Testing Financial Data

When I worked for a software house in the trading industry, it was my responsibility to test the financial data that would then be formatted in different charting or analytics engines. The data was accessed through a REST API and accessed Azure blobs as the data storage. When we are testing financial data, we are looking for correctness and consistency. As a tester, we must look at this from the angle of - where are the inconsistencies and where is the data not correct. If we are only looking for positives, we will rarely find negatives - look into the Vallance Effect.

Ideally, when testing time series data, you have access to the oracle data source, which could be an SQL database, MongoDB, or cloud storage. Then you can formulate queries that filter and sort the data in the way that the API will behave. This code can usually be taken straight from the API itself and transposed against the data source, but sometimes that may be difficult (like with cloud storage spanning multiple accounts).

Questions You Should Ask When Testing Time Series Data

  • Is the format correct?
  • What are the observable boundaries? Hour? Day? Minute? Year?
  • What are the "invisible" boundaries? Are multiple blob files being parsed? Is there any sharding or sorting going on before we see the data?
  • Are there more formats that we can use as a comparison?
  • Do we have an oracle (source of truth) data source?
  • What parameters are required to fetch data?
  • Are the time buckets inclusive or not?
  • Is there depth to the trades (multiple trades in a time bucket?)
  • What is the min/max granularity?

Jargon

  • Buckets
  • Delta
  • Time series data
  • OHLC / Candlestick chart
  • VWAP
  • Granularity
  • Pandas
  • CSV
  • Snapshot testing
  • Dark deployment
  • Crossed price
  • Bid
  • Ask
  • Spread

What Tools Can Be Used?

When it comes to larger data sets, comparison tools will be your friend, but even with the help of such tools, there is a limit to their effectiveness. As data scales, the testing required will have to be algorithmic and able to parse over the scaled data in a reasonable time.

One cheat code that I found when working with larger data sets was to try and chart the data out to look for anomalies. When visualizing data, certain anomalies will jump out at you visually, which would not be obvious when looking at a CSV, for example. One example of an anomaly that I would look out for was a crossed price. A crossed price in the context of the data I was testing was when a bid was higher than an ask price.

If that doesn’t make sense to you straight away, imagine it like this: Imagine if there was a guitar on eBay and its asking price was £30, you wouldn’t expect to see a bid for £50, because the seller would accept a price that is much lower. This works the same for trading.

Reverse Engineering a Query

Have a look at the Google chart above, and let’s try and reverse engineer how the query could look. The parameters could look something like:

From_date = 2024-17-02-00:00:00Z (nano)
Until_date = 2024-18-02-00:00:00Z (nano)
Granularity = 15 minutes
Asset_pair = BTC/USD

Which would look like below in a URI:

https://api.google.finance.com/cryptocurrency/v2/api?From_date=2024-17-02-00:00:00Z&Until_date=2024-18-02-00:00:00Z&Granularity=15&Asset_pair=BTC/USD

Experiments to Try

Here are some experiments I would try:

  • Set the granularity to a random, not very often used number such as 17. Observe how the data behaves. Pay special attention to the first and last buckets. Notably, does the last bucket include the next bucket if the time falls in between the 17 minutes of granularity?
  • Try leap years or 29th February on a valid leap year and an invalid leap year. Does the API catch the error or just continue as normal? Internally, are there any warnings?
  • Is the timestamp required to have the exact length of nanoseconds, or does it intelligently calculate the date when the smallest value is not provided?
  • Try other asset pairs. Try pairs that don’t exist.

This document should now be well-structured and formatted in Markdown for easier readability and usage!

Related topics:

← Back to blogs

Testing practice: Irish phone numbers

Tales of testing a web form with field validation for Irish phone numbers

Forget flashy - focus on fundamentals in testing

Why testers should focus on risk and fundamentals instead of over-engineering solutions with automation.

Thoughts on Estimates in Software Engineering

A deep dive into why software estimations are so tricky, the asymmetry of estimates, and how Scrum approaches them.

Setting expectations for tester during agile ceremonies

Setting expectations that testers should follow throught each agile process to make more of an impact and provide value

My Accidental Vibe Coding Nightmare

When limitied coding experience meets AI, is it tempting to vibe code or are you entering a debugging nightmare?

Rating testing deifnitions from different orgs

Rating the definitions of software testing from page 1 of Google and explaining why I think they deserve the rating

Testing Financial data using an API

How to test time-series financial data through an API

Tales from Reddit: testing doesn't exist

My thoughts on a bizarre comment from Reddit in which a fellow tester claims testing doesn't exist and what it means to the state of testing