Testing Financial Data

When I worked for a software house in the trading industry, it was my responsibility to test the financial data that would then be formatted in different charting or analytics engines. The data was accessed through a REST API and accessed Azure blobs as the data storage. When we are testing financial data, we are looking for correctness and consistency. As a tester, we must look at this from the angle of - where are the inconsistencies and where is the data not correct. If we are only looking for positives, we will rarely find negatives - look into the Vallance Effect.

Ideally, when testing time series data, you have access to the oracle data source, which could be an SQL database, MongoDB, or cloud storage. Then you can formulate queries that filter and sort the data in the way that the API will behave. This code can usually be taken straight from the API itself and transposed against the data source, but sometimes that may be difficult (like with cloud storage spanning multiple accounts).

Questions You Should Ask When Testing Time Series Data

Is the format correct?
What are the observable boundaries? Hour? Day? Minute? Year?
What are the "invisible" boundaries? Are multiple blob files being parsed? Is there any sharding or sorting going on before we see the data?
Are there more formats that we can use as a comparison?
Do we have an oracle (source of truth) data source?
What parameters are required to fetch data?
Are the time buckets inclusive or not?
Is there depth to the trades (multiple trades in a time bucket?)
What is the min/max granularity?

Jargon

Buckets
Delta
Time series data
OHLC / Candlestick chart
VWAP
Granularity
Pandas
CSV
Snapshot testing
Dark deployment
Crossed price
Bid
Ask
Spread

What Tools Can Be Used?

When it comes to larger data sets, comparison tools will be your friend, but even with the help of such tools, there is a limit to their effectiveness. As data scales, the testing required will have to be algorithmic and able to parse over the scaled data in a reasonable time.

One cheat code that I found when working with larger data sets was to try and chart the data out to look for anomalies. When visualizing data, certain anomalies will jump out at you visually, which would not be obvious when looking at a CSV, for example. One example of an anomaly that I would look out for was a crossed price. A crossed price in the context of the data I was testing was when a bid was higher than an ask price.

If that doesn’t make sense to you straight away, imagine it like this: Imagine if there was a guitar on eBay and its asking price was £30, you wouldn’t expect to see a bid for £50, because the seller would accept a price that is much lower. This works the same for trading.

Reverse Engineering a Query

Have a look at the Google chart above, and let’s try and reverse engineer how the query could look. The parameters could look something like:

From_date = 2024-17-02-00:00:00Z (nano)
Until_date = 2024-18-02-00:00:00Z (nano)
Granularity = 15 minutes
Asset_pair = BTC/USD

Which would look like below in a URI:

https://api.google.finance.com/cryptocurrency/v2/api?From_date=2024-17-02-00:00:00Z&Until_date=2024-18-02-00:00:00Z&Granularity=15&Asset_pair=BTC/USD

Experiments to Try

Here are some experiments I would try:

Set the granularity to a random, not very often used number such as 17. Observe how the data behaves. Pay special attention to the first and last buckets. Notably, does the last bucket include the next bucket if the time falls in between the 17 minutes of granularity?
Try leap years or 29th February on a valid leap year and an invalid leap year. Does the API catch the error or just continue as normal? Internally, are there any warnings?
Is the timestamp required to have the exact length of nanoseconds, or does it intelligently calculate the date when the smallest value is not provided?
Try other asset pairs. Try pairs that don’t exist.

This document should now be well-structured and formatted in Markdown for easier readability and usage!

Testing Financial Data

Questions You Should Ask When Testing Time Series Data

Jargon

What Tools Can Be Used?

Reverse Engineering a Query

Experiments to Try

Testing practice: Irish phone numbers

Forget flashy - focus on fundamentals in testing

Thoughts on Estimates in Software Engineering

Setting expectations for tester during agile ceremonies

My Accidental Vibe Coding Nightmare

Rating testing deifnitions from different orgs

Testing Financial data using an API

Tales from Reddit: testing doesn't exist