In our previous post on datasets Data Considerations In Systematic Strategies, we looked at various kinds of data that can be used as input variables in a systematic strategy. However, anyone who has ever worked with data will testify that getting raw data is one thing and using it is entirely another.

Data Processing

Data can be procured in the raw format from exchanges, regulators, or professional data service providers. Since any systematic strategy is essentially dependent on the quality of data, it is essential to procure data from reliable data sources alone. Once this is done, the next step is to verify, validate, clean, and store the data and this is probably the most unglamorous, yet critical, part of the entire strategy design process.

The first step involves cross-validating the data for some period either through a different data source or through direct market observation for traded data. Almost every dataset will have its issues with missing data and outliers – how these are handled and addressed are a design choice and can have a meaningful impact on the outcome of the strategy. The next step is therefore to ensure that data is processed to account for such missing values and outliers. There is no fixed way to do this and hence it is essential to have a documented policy for handling such cases. It is also important that datasets are consistent across time and thus if any structural changes have occurred then these will have to be accounted for (e.g., corporate actions in stocks, credit rating changes in sovereign bonds). Depending on the application, there might be a requirement for two parallel datasets, one which is adjusted for such breaks, and the other which is point-in-time and hence unadjusted. Once this is done, the final step is to store it locally or on the cloud for efficient use. Most large financial market participants have a separate team that manages and processes the data, and their operation is critical to multiple businesses.

The Data Edge

It is commonly said that most businesses are now in the business of data and ‘data as an edge’ is especially true for an investment strategy. A lot of time and effort is being devoted to identifying new and potential data sources as well as analyzing and creating proprietary datasets. Researchers and analysts frequently look at various data sources and come up with measures and indicators that could lead to meaningful outcomes. Alternate data sources such as satellite data, aggregated financial transactions and sentiment scores have been increasingly becoming popular. With the constant increase in computing power, there is an ever-increasing trove of data that is available to process and analyze.

If a data source does have some predictive power, the general rule of thumb would be that the ease of availability and use is inversely proportional to the amount of time it continues to add value i.e., a proprietary dataset could potentially have a much larger shelf life as an input variable when compared to a publicly available dataset. This is especially true in the systematic investing world and hence there is a constant quest to find the next edge – a dataset that can create sustainable value.

 

 

Categories : Data Processing and The Data-Edge
Tags : Alphaminedata edgeData processing