Demystifying big data: How to critically assess quantitative investment signals

Big data is one of the hottest buzzwords in the investment industry today. This is understandable given the prolific growth of data and data processing technology over the past several years and the intriguing applications of big data to equity investing.

It's easy for fiduciaries to be sold by investment managers touting the use of big data, since the alpha factors generated from it are novel and exciting. However, the implementation of big data concepts into investment strategies is technical and complex, which can make it difficult to critically assess them. We believe big data can be an attractive enhancement to any equity strategy, but the signals derived from the data must be assessed with proper skepticism.

As leaders in manager research, Russell Investments is in a privileged position. We meet with hundreds of managers—both quantitative and fundamental—each year, and we have seen the implementation of big data (both good and bad) firsthand over the past decade. This blog post is meant to demystify big data by helping fiduciaries understand what it is and what to look for when assessing its implementation into equity portfolios. If you haven't had enough after that, we also include a bonus section discussing the impact of big data on fundamental equity investors.

What is big data?

The terms big data, alternative data and unstructured data are often used interchangeably. There is significant overlap between the three, but they have slightly different meanings.

Big data refers to extremely large data sets that require advanced computing technology to process. Most unstructured and alternative data sets can be categorised as big data.
Unstructured data is information that is not organised in a consistent, well-defined manner. This makes it difficult to systematically analyse and assign meaning to the data. This is where two other big data buzzwords come into play: natural language processing (NLP) and machine learning (a type of artificial intelligence). These tools can analyse unstructured data and recognise patterns. One of the most commonly used unstructured data sets is transcripts from earnings calls with management. NLP can be used to systematically read the transcripts of thousands of companies and machine learning is used to differentiate between positive and negative words or phrases. The result is an indicator of management sentiment for a large group of companies. Typically, positive sentiment portends good future earnings and upward stock movement.
Alternative data sets are provided by sources other than the company whose stock is being assessed. The information is not commonly found on financial statements. Examples include social media data, satellite data and industry-specific data such as credit card transactions or oil-well data. A simple example of the application of credit card data is using transaction volume to predict quarterly sales for retail companies.

At Russell Investments, we refer to these data sets collectively as non-traditional.

How does one assess the implementation of big data?

Quantitative managers use data to generate signals that predict movements in stock prices. Our process for assessing alpha signals that use non-traditional data is consistent with how we assess signals that use traditional financial data. Below are some core questions we ask quantitative managers when assessing alpha signals:

Which signals have the highest risk-adjusted weight in the alpha model? Unsurprisingly, quantitative firms often tout their most complex, impressive signals in pitchbooks and presentations. We believe this may amount to marketing hype—don't fall for it! Why? In some cases, most of the alpha model is driven by undifferentiated signals like simple price momentum or book value to price measures, while the unique signals have minimal impact on performance. We focus on the key signals that drive performance—regardless of whether they use big data or not.
What's the economic intuition supporting the signal? At Russell Investments, we have an aversion to data miners, who construct signals based solely on past performance. We believe managers should have a fundamentally-oriented thesis, rather than throwing signals into a back-test and using the best performers. To us, technologically advanced signals are meaningless without intuition. In other words, we must be convinced that the inefficiency the manager is seeking to exploit exists.
What's the data source and how was the signal developed? Proprietary data sources are generally preferred by our team, but they are uncommon, and they often don't remain so for long. As such, we generally pay more attention to the specification of the signals that use the data. We prefer signals that are developed in-house and creatively/thoughtfully specified, rather than those that are purchased/widely available. Proprietary signals and/or data sources are less likely to be used by other investors, which decreases the probability of the excess return being arbitraged away.
Tell us more about the signal. We use our leverage as an influential institutional investor to demand transparency from investment managers. We are unimpressed when a manager simply states that they use non-traditional data. Our questions go far beyond the investment team's initial description of their data sets and signals. This is important since signal specification can have a significant impact on efficacy.

To further illustrate this point, let's return to the example of the management sentiment signal that uses text from earnings transcripts. Some managers use English word dictionaries based on academic papers to gauge sentiment. This leads to a very standard set of words, such as strong and great, to imply positive sentiment. Other managers refine the signal by using proprietary dictionaries to determine word association. This includes using foreign language dictionaries for companies that do not use English as their default language and using strings of words to gauge sentiment. The latter can provide a more powerful signal that is likely to persist longer.

Other key areas to probe managers on (but that we won't cover here) include data cleansing, signal testing environment, technology infrastructure, risk models, qualitative overrides and trading / implementation.

Big data is powerful but be critical!

When supported by sound economic rationale and thoughtful implementation, we believe that big data and machine learning can be additive to any investment strategy. As big data continues to proliferate in the investment industry, some older, less refined, non-proprietary signals will become commoditised. That said, we believe that well-conditioned, proprietary signals using traditional data sources will continue to provide excess return, particularly over intermediate-to-long time horizons.

We hope this blog post will help other fiduciaries differentiate between skilled and unskilled application of non-traditional data and tools.

Bonus: How does increased use of big data affect fundamental managers?

One of the primary advantages of non-traditional data sets is that they are usually available at a higher frequency. This can be particularly advantageous for investment processes that rely on exploiting information that is relevant to short-to-medium term fundamentals (e.g., momentum-oriented managers targeting stocks they expect to benefit from positive earnings surprises and revisions).

We have spoken with some fundamental managers that have begun implementing big data into their processes, particularly in the consumer sector. For example, while companies typically report earnings on a quarterly basis, web-scraping techniques can be used to extract consumer preference trends from social media continuously, providing intra-quarter data that helps predict near-term sales (speaking of the use of social media data, see our previous blog post regarding data privacy concerns and its impact on Facebook here). Managers relying on traditional data might not have a good sense for sales until they see competitors and/or suppliers report that quarter.

We have also spoken with fundamental managers that use big data for idea generation. For example, some use machine learning and NLP to quickly process thousands of articles to identify investment ideas by targeting stocks mentioned with specific event-related keywords. While this is a relatively simple use of big data, it allows managers to cast a much wider net for new ideas.

Ultimately, we think the combination of big data and fundamental research can be quite powerful. Still, the penetration of big data into fundamental equity investing remains in the very early stages. Most managers remain unsure of how to use the tools or have not considered using them. We have seen some promising applications of this technology and believe managers that implement it sooner will have an advantage. Investment processes that rely on exploiting information that is relevant to short-to-medium term fundamentals could find themselves front-run by investors who use higher frequency big data or machine learning techniques if they do not invest in big data. Big data can also be informative for those with longer time horizons, but we expect the impact to be smaller.