ETF Insight: The dangers of smart beta backtesting

Scott Longley

a cloudy sky with a light

Smart beta has a problem and the issue lies in its use of data to justify newer, fancier factors which, it might be fairly reasoned, are more precisely structured to capture the imagination of investors rather than relying on the actual numbers.

Such would be the conclusion from recent research from Scientific Beta, which has released two recent papers looking into the definitions of factors used by many of the leading providers in this area.

At the heart of the problem is the ability of researchers to wield ever greater computing power in search of factors which ‘work’ in a given dataset. But as Felix Goltz, research director at Scientific Beta, says, these factors will have no relevance outside the original dataset due to selection biases.

Hence the backtest performance can evaporate once the factor goes live. “Product providers explicitly acknowledge that the guiding principle behind factor definitions is to analyse a large number of possible combinations in short data sets and then retain the factors that deliver the highest backtest performance,” he says.

“If this performance is due to patterns that are specific to the sample, we are unlikely to detect them in backtests for different regions, or in backtests with deeper histories. Likewise, we are unlikely to detect similar performance once the factor goes live.”

For Nicolas Rabener at FactorResearch the problem goes to the heart of the promotion of smart beta. “The purpose of backtesting is to simulate the historical performance of a strategy and the results are commonly used for marketing investment products,” he says.

The better the backtesting, the easier it is to sell the final product – and crucially, of course, investors never get to see the bad backtests. “Unfortunately, attractive characteristics in backtesting are often based on data mining, which results in far less attractive performance once the product is live,” he adds. “Data-snooping describes identifying statistical relevant relationships, which are, in fact, the result of spurious patterns.”

Data-snooping is explained in more detail by Vitali Kalesnik, partner and director of research for Europe at Research Affiliates, who points out that the issue lies in how data-mining is applied. When it is used in areas such as high-frequency trading, for instance, the state-of-the-art machine-learning tools used can truly be utilised to spot regularities.

But with higher capacity areas such as factor investing with longer holding periods, the data-mining exercise “often turns into the data-snooping exercise of finding spurious relationships in the otherwise random return realisations”.

He went that-a-way

There is an element of quantum theory about why a factor can flunk when it reaches a live environment. The ‘discovery’ of the factor will often be accompanied by the publication of an academic paper which explicitly sets out the case – and gives fund managers the opportunity to implement it and quickly arbitrage away any excess returns.

In other words, once it is seen, it is often gone. Rabener points to an example from the early 1980s of research which identified highly attractive small-cap returns that swiftly disappeared once the paper was published.

But backtesting results disappearing in the light of day is not the only issue. Kalesnik says that the “elephant in the room” is the issue of transaction costs.

“Most studies fail to incorporate the realistic transaction costs into the return estimates,” he says. “Factors requiring higher turnover tend to have prohibitively high transaction costs rendering strategies trying to benefit from them useless at best.”

He points to his company’s own research has shown that most mutual funds targeting momentum failed to deliver outperformance even during periods when theoretical momentum portfolios outperformed.

Factor proliferation

But even if a fund avoids both the trap of data snooping and performs net of transaction costs, the final hurdle it faces is crowding.

“Only the more risky and harder to hold on to strategies still leave a potential for outperformance, but again often with the reduced potency,” he adds.

As has been dealt with by Scientific Beta and others, the issues around the proliferation of factors are a major concern with fluid definitions supporting a whole ecosystem.

This is far from the findings of the stricter academic research which at most would suggest there are just five factors that can reward investors for the risk taken – value, size, momentum, low volatility and quality.

Yet this is complicated by the fact that there is no single definition for any factor, despite the reams of academic research devoted to the area.

“Even if a particular factor definition is favoured by academic models it may prove to be inadequate for practical investment purposes,” says Kalesnik.

He points to the example of price-to-book as a definition of value. “When Fama and French chose it for their factor model no regard was made for the transaction costs. Neither did Fama and French foresee that in the era of high-tech the intangibles, not captured by the book value, will become a vitally important part of company valuations.”

Insurance policies

So what is an investor to do? For Goltz, independent research can help sort the wheat from the chaff. “Independent replication by third parties is required to conclude that a factor is not purely due to the specific test protocol and the dataset of a single provider,” he says. “Such replication studies are also needed to test whether the original results continue to hold after they have been published.”

Rabener suggests their focus should be on factors where there is academic research to provide at least some reassurance of validity of performance. “Anything that looks too good to be true, is likely to disappoint in realised performance,” he says.

Kalesnik makes a similar point. “Usually academic attention can serve as a strong validating signal. When a factor can be defined in multiple ways and if factor shows validity in many international markets this signals that the factor is more likely to be real.”

Still, even with the backtests and academic theory there remains the chance that the selected factor might not work in the future. In this case, Kalesnik falls back on a familiar argument. Investors should concentrate on the one area they can control and that is the certainty of transaction costs.

“Focusing on strategies with the higher capacity and lower transaction costs can be a prudent way to maximizing returns,” he says.

The final piece of advice comes from Rabener. As is currently the case with value in particular, the risk with any factor is that it can suffer prolonged periods of underperformance. This requires investors to be “disciplined when harvesting factor returns and have a long-term perspective”.

ETF Insight is a new series brought to you by ETF Stream. Each week, we shine a light on the key issues from across the European ETF industry, analysing and interpreting the latest trends in the space. For last week’s insight, click here.

Featured in this article


No ETFs to show.