Passive funds make great play of their index tracking. The idea is, of course, alluring and simple.

Do away with that idiosyncratic manager and replace him or her with a rules-based index which tells the exchange traded fund which stocks or bonds to buy, and sell.

As a result, expenses are reduced (no overpaid active fund manager), behavioural quirks are minimised and the investor gets to buy into 'the market' - in a diversified, liquid manner.

In sum, the revolution is passive and the instrument of disruption is the index. But there's a very real problem - that index. By and large these crucial creations are managed by huge mega corporations which charge fees for every single aspect of the index creation process. The exchange traded funds must not only pay for the right to track the index but also the data feeds that flow from the index.

Rather more pertinently the index firms have become the centre of attention, with each of the major platforms (MSCI, FTSE Russell, S&P Dow Jones, Stoxx) copying each other's iterations of an index. The net effect of this laudable competition is that all the major firms have roughly similar sub sets of every imaginable index - be it sectors, thematic or smart beta based.

The index firms don't of course develop their indices in isolation. They talk constantly to fund researchers and buyers. They also talk incessantly to ETF issuers about which variant of an index might work best. But in reality, it's all a fairly closed loop, probably led by marketers working out what they can sell next and to whom, as it has always been in the world of investment funds.

Surely though there's room for an alternative approach to index construction, with two ideas in particular jumping to my mind, one focused on the crowd and an open architecture, the other on machine learning. Let's take each in turn.

In our hyper connected age, we've become used to the idea of both open architectures and the power of the crowd. In the first, we choose to build a product (an index, or a software programme) by first being open about its construction and then encouraging the crowd to help develop, and build out the product. This model doesn't have to be entirely anarchic but can look a bit more like the open software architecture pioneered by the Mozilla Foundation which has developed Firefox and Thunderbird - both hugely successful platforms. A central organisation floats an idea but then the crowd helps develop the idea, tailoring to specific markets and individual requirements, perfecting the product in numerous iterations.

The same model could work easily within the index space. One core concern of many investors is that they are forced to invest in the indices, and benchmarks imposed on them by exchanges and issuers. Ordinary investors are not necessarily listened to - take ESG, a potentially huge area for open source indexing. Many millennials are hugely interested in a more benign form of investing that allows for climate change impacts as well as broader CSR interests. The current model of listening to these concerns is what one could politely call a focus-panel-influenced command and control model. Index firms gets a call from a leading pension fund or institution with an ethical focus, and they build an index to accommodate said major investor. What about an alternative crowd based approach where lobby groups and activist concerned about building a better governance model collaborate with ordinary investors to build an open source, constantly evolving ESG index?

Ditch the human

A more radical model might be to look at machine learning. The collective, open sourced model suggested just now is still based on normative judgements i.e things that you or I think we might like to see in an index. The existing command and control model beloved of index firms is by contrast based on a more positivistic, data driven model but nevertheless it is still created by a feedback loop between concerned human actors i.e issuers and investors. Why not take the humans away all together and let the machines do the hard work?

This approach is more radical and asks a more basic question - are humans best placed to make decisions about what works in a complex investment eco system? Of course Quantitative investing has been around for decades and has helped to lay the foundations for both smart beta and factor investing. And yes, Quant investing makes extensive use of machines of course, but under intensive human control and tutelage. But in reality the machines are subordinated to a calculating role informed by humans who decide the factors and measures.

Why not banish the humans altogether in this process of building a quant based index and rely on machine learning to find out which stocks to pick? Or as analysts at SocGen have recently asked, Can machine learning build us a better stock screen?

A research paper by Georgios Oikonomou and Andrew Lapthorne dives into the world of quant driven stock screening but in truth the exact same question could be asked of any smart beta index - why not get artificial intelligence to build the right index and ditch the human index creators altogether?

The SG analysts define their task as follows: "Our goal is to train our models to predict the direction of a stock based on a set of input data, or as they are known in the Machine Learning world "features". Each stock in our sample is represented by a set of 80 fundamental and technical indicators."

The underlying data set comes from a global universe covering FTSE Developed World ex-financial companies from December 1989 through September 2017, with a wide set of 80 factors, "which cover both fundamental and technical indicators stretching across 10 broader factor groups. Our indicators include many traditional quant factors and provide a picture of the company's valuation, profitability, leverage, earnings quality, historical growth and capital allocation, as well as metrics of a stock's price and earnings momentum and risk."

Lapthorne and Oikonomou add one small twist - they focus on "the tails of the data [which] often produces better results, [so] we can remove some of the noise associated with less separable observations". In terms of the Machine Learning (ML) approaches used, they opt for the three most popular:

  • Support Vector Machine1 is a supervised learning algorithm that is considered by many as the best classifier to date due to its good performance in a wide range of classification tasks.
  • Random Forest3 is an "ensemble" learning method (or meta-algorithm). In ensemble methods, multiple classifiers are trained to solve the same problem and are then combined by taking a majority vote. In Random Forest, a collection of 'decision trees' is used
  • AdaBoost4 (short for Adaptive Boosting) is the most popular boosting learning method. Like bagging, boosting is another ensemble learning method whereby the same classifier is trained multiple times. The idea of boosting is to combine a set of weak learners into a single strong classifier.

The result? ML not unsurprisingly is a success. "The winner of our little performance horse race is the non-linear SVM followed by the Random Forest algorithm. The linear SVM and AdaBoost lag behind, but they too managed to beat the benchmark model. What is also interesting is the similarity in the performance profile. Despite the differences in overall return, the periods in which they had the best or worst returns are similar across the models. The algorithms might be very different in terms of how they approach classification, but they all try to answer the same question by learning from the same set of data…..Since 2008, for example, the non-linear SVM has still gained c.40% with a c.66% monthly hit rate, which is still pretty impressive. The main exception is when all models suffered during a period that saw quality stocks rallying strongly and value suffering in the first half of the year, and this trend violently reversing in the second half".

Crucially the SG analysts observe that the "performance figures reported above don't take into account any transaction costs. Given the inclusion of high turnover factors and the dynamic nature of the models, it is not surprising to see that the resulting baskets experience very high turnover….when we cap the turnover to 30%, in line with the base model, we still manage to outperform the base model."

A new world of investing emerges out of this analysis. Rather than the existing model of social investing, where we follow other great investors like us, might we see a new social investing paradigm emerge where ML inspired AIs compete for attention to build a better index? Could we even take this all one step further and integrate both my alternative models - collaborative networks of investors and activists work together using AI to build open source indices where the machines do the really smart data analysis and we the humans set the normative goals?