Some asset managers view machine learning (ML) as a breakthrough for better analysis and prediction. Others argue these techniques are just specialized tools for quant analysts that will not change core asset management practices. Machine Learning for Asset Managers, the first in the Cambridge Elements in Quantitative Finance Series, is a short book that does not fully answer this big question or serve as a foundational text on the subject. It does, however, show how applying the right data analysis techniques can have a significant impact in solving challenging asset management problems that are not solvable through classical statistical analysis.
The traditional approach to the broad topic of machine learning focuses on general prediction techniques and the taxonomy of supervised and unsupervised learning models through the presentation of differences in machine learning and deep learning, as well as broad themes of artificial intelligence. (For a traditional general review, see Artificial Intelligence in Asset Management by Söhnke M. Bartram, Jürgen Branke, and Mehrshad Motahari.) Marcos M. López de Prado, chief investment officer of True Positive Technologies and professor of practice at the Cornell University College of Engineering, uses a more modest yet compelling approach to presenting the value of machine learning. This short work will help readers appreciate the potential power of machine learning techniques because it focuses on solutions to vexing asset management problems.
López de Prado’s presentation of problem-solving techniques provides a useful taste of machine learning for a broad audience. The book’s primary audience, however, consists of quantitative analysts who want to read about new techniques and to access Python code that will jumpstart their implementation of management solutions. A more in-depth analysis can be found in López de Prado’s longer work on the subject, Advances in Financial Machine Learning.
The book’s excellent introduction explains why machine learning techniques will benefit asset managers substantially and why traditional or classical linear techniques have limitations and are often inadequate in asset management. It makes a strong case that ML is not a black box but a set of data tools that enhance theory and improve data clarity. López de Prado focuses on seven complex problems or topics where applying new techniques developed by ML specialists will add value.
The first major topic
involves problems with covariance matrices. Noise in the covariance matrix will
influence any regression analysis or optimization, so techniques that can
better extract signals from noise will improve portfolio management decisions.
The second topic in this same general area shows how to “detone” the covariance
matrix by extracting the market component that often swamps other valuable
covariance matrix information. Expanding techniques for data signal extractions
will support better asset management decisions.
Next, López de Prado
explains how the distance matrix can be an enhanced method for looking beyond
correlation and how the concept of entropy or codependence from information
theory can be a useful tool. Building blocks, such as distance functions and
clustering techniques, can account for nonlinear effects, nonnormality, and
outliers that can unduly influence traditional correlation analysis. For
example, optimal clusters can be used to group data of similar quality as an
unsupervised learning technique that can effectively provide greater insight into
relationships across markets than is found in the traditional correlation
For those interested in the core problem of prediction, López de Prado discusses the frequently overlooked topic of financial labeling — that is, the setup of forecasting objectives as a key issue in supervised learning. Horizon returns are neither the only nor the best method of labeling data for predictions. For example, most traders are not interested in the difficult problem of forecasting a point estimate of where a stock will be in a week or a month. They are very interested, however, in a model that accurately predicts market direction. In short, the labels for what is being predicted matter.
The book addresses the core
problem of p-values and the concept of statistical significance.
Attention to this topic has been growing within finance because of the “zoo” of
statistically significant risk premiums that cannot be replicated out of sample.
This discussion demonstrates the broad application of ML as a general tool, not
just for problem solving but also for improved development of theory. Such ML
techniques as mean deceasing impurity, or MDI, and mean decreasing accuracy, or
MDA, can serve as effective and more efficient substitutes for p-values.
Ever since the innovations
of Harry Markowitz, portfolio construction has been a source of ongoing
frustration for asset managers. The “Markowitz curse,” which limits the
successful use of optimization when it is needed most, can be addressed by using
such ML techniques as hierarchical clustering and nested clustered optimization
to tease out data relationships and to simplify the optimal portfolio solution.
The final topic is tests for overfitting, a key problem for any quantitative asset manager trying to find that perfect model. ML techniques coupled with Monte Carlo simulations, which use the power of fast computing, can be used to provide multiple backtests and to suggest a range of possible Sharpe ratios. A model with a high Sharpe ratio may be just a matter of luck — one return path out of a wide range. Using ML can better identify false strategies and the likelihood of either Type I or Type II statistical errors. Discovering failure in the laboratory will save time and money before strategies are put into production.
Machine Learning for Asset Managers uses color for better display graphics and has a significant amount of Python code to help readers who want to implement the techniques presented. Code snippets are useful for readers who want to use this research, but at times, the integration of code and text in this book can be confusing. Although the author is adept at explaining complex topics, some steps, transitions, and conclusions are hard to follow for anyone lacking extensive quantitative knowledge. This work blends some of the author’s practical research projects, but that can be a disadvantage for readers looking for connections between techniques in order to think about machine learning holistically.
Brevity is this work’s
advantage, but a longer book would better support the author’s attempt to
demonstrate how machine learning can facilitate the development of new theories
and complement classical statistical theories. For example, the book’s
introduction provides one of the best motivations for using machine learning in
asset management that I have read. In just a few short pages, it addresses
popular misconceptions, answers frequently asked questions, and explains how
machine learning can be directly applied to portfolio management. López de Prado
has practical insights that most technical writers lack, so drawing more
extensively on his deep ML knowledge would be helpful to readers.
In summary, Machine Learning for Asset Managers successfully shows the power of ML techniques in solving difficult asset management problems, but it should not be viewed as an introduction to the topic for general asset managers. Nevertheless, learning how these techniques can solve problems, as expounded by an author who has enjoyed significant success in asset management, is worth the book’s modest price.
If you liked this post, don’t forget to subscribe to the Enterprising Investor.
All posts are the opinion of the author. As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.
Professional Learning for CFA Institute Members
CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.