INSIGHTS

From Pattern-Matching to Representation: The Real Shift in AI for Finance

Quantitative finance has used machine learning for decades, but the fundamental approach has barely changed. A new class of models is shifting the paradigm from predicting price movements to understanding what financial assets actually are.

12-MINUTE READ

Quantitative finance has used machine learning for longer than almost any other industry. Renaissance Technologies began applying statistical models to market data in the 1980s. Their Medallion Fund went on to produce average annual returns of 66% before fees over three decades, arguably the most impressive track record in the history of investing. The quant revolution that followed brought an entire industry of algorithmic trading, factor investing, and systematic strategies.

But there is something important buried in that history that is often glossed over: the fundamental approach has barely changed. From the earliest trend-following models through to modern gradient-boosted trees and deep learning on order book data, the core logic has remained the same. Find statistical patterns in historical price and volume data. Bet that they persist. Replace them when they stop working.

This is pattern-matching. And pattern-matching has a structural problem.

The half-life problem

Statistical patterns in financial markets have a half-life. A pattern works until it doesn’t. There are several reasons for this, and they compound.

The first is crowding. When a profitable signal is discovered, capital flows toward it. The more participants trading on the same pattern, the more compressed the returns become. What was once an edge gets arbitraged down to nothing. This is not a theoretical concern. Alpha decay is one of the most well-documented phenomena in systematic trading.

The second is regime change. Markets are not stationary systems. The relationships between variables shift as macro conditions, regulation, technology, and participant behaviour evolve. A model trained on one regime has no guarantee of relevance in the next. Renaissance itself experienced this: during the 2020 COVID crash, the Medallion Fund surged 76%, but their externally-available funds, running related but different strategies, suffered losses of 20% or more. The models that had worked overcompensated for the initial collapse and then over-hedged in the recovery. The statistical relationships they were calibrated on simply did not hold through a structural break of that magnitude.

The third is the arms race itself. The tools used to discover patterns are accelerating their decay. When a handful of quant shops were mining price data with bespoke statistical models in the 1990s, the patterns they found could persist for months or years. Now, with thousands of firms running similar ML pipelines on overlapping datasets, a new signal can be crowded out in days. The edge is not in finding patterns anymore. It is in finding them fractionally faster than the next fund, which is a game of infrastructure and execution speed rather than insight.

Jim Simons himself acknowledged this dynamic. Renaissance continuously searched for new signals because the old ones degraded. The firm’s success was not built on finding one great pattern. It was built on industrialising the discovery process: continuously identifying, testing, deploying, and retiring statistical relationships faster than the market could erode them.

This works, clearly. But it has a ceiling. Every pattern you find is, by definition, a feature of a specific market regime. It tells you something about what prices did. It tells you nothing about why.

From prediction to representation

There is a genuinely different approach emerging, and it represents a more fundamental shift than simply applying bigger ML models to the same price data.

The idea is straightforward in principle: instead of asking “will this asset’s price go up or down?”, ask “what IS this asset?” Represent it not as a time series of prices but as a point in a high-dimensional conceptual space that captures its identity, its exposures, its relationships to other assets, and the implicit logic of how informed investors treat it.

This is the core idea behind asset embeddings, a line of research most clearly articulated in recent work by Gabaix, Koijen, Richmond, and Yogo. Their approach borrows directly from natural language processing. In NLP, word embeddings represent words as vectors in a continuous space where proximity reflects meaning: “king” is close to “queen”, and the vector from “man” to “woman” is similar to the vector from “king” to “queen.” The same logic can be applied to financial assets by analysing how institutional investors arrange them in portfolios.

The intuition is that investors, collectively, encode an enormous amount of information into their allocation decisions. When thousands of institutional investors consistently hold two stocks in similar proportions, those stocks are similar in some meaningful sense, even if that similarity is not captured by standard accounting metrics. By training embedding models on portfolio holdings data, you can recover a latent representation of what each asset is in the eyes of the market’s most informed participants.

The Koijen et al. work demonstrates that these embeddings outperform traditional firm characteristics (book-to-market, momentum, size, profitability) on benchmarks including relative valuation prediction, return comovement, and predicting institutional portfolio decisions. They also produce investor embeddings as a by-product, representing not just what assets are but what strategies look like in the same space.

Why this matters beyond the research

The practical implications of this shift extend well beyond academic benchmarks.

Consider drift tracking. A fund manager has an intended strategy: a particular set of exposures, a particular risk profile, a particular position in the market. Over time, as the portfolio evolves through trades and market movements, the actual strategy can drift from the intended one. With price-based metrics, you might catch this through factor exposure analysis, but only along the dimensions you thought to measure. With fund embeddings, you can track the fund’s position in a continuous strategy space and detect drift along dimensions you did not even name.

Consider hedging. Traditional hedging relies on measured correlations between assets or exposures to known risk factors. But correlations are, again, backward-looking statistical relationships. They break under exactly the conditions where hedging matters most: structural shocks and regime changes. If instead you understand what an asset is in embedding space, you can reason about hedging based on conceptual similarity and exposure structure rather than historical price co-movement.

Consider stress testing. If you can represent assets in a space that captures their conceptual identity and their relationships, you can ask: “if a geopolitical shock hits Eastern European energy supply chains, which assets in my portfolio are exposed?” Not because you have historical data from an identical previous event, but because the embeddings encode what each asset is and what it is exposed to.

This is the fundamental difference. Pattern-matching is fragile to regime change because patterns are regime-dependent. They are artifacts of a specific set of market conditions. When those conditions shift, the patterns break, and the model has no basis for adaptation because it never understood what was happening. It only knew what the numbers did.

Representations degrade more gracefully. An asset’s identity, its sector, its supply chain exposures, its competitive position, the implicit logic of how sophisticated investors treat it, these change much more slowly than the statistical regularities in its price. A model built on representational understanding can reason imperfectly about a novel situation. A model built on price patterns has nothing to fall back on.

The honest caveat

Representations are still learned from historical data. They are not immune to unprecedented structural changes. The claim is not that embedding-based approaches eliminate the problem of distribution shift. The claim is that they operate on a layer of reality that is more stable than price dynamics. Conceptual identity shifts on the timescale of years and corporate strategy changes. Price patterns shift on the timescale of weeks and market microstructure changes. Building models on the more stable layer gives you a better foundation for navigating the less stable one.

What this means for the industry

The thirty-year history of quantitative finance has been dominated by a single paradigm: find statistical patterns in price data and exploit them before they decay. It has been extraordinarily profitable for the firms that mastered it, and it is not going away. But the marginal returns to that approach are diminishing. The tools are commoditised. The competition is brutal. The half-lives are shortening.

The shift toward representational approaches, understanding what financial objects are rather than just predicting where their prices go, is not a replacement for traditional quant methods. It is a different layer of intelligence. One that is more robust to regime change, more amenable to reasoning under novel conditions, and more aligned with how the best human investors actually think: not in terms of price momentum, but in terms of what a business is, what it is exposed to, and what that implies.

The firms that combine both layers, fast pattern-matching for short-term execution and deep representational understanding for strategic positioning, will have a structural advantage. Not because they found a better pattern, but because they built a better understanding of the market they are operating in.

Related Articles