Reading Between the Lines: NLP for Long-Horizon Factor Investing (Part 1 of 2)

Natural language processing (NLP) can extract investment signals from unstructured text that aren’t already captured by traditional factors like value, quality, momentum, and low volatility.
Unlike news sentiment, year-over-year textual change in SEC filings is a slow-moving, durable signal that investors can use to potentially enhance long-horizon factor strategies.
The signal's persistence may be supported by three drivers that technology doesn't easily erode: limited investor attention, management incentives to bury bad news in revised filings, and compensation for the genuine uncertainty signaled by meaningful rewrites.
The signal is most useful as a value-trap filter: it has historically produced the strongest return spread among stocks that already look risky on conventional measures — the cheap, aggressively investing, unprofitable, high-volatility, and low-momentum segments.

The dominant NLP application in investment management targets short-term trading, with daily news sentiment driving moves that play out over hours or days. That’s not much use to long-term factor investors, who need durable, slow-moving signals.

Unlike news sentiment, which decays in days, signals from SEC filings persist over months to a year — especially well-suited to low-turnover, long-horizon systematic strategies.

Value: Book to price.
Investment: A blend of book, inventory, and asset growth.
Profitability: A blend of gross, operating, and cash-based operating profitability.
Momentum: 12-month return, excluding the most recent month.
Volatility: Trailing one-year daily return volatility.

Our 30-year analysis of the top 1,000 U.S. stocks shows that even a simple NLP technique (measuring year-over-year textual similarity in SEC 10-K and 10-Q filings) generates a meaningful performance spread. Big changers underperformed steady-language companies by about 2.9% annually.

Excerpts from Citigroup’s 10-K Filings for 2006 and 2007

Recent 10-Year Results

Subperiod Excess Returns

Annual Rebalancing Results

A message from Advisor Perspectives and VettaFi: Discover something new! Click here to register for our upcoming webcasts.

Helping advisors enable clients to achieve their financial goals

Reading Between the Lines: NLP for Long-Horizon Factor Investing (Part 1 of 2)

Key Points

Introduction

Transforming Text into Investment Signals

Our Approach: Building a Signal That Lasts

Case Study: Citigroup on the Eve of the Financial Crisis

Applying the Signal Broadly

Why Does This Signal Work?

Limited Investor Attention

Asymmetric Management Incentives

Compensation for Uncertainty

Does the Signal Hold Up?

Signal Persists over Investable Horizons

Does This Signal Overlap with Other Factors?

Putting It All Together

Looking Ahead: Will the Signal Survive?

What's Next

Appendix

Excerpts from Citigroup’s 10-K Filings for 2006 and 2007

Subperiod Excess Returns

Annual Rebalancing Results

End Notes

References

Reading Between the Lines: NLP for Long-Horizon Factor Investing (Part 1 of 2)

Key Points

Introduction

Transforming Text into Investment Signals

Our Approach: Building a Signal That Lasts

Case Study: Citigroup on the Eve of the Financial Crisis

Applying the Signal Broadly

Why Does This Signal Work?

Limited Investor Attention

Asymmetric Management Incentives

Compensation for Uncertainty

Does the Signal Hold Up?

Signal Persists over Investable Horizons

Does This Signal Overlap with Other Factors?

Putting It All Together

Looking Ahead: Will the Signal Survive?

What's Next

Appendix

Excerpts from Citigroup’s 10-K Filings for 2006 and 2007

Subperiod Excess Returns

Annual Rebalancing Results

End Notes

References

Sponsored Links

Trending Topics View All