Skip to content

qpbqr/Sentiment-Divergence-Factor

Repository files navigation

Sentiment-Rating Divergence: Quantifying the "Unspoken" Alpha

1. Investment Philosophy & Motivation

The "Divergence" Hypothesis

The core inspiration for this strategy stems from a behavioral pattern observed in the A-share market, summarized by the adage:

"Ratings are political; Text is emotional."

Sell-side analysts often face structural conflicts of interest—such as maintaining relationships with listed companies or supporting investment banking divisions. Consequently, their numerical ratings are often inflated, sticky, or lack sufficient differentiation. However, their true opinions regarding a company's fundamentals are frequently revealed more subtly in the textual content of their reports or the surrounding news flow.

Core Logic: We aim to capture the "Divergence" between Textual Sentiment and Nominal Ratings.

  • If textual sentiment is significantly lower than what the rating implies, it signals hidden downside risk (the analyst is "sugarcoating" the truth).
  • Conversely, strong sentiment backed by a modest rating may indicate unrecognized upside.
  • This Divergence (Residual) strips away the "political" noise found in ratings, isolating a purer Alpha signal.

Adaptation for the US Market

While the original concept relied on full analyst report texts, this project adapts the methodology for the US market due to data availability constraints:

  • Proxy Data: Instead of full analyst reports, we utilize News "Situation" Descriptions. These concise summaries contain detailed event descriptions, serving as an objective proxy for market sentiment.
  • Methodology: We extract sentiment scores from these "Situation" texts using FinBERT and calculate the residual against Analyst Ratings (and other risk factors) to construct the final selection factor.

2. Factor Construction Methodology

The project employs FinBERT (a BERT model pre-trained on financial texts) for sentiment extraction, combined with a rigorous orthogonalization process to ensure the signal is unique.

Step 1: Sentiment Extraction (FinBERT)

Each news "situation" text is scored using the yiyanghkust/finbert-tone model.

  • Probability Adjustment: We ignore the "Neutral" probability to focus on the polarity intensity. $$P_{pos_adj} = \frac{P_{pos}}{P_{pos} + P_{neg}}$$
  • Base Score: Mapping the probability to a centered range $[-0.5, 0.5]$. $$S_{raw} = P_{pos_adj} - 0.5$$

Step 2: Asymmetric Adjustment (The "Fear" Bias)

Markets typically react more violently to negative news than positive news (Loss Aversion). To capture this, we apply a 3x penalty weight to negative scores:

$$ S_{adj} = \begin{cases} S_{raw} & \text{if } S_{raw} \ge 0 \\ S_{raw} \times 3 & \text{if } S_{raw} < 0 \end{cases} $$

Step 3: Time-Series Aggregation

We use a 90-day Linear Decay window to aggregate daily scores into a cumulative sentiment exposure:

$$ Factor_{agg, t} = \sum_{i=0}^{89} w_i \times S_{adj, t-i} $$

(Where $w_i$ decreases linearly over the window, giving higher weight to recent news.)

Step 4: Strict Orthogonalization (The "Pure" Divergence)

To ensure the signal represents a unique idiosyncratic alpha—rather than a proxy for media attention, size, or sector betas—we perform a multivariate orthogonalization.

We regress the aggregated sentiment score against the following explanatory variables:

  1. Analyst Ratings: To capture the "Divergence" logic.
  2. Event Count: To remove bias from companies with high media volume (which can artificially inflate cumulative scores).
  3. Market Cap (ln): To strip out size factor exposure.
  4. Sector Dummy Variables: To eliminate industry-specific sentiment baselines.

The final factor is the Residual ($\epsilon$) derived from the cross-sectional regression at each time step $t$:

$$ \text{Factor}_{agg, i, t} = \alpha + \beta_1 \text{Rating}_{i,t} + \beta_2 \text{Count}_{i,t} + \beta_3 \ln(\text{MktCap}_{i,t}) + \sum_{s} \delta_s \mathbb{I}(\text{Sector}_{i,t}=s) + \epsilon_{i,t} $$

$$ \textbf{Final Alpha Factor}_{i,t} = \epsilon_{i,t} $$


3. Backtesting Performance

Framework

  • Strategy: Long-Only (Decile-based selection)
  • Rebalancing: Monthly
  • Weighting: Market-Cap Weighted
  • Benchmark: S&P 500
  • Neutralization: Sector & Market Cap Neutralized (in backtest engine)

Phase 1: Initial Results (Base Factor)

The strategy demonstrates significant Alpha over the benchmark, though volatility and IC stability present room for improvement.

Metric Portfolio S&P 500 Benchmark
Total Return 130.01% 86.13%
Annualized Return 18.46% 13.47%
Sharpe Ratio 1.23 0.91
Max Drawdown -18.81% -
  • IC Statistics: Mean IC: 0.0115 | ICIR: 0.17 | IC > 0 Ratio: 55.9%
  • Observation: Strong total returns, but the low ICIR indicates the signal's predictive power fluctuates over time. Decile separation was good but not strictly monotonic.

Phase 2: Transformation & Optimization

To improve signal robustness, additional transformations (e.g., distribution standardization) were applied to the factor in backtest_transformation.ipynb.

Metric Transformed Portfolio S&P 500 Benchmark
Total Return 132.73% 86.13%
Annualized Return 18.74% 13.47%
Sharpe Ratio 1.24 0.91
Max Drawdown -14.98% -
  • IC Statistics: Mean IC: 0.0158 | ICIR: 0.25 | IC > 0 Ratio: 61.0%
  • Observation: The transformation improved the ICIR (0.17 $\to$ 0.25) and significantly reduced the Max Drawdown (-18.8% $\to$ -14.9%), validating that the "Divergence" logic contains extractable Alpha.

4. Conclusion & Discussion

This project successfully verifies the "Sentiment-Rating Divergence" hypothesis in the US market.

  1. Data Insight: Even without full analyst reports, publicly available news "situation" data—when processed with FinBERT—can effectively capture the market's "true" sentiment.
  2. Rigorous Construction: By strictly residualizing against Event Count and Common Betas, we ensure the alpha is not derived from mere popularity or style exposure.
  3. Performance: The strategy consistently outperforms the S&P 500 (Sharpe 1.24 vs 0.91).
  4. Future Work: While the Long-only performance is strong, the factor decile monotonicity could be improved. Future iterations might explore non-linear models to better capture the complex relationship between ratings and sentiment residuals.

5. Repository Structure

File Description
factor_construction_finbert.ipynb NLP Pipeline: FinBERT implementation, probability renormalization, asymmetric penalties, and linear decay aggregation.
backtest_simple.ipynb Backtest Engine: Standard monthly rebalancing engine. Implements the strict orthogonalization (Residuals vs Rating, Count, Size, Sector).
backtest_transformation.ipynb Optimization: Advanced backtesting with factor transformations to optimize signal distribution and stability.

| data/ | Contains the processed factor and market data. |

About

Quantitative investment strategy capturing divergence between analyst ratings and news sentiment using FinBERT

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors