Sentiment-Rating Divergence: Quantifying the "Unspoken" Alpha

1. Investment Philosophy & Motivation

The "Divergence" Hypothesis

The core inspiration for this strategy stems from a behavioral pattern observed in the A-share market, summarized by the adage:

"Ratings are political; Text is emotional."

Sell-side analysts often face structural conflicts of interest—such as maintaining relationships with listed companies or supporting investment banking divisions. Consequently, their numerical ratings are often inflated, sticky, or lack sufficient differentiation. However, their true opinions regarding a company's fundamentals are frequently revealed more subtly in the textual content of their reports or the surrounding news flow.

Core Logic: We aim to capture the "Divergence" between Textual Sentiment and Nominal Ratings.

If textual sentiment is significantly lower than what the rating implies, it signals hidden downside risk (the analyst is "sugarcoating" the truth).
Conversely, strong sentiment backed by a modest rating may indicate unrecognized upside.
This Divergence (Residual) strips away the "political" noise found in ratings, isolating a purer Alpha signal.

Adaptation for the US Market

While the original concept relied on full analyst report texts, this project adapts the methodology for the US market due to data availability constraints:

Proxy Data: Instead of full analyst reports, we utilize News "Situation" Descriptions. These concise summaries contain detailed event descriptions, serving as an objective proxy for market sentiment.
Methodology: We extract sentiment scores from these "Situation" texts using FinBERT and calculate the residual against Analyst Ratings (and other risk factors) to construct the final selection factor.

2. Factor Construction Methodology

The project employs FinBERT (a BERT model pre-trained on financial texts) for sentiment extraction, combined with a rigorous orthogonalization process to ensure the signal is unique.

Step 1: Sentiment Extraction (FinBERT)

Each news "situation" text is scored using the yiyanghkust/finbert-tone model.

Probability Adjustment: We ignore the "Neutral" probability to focus on the polarity intensity. $$P_{pos_adj} = \frac{P_{pos}}{P_{pos} + P_{neg}}$$
Base Score: Mapping the probability to a centered range $[-0.5, 0.5]$. $$S_{raw} = P_{pos_adj} - 0.5$$

Step 2: Asymmetric Adjustment (The "Fear" Bias)

Markets typically react more violently to negative news than positive news (Loss Aversion). To capture this, we apply a 3x penalty weight to negative scores:

$$ S_{adj} = \begin{cases} S_{raw} & \text{if } S_{raw} \ge 0 \\ S_{raw} \times 3 & \text{if } S_{raw} < 0 \end{cases} $$

Step 3: Time-Series Aggregation

We use a 90-day Linear Decay window to aggregate daily scores into a cumulative sentiment exposure:

$$ Factor_{agg, t} = \sum_{i=0}^{89} w_i \times S_{adj, t-i} $$

(Where $w_i$ decreases linearly over the window, giving higher weight to recent news.)

Step 4: Strict Orthogonalization (The "Pure" Divergence)

To ensure the signal represents a unique idiosyncratic alpha—rather than a proxy for media attention, size, or sector betas—we perform a multivariate orthogonalization.

We regress the aggregated sentiment score against the following explanatory variables:

Analyst Ratings: To capture the "Divergence" logic.
Event Count: To remove bias from companies with high media volume (which can artificially inflate cumulative scores).
Market Cap (ln): To strip out size factor exposure.
Sector Dummy Variables: To eliminate industry-specific sentiment baselines.

The final factor is the Residual ($\epsilon$) derived from the cross-sectional regression at each time step $t$:

$$ \text{Factor}_{agg, i, t} = \alpha + \beta_1 \text{Rating}_{i,t} + \beta_2 \text{Count}_{i,t} + \beta_3 \ln(\text{MktCap}_{i,t}) + \sum_{s} \delta_s \mathbb{I}(\text{Sector}_{i,t}=s) + \epsilon_{i,t} $$

$$ \textbf{Final Alpha Factor}_{i,t} = \epsilon_{i,t} $$

3. Backtesting Performance

Framework

Strategy: Long-Only (Decile-based selection)
Rebalancing: Monthly
Weighting: Market-Cap Weighted
Benchmark: S&P 500
Neutralization: Sector & Market Cap Neutralized (in backtest engine)

Phase 1: Initial Results (Base Factor)

The strategy demonstrates significant Alpha over the benchmark, though volatility and IC stability present room for improvement.

Metric	Portfolio	S&P 500 Benchmark
Total Return	130.01%	86.13%
Annualized Return	18.46%	13.47%
Sharpe Ratio	1.23	0.91
Max Drawdown	-18.81%	-

IC Statistics: Mean IC: 0.0115 | ICIR: 0.17 | IC > 0 Ratio: 55.9%
Observation: Strong total returns, but the low ICIR indicates the signal's predictive power fluctuates over time. Decile separation was good but not strictly monotonic.

Phase 2: Transformation & Optimization

To improve signal robustness, additional transformations (e.g., distribution standardization) were applied to the factor in backtest_transformation.ipynb.

Metric	Transformed Portfolio	S&P 500 Benchmark
Total Return	132.73%	86.13%
Annualized Return	18.74%	13.47%
Sharpe Ratio	1.24	0.91
Max Drawdown	-14.98%	-

IC Statistics: Mean IC: 0.0158 | ICIR: 0.25 | IC > 0 Ratio: 61.0%
Observation: The transformation improved the ICIR (0.17 $\to$ 0.25) and significantly reduced the Max Drawdown (-18.8% $\to$ -14.9%), validating that the "Divergence" logic contains extractable Alpha.

4. Conclusion & Discussion

This project successfully verifies the "Sentiment-Rating Divergence" hypothesis in the US market.

Data Insight: Even without full analyst reports, publicly available news "situation" data—when processed with FinBERT—can effectively capture the market's "true" sentiment.
Rigorous Construction: By strictly residualizing against Event Count and Common Betas, we ensure the alpha is not derived from mere popularity or style exposure.
Performance: The strategy consistently outperforms the S&P 500 (Sharpe 1.24 vs 0.91).
Future Work: While the Long-only performance is strong, the factor decile monotonicity could be improved. Future iterations might explore non-linear models to better capture the complex relationship between ratings and sentiment residuals.

5. Repository Structure

File	Description
`factor_construction_finbert.ipynb`	NLP Pipeline: FinBERT implementation, probability renormalization, asymmetric penalties, and linear decay aggregation.
`backtest_simple.ipynb`	Backtest Engine: Standard monthly rebalancing engine. Implements the strict orthogonalization (Residuals vs Rating, Count, Size, Sector).
`backtest_transformation.ipynb`	Optimization: Advanced backtesting with factor transformations to optimize signal distribution and stability.

| data/ | Contains the processed factor and market data. |

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
backtest_simple.ipynb		backtest_simple.ipynb
backtest_transformation.ipynb		backtest_transformation.ipynb
factor_construction_finbert.ipynb		factor_construction_finbert.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-Rating Divergence: Quantifying the "Unspoken" Alpha

1. Investment Philosophy & Motivation

The "Divergence" Hypothesis

Adaptation for the US Market

2. Factor Construction Methodology

Step 1: Sentiment Extraction (FinBERT)

Step 2: Asymmetric Adjustment (The "Fear" Bias)

Step 3: Time-Series Aggregation

Step 4: Strict Orthogonalization (The "Pure" Divergence)

3. Backtesting Performance

Framework

Phase 1: Initial Results (Base Factor)

Phase 2: Transformation & Optimization

4. Conclusion & Discussion

5. Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment-Rating Divergence: Quantifying the "Unspoken" Alpha

1. Investment Philosophy & Motivation

The "Divergence" Hypothesis

Adaptation for the US Market

2. Factor Construction Methodology

Step 1: Sentiment Extraction (FinBERT)

Step 2: Asymmetric Adjustment (The "Fear" Bias)

Step 3: Time-Series Aggregation

Step 4: Strict Orthogonalization (The "Pure" Divergence)

3. Backtesting Performance

Framework

Phase 1: Initial Results (Base Factor)

Phase 2: Transformation & Optimization

4. Conclusion & Discussion

5. Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages