AA reworked their Intelligence Index, and older models now score far lower than the 2026-05-14 snapshot assumed. Live raw for Qwen3-8B maps to 1.5 normalized vs 40.0 in the snapshot. #97 merged with a conservative max-merge overlay, so live downgrades never propagate. To make live data authoritative again:
- Re-derive
_AA_INDEX_MIN / _AA_INDEX_MAX from the current score distribution
- Regenerate the AA fallback snapshot from a fresh scrape on the new scale
- Switch the overlay from max-merge to live-wins
This sits at the core of ranking accuracy, so I'll take it myself.
AA reworked their Intelligence Index, and older models now score far lower than the 2026-05-14 snapshot assumed. Live raw for Qwen3-8B maps to 1.5 normalized vs 40.0 in the snapshot. #97 merged with a conservative max-merge overlay, so live downgrades never propagate. To make live data authoritative again:
_AA_INDEX_MIN/_AA_INDEX_MAXfrom the current score distributionThis sits at the core of ranking accuracy, so I'll take it myself.