docs: annotate the tightened scoring ruler (scores NOT comparable across rulers) by 100yenadmin · Pull Request #1034 · electricsheephq/WorldOS

100yenadmin · 2026-06-19T16:08:23Z

Owner-flagged stable-checkpoint hygiene: current scores read lower than historic because the scoring ruler got more rigorous this cycle, not because quality regressed — and that wasn't annotated anywhere human-readable.

What this adds

qa/SCORING.md §0 — "The scoring ruler is VERSIONED": every run is fenced by scoring_config_version (sc_…) + lens_config_version (lc_…); a number is meaningless without its ruler; the ruler is a deliberately-tightening feedback loop, so newer-ruler numbers read lower by design. Includes a ruler history (current sc_d4b93982763a vs the looser ≤v1.0.4 rulers) + how to re-score a historic transcript for apples-to-apples.
CHANGELOG [1.0.5-rc1] — an explicit "⚠ Scoring ruler tightened" note tying the current ruler hash to the features that tightened it (feat(qa): feature-engagement feedback loop — manifest + coverage scorer + forcing gate (WS0, all-WARN) #1018/feat(engine): acts-engine — runtime act cursor + act-transition cues + advance tools (Phase B 1-3) #1001/feat(qa): acts-engine felt-shape scorer + flat-arc WARN gate (Phase B 4-5) #1002/fix(audit): un-invert betrayal — gates, agenda values, telegraph anchor #999/content: ashfall-reach — the romance golden spine (exercises the romance gate) #997/feat(qa): dm_advanced_time WARN guard — unmask a frozen DM the soft-tick hid (WS-E) #1024/fix(qa): gate-severity audit — stop FATAL behavioral gates false-capping short emergent duos #1030) and warning against cross-ruler comparison (e.g. the historic gs-ledger-deep 4.8 was an older ruler).

Why it matters

We're beyond a normal release window; stable, differentiable checkpoints are critical. The sc_/lc_ stamping already exists in scores_db — this makes the human framing match, so past/future runs and versions stay distinguishable and the scorer can serve as the autonomous build-and-improve feedback loop it's designed to be.

Docs-only. 🤖 Generated with Claude Code

…mparable across rulers The 2026-06 cycle materially tightened the scoring ruler (feature-engagement coverage scorer #1018, acts felt-shape #1001/#1002, betrayal un-inversion #999, romance gate #997, dm_advanced_time unmask #1024, gate-severity accuracy #1030), so a run scores LOWER under sc_d4b93982763a/lc_d7fcfddd5bf7 than under the v1.0.4 rulers — BY DESIGN (the scorer is a tightening feedback loop). Document the ruler-version mechanism + history in SCORING.md §0 and annotate it in the v1.0.5-rc1 CHANGELOG, so current numbers are never mis-compared to historic ones (every scores_db row is fenced by scoring_config_version/lens_config_version). Stable-checkpoint hygiene per owner.

coderabbitai · 2026-06-19T16:08:31Z

Warning

Review limit reached

@100yenadmin, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 hours, 53 minutes, and 2 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d815b1d6-b128-4615-80f7-ad65411f84f1

📥 Commits

Reviewing files that changed from the base of the PR and between 540eff2 and 47ad054.

📒 Files selected for processing (2)

CHANGELOG.md
qa/SCORING.md

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

… annotation (#1034) (#1035) Checkpoint marking the Guiding Bolt SRD duration fix (found by running the combat-sprint, proven RED->GREEN, adversarially reviewed) + the ruler-version annotation. Still NOT a GA — mech remains below the 4.5 bar; story BG-caliber + satisfaction green. Co-authored-by: Eva <arncalso@gmail.com>

100yenadmin merged commit 522ec0c into main Jun 19, 2026
20 checks passed

100yenadmin mentioned this pull request Jun 19, 2026

docs(changelog): v1.0.5-rc2 — combat fidelity (#1033) + scoring-ruler annotation (#1034) #1035

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: annotate the tightened scoring ruler (scores NOT comparable across rulers)#1034

docs: annotate the tightened scoring ruler (scores NOT comparable across rulers)#1034
100yenadmin merged 1 commit into
mainfrom
docs/scoring-ruler-rigor-annotation

100yenadmin commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

100yenadmin commented Jun 19, 2026

What this adds

Why it matters

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant