You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
qa/score.sh (the pinned-Claude Angry-DM lens) hangs when scoring a combat-sprint transcript: claude -p produces nothing on stdout for 300s+. Seen on both Mac (cs-mechread re-score) and the VM (cs-postfix-1). The fight itself runs + the behavioral gate passes GREEN — only the lens scoring hangs.
Diagnosis (this session)
Claude is healthy: a direct claude -p --model sonnet probe on the VM returns in ~2s.
Not the bypass flag / root: IS_SANDBOX=1 ... claude -p --permission-mode bypassPermissions + tiny input returns in ~1.5s.
Not throttling/auth: probe + bypass+tiny both work immediately; scorer stderr is empty.
It is specific to the large combat-sprint Angry-DM input (~86KB: rubric_angry_dm.md 31KB + the combat .md ~29KB + the combat state.json ~23KB). The duo Angry-DM scores fine with the same rubric (sweep got angrydm 3.5), so it is combat-sprint-transcript-specific (dense combat state / tool-heavy transcript), not rubric size.
The combat-sprint is the fast mech bug-finder; without a score it can still find behavioral-gate defects but not produce the Angry-DM mechanical number to track lift (e.g. confirming #1033/#1038).
Next (cheap → expensive)
Bisect the input: score with (a) rubric+transcript only, (b) rubric+state only, (c) a trimmed transcript — find the size/content threshold.
Check whether the combat state.json (dense entities/effects/zones) is the trigger vs the transcript.
Try --output-format stream-json to see if it streams anything (distinguish a true CLI hang from slow generation).
If it is a stdin/large-input CLI issue, consider chunking or a state digest for the scorer prompt (the duo uses a leaner state).
Workaround for now: measure mech via a combat-seeking duo (duo Angry-DM scores fine) rather than the combat-sprint.
Symptom
qa/score.sh(the pinned-Claude Angry-DM lens) hangs when scoring a combat-sprint transcript:claude -pproduces nothing on stdout for 300s+. Seen on both Mac (cs-mechread re-score) and the VM (cs-postfix-1). The fight itself runs + the behavioral gate passes GREEN — only the lens scoring hangs.Diagnosis (this session)
claude -p --model sonnetprobe on the VM returns in ~2s.IS_SANDBOX=1 ... claude -p --permission-mode bypassPermissions+ tiny input returns in ~1.5s.rubric_angry_dm.md31KB + the combat.md~29KB + the combatstate.json~23KB). The duo Angry-DM scores fine with the same rubric (sweep got angrydm 3.5), so it is combat-sprint-transcript-specific (dense combat state / tool-heavy transcript), not rubric size.timeout ${WORLDOS_SCORE_TIMEOUT:-300}guard so a hang no longer blocks forever (it now retries / fails loudly) — but the underlying hang remains, so the combat-sprint mech NUMBER feedback loop is still blocked.Impact
The combat-sprint is the fast mech bug-finder; without a score it can still find behavioral-gate defects but not produce the Angry-DM mechanical number to track lift (e.g. confirming #1033/#1038).
Next (cheap → expensive)
state.json(dense entities/effects/zones) is the trigger vs the transcript.--output-format stream-jsonto see if it streams anything (distinguish a true CLI hang from slow generation).Workaround for now: measure mech via a combat-seeking duo (duo Angry-DM scores fine) rather than the combat-sprint.