Angry-DM scorer (claude -p) hangs on combat-sprint transcripts — zero stdout for 300s+

## Symptom
`qa/score.sh` (the pinned-Claude Angry-DM lens) **hangs** when scoring a combat-sprint transcript: `claude -p` produces **nothing** on stdout for 300s+. Seen on both Mac (cs-mechread re-score) and the VM (cs-postfix-1). The fight itself runs + the behavioral gate passes GREEN — only the lens scoring hangs.

## Diagnosis (this session)
- Claude is healthy: a direct `claude -p --model sonnet` probe on the VM returns in ~2s.
- Not the bypass flag / root: `IS_SANDBOX=1 ... claude -p --permission-mode bypassPermissions` + **tiny** input returns in ~1.5s.
- Not throttling/auth: probe + bypass+tiny both work immediately; scorer stderr is empty.
- **It is specific to the large combat-sprint Angry-DM input** (~86KB: `rubric_angry_dm.md` 31KB + the combat `.md` ~29KB + the combat `state.json` ~23KB). The **duo** Angry-DM scores fine with the same rubric (sweep got angrydm 3.5), so it is combat-sprint-transcript-specific (dense combat state / tool-heavy transcript), not rubric size.
- PR #1039 added a `timeout ${WORLDOS_SCORE_TIMEOUT:-300}` guard so a hang no longer blocks forever (it now retries / fails loudly) — but the underlying hang remains, so the combat-sprint mech NUMBER feedback loop is still blocked.

## Impact
The combat-sprint is the fast mech bug-finder; without a score it can still find behavioral-gate defects but not produce the Angry-DM mechanical number to track lift (e.g. confirming #1033/#1038).

## Next (cheap → expensive)
1. Bisect the input: score with (a) rubric+transcript only, (b) rubric+state only, (c) a trimmed transcript — find the size/content threshold.
2. Check whether the combat `state.json` (dense entities/effects/zones) is the trigger vs the transcript.
3. Try `--output-format stream-json` to see if it streams anything (distinguish a true CLI hang from slow generation).
4. If it is a stdin/large-input CLI issue, consider chunking or a state digest for the scorer prompt (the duo uses a leaner state).

Workaround for now: measure mech via a **combat-seeking duo** (duo Angry-DM scores fine) rather than the combat-sprint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Angry-DM scorer (claude -p) hangs on combat-sprint transcripts — zero stdout for 300s+ #1040

Symptom

Diagnosis (this session)

Impact

Next (cheap → expensive)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Angry-DM scorer (claude -p) hangs on combat-sprint transcripts — zero stdout for 300s+ #1040

Description

Symptom

Diagnosis (this session)

Impact

Next (cheap → expensive)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions