Skip to content

Angry-DM scorer (claude -p) hangs on combat-sprint transcripts — zero stdout for 300s+ #1040

@100yenadmin

Description

@100yenadmin

Symptom

qa/score.sh (the pinned-Claude Angry-DM lens) hangs when scoring a combat-sprint transcript: claude -p produces nothing on stdout for 300s+. Seen on both Mac (cs-mechread re-score) and the VM (cs-postfix-1). The fight itself runs + the behavioral gate passes GREEN — only the lens scoring hangs.

Diagnosis (this session)

  • Claude is healthy: a direct claude -p --model sonnet probe on the VM returns in ~2s.
  • Not the bypass flag / root: IS_SANDBOX=1 ... claude -p --permission-mode bypassPermissions + tiny input returns in ~1.5s.
  • Not throttling/auth: probe + bypass+tiny both work immediately; scorer stderr is empty.
  • It is specific to the large combat-sprint Angry-DM input (~86KB: rubric_angry_dm.md 31KB + the combat .md ~29KB + the combat state.json ~23KB). The duo Angry-DM scores fine with the same rubric (sweep got angrydm 3.5), so it is combat-sprint-transcript-specific (dense combat state / tool-heavy transcript), not rubric size.
  • PR fix(qa): timeout-guard the scorer claude -p call (stop intermittent hangs) #1039 added a timeout ${WORLDOS_SCORE_TIMEOUT:-300} guard so a hang no longer blocks forever (it now retries / fails loudly) — but the underlying hang remains, so the combat-sprint mech NUMBER feedback loop is still blocked.

Impact

The combat-sprint is the fast mech bug-finder; without a score it can still find behavioral-gate defects but not produce the Angry-DM mechanical number to track lift (e.g. confirming #1033/#1038).

Next (cheap → expensive)

  1. Bisect the input: score with (a) rubric+transcript only, (b) rubric+state only, (c) a trimmed transcript — find the size/content threshold.
  2. Check whether the combat state.json (dense entities/effects/zones) is the trigger vs the transcript.
  3. Try --output-format stream-json to see if it streams anything (distinguish a true CLI hang from slow generation).
  4. If it is a stdin/large-input CLI issue, consider chunking or a state digest for the scorer prompt (the duo uses a leaner state).

Workaround for now: measure mech via a combat-seeking duo (duo Angry-DM scores fine) rather than the combat-sprint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions