Skip to content

fix(qa): timeout-guard the scorer claude -p call (stop intermittent hangs)#1039

Merged
100yenadmin merged 1 commit into
mainfrom
fix/score-timeout-guard
Jun 19, 2026
Merged

fix(qa): timeout-guard the scorer claude -p call (stop intermittent hangs)#1039
100yenadmin merged 1 commit into
mainfrom
fix/score-timeout-guard

Conversation

@100yenadmin

Copy link
Copy Markdown
Member

The Angry-DM/lens scorer (qa/score.sh) retries on unparseable output but had no timeout on claude -p — so an intermittent hang (stuck stream / slow response) blocked the entire run forever. Seen repeatedly: the run fights + gates GREEN, then scoring hangs (combat-sprint, north-star). One-line fix: wrap claude -p in timeout ${WORLDOS_SCORE_TIMEOUT:-300} so a hang becomes empty output the existing retry loop catches. Healthy scores are ~60–150s, so it never fires on a good call. Unblocks the QA feedback loop. Shell-only (bash -n clean).

…angs blocking forever)

score.sh retries on unparseable/transient output but had NO wall-clock bound on claude -p, so a
genuine hang (stuck stream / slow response that never returns) blocked the entire run forever —
seen repeatedly on combat-sprint + north-star scoring (the run fights+gates GREEN, then scoring
hangs). Wrap claude -p in `timeout ${WORLDOS_SCORE_TIMEOUT:-300}` so a hang becomes empty output
that the existing retry loop catches. Healthy scores are ~60-150s → never fires on a good call.
Unblocks the whole QA feedback loop (every scored run depends on this).
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@100yenadmin, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 13 minutes and 53 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7d3ad4ee-2668-468d-950c-ced58e02e10a

📥 Commits

Reviewing files that changed from the base of the PR and between 928add1 and e4676a3.

📒 Files selected for processing (1)
  • qa/score.sh

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e4676a3bea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread qa/score.sh
# fires on a good call — it only rescues a genuine hang.
printf '%s' "$INPUT" | env -u ANTHROPIC_BASE_URL -u ANTHROPIC_API_KEY -u ANTHROPIC_AUTH_TOKEN \
-u API_TIMEOUT_MS -u CLAUDE_CONFIG_DIR claude -p \
-u API_TIMEOUT_MS -u CLAUDE_CONFIG_DIR timeout "${WORLDOS_SCORE_TIMEOUT:-300}" claude -p \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use worldos_timeout instead of bare timeout

On stock macOS this newly added bare timeout command is not available, and this scorer is called from the local QA lanes such as run_duo.sh and run_combat_sprint.sh; in that environment env ... timeout ... exits 127 before claude starts, leaving $RAW empty on every retry and making every score fail. This reintroduces the same coreutils dependency the repo already fixed for play lanes with worldos_timeout/python fallback, so the scorer timeout guard needs to use that shim or an equivalent fallback.

Useful? React with 👍 / 👎.

@100yenadmin 100yenadmin merged commit 6b989e8 into main Jun 19, 2026
20 checks passed
100yenadmin added a commit that referenced this pull request Jun 19, 2026
…ty (#1038) + scorer timeout-guard (#1039) (#1041)

Test-proven code checkpoint on rc2. Still NOT a GA — the GLM re-measure under the current ruler is
deferred (scorer hangs on combat-sprint transcripts, #1040). Story above bar at depth (old ruler),
mech the real gap, satisfaction green.

Co-authored-by: Eva <arncalso@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant