Skip to content

fix: measure firstResponseMs on first NDJSON line, not first tool_use#789

Open
km-git007 wants to merge 2 commits intogarrytan:mainfrom
km-git007:fix/e2e-first-response-timing
Open

fix: measure firstResponseMs on first NDJSON line, not first tool_use#789
km-git007 wants to merge 2 commits intogarrytan:mainfrom
km-git007:fix/e2e-first-response-timing

Conversation

@km-git007
Copy link
Copy Markdown

Summary

The E2E test diagnostic metric `firstResponseMs` was measuring the wrong event:

  • Documented: Time from spawn to first NDJSON line (first Claude response)
  • Actually measured: Time from spawn to first tool_use event

This caused the metric to be inflated by 5-15 seconds, undermining rate-limit diagnostics.

What Changed

  • Moved `firstResponseMs` timing capture to NDJSON line-read loop
  • Removed buggy timing capture from tool_use handler
  • Updated comments for clarity

Testing

Existing test suite passes. Future test runs will show accurate ~100-300ms for `firstResponseMs` instead of previous 5000-15000ms.

km-git007 and others added 2 commits April 3, 2026 12:08
docs: Slate agent integration research + design doc (garrytan#782)
The firstResponseMs metric was documented as "time from spawn to first
NDJSON line" but the code only set it on the first tool_use event within
an assistant message. This caused the metric to measure time to first
tool call rather than time to first Claude response—often inflating the
value by 5-15 seconds.

Now correctly measures time from spawn to the first NDJSON line received,
which accurately reflects Claude's response latency for rate-limit and
performance diagnostics.

Changes:
- Moved firstResponseMs timing capture to line-read loop (line 212-215)
- Removed buggy timing capture from tool_use handler (was line 224)
- Updated comment to clarify we track inter-turn latency for tool calls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant