[codex] Expose first-response analytics in SDK CLI MCP#18
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive "first-response" analytics suite across the CLI, MCP package, and TypeScript SDK, effectively deprecating the legacy voice.user_to_bot_latency metric. The changes include new CLI commands, MCP tools, and SDK methods for accessing first-response summaries, timeseries, breakdowns, and interaction waterfalls. Feedback focuses on improving the CLI implementation by correcting an overly restrictive type cast for granularity, expanding help descriptions to include missing grouping dimensions, and strengthening numeric validation for the limit parameter to properly handle NaN values.
| .description("Bucketed P50/P95/P99 first-response latency for one character.") | ||
| .requiredOption("--character-id <id>", "Character id") | ||
| .option("--mode <mode>", "Observed mode") | ||
| .option("--group-by <dim>", "Optional group: mode | settings_hash | stage | provider | model | status | turn_scope") |
There was a problem hiding this comment.
The description for the --group-by option is missing several valid grouping dimensions supported by the SDK and MCP tools, specifically breakdown_kind and character_id.
| .option("--group-by <dim>", "Optional group: mode | settings_hash | stage | provider | model | status | turn_scope") | |
| .option("--group-by <dim>", "Optional group: mode | breakdown_kind | stage | settings_hash | turn_scope | provider | model | status | character_id") |
| groupBy: opts.groupBy as FirstResponseGroupBy | undefined, | ||
| turnScope: opts.turnScope as TurnScope, | ||
| latencyKind: opts.latencyKind as FirstResponseLatencyKind, | ||
| granularity: opts.granularity as "hour", |
There was a problem hiding this comment.
The granularity option is cast to the literal type "hour", which is incorrect as the CLI option allows minute, hour, or day. This cast will cause the TypeScript compiler to ignore invalid values while potentially misleading developers about the supported runtime values.
| granularity: opts.granularity as "hour", | |
| granularity: opts.granularity as "minute" | "hour" | "day", |
| .command("breakdown") | ||
| .description("Grouped first-response latency for one character.") | ||
| .requiredOption("--character-id <id>", "Character id") | ||
| .option("--group-by <dim>", "mode | settings_hash | stage | provider | model | status | turn_scope", "mode") |
There was a problem hiding this comment.
The description for the --group-by option is missing several valid grouping dimensions supported by the SDK and MCP tools, specifically breakdown_kind and character_id.
| .option("--group-by <dim>", "mode | settings_hash | stage | provider | model | status | turn_scope", "mode") | |
| .option("--group-by <dim>", "mode | breakdown_kind | stage | settings_hash | turn_scope | provider | model | status | character_id", "mode") |
| turnScope: opts.turnScope as TurnScope, | ||
| latencyKind: opts.latencyKind as FirstResponseLatencyKind, | ||
| range: opts.range as RelativeRange, | ||
| limit: typeof opts.limit === "number" ? opts.limit : undefined, |
There was a problem hiding this comment.
The limit check only verifies if the type is number, which includes NaN. If a non-numeric value is passed to the CLI, opts.limit will be NaN, which is then passed to the SDK and eventually the API. It's safer to use Number.isFinite() to ensure a valid numeric value is sent.
| limit: typeof opts.limit === "number" ? opts.limit : undefined, | |
| limit: typeof opts.limit === "number" && Number.isFinite(opts.limit) ? opts.limit : undefined, |
58058d1 to
597fada
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 58058d1d69
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| handler: (client, args) => | ||
| client.firstResponse.settings(pickString(args, "settingsHash")!, { | ||
| characterId: pickString(args, "characterId"), | ||
| range: pickRange(args), |
There was a problem hiding this comment.
Honor last_30d default for first-response settings
The schema for get_first_response_settings declares range defaults to last_30d, but the handler calls pickRange(args), which falls back to last_24h when range is omitted. That mismatch narrows the lookup window and can cause settings hashes older than 24 hours to appear missing even though callers followed the documented defaults. Use a last_30d fallback in this handler (or read the schema-defaulted value directly) so behavior matches the declared contract.
Useful? React with 👍 / 👎.
597fada to
d6c9d25
Compare
Summary
voice.user_to_bot_latencyfor SLA reporting, and align first-response defaults/help textLocal verification
npm run lint,npm run typecheck,npm test -- --test-name-pattern firstResponse, andnpm run buildinpackages/typescriptnpm testandnpm run typecheckinpackages/mcpnpm run typecheckincligit diff --check