feat(#90): ccxray usage CLI — automated data-driven analysis by lis186 · Pull Request #94 · lis186/ccxray

lis186 · 2026-06-20T17:10:46Z

摘要

新增 ccxray usage CLI 命令 — 直接讀取 index.ndjson，0.6 秒內產出使用量分析，不需啟動 server。把 #89 手動跑 Python 腳本的分析能力變成一個零成本的指令，讓 agent 和開發者快速做資料驅動的決策。

所有過濾器都走「現有 flag 變聰明」的設計（Do-What-I-Mean），使用者不需學新概念：

--session 接受 latest/costliest 別名、標題子字串、UUID 前綴
--cwd 接受目錄名子字串，多個值自動切換成專案比較表
--open 從分析結果一鍵跳到 dashboard 的該 session

設計過程用多專家模擬（fzf / ripgrep / clig.dev / gh CLI 作者的心智模型）+ 加權評分機制（只接受 9 分以上方案）逐一驗證每個 UX 決策。

Detail

What it does

ccxray usage reads ~/.ccxray/logs/index.ndjson directly and prints aggregated analysis. Human-readable by default, --json for agents (<4KB, deterministic, idempotent).

Sections:

meta — total entries, sessions, cost, time range
sessions — by provider, subagent ratio, turn distribution, top 10 costliest sessions (with titles)
models — turns + cost share per model
tools — call counts, fail rate (--tools for full list)
skills — per-skill invocations + loads (unique sessions) + scope detection (user/project/plugin)
prompt hash stability — how often sys/tools/core prompts change between turns
cache — hit rate, plus hit rate bucketed by inter-turn gap (reveals the 1h cache TTL cliff)

Filters (all composable)

ccxray usage --last 7d                   # time filter (d/h/m)
ccxray usage --cwd myproject             # directory name substring
ccxray usage --cwd proj-a,proj-b         # multi-project comparison table
ccxray usage --session latest            # alias
ccxray usage --session costliest         # alias
ccxray usage --session "fix login"       # title substring
ccxray usage --session 950432            # UUID prefix
ccxray usage --session costliest --open  # jump to dashboard

Implementation notes

Fast-path dispatch in server/index.js before any server require — keeps it at 0.6s with no server boot
extractToolCalls in helpers.js now expands Skill/Workflow tool calls to Skill:<name> for per-skill tracking (new data only; old data shows as (pre-tracking))
Skill scope is derived at analysis time by scanning known skill directories — reflects current state, not historical
Cross-validated against raw index computation: entry count, cost, tool calls, cache tokens, model breakdown all match exactly (Codex sessions included)

Tests

29 unit + CLI e2e tests in test/usage.test.js. No hardcoded user paths.

What's documented

docs/wire-protocol-reference.md unaffected. README updated in all three languages (en/zh-TW/ja).

🤖 Generated with Claude Code

Pure CLI command that reads index.ndjson directly (no server needed, 0.6s). Sections: meta, sessions (with top 10 costliest), models, tools, skills (with scope detection), prompt hash stability, cache hit rates by inter-turn gap, and project comparison. Smart filters: - --session: aliases (latest/costliest), title substring, UUID prefix - --cwd: directory name substring, multi-cwd comparison table - --last: duration filter (7d/24h/30m) - --json: agent-consumable output (<4KB) - --tools: full tool breakdown Also: expand Skill/Workflow tool calls to Skill:<name> in extractToolCalls for per-skill tracking in future data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add --open: resolve a single session via --session, then open the dashboard to it. Reuses hub.readHubLock() for the port. Simplify pass (4 parallel reviewers): - latest resolver: O(1) last-entry instead of O(n) reduce (entries are append-ordered) - multi-cwd comparison reuses analyze() instead of re-deriving aggregates - sort session turns once before hashStability + gapVsCache - openDashboard uses hub.readHubLock() instead of hand-parsing hub.json - drop redundant title slice and misleading glob alias Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lis186 · 2026-06-20T17:48:32Z

Self-review findings (sorted by severity)

The substantive risk concentrates in the 5-line helpers.js change — extractToolCalls is a shared writer, so its output ripples through the whole dashboard data path, not just usage.js.

🔴 High — `extractToolCalls` `Skill:<name>` expansion has unverified downstream side effects

server/helpers.js:724 now rewrites Skill/Workflow tool-call keys to Skill:<name>. But this function feeds wire-parsers/anthropic.js:55 → live entry → SSE broadcast → index.ndjson. Every new Anthropic turn's toolCalls changes from {Skill: 5} to {"Skill:review": 5}. Three downstream consumers assume the key is exactly 'Skill':

public/miller-columns.js:2332 & :2337 — tc['Skill'] > 0 is the cheap pre-lazy-load signal for "does this turn have a skill call". New data has Skill:review keys → tc['Skill'] is undefined → the Skills section loses its pre-reqLoaded detection (graceful degradation, not a crash — recovers from req.tools once the turn is loaded).
public/entry-rendering.js:513 — tool chips render Skill:review instead of Skill (visible, undocumented UI change).
public/miller-columns.js:1791 — usedTools set counts each Skill:x as a distinct tool → inflates toolUtilization %.

CLAUDE.md requires a browser smoke test for server/UI changes; it wasn't run here. Classic "unit tests don't catch render-pipeline coupling" case.

Fix options: treat Skill:* as Skill in the 3 dashboard consumers (altitude-correct), or run a browser smoke and accept the degradation.

🟡 Medium — whole-file read into memory, no cap

server/usage.js:57 fs.readFileSync loads the entire index.ndjson (~87 MB in practice) then split('\n'). Fine for a one-shot CLI today, but OOM risk as the index grows unbounded. Worth a ponytail: comment naming the ceiling + upgrade path (streaming readline).

🟡 Medium — `--cwd` path-vs-substring heuristic can misfire

server/usage.js:86 decides "path vs substring" via p.startsWith('/') || p.startsWith('~'). Relative paths (./foo, foo/bar) fall into the substring branch and may match unintended cwds. The naming/behavior gap isn't documented in --help.

🟢 Low — `openDashboard` exec string concatenation

exec(\${cmd} ${JSON.stringify(url)}`): url is s=${encodeURIComponent(sid.slice(0,8))}— session id is a hash/UUID and encoded, so injection risk is low.execFile` with an args array would be more robust than string concat.

🟢 Low — no test guards the dashboard coupling

test/usage.test.js covers analyze() and the CLI well, but nothing tests the extractToolCalls Skill:<name> expansion, and no test pins the dashboard's Skill-key assumptions. A future revert wouldn't be caught.

Codex second-pass review pending; will append if it surfaces anything new.

lis186 · 2026-06-20T18:05:55Z

Codex second-pass review — net-new findings

Codex confirmed the 🔴 tc['Skill'] break (ranks it Critical) and surfaced these additional issues not in the first comment:

🔴 High — `Workflow:<name>` expansion is asymmetric (dropped stats)

helpers.js:724 expands both Skill and Workflow to <name>:<x>, but usage.js:175 only rolls them back via name.startsWith('Skill:'). So Workflow:<name> keys are counted in tools.totalCalls but never aggregated into skills — Workflow invocation stats are silently dropped. Also: no evidence in the codebase (no test, no wire doc) that the Workflow tool even uses input.skill. Either drop the Workflow branch or read both prefixes back.

🔴 High — `analyze([])` produces `NaN` for `subagentRatio` (no guard)

usage.js:233 subagentCount / entries.length is unguarded, unlike failRate at :250 which correctly guards entries.length ? … : 0. run() exits early on empty, but analyze() is exported and called directly (tests, multi-cwd per-group) — analyze([]) returns NaN.

🔴 High — `gapVsCache` throws on non-numeric `elapsed`

usage.js:319 — if prev.elapsed is non-numeric, gapSec becomes NaN; the gapSec < 0 guard doesn't catch NaN, so BUCKETS.find(b => NaN < b.max) returns undefined → .key throws. Real data is always numeric, but the guard is absent.

🟡 Medium — `latest` resolves by write-order, not `receivedAt` (regression from simplify pass)

The simplify pass changed latest to entries[entries.length - 1] (O(1)). But index lines are append-order, not guaranteed chronological under hub mode with concurrent clients or startup restoration. latest can silently return a stale session. The original reduce by receivedAt was correct — this is a speed-for-correctness regression I introduced.

🟡 Medium — `--json` caps tools at 10, but help/README say "top 7"

analyze() caps tools.top at 10 (:249), printHuman slices to 7 (:429). HELP text and README both say "default: top 7". JSON consumers without --tools get 10 — documented-contract violation.

🟡 Medium — `analyze()` is not pure (filesystem read)

buildSkillScopeMap() runs readdirSync on plugin trees inside analyze(). Makes the exported function environment-dependent (tests pick up whatever skills the runner has installed) and, in multi-cwd mode, re-scans the same plugin dirs once per project. Build the map once, pass it in.

🟢 Low

docs/normalization-map.md:146 still says extractToolCalls counts by name — now stale.
parseArgs dead test scaffold (test/usage.test.js:144-150) extracts via regex then returns null — invalid --last 7x silently ignored, untested.
Combined --session + --cwd empty result only reports the --session branch in the hint.

Consolidated priority: the Critical tc['Skill'] break + the two 🔴 High correctness bugs (Workflow asymmetry, analyze NaN/gapVsCache crash) should block merge. The latest write-order regression is mine to revert. Medium/Low can be follow-ups.

…ontract The PR expanded Skill/Workflow tool calls to `Skill:<name>` inside extractToolCalls, a shared writer whose output flows to the dashboard (SSE → miller-columns/entry-rendering) and into index.ndjson. That polluted the toolCalls contract: `tc['Skill']` skill-detection broke, usedTools inflated toolUtilization, and tool chips rendered `Skill:x`. Fix at the source instead of patching every consumer: revert extractToolCalls to a clean {toolName: count} map and add a dedicated extractSkillCalls → skillCalls index field that only `ccxray usage` reads. The dashboard is untouched (zero render-pipeline risk). - helpers.js: extractToolCalls no longer expands; new extractSkillCalls counts only the model-initiated `Skill` tool_use (the `Workflow` tool has no `skill` input, so the old Workflow branch was dead — dropped). - anthropic.js + entry.js: wire skillCalls through buildEntryFields and INDEX_FIELDS (rebuild-index inherits it via buildEntryFields). - usage.js: read per-skill detail from skillCalls; legacy entries without the field still surface as (pre-tracking) with no double-count. Resolves review findings A1/A2/A3 (dashboard coupling) and A4 (Workflow asymmetry) at once. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ture The CLI tests defaulted CCXRAY_HOME to the runner's real ~/.ccxray, so they only passed where ambient logs happened to exist. CI has no logs → `usage` exits 1 ("no logs found") and 11 assertions failed. This also contradicted the PR's "no hardcoded user paths" claim (it concatenated $HOME/.ccxray and used process.env.HOME as a --cwd prefix). Write a small synthetic index.ndjson into a temp CCXRAY_HOME at module load and point cli()/cliErr() at it; clean up on process exit. The fixture has two /work/* sessions and one /other/* session (so a `/work` prefix is a strict subset), a subagent turn, varied models/tools/cache, and a skillCalls entry — exercising the new field end-to-end in CI too. Verified: full usage suite is green with HOME pointed at an empty dir. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two correctness bugs in the exported analyze() (callable directly and per multi-cwd group, so the run()-level empty guard doesn't protect it): - subagentRatio divided by entries.length with no guard → analyze([]) returned NaN. Guard like the sibling failRate does. - gapVsCache computed gapSec from receivedAt/elapsed; a missing or non-numeric receivedAt makes gapSec NaN, which matches no bucket — not even max:Infinity (NaN < Infinity is false) — so the bucket lookup threw. (elapsed was already guarded by `parseFloat || 0`; receivedAt was the real source.) Replace `gapSec < 0` with `!(gapSec >= 0)` to skip NaN too. Tests: analyze([]) zeroed-result and a non-finite-gap case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A simplify pass changed `latest` to entries[entries.length - 1] for O(1) lookup, but index.ndjson lines are append-order and can land out of sequence under hub concurrency or startup restoration — so `latest` could silently return a stale session. Restore the receivedAt reduce. Test writes an index whose newest-by-receivedAt entry is the FIRST line (last line is older) so file-order resolution picks the wrong session. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ing fs buildSkillScopeMap() (readdirSync over skill/plugin trees) ran inside analyze(), making the exported function environment-dependent — tests picked up whatever skills the runner had installed — and, in multi-cwd mode, re-scanning the same plugin dirs once per project group. Build the map once in run() and pass it via opts.scopeMap; analyze() defaults to {} so direct/test callers are deterministic. The multi-cwd comparison path doesn't pass a map (its output has no scope column), so it no longer scans the filesystem at all. Tests: scope is null without a scopeMap, and resolves when one is injected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

analyze() capped tools.top at 10 while printHuman re-sliced to 7 and --help/README document "default: top 7", so --json consumers silently got 10. Cap once in analyze() at 7 (—tools lifts it to all for both JSON and human); printHuman now iterates the already-capped list. Drops the now-unused opts param from printHuman and its call site. Test pins the contract: 7 by default, all with --tools. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The path-vs-substring heuristic keyed on startsWith('/')||startsWith('~'), but stored cwds are absolute so a literal `~/…` prefix never matched (latent bug), and relative values like `./foo` fell into the substring branch carrying the `./`. Expand `~`/`~/` to home before prefix matching, strip a leading `./` before substring matching, and document the rule in --help (absolute/~ = prefix, bare name = case-insensitive substring). Tests: bare-name substring, ./-stripping, and ~-expansion prefix match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

readFileSync + split holds the whole index.ndjson in memory. That's the right tradeoff for a 0.6s one-shot CLI today (tens of MB in practice), so rather than prematurely rewrite to streaming, name the ceiling and the upgrade path (a readline streaming pass, which would make run() async) in a ponytail comment for when the index grows unbounded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

openDashboard built `${cmd} ${JSON.stringify(url)}` and ran it through exec (a shell). The url is a hash/UUID-derived, encoded session query so injection risk was low, but execFile with an args array removes the shell entirely. Handles the win32 `start` builtin correctly (cmd /c start "" url). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`--last 7x` parsed to null and was silently ignored, so the user got an unfiltered result while believing a time filter applied. Exit 1 with a message naming the valid forms (7d, 24h, 30m). Also removes the dead parseArgs IIFE in the test (it read the source, regex-matched, then returned null and was never used). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The hint ternary reported only --session when both --session and --cwd were set (and never mentioned --last), so a combined no-match sent users chasing the wrong filter. Collect all active filters and tailor the hint: a single filter gets a targeted suggestion; multiple get "loosen one". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Pin extractToolCalls to keep Skill/Workflow as plain keys (so re-adding the Skill:<name> expansion that broke the dashboard would fail a test), and cover extractSkillCalls (skill-name keying, ignores non-Skill tools and Skill calls without a skill input). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reflect the final tool-call model in the normalization map and data model: extractToolCalls counts by plain tool name (Skill/Workflow not expanded), and the companion extractSkillCalls populates the separate skillCalls index field that `ccxray usage` reads for per-skill stats. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The --cwd ~ test resolved against the real os.homedir(), pulling the runner's home path (and username) into the test data. Set a throwaway $HOME so ~ expands to a temp dir instead — the test no longer touches the real home directory and contains only synthetic paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`latest` and `costliest` scanned the whole index before --last and --cwd were applied, so `usage --session latest --cwd /work/foo` resolved the global newest session and then filtered it out — reporting "no matching entries" even when /work/foo had sessions. Move the --last and --cwd filters ahead of session resolution so the aliases pick the newest / priciest session within the filtered scope. Tests: latest and costliest both scoped by --cwd. (Found by Codex second pass.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`e.cwd.startsWith(p)` let `--cwd /work/proj` also match a sibling `/work/proj-sibling`, silently inflating project totals. Match the exact dir or a real subtree (`e.cwd === pn || e.cwd.startsWith(pn + '/')`) after trimming trailing slashes. Test: /work/proj matches /work/proj but not /work/proj-sibling. (Found by Codex second pass.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add an Anthropic test that drives a Skill tool_use through buildEntryFields → buildIndexLine and asserts both the clean toolCalls ({Skill,Bash}) and the persisted skillCalls ({code-review}) survive the INDEX_FIELDS projection. Catches a future drop of skillCalls from the parser output or INDEX_FIELDS, which the helper-only test missed. (Found by Codex second pass.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The usage examples only demonstrated bare-name substring matching for --cwd. Add a line (en/zh-TW/ja) showing that an absolute or ~ path does an exact-subtree prefix match, so both modes are visible in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lis186 · 2026-06-21T02:17:31Z

Review findings resolved + green CI + Codex second pass

繁中：兩輪 review 的 14 項 + 失敗的 CI + Codex 二審新發現的 3 項，已全部修復並推上來。核心改動：Skill:<name> 不再塞進共用的 toolCalls 契約，改用獨立的 skillCalls index 欄位——dashboard 零改動（瀏覽器 smoke 驗證 chips 為乾淨的 Skill/Bash/Read、無 Skill: 洩漏、零 console error）。940 測試綠；Codex 二審 round 2 clean。

Key architectural change (resolves 🔴 A1–A4)

Rather than expand Skill/Workflow to Skill:<name> inside extractToolCalls (a shared writer feeding the dashboard and index.ndjson), extractToolCalls is reverted to a clean {toolName: count} map, and a dedicated extractSkillCalls → skillCalls index field carries per-skill granularity that only ccxray usage reads. The dashboard is untouched (zero render-pipeline risk). Verified by browser smoke: tool chips render Skill/Bash/Read, no Skill: leak anywhere, zero console errors. (Workflow has no skill input, so its branch was dead and is dropped.)

Findings → fixes

Finding (severity)	Commit
🔴 `extractToolCalls` dashboard coupling (A1/A2/A3) + `Workflow` asymmetry (A4)	`10c7e3d`
🔴 `analyze([])` NaN + `gapVsCache` crash on non-finite gap	`91ca52f`
🟡 `--session latest` resolved by write-order	`275ebbf`
🟡 `analyze()` not pure (fs read inside)	`6ced143`
🟡 `--json` tools cap 10 vs documented top 7	`c65fa54`
🟡 `--cwd` path-vs-substring heuristic (+ `~` never matched)	`fe6367c`
🟡 whole-file read, no ceiling	`1213568`
🟢 `openDashboard` shell string → `execFile`	`a2184be`
🟢 invalid `--last` silently ignored + dead test scaffold	`18681f1`
🟢 combined `--session`/`--cwd` empty-hint	`195a7f3`
🟢 no test guarding the dashboard/skill contract	`6d0b375`
🟢 stale `normalization-map.md` / `data-model.md`	`0f2a9b3`

Failing CI

Root cause: the CLI e2e tests defaulted CCXRAY_HOME to the real ~/.ccxray, so they passed locally but failed in CI (no logs → exit 1, 11 failures). They now use an isolated synthetic CCXRAY_HOME fixture and pass with an empty $HOME. — cfe5c19 (+ hermetic ~ test 53c2c97)

Codex second pass

Round 1 surfaced 3 net-new issues, all fixed:

--session latest/costliest resolved before --last/--cwd → now resolved within the filtered scope — fcbb146
absolute --cwd prefix not path-bound (/work/proj matched /work/proj-sibling) — fd2b763
no test proving skillCalls survives the real buildEntryFields → buildIndexLine projection — db6abce

Round 2: no substantive findings.

Follow-up docs (intentionally not in this PR)

docs: document the ccxray usage --json output schema (agent-facing contract) #95 — document the ccxray usage --json output schema (agent-facing contract)
docs: document test hygiene — isolated synthetic CCXRAY_HOME, no real user data #96 — document test hygiene (isolated synthetic CCXRAY_HOME, no real user data)
docs: add a decision record for the toolCalls vs skillCalls index contract #97 — decision record for the toolCalls vs skillCalls index contract

Verification: npm test → 940 pass / 0 fail; browser smoke on an isolated server; full diff re-reviewed by Codex (clean).

Record why toolCalls stays a plain {toolName:count} dashboard contract and per-skill detail lives in a separate skillCalls index, so a future "merge the two fields" change doesn't repeat the dashboard breakage from PR #94. Also documents why skillCalls is structurally Anthropic-only. Cross-linked from docs/normalization-map.md §5. Closes #97 Co-authored-by: Justin Lee <justinlee@91app.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(#96): document test hygiene convention Tests must point CCXRAY_HOME at a throwaway temp dir with their own synthetic index.ndjson and never read the real ~/.ccxray — the fallback that made PR #94's usage e2e tests pass locally but fail in empty-home CI. Adds docs/testing.md (4 rules, canonical pattern from usage.test.js, the $HOME-vs-CCXRAY_HOME distinction incl. the puppeteer Chrome-cache caveat) and a short Test Hygiene section + pointer in CLAUDE.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(#96): run tests against an empty CCXRAY_HOME Adds a test step that points CCXRAY_HOME at a fresh empty dir under $RUNNER_TEMP, as a backstop against the PR #94 failure class: a test that reads the real ~/.ccxray now finds no logs and fails the build. $HOME is left untouched so puppeteer's Chrome cache stays intact. CCXRAY_HOME is set at the step (not job) level via the $RUNNER_TEMP shell var — the runner context is not available in jobs.<id>.env. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Justin Lee <justinlee@91app.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Justin Lee and others added 2 commits June 20, 2026 22:45

Justin Lee and others added 18 commits June 21, 2026 02:34

lis186 merged commit 79817cd into main Jun 21, 2026
2 checks passed

lis186 deleted the feat/usage-cli-90 branch June 21, 2026 03:06

This was referenced Jun 21, 2026

docs(#96): test hygiene convention + empty-CCXRAY_HOME CI backstop #98

Merged

docs(#97): ADR for toolCalls vs skillCalls index contract #99

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#90): ccxray usage CLI — automated data-driven analysis#94

feat(#90): ccxray usage CLI — automated data-driven analysis#94
lis186 merged 20 commits into
mainfrom
feat/usage-cli-90

lis186 commented Jun 20, 2026

Uh oh!

lis186 commented Jun 20, 2026

Uh oh!

lis186 commented Jun 20, 2026

Uh oh!

lis186 commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lis186 commented Jun 20, 2026

摘要

Detail

What it does

Filters (all composable)

Implementation notes

Tests

What's documented

Uh oh!

lis186 commented Jun 20, 2026

Self-review findings (sorted by severity)

🔴 High — extractToolCalls Skill:<name> expansion has unverified downstream side effects

🟡 Medium — whole-file read into memory, no cap

🟡 Medium — --cwd path-vs-substring heuristic can misfire

🟢 Low — openDashboard exec string concatenation

🟢 Low — no test guards the dashboard coupling

Uh oh!

lis186 commented Jun 20, 2026

Codex second-pass review — net-new findings

🔴 High — Workflow:<name> expansion is asymmetric (dropped stats)

🔴 High — analyze([]) produces NaN for subagentRatio (no guard)

🔴 High — gapVsCache throws on non-numeric elapsed

🟡 Medium — latest resolves by write-order, not receivedAt (regression from simplify pass)

🟡 Medium — --json caps tools at 10, but help/README say "top 7"

🟡 Medium — analyze() is not pure (filesystem read)

🟢 Low

Uh oh!

lis186 commented Jun 21, 2026

Review findings resolved + green CI + Codex second pass

Key architectural change (resolves 🔴 A1–A4)

Findings → fixes

Failing CI

Codex second pass

Follow-up docs (intentionally not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🔴 High — `extractToolCalls` `Skill:<name>` expansion has unverified downstream side effects

🟡 Medium — `--cwd` path-vs-substring heuristic can misfire

🟢 Low — `openDashboard` exec string concatenation

🔴 High — `Workflow:<name>` expansion is asymmetric (dropped stats)

🔴 High — `analyze([])` produces `NaN` for `subagentRatio` (no guard)

🔴 High — `gapVsCache` throws on non-numeric `elapsed`

🟡 Medium — `latest` resolves by write-order, not `receivedAt` (regression from simplify pass)

🟡 Medium — `--json` caps tools at 10, but help/README say "top 7"

🟡 Medium — `analyze()` is not pure (filesystem read)