Skip to content

feat(#90): ccxray usage CLI — automated data-driven analysis#94

Merged
lis186 merged 20 commits into
mainfrom
feat/usage-cli-90
Jun 21, 2026
Merged

feat(#90): ccxray usage CLI — automated data-driven analysis#94
lis186 merged 20 commits into
mainfrom
feat/usage-cli-90

Conversation

@lis186

@lis186 lis186 commented Jun 20, 2026

Copy link
Copy Markdown
Owner

摘要

新增 ccxray usage CLI 命令 — 直接讀取 index.ndjson,0.6 秒內產出使用量分析,不需啟動 server。把 #89 手動跑 Python 腳本的分析能力變成一個零成本的指令,讓 agent 和開發者快速做資料驅動的決策。

所有過濾器都走「現有 flag 變聰明」的設計(Do-What-I-Mean),使用者不需學新概念:

  • --session 接受 latest/costliest 別名、標題子字串、UUID 前綴
  • --cwd 接受目錄名子字串,多個值自動切換成專案比較表
  • --open 從分析結果一鍵跳到 dashboard 的該 session

設計過程用多專家模擬(fzf / ripgrep / clig.dev / gh CLI 作者的心智模型)+ 加權評分機制(只接受 9 分以上方案)逐一驗證每個 UX 決策。

Detail

What it does

ccxray usage reads ~/.ccxray/logs/index.ndjson directly and prints aggregated analysis. Human-readable by default, --json for agents (<4KB, deterministic, idempotent).

Sections:

  • meta — total entries, sessions, cost, time range
  • sessions — by provider, subagent ratio, turn distribution, top 10 costliest sessions (with titles)
  • models — turns + cost share per model
  • tools — call counts, fail rate (--tools for full list)
  • skills — per-skill invocations + loads (unique sessions) + scope detection (user/project/plugin)
  • prompt hash stability — how often sys/tools/core prompts change between turns
  • cache — hit rate, plus hit rate bucketed by inter-turn gap (reveals the 1h cache TTL cliff)

Filters (all composable)

ccxray usage --last 7d                   # time filter (d/h/m)
ccxray usage --cwd myproject             # directory name substring
ccxray usage --cwd proj-a,proj-b         # multi-project comparison table
ccxray usage --session latest            # alias
ccxray usage --session costliest         # alias
ccxray usage --session "fix login"       # title substring
ccxray usage --session 950432            # UUID prefix
ccxray usage --session costliest --open  # jump to dashboard

Implementation notes

  • Fast-path dispatch in server/index.js before any server require — keeps it at 0.6s with no server boot
  • extractToolCalls in helpers.js now expands Skill/Workflow tool calls to Skill:<name> for per-skill tracking (new data only; old data shows as (pre-tracking))
  • Skill scope is derived at analysis time by scanning known skill directories — reflects current state, not historical
  • Cross-validated against raw index computation: entry count, cost, tool calls, cache tokens, model breakdown all match exactly (Codex sessions included)

Tests

29 unit + CLI e2e tests in test/usage.test.js. No hardcoded user paths.

What's documented

docs/wire-protocol-reference.md unaffected. README updated in all three languages (en/zh-TW/ja).

🤖 Generated with Claude Code

Justin Lee and others added 2 commits June 20, 2026 22:45
Pure CLI command that reads index.ndjson directly (no server needed, 0.6s).

Sections: meta, sessions (with top 10 costliest), models, tools, skills
(with scope detection), prompt hash stability, cache hit rates by
inter-turn gap, and project comparison.

Smart filters:
- --session: aliases (latest/costliest), title substring, UUID prefix
- --cwd: directory name substring, multi-cwd comparison table
- --last: duration filter (7d/24h/30m)
- --json: agent-consumable output (<4KB)
- --tools: full tool breakdown

Also: expand Skill/Workflow tool calls to Skill:<name> in extractToolCalls
for per-skill tracking in future data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --open: resolve a single session via --session, then open the
dashboard to it. Reuses hub.readHubLock() for the port.

Simplify pass (4 parallel reviewers):
- latest resolver: O(1) last-entry instead of O(n) reduce (entries are
  append-ordered)
- multi-cwd comparison reuses analyze() instead of re-deriving aggregates
- sort session turns once before hashStability + gapVsCache
- openDashboard uses hub.readHubLock() instead of hand-parsing hub.json
- drop redundant title slice and misleading glob alias

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lis186

lis186 commented Jun 20, 2026

Copy link
Copy Markdown
Owner Author

Self-review findings (sorted by severity)

The substantive risk concentrates in the 5-line helpers.js change — extractToolCalls is a shared writer, so its output ripples through the whole dashboard data path, not just usage.js.

🔴 High — extractToolCalls Skill:<name> expansion has unverified downstream side effects

server/helpers.js:724 now rewrites Skill/Workflow tool-call keys to Skill:<name>. But this function feeds wire-parsers/anthropic.js:55live entry → SSE broadcast → index.ndjson. Every new Anthropic turn's toolCalls changes from {Skill: 5} to {"Skill:review": 5}. Three downstream consumers assume the key is exactly 'Skill':

  1. public/miller-columns.js:2332 & :2337tc['Skill'] > 0 is the cheap pre-lazy-load signal for "does this turn have a skill call". New data has Skill:review keys → tc['Skill'] is undefined → the Skills section loses its pre-reqLoaded detection (graceful degradation, not a crash — recovers from req.tools once the turn is loaded).
  2. public/entry-rendering.js:513 — tool chips render Skill:review instead of Skill (visible, undocumented UI change).
  3. public/miller-columns.js:1791usedTools set counts each Skill:x as a distinct tool → inflates toolUtilization %.

CLAUDE.md requires a browser smoke test for server/UI changes; it wasn't run here. Classic "unit tests don't catch render-pipeline coupling" case.

Fix options: treat Skill:* as Skill in the 3 dashboard consumers (altitude-correct), or run a browser smoke and accept the degradation.

🟡 Medium — whole-file read into memory, no cap

server/usage.js:57 fs.readFileSync loads the entire index.ndjson (~87 MB in practice) then split('\n'). Fine for a one-shot CLI today, but OOM risk as the index grows unbounded. Worth a ponytail: comment naming the ceiling + upgrade path (streaming readline).

🟡 Medium — --cwd path-vs-substring heuristic can misfire

server/usage.js:86 decides "path vs substring" via p.startsWith('/') || p.startsWith('~'). Relative paths (./foo, foo/bar) fall into the substring branch and may match unintended cwds. The naming/behavior gap isn't documented in --help.

🟢 Low — openDashboard exec string concatenation

exec(\${cmd} ${JSON.stringify(url)}`): url is s=${encodeURIComponent(sid.slice(0,8))}— session id is a hash/UUID and encoded, so injection risk is low.execFile` with an args array would be more robust than string concat.

🟢 Low — no test guards the dashboard coupling

test/usage.test.js covers analyze() and the CLI well, but nothing tests the extractToolCalls Skill:<name> expansion, and no test pins the dashboard's Skill-key assumptions. A future revert wouldn't be caught.


Codex second-pass review pending; will append if it surfaces anything new.

@lis186

lis186 commented Jun 20, 2026

Copy link
Copy Markdown
Owner Author

Codex second-pass review — net-new findings

Codex confirmed the 🔴 tc['Skill'] break (ranks it Critical) and surfaced these additional issues not in the first comment:

🔴 High — Workflow:<name> expansion is asymmetric (dropped stats)

helpers.js:724 expands both Skill and Workflow to <name>:<x>, but usage.js:175 only rolls them back via name.startsWith('Skill:'). So Workflow:<name> keys are counted in tools.totalCalls but never aggregated into skills — Workflow invocation stats are silently dropped. Also: no evidence in the codebase (no test, no wire doc) that the Workflow tool even uses input.skill. Either drop the Workflow branch or read both prefixes back.

🔴 High — analyze([]) produces NaN for subagentRatio (no guard)

usage.js:233 subagentCount / entries.length is unguarded, unlike failRate at :250 which correctly guards entries.length ? … : 0. run() exits early on empty, but analyze() is exported and called directly (tests, multi-cwd per-group) — analyze([]) returns NaN.

🔴 High — gapVsCache throws on non-numeric elapsed

usage.js:319 — if prev.elapsed is non-numeric, gapSec becomes NaN; the gapSec < 0 guard doesn't catch NaN, so BUCKETS.find(b => NaN < b.max) returns undefined.key throws. Real data is always numeric, but the guard is absent.

🟡 Medium — latest resolves by write-order, not receivedAt (regression from simplify pass)

The simplify pass changed latest to entries[entries.length - 1] (O(1)). But index lines are append-order, not guaranteed chronological under hub mode with concurrent clients or startup restoration. latest can silently return a stale session. The original reduce by receivedAt was correct — this is a speed-for-correctness regression I introduced.

🟡 Medium — --json caps tools at 10, but help/README say "top 7"

analyze() caps tools.top at 10 (:249), printHuman slices to 7 (:429). HELP text and README both say "default: top 7". JSON consumers without --tools get 10 — documented-contract violation.

🟡 Medium — analyze() is not pure (filesystem read)

buildSkillScopeMap() runs readdirSync on plugin trees inside analyze(). Makes the exported function environment-dependent (tests pick up whatever skills the runner has installed) and, in multi-cwd mode, re-scans the same plugin dirs once per project. Build the map once, pass it in.

🟢 Low

  • docs/normalization-map.md:146 still says extractToolCalls counts by name — now stale.
  • parseArgs dead test scaffold (test/usage.test.js:144-150) extracts via regex then returns null — invalid --last 7x silently ignored, untested.
  • Combined --session + --cwd empty result only reports the --session branch in the hint.

Consolidated priority: the Critical tc['Skill'] break + the two 🔴 High correctness bugs (Workflow asymmetry, analyze NaN/gapVsCache crash) should block merge. The latest write-order regression is mine to revert. Medium/Low can be follow-ups.

Justin Lee and others added 18 commits June 21, 2026 02:34
…ontract

The PR expanded Skill/Workflow tool calls to `Skill:<name>` inside
extractToolCalls, a shared writer whose output flows to the dashboard
(SSE → miller-columns/entry-rendering) and into index.ndjson. That
polluted the toolCalls contract: `tc['Skill']` skill-detection broke,
usedTools inflated toolUtilization, and tool chips rendered `Skill:x`.

Fix at the source instead of patching every consumer: revert
extractToolCalls to a clean {toolName: count} map and add a dedicated
extractSkillCalls → skillCalls index field that only `ccxray usage`
reads. The dashboard is untouched (zero render-pipeline risk).

- helpers.js: extractToolCalls no longer expands; new extractSkillCalls
  counts only the model-initiated `Skill` tool_use (the `Workflow` tool
  has no `skill` input, so the old Workflow branch was dead — dropped).
- anthropic.js + entry.js: wire skillCalls through buildEntryFields and
  INDEX_FIELDS (rebuild-index inherits it via buildEntryFields).
- usage.js: read per-skill detail from skillCalls; legacy entries
  without the field still surface as (pre-tracking) with no double-count.

Resolves review findings A1/A2/A3 (dashboard coupling) and A4
(Workflow asymmetry) at once.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ture

The CLI tests defaulted CCXRAY_HOME to the runner's real ~/.ccxray, so
they only passed where ambient logs happened to exist. CI has no logs →
`usage` exits 1 ("no logs found") and 11 assertions failed. This also
contradicted the PR's "no hardcoded user paths" claim (it concatenated
$HOME/.ccxray and used process.env.HOME as a --cwd prefix).

Write a small synthetic index.ndjson into a temp CCXRAY_HOME at module
load and point cli()/cliErr() at it; clean up on process exit. The
fixture has two /work/* sessions and one /other/* session (so a `/work`
prefix is a strict subset), a subagent turn, varied models/tools/cache,
and a skillCalls entry — exercising the new field end-to-end in CI too.

Verified: full usage suite is green with HOME pointed at an empty dir.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two correctness bugs in the exported analyze() (callable directly and per
multi-cwd group, so the run()-level empty guard doesn't protect it):

- subagentRatio divided by entries.length with no guard → analyze([])
  returned NaN. Guard like the sibling failRate does.
- gapVsCache computed gapSec from receivedAt/elapsed; a missing or
  non-numeric receivedAt makes gapSec NaN, which matches no bucket — not
  even max:Infinity (NaN < Infinity is false) — so the bucket lookup threw.
  (elapsed was already guarded by `parseFloat || 0`; receivedAt was the
  real source.) Replace `gapSec < 0` with `!(gapSec >= 0)` to skip NaN too.

Tests: analyze([]) zeroed-result and a non-finite-gap case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A simplify pass changed `latest` to entries[entries.length - 1] for O(1)
lookup, but index.ndjson lines are append-order and can land out of
sequence under hub concurrency or startup restoration — so `latest`
could silently return a stale session. Restore the receivedAt reduce.

Test writes an index whose newest-by-receivedAt entry is the FIRST line
(last line is older) so file-order resolution picks the wrong session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing fs

buildSkillScopeMap() (readdirSync over skill/plugin trees) ran inside
analyze(), making the exported function environment-dependent — tests
picked up whatever skills the runner had installed — and, in multi-cwd
mode, re-scanning the same plugin dirs once per project group.

Build the map once in run() and pass it via opts.scopeMap; analyze()
defaults to {} so direct/test callers are deterministic. The multi-cwd
comparison path doesn't pass a map (its output has no scope column), so
it no longer scans the filesystem at all.

Tests: scope is null without a scopeMap, and resolves when one is injected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
analyze() capped tools.top at 10 while printHuman re-sliced to 7 and
--help/README document "default: top 7", so --json consumers silently
got 10. Cap once in analyze() at 7 (—tools lifts it to all for both JSON
and human); printHuman now iterates the already-capped list. Drops the
now-unused opts param from printHuman and its call site.

Test pins the contract: 7 by default, all with --tools.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The path-vs-substring heuristic keyed on startsWith('/')||startsWith('~'),
but stored cwds are absolute so a literal `~/…` prefix never matched
(latent bug), and relative values like `./foo` fell into the substring
branch carrying the `./`. Expand `~`/`~/` to home before prefix matching,
strip a leading `./` before substring matching, and document the rule in
--help (absolute/~ = prefix, bare name = case-insensitive substring).

Tests: bare-name substring, ./-stripping, and ~-expansion prefix match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
readFileSync + split holds the whole index.ndjson in memory. That's the
right tradeoff for a 0.6s one-shot CLI today (tens of MB in practice), so
rather than prematurely rewrite to streaming, name the ceiling and the
upgrade path (a readline streaming pass, which would make run() async)
in a ponytail comment for when the index grows unbounded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
openDashboard built `${cmd} ${JSON.stringify(url)}` and ran it through
exec (a shell). The url is a hash/UUID-derived, encoded session query so
injection risk was low, but execFile with an args array removes the shell
entirely. Handles the win32 `start` builtin correctly (cmd /c start "" url).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`--last 7x` parsed to null and was silently ignored, so the user got an
unfiltered result while believing a time filter applied. Exit 1 with a
message naming the valid forms (7d, 24h, 30m). Also removes the dead
parseArgs IIFE in the test (it read the source, regex-matched, then
returned null and was never used).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The hint ternary reported only --session when both --session and --cwd
were set (and never mentioned --last), so a combined no-match sent users
chasing the wrong filter. Collect all active filters and tailor the hint:
a single filter gets a targeted suggestion; multiple get "loosen one".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pin extractToolCalls to keep Skill/Workflow as plain keys (so re-adding
the Skill:<name> expansion that broke the dashboard would fail a test),
and cover extractSkillCalls (skill-name keying, ignores non-Skill tools
and Skill calls without a skill input).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reflect the final tool-call model in the normalization map and data
model: extractToolCalls counts by plain tool name (Skill/Workflow not
expanded), and the companion extractSkillCalls populates the separate
skillCalls index field that `ccxray usage` reads for per-skill stats.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The --cwd ~ test resolved against the real os.homedir(), pulling the
runner's home path (and username) into the test data. Set a throwaway
$HOME so ~ expands to a temp dir instead — the test no longer touches the
real home directory and contains only synthetic paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`latest` and `costliest` scanned the whole index before --last and --cwd
were applied, so `usage --session latest --cwd /work/foo` resolved the
global newest session and then filtered it out — reporting "no matching
entries" even when /work/foo had sessions. Move the --last and --cwd
filters ahead of session resolution so the aliases pick the newest /
priciest session within the filtered scope.

Tests: latest and costliest both scoped by --cwd. (Found by Codex second pass.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`e.cwd.startsWith(p)` let `--cwd /work/proj` also match a sibling
`/work/proj-sibling`, silently inflating project totals. Match the exact
dir or a real subtree (`e.cwd === pn || e.cwd.startsWith(pn + '/')`) after
trimming trailing slashes.

Test: /work/proj matches /work/proj but not /work/proj-sibling.
(Found by Codex second pass.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an Anthropic test that drives a Skill tool_use through
buildEntryFields → buildIndexLine and asserts both the clean toolCalls
({Skill,Bash}) and the persisted skillCalls ({code-review}) survive the
INDEX_FIELDS projection. Catches a future drop of skillCalls from the
parser output or INDEX_FIELDS, which the helper-only test missed.
(Found by Codex second pass.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The usage examples only demonstrated bare-name substring matching for
--cwd. Add a line (en/zh-TW/ja) showing that an absolute or ~ path does an
exact-subtree prefix match, so both modes are visible in the README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lis186

lis186 commented Jun 21, 2026

Copy link
Copy Markdown
Owner Author

Review findings resolved + green CI + Codex second pass

繁中:兩輪 review 的 14 項 + 失敗的 CI + Codex 二審新發現的 3 項,已全部修復並推上來。核心改動:Skill:<name> 不再塞進共用的 toolCalls 契約,改用獨立的 skillCalls index 欄位——dashboard 零改動(瀏覽器 smoke 驗證 chips 為乾淨的 Skill/Bash/Read、無 Skill: 洩漏、零 console error)。940 測試綠;Codex 二審 round 2 clean。

Key architectural change (resolves 🔴 A1–A4)

Rather than expand Skill/Workflow to Skill:<name> inside extractToolCalls (a shared writer feeding the dashboard and index.ndjson), extractToolCalls is reverted to a clean {toolName: count} map, and a dedicated extractSkillCallsskillCalls index field carries per-skill granularity that only ccxray usage reads. The dashboard is untouched (zero render-pipeline risk). Verified by browser smoke: tool chips render Skill/Bash/Read, no Skill: leak anywhere, zero console errors. (Workflow has no skill input, so its branch was dead and is dropped.)

Findings → fixes

Finding (severity) Commit
🔴 extractToolCalls dashboard coupling (A1/A2/A3) + Workflow asymmetry (A4) 10c7e3d
🔴 analyze([]) NaN + gapVsCache crash on non-finite gap 91ca52f
🟡 --session latest resolved by write-order 275ebbf
🟡 analyze() not pure (fs read inside) 6ced143
🟡 --json tools cap 10 vs documented top 7 c65fa54
🟡 --cwd path-vs-substring heuristic (+ ~ never matched) fe6367c
🟡 whole-file read, no ceiling 1213568
🟢 openDashboard shell string → execFile a2184be
🟢 invalid --last silently ignored + dead test scaffold 18681f1
🟢 combined --session/--cwd empty-hint 195a7f3
🟢 no test guarding the dashboard/skill contract 6d0b375
🟢 stale normalization-map.md / data-model.md 0f2a9b3

Failing CI

Root cause: the CLI e2e tests defaulted CCXRAY_HOME to the real ~/.ccxray, so they passed locally but failed in CI (no logs → exit 1, 11 failures). They now use an isolated synthetic CCXRAY_HOME fixture and pass with an empty $HOME. — cfe5c19 (+ hermetic ~ test 53c2c97)

Codex second pass

Round 1 surfaced 3 net-new issues, all fixed:

  • --session latest/costliest resolved before --last/--cwd → now resolved within the filtered scope — fcbb146
  • absolute --cwd prefix not path-bound (/work/proj matched /work/proj-sibling) — fd2b763
  • no test proving skillCalls survives the real buildEntryFieldsbuildIndexLine projection — db6abce

Round 2: no substantive findings.

Follow-up docs (intentionally not in this PR)

Verification: npm test → 940 pass / 0 fail; browser smoke on an isolated server; full diff re-reviewed by Codex (clean).

@lis186 lis186 merged commit 79817cd into main Jun 21, 2026
2 checks passed
@lis186 lis186 deleted the feat/usage-cli-90 branch June 21, 2026 03:06
lis186 added a commit that referenced this pull request Jun 21, 2026
Record why toolCalls stays a plain {toolName:count} dashboard contract
and per-skill detail lives in a separate skillCalls index, so a future
"merge the two fields" change doesn't repeat the dashboard breakage from
PR #94. Also documents why skillCalls is structurally Anthropic-only.

Cross-linked from docs/normalization-map.md §5.

Closes #97

Co-authored-by: Justin Lee <justinlee@91app.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lis186 added a commit that referenced this pull request Jun 21, 2026
* docs(#96): document test hygiene convention

Tests must point CCXRAY_HOME at a throwaway temp dir with their own
synthetic index.ndjson and never read the real ~/.ccxray — the fallback
that made PR #94's usage e2e tests pass locally but fail in empty-home CI.

Adds docs/testing.md (4 rules, canonical pattern from usage.test.js, the
$HOME-vs-CCXRAY_HOME distinction incl. the puppeteer Chrome-cache caveat)
and a short Test Hygiene section + pointer in CLAUDE.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(#96): run tests against an empty CCXRAY_HOME

Adds a test step that points CCXRAY_HOME at a fresh empty dir under
$RUNNER_TEMP, as a backstop against the PR #94 failure class: a test that
reads the real ~/.ccxray now finds no logs and fails the build. $HOME is
left untouched so puppeteer's Chrome cache stays intact.

CCXRAY_HOME is set at the step (not job) level via the $RUNNER_TEMP shell
var — the runner context is not available in jobs.<id>.env.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Justin Lee <justinlee@91app.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant