feat(#90): ccxray usage CLI — automated data-driven analysis#94
Conversation
Pure CLI command that reads index.ndjson directly (no server needed, 0.6s). Sections: meta, sessions (with top 10 costliest), models, tools, skills (with scope detection), prompt hash stability, cache hit rates by inter-turn gap, and project comparison. Smart filters: - --session: aliases (latest/costliest), title substring, UUID prefix - --cwd: directory name substring, multi-cwd comparison table - --last: duration filter (7d/24h/30m) - --json: agent-consumable output (<4KB) - --tools: full tool breakdown Also: expand Skill/Workflow tool calls to Skill:<name> in extractToolCalls for per-skill tracking in future data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --open: resolve a single session via --session, then open the dashboard to it. Reuses hub.readHubLock() for the port. Simplify pass (4 parallel reviewers): - latest resolver: O(1) last-entry instead of O(n) reduce (entries are append-ordered) - multi-cwd comparison reuses analyze() instead of re-deriving aggregates - sort session turns once before hashStability + gapVsCache - openDashboard uses hub.readHubLock() instead of hand-parsing hub.json - drop redundant title slice and misleading glob alias Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Self-review findings (sorted by severity)The substantive risk concentrates in the 5-line 🔴 High —
|
Codex second-pass review — net-new findingsCodex confirmed the 🔴 🔴 High —
|
…ontract
The PR expanded Skill/Workflow tool calls to `Skill:<name>` inside
extractToolCalls, a shared writer whose output flows to the dashboard
(SSE → miller-columns/entry-rendering) and into index.ndjson. That
polluted the toolCalls contract: `tc['Skill']` skill-detection broke,
usedTools inflated toolUtilization, and tool chips rendered `Skill:x`.
Fix at the source instead of patching every consumer: revert
extractToolCalls to a clean {toolName: count} map and add a dedicated
extractSkillCalls → skillCalls index field that only `ccxray usage`
reads. The dashboard is untouched (zero render-pipeline risk).
- helpers.js: extractToolCalls no longer expands; new extractSkillCalls
counts only the model-initiated `Skill` tool_use (the `Workflow` tool
has no `skill` input, so the old Workflow branch was dead — dropped).
- anthropic.js + entry.js: wire skillCalls through buildEntryFields and
INDEX_FIELDS (rebuild-index inherits it via buildEntryFields).
- usage.js: read per-skill detail from skillCalls; legacy entries
without the field still surface as (pre-tracking) with no double-count.
Resolves review findings A1/A2/A3 (dashboard coupling) and A4
(Workflow asymmetry) at once.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ture
The CLI tests defaulted CCXRAY_HOME to the runner's real ~/.ccxray, so
they only passed where ambient logs happened to exist. CI has no logs →
`usage` exits 1 ("no logs found") and 11 assertions failed. This also
contradicted the PR's "no hardcoded user paths" claim (it concatenated
$HOME/.ccxray and used process.env.HOME as a --cwd prefix).
Write a small synthetic index.ndjson into a temp CCXRAY_HOME at module
load and point cli()/cliErr() at it; clean up on process exit. The
fixture has two /work/* sessions and one /other/* session (so a `/work`
prefix is a strict subset), a subagent turn, varied models/tools/cache,
and a skillCalls entry — exercising the new field end-to-end in CI too.
Verified: full usage suite is green with HOME pointed at an empty dir.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two correctness bugs in the exported analyze() (callable directly and per multi-cwd group, so the run()-level empty guard doesn't protect it): - subagentRatio divided by entries.length with no guard → analyze([]) returned NaN. Guard like the sibling failRate does. - gapVsCache computed gapSec from receivedAt/elapsed; a missing or non-numeric receivedAt makes gapSec NaN, which matches no bucket — not even max:Infinity (NaN < Infinity is false) — so the bucket lookup threw. (elapsed was already guarded by `parseFloat || 0`; receivedAt was the real source.) Replace `gapSec < 0` with `!(gapSec >= 0)` to skip NaN too. Tests: analyze([]) zeroed-result and a non-finite-gap case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A simplify pass changed `latest` to entries[entries.length - 1] for O(1) lookup, but index.ndjson lines are append-order and can land out of sequence under hub concurrency or startup restoration — so `latest` could silently return a stale session. Restore the receivedAt reduce. Test writes an index whose newest-by-receivedAt entry is the FIRST line (last line is older) so file-order resolution picks the wrong session. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing fs
buildSkillScopeMap() (readdirSync over skill/plugin trees) ran inside
analyze(), making the exported function environment-dependent — tests
picked up whatever skills the runner had installed — and, in multi-cwd
mode, re-scanning the same plugin dirs once per project group.
Build the map once in run() and pass it via opts.scopeMap; analyze()
defaults to {} so direct/test callers are deterministic. The multi-cwd
comparison path doesn't pass a map (its output has no scope column), so
it no longer scans the filesystem at all.
Tests: scope is null without a scopeMap, and resolves when one is injected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
analyze() capped tools.top at 10 while printHuman re-sliced to 7 and --help/README document "default: top 7", so --json consumers silently got 10. Cap once in analyze() at 7 (—tools lifts it to all for both JSON and human); printHuman now iterates the already-capped list. Drops the now-unused opts param from printHuman and its call site. Test pins the contract: 7 by default, all with --tools. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The path-vs-substring heuristic keyed on startsWith('/')||startsWith('~'),
but stored cwds are absolute so a literal `~/…` prefix never matched
(latent bug), and relative values like `./foo` fell into the substring
branch carrying the `./`. Expand `~`/`~/` to home before prefix matching,
strip a leading `./` before substring matching, and document the rule in
--help (absolute/~ = prefix, bare name = case-insensitive substring).
Tests: bare-name substring, ./-stripping, and ~-expansion prefix match.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
readFileSync + split holds the whole index.ndjson in memory. That's the right tradeoff for a 0.6s one-shot CLI today (tens of MB in practice), so rather than prematurely rewrite to streaming, name the ceiling and the upgrade path (a readline streaming pass, which would make run() async) in a ponytail comment for when the index grows unbounded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
openDashboard built `${cmd} ${JSON.stringify(url)}` and ran it through
exec (a shell). The url is a hash/UUID-derived, encoded session query so
injection risk was low, but execFile with an args array removes the shell
entirely. Handles the win32 `start` builtin correctly (cmd /c start "" url).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`--last 7x` parsed to null and was silently ignored, so the user got an unfiltered result while believing a time filter applied. Exit 1 with a message naming the valid forms (7d, 24h, 30m). Also removes the dead parseArgs IIFE in the test (it read the source, regex-matched, then returned null and was never used). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The hint ternary reported only --session when both --session and --cwd were set (and never mentioned --last), so a combined no-match sent users chasing the wrong filter. Collect all active filters and tailor the hint: a single filter gets a targeted suggestion; multiple get "loosen one". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pin extractToolCalls to keep Skill/Workflow as plain keys (so re-adding the Skill:<name> expansion that broke the dashboard would fail a test), and cover extractSkillCalls (skill-name keying, ignores non-Skill tools and Skill calls without a skill input). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reflect the final tool-call model in the normalization map and data model: extractToolCalls counts by plain tool name (Skill/Workflow not expanded), and the companion extractSkillCalls populates the separate skillCalls index field that `ccxray usage` reads for per-skill stats. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The --cwd ~ test resolved against the real os.homedir(), pulling the runner's home path (and username) into the test data. Set a throwaway $HOME so ~ expands to a temp dir instead — the test no longer touches the real home directory and contains only synthetic paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`latest` and `costliest` scanned the whole index before --last and --cwd were applied, so `usage --session latest --cwd /work/foo` resolved the global newest session and then filtered it out — reporting "no matching entries" even when /work/foo had sessions. Move the --last and --cwd filters ahead of session resolution so the aliases pick the newest / priciest session within the filtered scope. Tests: latest and costliest both scoped by --cwd. (Found by Codex second pass.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`e.cwd.startsWith(p)` let `--cwd /work/proj` also match a sibling `/work/proj-sibling`, silently inflating project totals. Match the exact dir or a real subtree (`e.cwd === pn || e.cwd.startsWith(pn + '/')`) after trimming trailing slashes. Test: /work/proj matches /work/proj but not /work/proj-sibling. (Found by Codex second pass.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an Anthropic test that drives a Skill tool_use through
buildEntryFields → buildIndexLine and asserts both the clean toolCalls
({Skill,Bash}) and the persisted skillCalls ({code-review}) survive the
INDEX_FIELDS projection. Catches a future drop of skillCalls from the
parser output or INDEX_FIELDS, which the helper-only test missed.
(Found by Codex second pass.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The usage examples only demonstrated bare-name substring matching for --cwd. Add a line (en/zh-TW/ja) showing that an absolute or ~ path does an exact-subtree prefix match, so both modes are visible in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review findings resolved + green CI + Codex second pass繁中:兩輪 review 的 14 項 + 失敗的 CI + Codex 二審新發現的 3 項,已全部修復並推上來。核心改動: Key architectural change (resolves 🔴 A1–A4)Rather than expand Findings → fixes
Failing CIRoot cause: the CLI e2e tests defaulted Codex second passRound 1 surfaced 3 net-new issues, all fixed:
Round 2: no substantive findings. Follow-up docs (intentionally not in this PR)
Verification: |
Record why toolCalls stays a plain {toolName:count} dashboard contract
and per-skill detail lives in a separate skillCalls index, so a future
"merge the two fields" change doesn't repeat the dashboard breakage from
PR #94. Also documents why skillCalls is structurally Anthropic-only.
Cross-linked from docs/normalization-map.md §5.
Closes #97
Co-authored-by: Justin Lee <justinlee@91app.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(#96): document test hygiene convention Tests must point CCXRAY_HOME at a throwaway temp dir with their own synthetic index.ndjson and never read the real ~/.ccxray — the fallback that made PR #94's usage e2e tests pass locally but fail in empty-home CI. Adds docs/testing.md (4 rules, canonical pattern from usage.test.js, the $HOME-vs-CCXRAY_HOME distinction incl. the puppeteer Chrome-cache caveat) and a short Test Hygiene section + pointer in CLAUDE.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(#96): run tests against an empty CCXRAY_HOME Adds a test step that points CCXRAY_HOME at a fresh empty dir under $RUNNER_TEMP, as a backstop against the PR #94 failure class: a test that reads the real ~/.ccxray now finds no logs and fails the build. $HOME is left untouched so puppeteer's Chrome cache stays intact. CCXRAY_HOME is set at the step (not job) level via the $RUNNER_TEMP shell var — the runner context is not available in jobs.<id>.env. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Justin Lee <justinlee@91app.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
摘要
新增
ccxray usageCLI 命令 — 直接讀取index.ndjson,0.6 秒內產出使用量分析,不需啟動 server。把 #89 手動跑 Python 腳本的分析能力變成一個零成本的指令,讓 agent 和開發者快速做資料驅動的決策。所有過濾器都走「現有 flag 變聰明」的設計(Do-What-I-Mean),使用者不需學新概念:
--session接受latest/costliest別名、標題子字串、UUID 前綴--cwd接受目錄名子字串,多個值自動切換成專案比較表--open從分析結果一鍵跳到 dashboard 的該 session設計過程用多專家模擬(fzf / ripgrep / clig.dev / gh CLI 作者的心智模型)+ 加權評分機制(只接受 9 分以上方案)逐一驗證每個 UX 決策。
Detail
What it does
ccxray usagereads~/.ccxray/logs/index.ndjsondirectly and prints aggregated analysis. Human-readable by default,--jsonfor agents (<4KB, deterministic, idempotent).Sections:
--toolsfor full list)Filters (all composable)
Implementation notes
server/index.jsbefore any serverrequire— keeps it at 0.6s with no server bootextractToolCallsinhelpers.jsnow expandsSkill/Workflowtool calls toSkill:<name>for per-skill tracking (new data only; old data shows as(pre-tracking))Tests
29 unit + CLI e2e tests in
test/usage.test.js. No hardcoded user paths.What's documented
docs/wire-protocol-reference.mdunaffected. README updated in all three languages (en/zh-TW/ja).🤖 Generated with Claude Code