Krablante · Krablante · May 16, 2026 · May 16, 2026
diff --git a/README.md b/README.md
@@ -27,6 +27,10 @@ histories, that is the multi-GB failure mode `cdxusage` avoids by streaming
 JSONL files, indexing compact per-file summaries, and reusing a small local
 cache.
 
+On Linux hosts with GNU-compatible `perl` and `xargs -r`, cold full scans also
+use a native batch prefilter. Unsupported hosts and native helper failures fall
+back to the Node scanner automatically.
+
 No SQLite. No daemon. No provider catalogs. No background service.
 
 ## Quick Start
@@ -72,6 +76,7 @@ global install, `npx -y github:Krablante/cdxusage`, or
 - pretty terminal tables and JSON output
 - date filters, timezone, locale, sorting, compact tables
 - automatic Codex home discovery on Linux, macOS, Windows, and WSL
+- Linux/GNU native batch prefilter with Node fallback
 - OpenAI/Codex pricing only, with missing non-OpenAI model prices reported
 - offline pricing fallback and disposable local caches
 - portable folder build with Linux/macOS shell, Windows CMD, and PowerShell launchers
@@ -149,25 +154,26 @@ npm run benchmark -- --since 2026-05-01 --upstream-timeout 25 --cdxusage-timeout
 The helper resolves `@ccusage/codex@latest` and times the actual
 `ccusage-codex` binary so RAM reflects the scanner, not the `npx` wrapper.
 
-Recent local sanity check on a large Codex history:
+Recent local sanity check on a large Codex history, comparing this rollout with
+the previous public commit `1c084b4`:
 
 | Tool | Scenario | Time | RAM | Result |
 | --- | --- | ---: | ---: | --- |
-| `@ccusage/codex@18.0.11` | `--since 2026-05-01`, 45s limit | `>45.03s` | `2.38 GB` before timeout | timed out |
-| `cdxusage` | same filter, cold full scan | `31.61s` | `0.37 GB` | complete |
-| `cdxusage` | same filter, warm cached | `0.41s` | `0.16 GB` | complete |
+| `cdxusage` pre-native baseline | cold app cache | `27.80s` | `0.350 GB` | complete |
+| `cdxusage` native auto scanner | cold app cache | `10.46s` | `0.180 GB` | complete |
+| `cdxusage` pre-native baseline | warm app cache | `0.40s` | `0.166 GB` | complete |
+| `cdxusage` native auto scanner | warm app cache | `0.30s` | `0.143 GB` | complete |
 
 See [docs/benchmark-2026-05-16.md](docs/benchmark-2026-05-16.md) for the
 same-run command output behind this table.
 
 Cold scans read every matching JSONL file for correctness, including resumed
 long-lived sessions whose recent events may live in older session files. After
-the cache is built, the same report is dramatically faster: in this run, the
-warm cached path was at least 99.1% faster than the upstream timeout window.
+the cache is built, the same report is dramatically faster.
 
-The timeout keeps the upstream run from reaching its worst failure mode. The
-upstream path reads and sorts a large archive-shaped set of token events in
-memory; `cdxusage` keeps memory bounded and predictable by avoiding that shape.
+The native prefilter reduces the candidate-line byte volume delivered into Node
+processing; it does not claim lower physical disk reads. `bytesRead` in
+`--include-stats` remains the logical source byte count.
 
 ## Portable Folder
 
@@ -219,6 +225,8 @@ npm run check
 npm run lint
 npm run typecheck
 npm test
+npm run test:node
+npm run test:native
 npm run smoke
 npm run portable:smoke
 npm pack --dry-run --json

diff --git a/docs/benchmark-2026-05-16.md b/docs/benchmark-2026-05-16.md
@@ -1,33 +1,66 @@
 # Benchmark Evidence: 2026-05-16
 
-This is a local sanity check on a large Codex history. It is not a universal
-benchmark claim; run `npm run benchmark` on your own archive for local numbers.
+This is dated local evidence from a large Codex history plus static synthetic
+fixtures. It is not a universal benchmark claim; run `npm run benchmark` on
+your own archive for local numbers.
 
-The public benchmark helper resolves `@ccusage/codex@latest`, finds the actual
-`ccusage-codex` binary, and measures that process directly. This avoids
-underreporting RAM by timing only the `npx` wrapper.
+## Real Profile
 
-Command:
+Command shape:
 
 ```bash
-npm run benchmark -- --since 2026-05-01 --upstream-timeout 45 --cdxusage-timeout 90
+node ./bin/cdxusage.mjs daily \
+  --offline --since 2026-05-01 \
+  --include-stats --json
 ```
 
-Raw output:
+The comparison uses the previous public commit `1c084b4` as the baseline and
+the current native-auto scanner worktree as the selected run. Runs are
+application-cache-cold/page-cache-warm on an actively changing local archive, so
+use the synthetic fixtures below for strict accuracy checks.
 
-```text
-| Tool | Time | RAM | Result |
-| --- | ---: | ---: | --- |
-| cdxusage cold | 31.61s | 0.37 GB | complete |
-| cdxusage warm | 0.41s | 0.16 GB | complete |
-| @ccusage/codex@18.0.11 | >45.03s | 2.38 GB | timed out (124) |
-```
+| Tool | Scenario | Wall | CPU | RAM | Result |
+| --- | --- | ---: | ---: | ---: | --- |
+| `cdxusage` pre-native baseline | cold app cache | `27.80s` | `45.36s` | `0.350 GB` | complete |
+| `cdxusage` native auto scanner | cold app cache | `10.46s` | `16.82s` | `0.180 GB` | complete |
+| `cdxusage` pre-native baseline | warm app cache | `0.40s` | `0.57s` | `0.166 GB` | complete |
+| `cdxusage` native auto scanner | warm app cache | `0.30s` | `0.46s` | `0.143 GB` | complete |
+
+Cold native auto versus pre-native baseline:
+
+- 62.4% less wall time
+- 62.9% less CPU time
+- 48.5% less RAM
+- 4,219 JSONL files, about 9.24GB logical source bytes
+- about 0.65GB candidate bytes delivered into Node candidate-line processing
+- about 93.0% less candidate data delivered from full-scan source bytes into
+  Node candidate-line processing
+
+The live archive changed between sequential real runs, so token totals are not
+used as strict accuracy evidence here.
+
+## Synthetic Fixtures
+
+Static fixture run comparing public commit `1c084b4` with the native-auto
+worktree:
+
+| Scenario | Cold wall saved | Cold CPU saved | RAM saved | Accuracy |
+| --- | ---: | ---: | ---: | --- |
+| small | 18.2% | 8.3% | 1.1% | match |
+| medium | 42.5% | 28.3% | -1.2% | match |
+| large | 59.1% | 44.1% | 14.2% | match |
+| huge | 60.9% | 47.7% | 29.7% | match |
+| adversarial | 27.8% | 14.3% | 3.5% | match |
 
-Interpretation:
+## Notes
 
 - Cold `cdxusage` scans read the archive for correctness, including resumed
   sessions whose recent activity can live in older session files.
-- Warm `cdxusage` uses the compact file cache/index and completed at least
-  99.1% faster than the upstream timeout window in this run.
-- The upstream run was stopped at 45 seconds before completion; RAM shown is
-  the measured maximum before timeout, not a completed-run peak.
+- Warm `cdxusage` uses the compact file cache/index and should normally be
+  dominated by changed or appended files.
+- Native acceleration depends on Linux/GNU-compatible tools. Other platforms or
+  native failures use the Node scanner.
+- `nativeOutputBytes` is candidate byte volume delivered into Node processing;
+  `bytesRead` remains the logical source byte count.
+- The public benchmark helper still resolves and times `@ccusage/codex@latest`
+  directly when you want an upstream comparison on your own machine.
diff --git a/docs/compatibility.md b/docs/compatibility.md
@@ -53,6 +53,29 @@ public ccusage Codex guide.
 - `--sort <auto|date|month|lastActivity|tokens|cost|input|output|session|directory>`
 - `--order <asc|desc>`
 
+## Scanner Diagnostics
+
+Default scanner selection is `auto`: on Linux hosts with working `perl` and
+GNU-compatible `xargs -r`, `cdxusage` uses a native batch prefilter for cold
+full scans and falls back to the Node scanner when native tooling is
+unavailable or fails. Other platforms use the Node scanner unless explicitly
+forced for diagnostics. Tail reads and cached files keep the normal cache
+semantics.
+
+Internal diagnostic override:
+
+```bash
+CDXUSAGE_SCAN_MODE=node cdxusage daily
+CDXUSAGE_SCAN_MODE=grep-batch cdxusage daily
+```
+
+This is not an upstream compatibility surface. With `--include-stats`,
+`scannerModes` reports aggregate scanner counts, `linesSeen` is physical JSONL
+lines scanned, `candidateLinesSeen` is the subset containing `turn_context` or
+`token_count`, and `nativeOutputBytes` is candidate byte volume delivered into
+Node processing. Cache files and stats output can include absolute local paths,
+model names, token volumes, and estimated cost metadata.
+
 ## JSON Output
 
 `daily --json`:

diff --git a/package.json b/package.json
@@ -11,10 +11,12 @@
   },
   "scripts": {
     "start": "node ./bin/cdxusage.mjs",
-    "check": "node --check ./bin/cdxusage.mjs && node --check ./src/cli.mjs && node --check ./src/codex-home.mjs && node --check ./src/engine.mjs && node --check ./src/format.mjs && node --check ./src/pricing.mjs && node --check ./src/table.mjs && node --check ./scripts/build-portable.mjs && node --check ./scripts/benchmark-local.mjs && node --check ./test/smoke.mjs",
+    "check": "node --check ./bin/cdxusage.mjs && node --check ./src/cli.mjs && node --check ./src/codex-home.mjs && node --check ./src/discovery.mjs && node --check ./src/engine.mjs && node --check ./src/format.mjs && node --check ./src/pricing.mjs && node --check ./src/table.mjs && node --check ./scripts/build-portable.mjs && node --check ./scripts/benchmark-local.mjs && node --check ./test/smoke.mjs",
     "lint": "npm run check",
     "typecheck": "npm run check",
     "test": "node ./test/codex-home.test.mjs && node ./test/engine.test.mjs && node ./test/cli.test.mjs && node ./test/pricing.test.mjs",
+    "test:node": "CDXUSAGE_SCAN_MODE=node npm test",
+    "test:native": "CDXUSAGE_SCAN_MODE=grep-batch npm test",
     "smoke": "node ./test/smoke.mjs",
     "portable:build": "node ./scripts/build-portable.mjs",
     "portable:smoke": "npm run portable:build && sh ./portable/cdxusage --version && node -e \"const fs=require('fs'); for (const file of ['portable/LICENSE','portable/cdxusage.cmd','portable/cdxusage.ps1','portable/src/codex-home.mjs']) if (!fs.existsSync(file)) process.exit(1)\" && node ./test/smoke.mjs --portable",

diff --git a/src/cli.mjs b/src/cli.mjs
@@ -65,30 +65,36 @@ export async function main(argv = process.argv.slice(2), io = { stdout: process.
   }
   const locale = args.locale ?? DEFAULT_LOCALE;
   const mode = args.command === 'sessions' ? 'session' : args.command;
-  const dataPaths = await resolveCodexDataPaths({
-    codexHome: args.codexHome,
-    sessionsDir: args.sessionsDir,
-  });
-  const pricingMode = await resolvePricingMode(args, dataPaths.codexHome);
-  const report = await collectUsage({
-    dataPaths,
-    since,
-    until,
-    timezone,
-    cacheFile: args.cacheFile,
-    pricingCacheFile: args.pricingCacheFile,
-    pricingOffline: args.offline,
-    pricingTtlMs: args.pricingTtlHours != null ? Number(args.pricingTtlHours) * 60 * 60 * 1000 : undefined,
-    pricingFetchTimeoutMs: args.pricingFetchTimeoutMs != null ? Number(args.pricingFetchTimeoutMs) : undefined,
-    pricingTier: pricingMode.tier,
-    pricingPriorityModels: pricingMode.priorityModels,
-    maxCacheBytes: args.maxCacheBytes != null ? Number(args.maxCacheBytes) : undefined,
-    useCache: !args.noCache,
-    clearCache: args.clearCache,
-    saveCache: !args.noSaveCache,
-    discoveryMode: args.discovery,
-    includePricing: !args.noPricing,
-  });
+  let report;
+  try {
+    const dataPaths = await resolveCodexDataPaths({
+      codexHome: args.codexHome,
+      sessionsDir: args.sessionsDir,
+    });
+    const pricingMode = await resolvePricingMode(args, dataPaths.codexHome);
+    report = await collectUsage({
+      dataPaths,
+      since,
+      until,
+      timezone,
+      cacheFile: args.cacheFile,
+      pricingCacheFile: args.pricingCacheFile,
+      pricingOffline: args.offline,
+      pricingTtlMs: args.pricingTtlHours != null ? Number(args.pricingTtlHours) * 60 * 60 * 1000 : undefined,
+      pricingFetchTimeoutMs: args.pricingFetchTimeoutMs != null ? Number(args.pricingFetchTimeoutMs) : undefined,
+      pricingTier: pricingMode.tier,
+      pricingPriorityModels: pricingMode.priorityModels,
+      maxCacheBytes: args.maxCacheBytes != null ? Number(args.maxCacheBytes) : undefined,
+      useCache: !args.noCache,
+      clearCache: args.clearCache,
+      saveCache: !args.noSaveCache,
+      discoveryMode: args.discovery,
+      includePricing: !args.noPricing,
+    });
+  } catch (error) {
+    io.stderr.write(`${error?.message ?? String(error)}\n`);
+    return 1;
+  }
 
   const rows = rowsForMode(report, mode, { locale, sort: args.sort, order: args.order });
   if (args.json) {
@@ -248,6 +254,11 @@ export function parseArgs(argv) {
         if (token.startsWith('--')) {
           throw new Error(`Unknown option: ${token}`);
         }
+        if (COMMANDS.has(token) && !args.commandSpecified) {
+          args.command = token;
+          args.commandSpecified = true;
+          break;
+        }
         throw new Error(`Unknown command or argument: ${token}`);
     }
   }
@@ -309,11 +320,21 @@ function compareRows(a, b, sort, mode) {
 
 function compareDefault(a, b, mode) {
   if (mode === 'session') {
-    return String(a.lastActivity ?? '').localeCompare(String(b.lastActivity ?? ''));
+    return compareTimestamp(a.lastActivity, b.lastActivity);
   }
   return String(a.key ?? '').localeCompare(String(b.key ?? ''));
 }
 
+function compareTimestamp(left, right) {
+  return toSortableTimestamp(left).localeCompare(toSortableTimestamp(right));
+}
+
+function toSortableTimestamp(value) {
+  const text = String(value ?? '');
+  const ms = Date.parse(text);
+  return Number.isFinite(ms) ? new Date(ms).toISOString() : text;
+}
+
 function compareNumber(a, b) {
   return (Number(a) || 0) - (Number(b) || 0);
 }