Skip to content

feat(cli): opt-in result cache for basilisk check + honest warm/cold benchmark#75

Merged
MelbourneDeveloper merged 1 commit into
mainfrom
feat/cli-result-cache
Jun 4, 2026
Merged

feat(cli): opt-in result cache for basilisk check + honest warm/cold benchmark#75
MelbourneDeveloper merged 1 commit into
mainfrom
feat/cli-result-cache

Conversation

@MelbourneDeveloper
Copy link
Copy Markdown
Collaborator

TLDR

Add an opt-in, soundness-first on-disk result cache to basilisk check (--cache) that serves unchanged files from a persistent cache, while remaining byte-for-byte inert unless the flag is passed.

What Was Added?

  • --cache, --cache-dir, --cache-stats flags on basilisk check (all default-off). A hit is returned only when the target bytes, every file the check read, the effective config, the resolution environment, and the checker version are all unchanged.
  • basilisk-db::cache (CheckCache, Fingerprint) — on-disk entry store that re-verifies every recorded input on lookup before returning a hit.
  • basilisk-common::fsread_tracked + a thread-local ReadRecorder that captures the exact per-check read-set. Std-only, so basilisk-common still compiles to the Zed wasm32-wasip2 target.
  • basilisk-checker::CachedDiagnostic — a serde projection of Diagnostic for persistence, with a static-code interner.
  • basilisk-cli::cache_check — CLI glue translating the flags into a lookup/store wrapped around the cold check, plus hit/miss stats.
  • Benchmark harness: new fixtures (e0011_explicit_any.py, e0018_undefined_name.py) and a reworked benchmarks/run.sh that reports an honest warm-vs-cold comparison.
  • Spec/plan: docs/specs/CHECKER-CACHE-SPEC.md (the CHKCACHE spec group), docs/plans/CHECKER-CACHE-PLAN.md, and a docs/INDEX.md entry.

What Was Changed or Deleted?

  • basilisk-parser::parse_file and basilisk-stubs::parse_pyi_file now read via read_tracked instead of std::fs::read_to_string. When no recorder is active this is byte-for-byte identical to the old call (CHKCACHE-READSET-FS); a single thread-local check sees None and does nothing.
  • Added serde::Serialize/Deserialize derives to Severity, Span, TypeProvenance, and the basilisk-config types so results can be persisted — purely additive, no runtime behaviour.
  • The basilisk-cli check pipeline threads CacheOptions/CacheStats through and short-circuits straight to the original cold check when --cache is absent.
  • No behaviour change for the LSP or any non-cached CLI runReadRecorder::start() is only ever called behind the --cache gate.

How Do The Automated Tests Prove It Works?

The correctness contract (CHKCACHE-CONTRACT) is proven end-to-end and at unit level:

  • CLI e2e (crates/basilisk-cli/tests/e2e_cache.rs): second_run_hits_with_identical_output (a hit reproduces byte-identical diagnostics), editing_target_invalidates, editing_dependency_invalidates, changing_config_invalidates (each fingerprint input forces a miss), and disabled_creates_no_cache_dir — proving the feature is inert without --cache (no cache directory is even created).
  • Cache store (crates/basilisk-db/tests/cache_tests.rs): roundtrip_hit_returns_payload, missing_entry_is_a_miss, corrupt_entry_is_a_miss, fingerprint_mismatches_are_misses, changed_dependency_is_a_miss, missing_dependency_is_a_miss, unserialisable_payload_errors.
  • Read-set recorder (crates/basilisk-common/tests/fs_tests.rs): read_without_recorder_returns_content_and_records_nothing (zero behaviour change when inactive), recorder_captures_reads_with_canonical_key_and_hash, dropping_recorder_without_finish_clears_state, read_error_propagates, canonical_key_falls_back_for_missing_path, content_hash_is_deterministic_and_distinguishes.
  • Diagnostic projection (crates/basilisk-checker/tests/cached_tests.rs): projection_round_trips_through_serde_preserving_every_field, optional_fields_round_trip_as_none, interner_reuses_static_storage_for_repeated_codes.

Coverage gates all hold (basilisk-db 100%, basilisk-cli 89% ≥ 88%, basilisk-checker 94% ≥ 93%); the mutation score is unchanged at 90.79% (83 mutants, identical baseline — the new code adds no rules/ mutants); PEP conformance is unchanged.

Spec / Doc Changes

  • New docs/specs/CHECKER-CACHE-SPEC.md defining the CHKCACHE spec group: the correctness contract, soundness argument, read-set capture, fingerprint composition, on-disk entry format, documented v1 limits, and how this positions against the Salsa endgame.
  • New docs/plans/CHECKER-CACHE-PLAN.md; docs/INDEX.md updated.

Breaking Changes

  • None — the cache is strictly opt-in (--cache); all existing behaviour is byte-for-byte unchanged when the flag is absent.

🤖 Generated with Claude Code

Two linked changes.

1. Benchmark — measure warm AND cold for every tool, honestly.
   mypy was timing cache hits, not type-checking: hyperfine's warmup
   populated .mypy_cache, so every measured run was "file unchanged ->
   do nothing" (~150ms of interpreter+typeshed load), which is why mypy
   looked flat/fast. Now each tool has a COLD column (full check) and a
   -warm column (repeat run using that tool's own cache):
     * basilisk-warm = --cache result-cache hit
     * mypy-warm     = incremental .mypy_cache hit (cold = --no-incremental)
     * pyright/ty/pyrefly keep NO cross-run result cache (verified: first-
       ever run == repeat run, zero cache artifacts), so warm ~= cold.
   New fixtures e0011 (explicit Any) and e0018 (undefined name). Website
   renders the columns generically (no template change).

2. Opt-in CLI result cache (`basilisk check --cache`) so basilisk has a
   real warm/cold story and warm is detectable for the benchmark.

   Correctness contract [CHKCACHE-CONTRACT]: a hit is returned ONLY when
   the target, every transitively-read source/stub, the effective config,
   and the checker version are all byte-identical to when the entry was
   written. Any doubt -> miss. Faster, never wrong.

   - basilisk-common::fs: thread-local read-recorder; parse_file and
     parse_pyi_file route through it so the exact read-set is captured.
   - basilisk-db::cache: generic content-addressed cache; entries store the
     read-set and re-verify every file on lookup [CHKCACHE-ENTRY].
   - basilisk-checker::CachedDiagnostic: serde projection of Diagnostic with
     a bounded code interner (ErrorCode holds &'static str) [CHKCACHE-DIAG].
   - CLI: --cache / --cache-dir / --cache-stats; off by default.

   Warm is 3x faster (115ms -> 38ms on the biggest fixture); a miss adds
   ~zero overhead. Editing a dependency invalidates the importer, proven by
   e2e + crate-boundary tests.

   Positioning [CHKCACHE-POSITIONING]: a result cache is only useful with
   watcher-driven, dependency-aware (smart) invalidation; v1 has neither
   (it re-verifies lazily on the next lookup), so it helps batch re-runs,
   not the interactive editor. The proper long-term mechanism is the Salsa
   migration (basilisk-db Phase 2), which gives watcher-driven sub-file
   invalidation for free and subsumes this hand-rolled read-set. v1 is a
   small, correct stepping stone.

Spec docs/specs/CHECKER-CACHE-SPEC.md, plan docs/plans/CHECKER-CACHE-PLAN.md.
All CI jobs green locally; coverage thresholds met (db 100%); mutation scope
unchanged.
@MelbourneDeveloper MelbourneDeveloper merged commit 5bb028b into main Jun 4, 2026
12 checks passed
@MelbourneDeveloper MelbourneDeveloper deleted the feat/cli-result-cache branch June 4, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant