Skip to content

perf: cache DAG, batch top-k metadata, add full-pipeline recall benches#8

Open
JosephOIbrahim wants to merge 1 commit into
masterfrom
claude/perf-measurement
Open

perf: cache DAG, batch top-k metadata, add full-pipeline recall benches#8
JosephOIbrahim wants to merge 1 commit into
masterfrom
claude/perf-measurement

Conversation

@JosephOIbrahim
Copy link
Copy Markdown
Owner

@JosephOIbrahim JosephOIbrahim commented May 10, 2026

Summary

Three performance / measurement improvements from the second-pass MoE review (the "PR C" lane from the strategic split). Independent of PR #6 and PR #7 — no file overlap.

Fix File What it does Why it matters
Cache cognitive DAG src/mock_cogexec.py Build the DAG + topological order once at module load; evaluate_dag reads from _DAG / _DAG_ORDER instead of rebuilding The DAG shape is a constant (7 nodes, 5 edges). Sessions evaluate this on every exchange — ≥ hundreds per session — so N rebuilds become 1.
Batch top-k metadata fetch python/harlo/encoder/__init__.py Replace the per-id SELECT ... WHERE id = ? + fetchone() loop in semantic_recall with a single SELECT ... WHERE id IN (?,?,...); build {id: row} map and read in distance order k+1 SQLite round-trips → 2 statements. Removes a measured N+1 (one SELECT for the full candidate scan, one per top-k).
Full-pipeline recall benches crates/hippocampus/benches/recall.rs Add three benches alongside the existing xor_search micro-bench: load_all_sdrs 100k, recall_full 10k k=5, recall_full 100k k=5 The existing bench measures only popcount + heap over pre-loaded candidates. The real recall path also does SQLite scan + 25 MB blob read. If load dominates, optimising popcount with SIMD is wasted work. Measure before optimising.

What's intentionally NOT in this PR

Test plan

  • cargo test -p hippocampus — 42 passed, 0 failed
  • cargo bench --no-run -p hippocampus — new benches compile clean
  • python3 -m py_compile on src/mock_cogexec.py and python/harlo/encoder/__init__.py — syntax clean
  • Adjacent regression sweep (tests/test_motor/ test_brainstem/ test_elenchus/ test_hippocampus/ test_composition/ test_inquiry/ test_sync/ test_hot_store/) — 207 passed, 7 skipped (pre-existing env skips for pxr / mcp), 0 failed
  • Cannot run encoder pytests in this sandbox — pre-existing env issue: python/harlo/encoder/semantic_encoder.py:12 imports numpy which isn't installed. The N+1 fix is syntax-clean and behaviourally equivalent (same ordering, same result keys, fewer round-trips); local CI with numpy can confirm.
  • Want a real recall measurement. cargo bench -p hippocampus --bench recall -- --warm-up-time 1 --measurement-time 3 --sample-size 20 will produce numbers for load_all_sdrs 100k and recall_full 100k k=5 — those are the headline diagnostics this PR exists to enable.

Compliance

The 33 inviolable rules in CLAUDE.md are unchanged. R1 / R7 are pure perf; R8 is pure additive bench. Constitutional greps (sleep(, while True, DELETE.*audit, float32, cosine, reasoning_trace in elenchus/verifier.py) all return zero.

https://claude.ai/code/session_017arHKzx5mTUFiry7JhhRPs


Generated by Claude Code

Summary by CodeRabbit

  • Tests

    • Expanded benchmarking suite with new performance measurement functions for recall pipeline operations and data loading scenarios.
  • Refactor

    • Optimized database query performance through batch processing of trace metadata retrieval.
    • Improved computation efficiency via precomputed and cached execution graphs.

Review Change Stack

Three perf / measurement improvements from the second-pass MoE review.
None overlap with PR #6 or PR #7.

* src/mock_cogexec.py — cache the cognitive DAG at module load.  The
  DAG shape is a constant (7 nodes, 5 edges); previously evaluate_dag
  called build_dag() + topological_sort() on every exchange.  A
  typical session evaluates this on every exchange (≥ hundreds per
  session); module-level cache turns ~N rebuilds into 1.

* python/harlo/encoder/__init__.py — batch the top-k metadata fetch
  in semantic_recall.  Previously: load all candidate SDRs once
  (correct), then for EACH top-k id, execute a separate
  `SELECT ... WHERE id = ?` + fetchone().  Now: one
  `SELECT ... WHERE id IN (?,?,...)` builds a {id: row} map; the
  ordering loop reads from the map.  k+1 round-trips → 2 statements.

* crates/hippocampus/benches/recall.rs — add three full-pipeline
  benches alongside the existing xor_search micro-benches:
  - `load_all_sdrs 100k` — SQLite scan + blob read
  - `recall_full 10k k=5` — end-to-end load + search + decay + result
  - `recall_full 100k k=5` — same at the constitutional 100K size
  The existing xor_search bench measures only popcount + heap over
  pre-loaded candidates; if SQLite I/O dominates the real recall
  path, optimising popcount with SIMD is wasted work.  Measure
  before optimising.

Verified:
- cargo test -p hippocampus — 42 passed
- cargo bench --no-run -p hippocampus — bench compiles clean
- 207 adjacent tests pass (motor/brainstem/elenchus/hippocampus/
  composition/inquiry/sync/hot_store); 7 skipped on pre-existing
  env issues (pxr/mcp not installed in sandbox)
- Encoder tests env-blocked by missing numpy; encoder/__init__.py
  syntax-clean via py_compile
- Constitutional greps still zero

https://claude.ai/code/session_017arHKzx5mTUFiry7JhhRPs
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d098aa0c-0c7d-429e-8c37-defe0b55dd0a

📥 Commits

Reviewing files that changed from the base of the PR and between 490e3e0 and b509cfa.

📒 Files selected for processing (3)
  • crates/hippocampus/benches/recall.rs
  • python/harlo/encoder/__init__.py
  • src/mock_cogexec.py

📝 Walkthrough

Walkthrough

Three independent performance improvements: expanded end-to-end benchmarking for recall operations in Rust, batched SQLite metadata queries in semantic search, and precomputed DAG caching in cognitive execution evaluation.

Changes

Benchmark Suite Expansion

Layer / File(s) Summary
Module Setup
crates/hippocampus/benches/recall.rs
Module documentation and imports expanded to support both micro-benchmarks (xor_search) and full-pipeline benchmarks (load_all_sdrs, recall).
Helper Infrastructure
crates/hippocampus/benches/recall.rs
Added build_db(n) function to construct and populate in-memory SQLite databases with TraceRecord entries.
Benchmark Functions
crates/hippocampus/benches/recall.rs
Added bench_load_all_sdrs_100k, bench_recall_full_10k, and bench_recall_full_100k to measure full-pipeline performance across 10K and 100K trace databases.
Test Registration
crates/hippocampus/benches/recall.rs
Expanded criterion_group! to register new load_all_sdrs and recall_full benchmarks alongside existing micro-benchmarks.

Trace Query Batching

Layer / File(s) Summary
Batch Query Implementation
python/harlo/encoder/__init__.py
Optimized semantic_recall metadata retrieval from individual SELECT ... WHERE id = ? queries to a single batch query using IN (...) clause, with dictionary-keyed assembly preserving distance-based ordering.

DAG Evaluation Caching

Layer / File(s) Summary
Cache Infrastructure
src/mock_cogexec.py
Added module-level private caches _DAG and _DAG_ORDER for precomputed graph and topological evaluation order.
Function Update
src/mock_cogexec.py
Updated evaluate_dag() to reuse cached _DAG and _DAG_ORDER instead of rebuilding graph and topological sort on each invocation.
Import-Time Setup
src/mock_cogexec.py
Added module initialization block to construct _DAG from build_dag() and compute _DAG_ORDER via topological sort at import time.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • JosephOIbrahim/Harlo#3: Adds the initial xor_search micro-benchmark to the same recall.rs file; this PR expands that benchmark suite with full-pipeline measurements and supporting infrastructure.

Poem

🐰 Three swift hops through the code we made,
One batches queries, one caches cascades,
And benchmarks now measure from start to the end—
Performance and wisdom, bundled as friends! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes all three primary changes: DAG caching, batch metadata fetching, and new benchmark functions for the full recall pipeline.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/perf-measurement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants