perf: cache DAG, batch top-k metadata, add full-pipeline recall benches by JosephOIbrahim · Pull Request #8 · JosephOIbrahim/Harlo

JosephOIbrahim · 2026-05-10T20:01:37Z

Summary

Three performance / measurement improvements from the second-pass MoE review (the "PR C" lane from the strategic split). Independent of PR #6 and PR #7 — no file overlap.

Fix	File	What it does	Why it matters
Cache cognitive DAG	`src/mock_cogexec.py`	Build the DAG + topological order once at module load; `evaluate_dag` reads from `_DAG` / `_DAG_ORDER` instead of rebuilding	The DAG shape is a constant (7 nodes, 5 edges). Sessions evaluate this on every exchange — ≥ hundreds per session — so N rebuilds become 1.
Batch top-k metadata fetch	`python/harlo/encoder/__init__.py`	Replace the per-id `SELECT ... WHERE id = ?` + `fetchone()` loop in `semantic_recall` with a single `SELECT ... WHERE id IN (?,?,...)`; build `{id: row}` map and read in distance order	k+1 SQLite round-trips → 2 statements. Removes a measured N+1 (one SELECT for the full candidate scan, one per top-k).
Full-pipeline recall benches	`crates/hippocampus/benches/recall.rs`	Add three benches alongside the existing `xor_search` micro-bench: `load_all_sdrs 100k`, `recall_full 10k k=5`, `recall_full 100k k=5`	The existing bench measures only popcount + heap over pre-loaded candidates. The real recall path also does SQLite scan + 25 MB blob read. If load dominates, optimising popcount with SIMD is wasted work. Measure before optimising.

What's intentionally NOT in this PR

F9–F12 from the second MoE review (anchor-buffer growth, S3 rupture-repair theatre, S8 sincerity gate, _CONSENT_SECRET regen). Each requires a constitutional-intent decision before any code change.
Anything from PR fix(constitutional): S7 formula + Checkpoint TOCTOU + engine-lock timeout + HotStore commit handling #6 (S7 formula, Checkpoint TOCTOU, _engine_lock timeout, HotStore commit) or PR fix(salvage): apoptosis transaction + decompile listener-before-reset + audit regression test #7 (apoptosis transaction, decompile listener reorder).

Test plan

cargo test -p hippocampus — 42 passed, 0 failed
cargo bench --no-run -p hippocampus — new benches compile clean
python3 -m py_compile on src/mock_cogexec.py and python/harlo/encoder/__init__.py — syntax clean
Adjacent regression sweep (tests/test_motor/ test_brainstem/ test_elenchus/ test_hippocampus/ test_composition/ test_inquiry/ test_sync/ test_hot_store/) — 207 passed, 7 skipped (pre-existing env skips for pxr / mcp), 0 failed
Cannot run encoder pytests in this sandbox — pre-existing env issue: python/harlo/encoder/semantic_encoder.py:12 imports numpy which isn't installed. The N+1 fix is syntax-clean and behaviourally equivalent (same ordering, same result keys, fewer round-trips); local CI with numpy can confirm.
Want a real recall measurement. cargo bench -p hippocampus --bench recall -- --warm-up-time 1 --measurement-time 3 --sample-size 20 will produce numbers for load_all_sdrs 100k and recall_full 100k k=5 — those are the headline diagnostics this PR exists to enable.

Compliance

The 33 inviolable rules in CLAUDE.md are unchanged. R1 / R7 are pure perf; R8 is pure additive bench. Constitutional greps (sleep(, while True, DELETE.*audit, float32, cosine, reasoning_trace in elenchus/verifier.py) all return zero.

https://claude.ai/code/session_017arHKzx5mTUFiry7JhhRPs

Generated by Claude Code

Summary by CodeRabbit

Tests
- Expanded benchmarking suite with new performance measurement functions for recall pipeline operations and data loading scenarios.
Refactor
- Optimized database query performance through batch processing of trace metadata retrieval.
- Improved computation efficiency via precomputed and cached execution graphs.

Three perf / measurement improvements from the second-pass MoE review. None overlap with PR #6 or PR #7. * src/mock_cogexec.py — cache the cognitive DAG at module load. The DAG shape is a constant (7 nodes, 5 edges); previously evaluate_dag called build_dag() + topological_sort() on every exchange. A typical session evaluates this on every exchange (≥ hundreds per session); module-level cache turns ~N rebuilds into 1. * python/harlo/encoder/__init__.py — batch the top-k metadata fetch in semantic_recall. Previously: load all candidate SDRs once (correct), then for EACH top-k id, execute a separate `SELECT ... WHERE id = ?` + fetchone(). Now: one `SELECT ... WHERE id IN (?,?,...)` builds a {id: row} map; the ordering loop reads from the map. k+1 round-trips → 2 statements. * crates/hippocampus/benches/recall.rs — add three full-pipeline benches alongside the existing xor_search micro-benches: - `load_all_sdrs 100k` — SQLite scan + blob read - `recall_full 10k k=5` — end-to-end load + search + decay + result - `recall_full 100k k=5` — same at the constitutional 100K size The existing xor_search bench measures only popcount + heap over pre-loaded candidates; if SQLite I/O dominates the real recall path, optimising popcount with SIMD is wasted work. Measure before optimising. Verified: - cargo test -p hippocampus — 42 passed - cargo bench --no-run -p hippocampus — bench compiles clean - 207 adjacent tests pass (motor/brainstem/elenchus/hippocampus/ composition/inquiry/sync/hot_store); 7 skipped on pre-existing env issues (pxr/mcp not installed in sandbox) - Encoder tests env-blocked by missing numpy; encoder/__init__.py syntax-clean via py_compile - Constitutional greps still zero https://claude.ai/code/session_017arHKzx5mTUFiry7JhhRPs

coderabbitai · 2026-05-10T20:01:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d098aa0c-0c7d-429e-8c37-defe0b55dd0a

📥 Commits

Reviewing files that changed from the base of the PR and between 490e3e0 and b509cfa.

📒 Files selected for processing (3)

crates/hippocampus/benches/recall.rs
python/harlo/encoder/__init__.py
src/mock_cogexec.py

📝 Walkthrough

Walkthrough

Three independent performance improvements: expanded end-to-end benchmarking for recall operations in Rust, batched SQLite metadata queries in semantic search, and precomputed DAG caching in cognitive execution evaluation.

Changes

Benchmark Suite Expansion

Layer / File(s)	Summary
Module Setup `crates/hippocampus/benches/recall.rs`	Module documentation and imports expanded to support both micro-benchmarks (`xor_search`) and full-pipeline benchmarks (`load_all_sdrs`, `recall`).
Helper Infrastructure `crates/hippocampus/benches/recall.rs`	Added `build_db(n)` function to construct and populate in-memory SQLite databases with `TraceRecord` entries.
Benchmark Functions `crates/hippocampus/benches/recall.rs`	Added `bench_load_all_sdrs_100k`, `bench_recall_full_10k`, and `bench_recall_full_100k` to measure full-pipeline performance across 10K and 100K trace databases.
Test Registration `crates/hippocampus/benches/recall.rs`	Expanded `criterion_group!` to register new load_all_sdrs and recall_full benchmarks alongside existing micro-benchmarks.

Trace Query Batching

Layer / File(s)	Summary
Batch Query Implementation `python/harlo/encoder/__init__.py`	Optimized `semantic_recall` metadata retrieval from individual `SELECT ... WHERE id = ?` queries to a single batch query using `IN (...)` clause, with dictionary-keyed assembly preserving distance-based ordering.

DAG Evaluation Caching

Layer / File(s)	Summary
Cache Infrastructure `src/mock_cogexec.py`	Added module-level private caches `_DAG` and `_DAG_ORDER` for precomputed graph and topological evaluation order.
Function Update `src/mock_cogexec.py`	Updated `evaluate_dag()` to reuse cached `_DAG` and `_DAG_ORDER` instead of rebuilding graph and topological sort on each invocation.
Import-Time Setup `src/mock_cogexec.py`	Added module initialization block to construct `_DAG` from `build_dag()` and compute `_DAG_ORDER` via topological sort at import time.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

JosephOIbrahim/Harlo#3: Adds the initial xor_search micro-benchmark to the same recall.rs file; this PR expands that benchmark suite with full-pipeline measurements and supporting infrastructure.

Poem

🐰 Three swift hops through the code we made,
One batches queries, one caches cascades,
And benchmarks now measure from start to the end—
Performance and wisdom, bundled as friends! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes all three primary changes: DAG caching, batch metadata fetching, and new benchmark functions for the full recall pipeline.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/perf-measurement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

JosephOIbrahim mentioned this pull request May 10, 2026

fix(moe3): HotStore lock + DMN teardown lock + amygdala audit-truth + trust single-conn + elenchus cap + coach observability #9

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache DAG, batch top-k metadata, add full-pipeline recall benches#8

perf: cache DAG, batch top-k metadata, add full-pipeline recall benches#8
JosephOIbrahim wants to merge 1 commit into
masterfrom
claude/perf-measurement

JosephOIbrahim commented May 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JosephOIbrahim commented May 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's intentionally NOT in this PR

Test plan

Compliance

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JosephOIbrahim commented May 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 10, 2026 •

edited

Loading