Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop) by JosephOIbrahim · Pull Request #2 · JosephOIbrahim/Moneta

JosephOIbrahim · 2026-05-04T02:31:08Z

A long-running orchestrator that spawns role-scoped reviewers mirroring
MONETA.md §6 plus an Adversarial-Reviewer (Commandment 7), governed by a
bespoke constitution sourced verbatim from the locked artifacts.

docs/review-constitution.md: composition of MONETA.md §6/§7/§9,
ARCHITECTURE.md §15.2/§15.6, CLAUDE.md hard rules,
agent-commandments.md, and substrate-conventions.md. No new doctrine —
every clause cites a locked artifact. Hashed and stamped into every
finding; mid-run drift aborts.
scripts/review_roles.py: 7 role-reviewers + Adversarial; 5 themed loops
(spec-conformance → concurrency-atomicity → correctness-edges →
performance-envelope → simplicity-doc-tests); touchpoint glob matching
for role isolation.
scripts/review_synthesizer.py: schema validation, role-isolation
filter, fuzzy dedupe, severity+§9-aware ranking, cross-loop closure
ledger, per-loop markdown rendering.
scripts/review_harness.py: parallel fan-out per loop, JSONL
checkpointing, --resume, --dry-run, --max-loops, prompt caching on
the constitution + repo snapshot, three-retry cap per Commandment 3.
pyproject.toml: [review] optional extra (anthropic>=0.40); not pulled
into Phase 1/3 runtime.
.gitignore: per-round files and findings.jsonl ignored;
synthesis.md re-included.
CLAUDE.md: one-line pointer in the build-and-test section.

Verification: ruff clean, 94 existing tests pass, dry-run loop 1
assembles 8 prompts (constitution+role ~25KB system, snapshot ~521KB
user; constitution hash 3b56e3ffae083c9b), JSONL round-trip works,
schema validator rejects malformed findings, snapshot excludes
harness/constitution to prevent feedback loops.

The live 5-loop run is intentionally not executed in this commit; user
runs it separately after reviewing the harness, the constitution, and
the smoke-test output.

https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL

Summary by CodeRabbit

New Features
- Multi-agent review harness with configurable reviewer roles/themes, checkpointed resumption, per-loop and cross-loop synthesis.
Documentation
- Added review constitution, harness usage docs, synthesis/round brief, and updated API handle-based examples and lifecycle notes.
Bug Fixes / Reliability
- Fixed concurrent protected-deposit race, atomic snapshot/durability window, per-batch consolidation semantics, attention clamping, and improved partial-flush error reporting.
Tests
- New unit and integration tests covering concurrency, durability, consolidation, decay call-sites, ECS clamping, and vector index behaviors.

A long-running orchestrator that spawns role-scoped reviewers mirroring MONETA.md §6 plus an Adversarial-Reviewer (Commandment 7), governed by a bespoke constitution sourced verbatim from the locked artifacts. - docs/review-constitution.md: composition of MONETA.md §6/§7/§9, ARCHITECTURE.md §15.2/§15.6, CLAUDE.md hard rules, agent-commandments.md, and substrate-conventions.md. No new doctrine — every clause cites a locked artifact. Hashed and stamped into every finding; mid-run drift aborts. - scripts/review_roles.py: 7 role-reviewers + Adversarial; 5 themed loops (spec-conformance → concurrency-atomicity → correctness-edges → performance-envelope → simplicity-doc-tests); touchpoint glob matching for role isolation. - scripts/review_synthesizer.py: schema validation, role-isolation filter, fuzzy dedupe, severity+§9-aware ranking, cross-loop closure ledger, per-loop markdown rendering. - scripts/review_harness.py: parallel fan-out per loop, JSONL checkpointing, --resume, --dry-run, --max-loops, prompt caching on the constitution + repo snapshot, three-retry cap per Commandment 3. - pyproject.toml: [review] optional extra (anthropic>=0.40); not pulled into Phase 1/3 runtime. - .gitignore: per-round files and findings.jsonl ignored; synthesis.md re-included. - CLAUDE.md: one-line pointer in the build-and-test section. Verification: ruff clean, 94 existing tests pass, dry-run loop 1 assembles 8 prompts (constitution+role ~25KB system, snapshot ~521KB user; constitution hash 3b56e3ffae083c9b), JSONL round-trip works, schema validator rejects malformed findings, snapshot excludes harness/constitution to prevent feedback loops. The live 5-loop run is intentionally not executed in this commit; user runs it separately after reviewing the harness, the constitution, and the smoke-test output. https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL

Adds an alternate model-call backend that shells out to `claude -p`, making the harness runnable in environments that have an authenticated Claude Code CLI but no ANTHROPIC_API_KEY. The SDK backend is unchanged and still preferred when the env var is set. - New --backend {auto,sdk,cli} flag; auto picks sdk if ANTHROPIC_API_KEY is set, else cli. - _call_via_cli pipes the user prompt over stdin (snapshot ~520KB exceeds argv limits), passes the system prompt via --system-prompt, and parses claude's JSON output. - Retry budget (3 attempts, exponential backoff) is shared across both backends and continues to honor Commandment 3. - Smoke-tested end to end: SMOKE-OK round-trip via _call_via_cli. - .gitignore: docs/review/run.log added (ephemeral tee target). https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL

Final cross-loop rollup from scripts/review_harness.py against this branch. 5 themed loops, 8 reviewers each (7 role + adversarial), constitution hash 3b56e3ffae083c9b. Stats: - 494 findings retained (all loops); 466 open after closures - 17 Critical, 100 High, 8 §9 escalation candidates - Per-loop kept: 67 / 91 / 128 / 91 / 117 - Per-loop invalid: 0 / 0 / 0 / 35 / 0 (loop 4 schema drift) - Per-loop role-isolation drops: 0 / 0 / 0 / 2 / 25 - 0 duplicates across all loops §9 escalation candidates surfaced (Architect-routed): - four-op signature drift vs ARCHITECTURE.md §2 (module-level vs bound method) - protected-floor consolidation contradiction (MONETA.md §2.5 vs ARCHITECTURE.md §6 selection criteria) - vector-authoritative inversion across hydrate path (§7 keystone) - v1.1.0 handle exclusivity model absent from ARCHITECTURE.md - multi-handle protected-quota aggregation bypasses §10 hard cap - quota_override field unbounded vs §10 hard cap Top Critical implementation findings: - WAL truncation race (DurabilityManager.snapshot_ecs) - Attention log drain stale-buffer race + module docstring overpromise - UsdTarget.flush() partial-failure breaks sequential-write atomicity - Stage-partial-batch atomicity break in ConsolidationRunner.run_pass - signal_attention bypasses protected_floor lower bound - Protected-quota count-then-add TOCTOU - docs/api.md still documents pre-v1.1.0 module singleton API Cross-loop memory worked: adversarial reviewer correctly refuted §9 framing of attention-log race in loop 2 (real bug, but ARCHITECTURE.md §5.1 'eventually consistent' permits the loss window — fix is impl correction + docstring softening, not a Round 4 trigger). The constitution and per-round artifacts are gitignored; docs/review/synthesis.md is the durable record. https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL

coderabbitai · 2026-05-04T02:31:47Z

Warning

Rate limit exceeded

@JosephOIbrahim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 44 minutes and 52 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f86197b-9e97-4185-9017-d652b4054a94

📥 Commits

Reviewing files that changed from the base of the PR and between b7c8223 and 3fbdeb9.

📒 Files selected for processing (8)

ARCHITECTURE.md
CLAUDE.md
MONETA.md
docs/review-constitution.md
docs/round4-brief.md
docs/rounds/round-4.md
src/moneta/api.py
tests/unit/test_api.py

📝 Walkthrough

Walkthrough

Adds a multi-agent review harness (scripts + docs + CI ignores + optional dependency) and a review synthesizer/roles system; plus multiple runtime correctness changes in moneta (deposit concurrency lock, durability snapshot atomicity, per-batch consolidation state, usd_target partial-failure reporting, ECS clamping, attention_log doc), with corresponding tests and documentation updates.

Changes

Multi-agent review harness (docs + CLI + synthesis + config)

Layer / File(s)	Summary
Interface / Extras `pyproject.toml`	Adds `project.optional-dependencies.review` with `anthropic>=0.40`.
Repo filtering / git ignore `.gitignore`	Ignores `docs/review/*` artifacts while unignoring `docs/review/synthesis.md`.
Constitution / Policy `docs/review-constitution.md`	Adds locked constitution describing reviewer rules, strict JSON schema, escalation/closure rules, loop themes, role isolation, and output discipline.
User docs / Runbook `CLAUDE.md`, `docs/round4-brief.md`	Documents the harness, install/run examples, and Round 4 escalation brief derived from synthesis.
Roles & Config `scripts/review_roles.py`	New Role/Theme dataclasses, role touchpoint globs, theme mapping, scope utils, and ranking constants.
Harness CLI / Orchestration `scripts/review_harness.py`	Adds CLI to build deterministic repo snapshot, hash constitution, run N loops, call model backends (Anthropic SDK or `claude` CLI) with retries, extract/stamp findings, adversarial pass, checkpointing, and output files under `docs/review/`.
Synthesis / Validation `scripts/review_synthesizer.py`	Validates findings against constitution schema, enforces role-isolation, deduplicates, ranks, records closures, renders per-loop and final markdown, and provides JSONL helpers.

sequenceDiagram
  participant CLI as review_harness CLI
  participant Snapshot as RepoSnapshot
  participant Const as Constitution (docs)
  participant Model as Model Backend\n(Anthropic SDK / claude CLI)
  participant Synth as Synthesizer
  participant FS as Filesystem (`docs/review/*`)

  CLI->>Snapshot: load_repo_snapshot()
  CLI->>Const: read & hash constitution
  CLI->>Model: assemble_reviewer_prompt() -> call_reviewer()
  Model-->>CLI: response (fenced JSON)
  CLI->>Synth: extract_findings_array() + stamp_constitution_hash()
  CLI->>Model: call adversarial reviewer (uses role findings)
  CLI->>Synth: synthesize_loop(raw_findings)
  Synth-->>FS: write round markdown + findings.jsonl
  CLI->>FS: save_checkpoint(), final synthesis.md (on completion)

Core system correctness changes, errors, and tests (moneta runtime & tests)

Layer / File(s)	Summary
Concurrency primitive `src/moneta/api.py`	Adds `self._deposit_lock` and moves protected-quota count+add into a locked critical section in `Moneta.deposit`.
Durability atomicity `src/moneta/durability.py`, `tests/integration/test_durability_roundtrip.py`	Holds `_lock` across snapshot timestamping, write+fsync+atomic replace, WAL truncation; adds integration test exercising concurrent WAL appends during snapshot.
Per-batch consolidation `src/moneta/consolidation.py`, `tests/integration/test_sequential_writer_ordering.py`	Marks entities `CONSOLIDATED` immediately after each successful batch commit; adds tests for partial-batch success vs full success and minor type annotation update.
ECS utility clamping `src/moneta/ecs.py`, `tests/unit/test_ecs.py`	`apply_attention` now clamps utility upper to 1.0 then lower to `protected_floor`; adds unit test verifying clamp at protected floor.
UsdTarget partial-failure reporting `src/moneta/usd_target.py`	Adds `FlushPartialFailureError` and per-layer try/except in `flush()` to report saved/failed/pending layers and raise on first failure.
Attention log docs `src/moneta/attention_log.py`	Rewrites module docstring to clarify possible small-loss race window around buffer swap.
API docs handle transition `docs/api.md`	Rewrites API docs to a Moneta handle model (`with Moneta(...) as m:`), documents lifecycle errors and per-batch consolidation semantics.
Vector index and other unit tests `tests/unit/test_vector_index.py`, `tests/unit/test_decay.py`, `tests/unit/test_api.py`	Adds comprehensive vector index tests, decay invocation scan tests, protected-deposit concurrency test, and other test updates to reflect runtime changes.
Integration wiring various tests	Multiple new/updated unit and integration tests across the codebase aligning with the above behavior changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

"I hopped through diffs at break of dawn,
I stitched the rules where codes were drawn.
Five loops, many voices sing—🐇
I stamped the law, then nudged a spring.
Files saved, tests passed, the branch goes on."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 49.41% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop)' clearly and specifically describes the main addition in this changeset—a new multi-agent review orchestration system with specific implementation details (Claude Opus 4.7, mixture-of-experts roles, 5-iteration loops).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/multi-agent-codebase-review-A8WHg

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

scripts/review_harness.py (1)
340-346: 💤 Low value

Consider adding check=False explicitly for clarity.

The subprocess.run call relies on manual returncode checking (line 347). While this works, explicit check=False documents the intent that you're handling errors manually.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/review_harness.py` around lines 340 - 346, Add an explicit
check=False to the subprocess.run invocation so it is clear errors are handled
manually; update the subprocess.run call that assigns to completed (which passes
cmd and input=bundle.user_prompt) to include check=False to document intent
while preserving existing manual returncode checking and timeout behavior.
scripts/review_roles.py (1)
187-187: 💤 Low value

Optional: Use unpacking syntax per Ruff RUF005.

Ruff suggests (*ROLE_REVIEWERS, ADVERSARIAL) over tuple concatenation. This is purely stylistic — both are correct.
Proposed fix
-ALL_ROLES: tuple[Role, ...] = ROLE_REVIEWERS + (ADVERSARIAL,)
+ALL_ROLES: tuple[Role, ...] = (*ROLE_REVIEWERS, ADVERSARIAL)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/review_roles.py` at line 187, Replace the tuple concatenation used to
build ALL_ROLES with unpacking syntax to satisfy Ruff RUF005: change the
expression that defines ALL_ROLES (currently using ROLE_REVIEWERS +
(ADVERSARIAL,)) to use (*ROLE_REVIEWERS, ADVERSARIAL) so ALL_ROLES remains a
tuple of Role while following the recommended style; ensure the name ALL_ROLES
and the referenced symbols ROLE_REVIEWERS and ADVERSARIAL are used exactly as in
the current definition.
docs/review-constitution.md (1)
125-137: 💤 Low value

Add language specifier to fenced code block.

The fenced code block at line 125 lacks a language specifier, triggering markdownlint MD040. Since this is verbatim text from CLAUDE.md, use ```text to silence the lint warning while preserving intent.
Proposed fix
-```
+```text
 - Do not re-open Round 2 or Round 3 decisions. Escalate per MONETA.md §9.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/review-constitution.md` around lines 125 - 137, The fenced code block
that begins with the line "- Do not re-open Round 2 or Round 3 decisions.
Escalate per MONETA.md §9." is missing a language specifier and triggers
markdownlint MD040; update the opening fence so it specifies the text language
(i.e., change the opening triple-backtick to include "text") for the code block
containing the list so the linter is satisfied while preserving the verbatim
CLAUDE.md content.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/review-constitution.md`:
- Around line 125-137: The fenced code block that begins with the line "- Do not
re-open Round 2 or Round 3 decisions. Escalate per MONETA.md §9." is missing a
language specifier and triggers markdownlint MD040; update the opening fence so
it specifies the text language (i.e., change the opening triple-backtick to
include "text") for the code block containing the list so the linter is
satisfied while preserving the verbatim CLAUDE.md content.

In `@scripts/review_harness.py`:
- Around line 340-346: Add an explicit check=False to the subprocess.run
invocation so it is clear errors are handled manually; update the subprocess.run
call that assigns to completed (which passes cmd and input=bundle.user_prompt)
to include check=False to document intent while preserving existing manual
returncode checking and timeout behavior.

In `@scripts/review_roles.py`:
- Line 187: Replace the tuple concatenation used to build ALL_ROLES with
unpacking syntax to satisfy Ruff RUF005: change the expression that defines
ALL_ROLES (currently using ROLE_REVIEWERS + (ADVERSARIAL,)) to use
(*ROLE_REVIEWERS, ADVERSARIAL) so ALL_ROLES remains a tuple of Role while
following the recommended style; ensure the name ALL_ROLES and the referenced
symbols ROLE_REVIEWERS and ADVERSARIAL are used exactly as in the current
definition.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd23c72b-506d-4c57-9f63-3ae86b9ce6b8

📥 Commits

Reviewing files that changed from the base of the PR and between 9a46293 and 6350c2c.

📒 Files selected for processing (8)

.gitignore
CLAUDE.md
docs/review-constitution.md
docs/review/synthesis.md
pyproject.toml
scripts/review_harness.py
scripts/review_roles.py
scripts/review_synthesizer.py

Implementation pass against the v1.0.0 multi-agent review synthesis committed in 6350c2c. Fixes the Critical impl bugs that did NOT need §9 escalation; routes the 6 spec-level surprises to docs/round4-brief.md for the Architect. Implementation fixes: - ecs.py apply_attention: clamp utility lower bound against protected_floor so negative-weight signals cannot defeat §10 protection (substrate-L3-signal-attention-bypasses-protected-floor). - api.py deposit: per-handle threading.Lock around count_protected + ecs.add so concurrent protected deposits at quota-1 cannot both succeed (substrate-L2-protected-quota-count-then-add-toctou). - durability.py snapshot_ecs: hold _lock across timestamp capture, JSON dump, and WAL unlink — closes the race window where wal_appends between `now` capture and unlink were silently truncated (persistence-L1-wal-truncation-race). - usd_target.py flush: new FlushPartialFailureError carrying saved/failed/pending layer bookkeeping so SequentialWriter can refuse to advance the vector index when only some layers Saved (usd-L2-flush-partial-failure-breaks-sequential-write-atomicity). - consolidation.py run_pass: hoist STAGED_FOR_SYNC → CONSOLIDATED transition INSIDE the batch loop so a later-batch failure does not pin earlier successfully-committed batches in STAGED_FOR_SYNC forever (consolidation-L2-stage-partial-batch-atomicity-break). - attention_log.py module docstring: soften "Entries cannot be lost" to match ARCHITECTURE.md §5.1 "lock-free, eventually consistent, simplest failure mode" — adversarial-L2 correctly classified the drain race as an impl bug + docstring fix, NOT §9. - docs/api.md: full rewrite to v1.1.0+ handle pattern (collapses 5 separate findings into one pass; replaces moneta.init() / module- level free functions with `with Moneta(MonetaConfig.ephemeral())`). New tests (21): - tests/unit/test_decay.py: AST scan asserting exactly three decay evaluation points (api.py, attention_log.py, consolidation.py) per ARCHITECTURE.md §16; adding a fourth site is §9 Trigger 2. - tests/unit/test_vector_index.py: new file, 16 tests covering §7.1 dim-homogeneity hard-raise, §5.1 silent-no-op for missing entities, query zero-norm guards, snapshot/restore round-trip. - tests/unit/test_ecs.py: floor-clamp regression for apply_attention. - tests/unit/test_api.py: 20-thread concurrent stress for the deposit lock, asserts exactly one of N succeeds at quota-1. - tests/integration/test_durability_roundtrip.py: appender threads hammering wal_append while main thread runs many snapshot_ecs calls; hydrate succeeds, no IndexError, no silent corruption. - tests/integration/test_sequential_writer_ordering.py: per-batch ECS state preservation under mid-batch commit_staging failure. New docs: - docs/round4-brief.md: the 6 genuine §9 Trigger 2 candidates the review surfaced (four-op signature drift, protected-floor consolidation contradiction, vector-authoritative inversion across hydrate, handle exclusivity model absent from ARCHITECTURE.md, quota_override unbounded vs §10 cap, multi-handle aggregation). Architect-routed; no implementation proceeds on affected subsystems until the round closes. Verification: 117 passed, 4 pxr-skipped (was 94 baseline — 23 new tests). All prior tests still green. Ruff clean on modified files. https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/moneta/ecs.py (1)
191-217: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Docstring formula is stale after the protected-floor clamp.

The method docstring at line 193 still advertises Utility = min(1.0, Utility + summed_weight), but the new code also clamps to protected_floor (lines 214-216). With negative summed_weight the floor is now load-bearing, and the docstring contract should reflect it so callers reading the spec line don't misjudge the lower bound.
📝 Proposed docstring fix
         `agg` maps entity_id to `(summed_weight, signal_count)`, produced by
         `attention_log.aggregate`. Per ARCHITECTURE.md §5:
 
-            Utility       = min(1.0, Utility + summed_weight)
+            Utility       = max(protected_floor,
+                                min(1.0, Utility + summed_weight))
             AttendedCount = AttendedCount + signal_count
             LastEvaluated = now
As per coding guidelines: "Do not let documentation lag implementation by more than one PR. Documentarian maintains inline docstrings…".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/moneta/ecs.py` around lines 191 - 217, The docstring for the method that
reduces attention_log.aggregate still states "Utility = min(1.0, Utility +
summed_weight)" but the implementation also clamps the result to
self._protected_floor (via self._protected_floor[row] and new_u < floor logic);
update the docstring to reflect the full contract—Utility = max(protected_floor,
min(1.0, Utility + summed_weight)) (and note that AttendedCount adds
signal_count and missing entities are skipped) so readers see the
protected-floor lower bound; reference attention_log.aggregate,
summed_weight/count from agg, and the fields self._utility and
self._protected_floor in the description.

🧹 Nitpick comments (1)

tests/unit/test_decay.py (1)
188-209: ⚡ Quick win

Make the §16 gate recursive — glob("*.py") skips subpackages.

The docstring claims it walks "every module under src/moneta/", but moneta_root.glob("*.py") is non-recursive and only enumerates top-level files. The package is flat today, but the moment a future PR introduces a subpackage that calls decay_all/decay_value (the exact §9 Trigger 2 scenario this test is meant to catch), the gate will pass silently — the conformance assertion would then be a false positive.

Switching to rglob (with the same name-skip set) keeps the gate honest as the package grows, at zero readability cost.
♻️ Proposed fix
         moneta_root = Path(moneta.__file__).parent
         sites: list[tuple[str, int, str]] = []
-        for py in sorted(moneta_root.glob("*.py")):
+        for py in sorted(moneta_root.rglob("*.py")):
             if py.name in {"ecs.py", "decay.py", "__init__.py"}:
                 continue
Note: if subpackages later host decay implementation files of their own, the name in {...} skip-list will need to grow accordingly, or you can switch to a path-suffix check.
As per coding guidelines: "ARCHITECTURE.md is the locked spec... If the blueprint and spec disagree on a clause, escalate per MONETA.md §9." A silent false-pass here would let a §9 Trigger 2 ship.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_decay.py` around lines 188 - 209, The test scanning code uses
moneta_root.glob("*.py") which is non-recursive; change that call to
moneta_root.rglob("*.py") in the module-scanning block (the code that collects
sites using the moneta_root variable and appends to sites) so it visits
subpackages as well, keeping the existing skip-check for py.name in
{"ecs.py","decay.py","__init__.py"} unchanged; also leave a comment or adjust
the skip strategy (e.g., path-suffix check) if you expect decay implementation
files to appear in subpackages later so the exclusion logic remains correct.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/moneta/usd_target.py`:
- Around line 422-467: The flush logic in UsdTarget currently catches
BaseException and ignores boolean return values from Sdf.Layer.Save(), so
failures that return False are treated as successes; change the loop over
self._layers and the final self._root_layer.Save() handling to (1) call Save()
and if it returns False immediately raise FlushPartialFailureError with the same
saved_layers/failed_layer/pending_layers semantics (use failed_layer=name for
layer loop and "<root>" for root), and (2) narrow exception handlers from except
BaseException to except Exception to still convert raised exceptions (e.g.
pxr.Tf.ErrorException) into FlushPartialFailureError while avoiding masking
SystemExit/KeyboardInterrupt; reference self._layers, layer.Save(),
self._root_layer.Save(), and FlushPartialFailureError when applying the fixes.

In `@tests/integration/test_durability_roundtrip.py`:
- Around line 230-251: The test currently only checks that appends completed and
hydrate succeeds, but doesn't verify each wal_append is either in a snapshot or
replayed from the post-snapshot WAL; modify the test around the wal_append loop
and re-hydration to record each append's (eid, ts) (use the existing written
list or extend run_sleep_pass to return timestamps), capture the timestamp of
the last snapshot call (snapshot_ecs) before closing, then after reopening
Moneta (m2) assert for every recorded (eid, ts) that either ts <=
snapshot_created_at (thus allowed to be in snapshot) or the entry appears in
m2.attention/was replayed (i.e., m2.attention contains that eid+payload),
ensuring each wal_append is accounted for rather than only checking hydrate
succeeded or seeded entity exists.

---

Outside diff comments:
In `@src/moneta/ecs.py`:
- Around line 191-217: The docstring for the method that reduces
attention_log.aggregate still states "Utility = min(1.0, Utility +
summed_weight)" but the implementation also clamps the result to
self._protected_floor (via self._protected_floor[row] and new_u < floor logic);
update the docstring to reflect the full contract—Utility = max(protected_floor,
min(1.0, Utility + summed_weight)) (and note that AttendedCount adds
signal_count and missing entities are skipped) so readers see the
protected-floor lower bound; reference attention_log.aggregate,
summed_weight/count from agg, and the fields self._utility and
self._protected_floor in the description.

---

Nitpick comments:
In `@tests/unit/test_decay.py`:
- Around line 188-209: The test scanning code uses moneta_root.glob("*.py")
which is non-recursive; change that call to moneta_root.rglob("*.py") in the
module-scanning block (the code that collects sites using the moneta_root
variable and appends to sites) so it visits subpackages as well, keeping the
existing skip-check for py.name in {"ecs.py","decay.py","__init__.py"}
unchanged; also leave a comment or adjust the skip strategy (e.g., path-suffix
check) if you expect decay implementation files to appear in subpackages later
so the exclusion logic remains correct.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc4a2bf4-cb9a-445b-ac23-7ff8542c660b

📥 Commits

Reviewing files that changed from the base of the PR and between 6350c2c and b7c8223.

📒 Files selected for processing (14)

docs/api.md
docs/round4-brief.md
src/moneta/api.py
src/moneta/attention_log.py
src/moneta/consolidation.py
src/moneta/durability.py
src/moneta/ecs.py
src/moneta/usd_target.py
tests/integration/test_durability_roundtrip.py
tests/integration/test_sequential_writer_ordering.py
tests/unit/test_api.py
tests/unit/test_decay.py
tests/unit/test_ecs.py
tests/unit/test_vector_index.py

✅ Files skipped from review due to trivial changes (2)

tests/unit/test_ecs.py
docs/round4-brief.md

coderabbitai · 2026-05-05T14:15:08Z

+        if self._in_memory:
+            return
+
+        saved: List[str] = []
+        # Stable iteration: capture an explicit list so saved_layers and
+        # pending_layers refer to the same ordering on retry diagnostics.
+        layer_items: List[Tuple[str, Sdf.Layer]] = list(self._layers.items())
+        for idx, (name, layer) in enumerate(layer_items):
+            try:
                layer.Save()
+            except BaseException as exc:  # USD raises diverse exception types
+                pending = [n for n, _ in layer_items[idx + 1 :]]
+                _logger.error(
+                    "UsdTarget.flush partial failure on layer=%s "
+                    "saved=%d pending=%d cause=%s",
+                    name,
+                    len(saved),
+                    len(pending),
+                    exc,
+                )
+                raise FlushPartialFailureError(
+                    saved_layers=saved,
+                    failed_layer=name,
+                    pending_layers=pending,
+                    cause=exc,
+                ) from exc
+            saved.append(name)
+
+        # Root layer last: if it raises, every sublayer is durable but the
+        # composition handle is not — still a partial-failure event from
+        # §7's perspective.
+        try:
            self._root_layer.Save()
+        except BaseException as exc:
+            _logger.error(
+                "UsdTarget.flush partial failure on root layer "
+                "saved=%d cause=%s",
+                len(saved),
+                exc,
+            )
+            raise FlushPartialFailureError(
+                saved_layers=saved,
+                failed_layer="<root>",
+                pending_layers=[],
+                cause=exc,
+            ) from exc


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What exception types does pxr Sdf Layer Save raise OpenUSD 0.25.5

💡 Result:

In OpenUSD 0.25.5, SdfLayer::Save does not document any specific “exception types” that it throws; instead, the API is defined to return a boolean success/failure value (Save returns true on success, false if an error occurred) [1]. What you’ll typically see in practice is that OpenUSD may emit Tf.ErrorException instances in error scenarios (for example, when file-format parsing/reading fails in related layer/file-format operations), but that is not stated as the guaranteed exception contract for SdfLayer::Save itself [2]. Therefore, from the authoritative API surface available in the OpenUSD documentation for this release line: there are no enumerated exception types for SdfLayer::Save to report beyond the documented boolean return behavior [1].

Citations:

1: https://openusd.org/dev/api/class_sdf_layer.html

2: Import fails when importing usda into usdc (and usdc into usda) PixarAnimationStudios/OpenUSD#3660

🌐 Web query:

pxr.Sdf.Layer.Save() Python API exception behavior OpenUSD

💡 Result:

The pxr.Sdf.Layer.Save method in the OpenUSD Python API returns a boolean: True if the save was successful, False if an error occurred [1][2]. It does not throw exceptions for save failures; instead, errors are reported via the Tf error system (pxr.Tf errors/warnings) [3]. In Python bindings, TfErrors encountered during the call are automatically converted to Python exceptions like pxr.Tf.ErrorException, but only for unhandled errors—standard save failures return False without exception [3][4]. Examples of failure cases returning False (no exception): - Anonymous layers cannot be saved [1]. - Layer has no remembered file identifier [2]. - Layer not dirty and file exists (unless force=True) [2]. - Specific bugs like identifier change on usdc layers (returns False) [5]. However, fatal errors in underlying file formats (e.g., invalid file parsing during related ops like Import) do throw pxr.Tf.ErrorException [4]. To handle: Check the return value and optionally query Tf diagnostics: from pxr import Sdf, Tf layer = Sdf.Layer.CreateNew('test.usda') success = layer.Save() if not success: print("Save failed - check Tf errors") for diag in Tf.Diagnostics(): print(diag) No exceptions for standard save errors; use return code [1][2].

Citations:

1: https://openusd.org/dev/api/class_sdf_layer.html

2: https://github.com/parallax/ar-export/blob/master/libs/USDPython/USD/pxr/Sdf/__DOC.py

3: https://www.sidefx.com/docs/hdk20.5/py_error_8h_source.html

4: Import fails when importing usda into usdc (and usdc into usda) PixarAnimationStudios/OpenUSD#3660

5: Layer Cannot Be Saved When Identifier Changes for Crate Files PixarAnimationStudios/OpenUSD#3975

Narrow except BaseException to except Exception and check Save() return values instead of relying solely on exception handling.

Verification of OpenUSD 0.25.5 behavior confirms that pxr.Sdf.Layer.Save() primarily returns a boolean (True for success, False for standard failures like anonymous layers or missing file identifiers) rather than raising exceptions. Only fatal errors in underlying file formats trigger pxr.Tf.ErrorException.

The current code at lines 432 and 455 catches BaseException but does not check the return value, which means standard save failures (returning False) are silently ignored and never raise FlushPartialFailureError. This is a safety issue: the flush loop will conclude successfully even if individual layers failed to save.

To fix:

Check the boolean return value: if not layer.Save(): raise FlushPartialFailureError(...)

Narrow the exception handler to except Exception to avoid trapping KeyboardInterrupt and SystemExit

The narrowing from BaseException to Exception remains valid and necessary to prevent masking interpreter control signals.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/moneta/usd_target.py` around lines 422 - 467, The flush logic in UsdTarget currently catches BaseException and ignores boolean return values from Sdf.Layer.Save(), so failures that return False are treated as successes; change the loop over self._layers and the final self._root_layer.Save() handling to (1) call Save() and if it returns False immediately raise FlushPartialFailureError with the same saved_layers/failed_layer/pending_layers semantics (use failed_layer=name for layer loop and "<root>" for root), and (2) narrow exception handlers from except BaseException to except Exception to still convert raised exceptions (e.g. pxr.Tf.ErrorException) into FlushPartialFailureError while avoiding masking SystemExit/KeyboardInterrupt; reference self._layers, layer.Save(), self._root_layer.Save(), and FlushPartialFailureError when applying the fixes.

coderabbitai · 2026-05-05T14:15:08Z

+
+            # Read back: every wal_append must be either in the post-snapshot
+            # WAL OR no-longer-in-WAL because its timestamp predates the
+            # final snapshot's snapshot_created_at (and thus is captured in
+            # the snapshot's rows or correctly truncated as already-applied).
+            # The bug we're guarding against is: an entry written between
+            # `now` capture and unlink on an EARLIER snapshot, with the
+            # entry's timestamp > that snapshot's `now`, then unlinked.
+            # With the fix, that race window is closed: the only entries
+            # that get unlinked are those written under the same lock as
+            # the timestamp capture, which means their timestamps are
+            # strictly < `now`.
+            #
+            # Operationally: re-hydrate and verify the resulting state is
+            # consistent — no IndexError, no silent corruption.
+            assert len(written) == n_appenders * appends_per_thread
+
+        with Moneta(cfg) as m2:
+            # Hydrate must succeed without error, ECS preserved.
+            assert m2.ecs.n == 1
+            mem = m2.ecs.get_memory(eid)
+            assert mem is not None


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test assertions are weaker than the docstring claim.

The docstring promises that every wal_append survives — either captured in snapshot rows or preserved in the post-snapshot WAL. But the actual assertions only check (a) that all appends completed without raising, (b) that hydrate succeeds, and (c) that the seeded entity is present. The seeded entity is captured by run_sleep_pass's snapshot, not by WAL replay, so this test would still pass even if every WAL append were silently truncated.

To actually exercise the truncation race, the test should record the per-thread timestamps, then on re-hydrate verify that for each (eid, ts) the entry was either replayed (via the attention log post-hydrate) or accounted for in a prior snapshot. A simple proxy: append an entry whose timestamp is strictly after the last snapshot_ecs call before close, and assert it shows up on m2.attention after hydrate.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_durability_roundtrip.py` around lines 230 - 251, The test currently only checks that appends completed and hydrate succeeds, but doesn't verify each wal_append is either in a snapshot or replayed from the post-snapshot WAL; modify the test around the wal_append loop and re-hydration to record each append's (eid, ts) (use the existing written list or extend run_sleep_pass to return timestamps), capture the timestamp of the last snapshot call (snapshot_ecs) before closing, then after reopening Moneta (m2) assert for every recorded (eid, ts) that either ts <= snapshot_created_at (thus allowed to be in snapshot) or the entry appears in m2.attention/was replayed (i.e., m2.attention contains that eid+payload), ensuring each wal_append is accounted for rather than only checking hydrate succeeded or seeded entity exists.

Round 4 closes the six genuine §9 Trigger 2 (spec-level surprise) candidates the multi-agent review surfaced. Closure document: docs/rounds/round-4.md. Brief: docs/round4-brief.md (now superseded). This round did NOT go to Gemini Deep Think — none of the items required external scoping. Each had a clear locally-correct ruling derivable from existing artifacts plus the ratified implementation history (v1.1.0 surgery, Phase 3 narrow-lock work, Pass 6 closure). Round 4 is post-implementation reconciliation between specs that locked at different times. Six rulings: 1. Four-op API surface is methods on Moneta(config) handle. Type signatures in MONETA.md §2.1 unchanged; dispatch (method vs free function) is implementation. The fifth-op rule applies to public methods of Moneta. Spec amended at MONETA.md §2.1, ARCHITECTURE.md §2 + §16. Code already conformant from v1.1.0 surgery. 2. Protected memories are PINNED, not consolidated. The §6 staging gate (utility < 0.3) is unreachable for protected_floor ≥ 0.3 by construction; this is the deliberate spec choice, not a bug. cortex_protected.usda is reserved for the explicit Phase 3 unpin tool. Spec amended at MONETA.md §2.6, ARCHITECTURE.md §6. 3. Vector index is RUNTIME-authoritative; ECS snapshot is ACROSS-RESTART authoritative. The §7 atomicity guarantee is a within-session property; hydrate's ECS-first ordering is a Phase 1 in-memory shadow consequence. Phase 2 LanceDB persistence is the path to making vector authoritative across restart, if needed. Spec amended at ARCHITECTURE.md §7.2 (new sub-clause). 4. Handle exclusivity is per-process via _ACTIVE_URIS. New ARCHITECTURE.md §17 with five sub-clauses: registry/lifecycle, GIL-atomicity TOCTOU model, fork() behavior (cross-process is bridge layer's concern via flock), PEP 703 free-threading (migration is §9 Trigger 2), SIGTERM cleanup ordering. MONETA.md §2.11 cross-references §17. 5. quota_override BOUNDED at 1 ≤ q ≤ 1000. Default 100 unchanged. New MAX_QUOTA_OVERRIDE constant in src/moneta/api.py; MonetaConfig.__post_init__ raises ValueError on out-of-range construction with a message that cites "Round 4 Ruling 5". Spec amended at MONETA.md §2.10, ARCHITECTURE.md §10. 6. _ACTIVE_URIS scope is "per substrate handle" — disambiguates §10's "per agent." Multi-handle aggregation is by design (an agent process holding N handles on N storage URIs gets N independent quotas). Spec amended at MONETA.md §2.10, ARCHITECTURE.md §10. Code change (Ruling 5 only): - src/moneta/api.py: MAX_QUOTA_OVERRIDE = 1000; MonetaConfig __post_init__ validation; new module-level constant. - tests/unit/test_api.py: 8 new tests covering floor/ceiling acceptance, zero/negative/above-ceiling/unbounded rejection, and the error-message provenance ("Round 4 Ruling 5"). Constitutional updates: - docs/review-constitution.md: round-closure binding now Rounds 1–4 (was 1–3); §3 hard-rule line updated; §12 anti-pattern updated. This means the harness's constitution hash changes and any future review run will treat Round 4 as closed. - CLAUDE.md: source-of-truth list now references round-{1..4}; hard-rules line updated; locked-decisions list updated. - docs/round4-brief.md: top header marks SUPERSEDED, points readers at docs/rounds/round-4.md for the rulings themselves. Verification: 125 passed, 4 pxr-skipped (was 117 pre-Round-4 — 8 new tests for the quota_override bound). Smoke check confirms the bound fires with the expected message and the four-op handle pattern still works end-to-end. https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL

Four-item state-of-union followup: 1. docs/api.md — full rewrite for the v1.1.0+ handle API. The previous text described the deleted singleton API (moneta.init(), module-level moneta.deposit(...), MonetaNotInitializedError, PROTECTED_QUOTA=100 module constant). Replaced with: handle construction via Moneta(MonetaConfig), context-manager pattern, per-handle quota_override, multi-instance section, current error classes (MonetaResourceLockedError, ProtectedQuotaExceededError), runtime requirements section. 2. README.md test counts — bumped 107 → 109 (plain Python) and 147 → 149 (hython total). Drift since the free-threading guard tests landed in 76da067. Historical surgery-table number (94 → 107 delta column) left intact. 3. docs/patent-evidence/README.md — split into "claim-substantiating evidence" (Pass 5 + Pass 6, mapping to claims #4 and #3) and "implementation maturity evidence" (v1.1.0 TOCTOU 480-attempt, v1.2.0-rc1 codeless schema acceptance gate, free-threading guard). Added explicit claims-evidence gap analysis: claims #1 (LIVRPS decay priority) and #2 (variant LOD for fidelity) lack dated entries; claim #3 has only Pass 6 secondary mention. Added "how to add a new entry" section codifying date / claim mapping / prior art / reproducibility requirements. 4. CLAUDE.md hard rules — surfaced the free-threading limitation as an explicit hard rule (was source-only in attention_log.py docstring). Bumped stale "94 existing Phase 1 tests" reference to the current 109 plain / 149 hython numbers. No code changed. No tests added or removed. The four locked decisions in the spec remain untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Six updates that bring the README current with reality post the documentarian-debt followup: 1. Status table — added free-threading guard row (commit 76da067). Module-level surgical addition, not a tagged version, but worth surfacing so the entry isn't only visible by reading source. 2. Locked decisions §3 (concurrency primitive) — appended GIL-required clause. The lock-free swap-and-drain correctness argument depends on the GIL; AttentionLog.__init__ raises under PEP 703. Forward work tracked via §9 Trigger 2. 3. Project structure tests/ block — added test_attention_log_gil_guard.py line. Math now reconciles to 109 plain Python (70 unit + 22 integration + 2 load + 3 twin + 10 twin-adversarial + 2 gil-guard). 4. Lineage paragraph — appended free-threading guard step + this documentarian followup (commit 6cb1fd1) so the timeline stays complete through the present. 5. System overview mermaid — annotated AttentionLog node with "GIL-required · PEP 703 sentinel" so the architecture diagram surfaces the runtime contract, not just the structural one. 6. NEW mermaid diagram — claim → evidence map next to the Novelty claims section. Visualizes the gap analysis from the patent-evidence README: claims #3 and #4 have direct evidence; claims #1 and #2 have dated-entry gaps; implementation maturity evidence (v1.1.0, v1.2.0-rc1, free-threading guard) is shown as auxiliary not claim-substantiating. Same two-color palette as the other 8 diagrams. No code changed. Test suite untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude added 3 commits May 3, 2026 23:21

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop)#2

Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop)#2
JosephOIbrahim wants to merge 5 commits into
mainfrom
claude/multi-agent-codebase-review-A8WHg

JosephOIbrahim commented May 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 5, 2026

Uh oh!

coderabbitai Bot May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JosephOIbrahim commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JosephOIbrahim commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading