Skip to content

Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop)#2

Open
JosephOIbrahim wants to merge 5 commits into
mainfrom
claude/multi-agent-codebase-review-A8WHg
Open

Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop)#2
JosephOIbrahim wants to merge 5 commits into
mainfrom
claude/multi-agent-codebase-review-A8WHg

Conversation

@JosephOIbrahim
Copy link
Copy Markdown
Owner

@JosephOIbrahim JosephOIbrahim commented May 4, 2026

A long-running orchestrator that spawns role-scoped reviewers mirroring
MONETA.md §6 plus an Adversarial-Reviewer (Commandment 7), governed by a
bespoke constitution sourced verbatim from the locked artifacts.

  • docs/review-constitution.md: composition of MONETA.md §6/§7/§9,
    ARCHITECTURE.md §15.2/§15.6, CLAUDE.md hard rules,
    agent-commandments.md, and substrate-conventions.md. No new doctrine —
    every clause cites a locked artifact. Hashed and stamped into every
    finding; mid-run drift aborts.
  • scripts/review_roles.py: 7 role-reviewers + Adversarial; 5 themed loops
    (spec-conformance → concurrency-atomicity → correctness-edges →
    performance-envelope → simplicity-doc-tests); touchpoint glob matching
    for role isolation.
  • scripts/review_synthesizer.py: schema validation, role-isolation
    filter, fuzzy dedupe, severity+§9-aware ranking, cross-loop closure
    ledger, per-loop markdown rendering.
  • scripts/review_harness.py: parallel fan-out per loop, JSONL
    checkpointing, --resume, --dry-run, --max-loops, prompt caching on
    the constitution + repo snapshot, three-retry cap per Commandment 3.
  • pyproject.toml: [review] optional extra (anthropic>=0.40); not pulled
    into Phase 1/3 runtime.
  • .gitignore: per-round files and findings.jsonl ignored;
    synthesis.md re-included.
  • CLAUDE.md: one-line pointer in the build-and-test section.

Verification: ruff clean, 94 existing tests pass, dry-run loop 1
assembles 8 prompts (constitution+role ~25KB system, snapshot ~521KB
user; constitution hash 3b56e3ffae083c9b), JSONL round-trip works,
schema validator rejects malformed findings, snapshot excludes
harness/constitution to prevent feedback loops.

The live 5-loop run is intentionally not executed in this commit; user
runs it separately after reviewing the harness, the constitution, and
the smoke-test output.

https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL

Summary by CodeRabbit

  • New Features
    • Multi-agent review harness with configurable reviewer roles/themes, checkpointed resumption, per-loop and cross-loop synthesis.
  • Documentation
    • Added review constitution, harness usage docs, synthesis/round brief, and updated API handle-based examples and lifecycle notes.
  • Bug Fixes / Reliability
    • Fixed concurrent protected-deposit race, atomic snapshot/durability window, per-batch consolidation semantics, attention clamping, and improved partial-flush error reporting.
  • Tests
    • New unit and integration tests covering concurrency, durability, consolidation, decay call-sites, ECS clamping, and vector index behaviors.

claude added 3 commits May 3, 2026 23:21
A long-running orchestrator that spawns role-scoped reviewers mirroring
MONETA.md §6 plus an Adversarial-Reviewer (Commandment 7), governed by a
bespoke constitution sourced verbatim from the locked artifacts.

- docs/review-constitution.md: composition of MONETA.md §6/§7/§9,
  ARCHITECTURE.md §15.2/§15.6, CLAUDE.md hard rules,
  agent-commandments.md, and substrate-conventions.md. No new doctrine —
  every clause cites a locked artifact. Hashed and stamped into every
  finding; mid-run drift aborts.
- scripts/review_roles.py: 7 role-reviewers + Adversarial; 5 themed loops
  (spec-conformance → concurrency-atomicity → correctness-edges →
  performance-envelope → simplicity-doc-tests); touchpoint glob matching
  for role isolation.
- scripts/review_synthesizer.py: schema validation, role-isolation
  filter, fuzzy dedupe, severity+§9-aware ranking, cross-loop closure
  ledger, per-loop markdown rendering.
- scripts/review_harness.py: parallel fan-out per loop, JSONL
  checkpointing, --resume, --dry-run, --max-loops, prompt caching on
  the constitution + repo snapshot, three-retry cap per Commandment 3.
- pyproject.toml: [review] optional extra (anthropic>=0.40); not pulled
  into Phase 1/3 runtime.
- .gitignore: per-round files and findings.jsonl ignored;
  synthesis.md re-included.
- CLAUDE.md: one-line pointer in the build-and-test section.

Verification: ruff clean, 94 existing tests pass, dry-run loop 1
assembles 8 prompts (constitution+role ~25KB system, snapshot ~521KB
user; constitution hash 3b56e3ffae083c9b), JSONL round-trip works,
schema validator rejects malformed findings, snapshot excludes
harness/constitution to prevent feedback loops.

The live 5-loop run is intentionally not executed in this commit; user
runs it separately after reviewing the harness, the constitution, and
the smoke-test output.

https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
Adds an alternate model-call backend that shells out to `claude -p`,
making the harness runnable in environments that have an authenticated
Claude Code CLI but no ANTHROPIC_API_KEY. The SDK backend is unchanged
and still preferred when the env var is set.

- New --backend {auto,sdk,cli} flag; auto picks sdk if ANTHROPIC_API_KEY
  is set, else cli.
- _call_via_cli pipes the user prompt over stdin (snapshot ~520KB
  exceeds argv limits), passes the system prompt via --system-prompt,
  and parses claude's JSON output.
- Retry budget (3 attempts, exponential backoff) is shared across both
  backends and continues to honor Commandment 3.
- Smoke-tested end to end: SMOKE-OK round-trip via _call_via_cli.
- .gitignore: docs/review/run.log added (ephemeral tee target).

https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
Final cross-loop rollup from scripts/review_harness.py against this
branch. 5 themed loops, 8 reviewers each (7 role + adversarial),
constitution hash 3b56e3ffae083c9b.

Stats:
- 494 findings retained (all loops); 466 open after closures
- 17 Critical, 100 High, 8 §9 escalation candidates
- Per-loop kept: 67 / 91 / 128 / 91 / 117
- Per-loop invalid: 0 / 0 / 0 / 35 / 0 (loop 4 schema drift)
- Per-loop role-isolation drops: 0 / 0 / 0 / 2 / 25
- 0 duplicates across all loops

§9 escalation candidates surfaced (Architect-routed):
- four-op signature drift vs ARCHITECTURE.md §2 (module-level vs bound
  method)
- protected-floor consolidation contradiction (MONETA.md §2.5 vs
  ARCHITECTURE.md §6 selection criteria)
- vector-authoritative inversion across hydrate path (§7 keystone)
- v1.1.0 handle exclusivity model absent from ARCHITECTURE.md
- multi-handle protected-quota aggregation bypasses §10 hard cap
- quota_override field unbounded vs §10 hard cap

Top Critical implementation findings:
- WAL truncation race (DurabilityManager.snapshot_ecs)
- Attention log drain stale-buffer race + module docstring overpromise
- UsdTarget.flush() partial-failure breaks sequential-write atomicity
- Stage-partial-batch atomicity break in ConsolidationRunner.run_pass
- signal_attention bypasses protected_floor lower bound
- Protected-quota count-then-add TOCTOU
- docs/api.md still documents pre-v1.1.0 module singleton API

Cross-loop memory worked: adversarial reviewer correctly refuted §9
framing of attention-log race in loop 2 (real bug, but ARCHITECTURE.md
§5.1 'eventually consistent' permits the loss window — fix is impl
correction + docstring softening, not a Round 4 trigger).

The constitution and per-round artifacts are gitignored;
docs/review/synthesis.md is the durable record.

https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Warning

Rate limit exceeded

@JosephOIbrahim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 44 minutes and 52 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f86197b-9e97-4185-9017-d652b4054a94

📥 Commits

Reviewing files that changed from the base of the PR and between b7c8223 and 3fbdeb9.

📒 Files selected for processing (8)
  • ARCHITECTURE.md
  • CLAUDE.md
  • MONETA.md
  • docs/review-constitution.md
  • docs/round4-brief.md
  • docs/rounds/round-4.md
  • src/moneta/api.py
  • tests/unit/test_api.py
📝 Walkthrough

Walkthrough

Adds a multi-agent review harness (scripts + docs + CI ignores + optional dependency) and a review synthesizer/roles system; plus multiple runtime correctness changes in moneta (deposit concurrency lock, durability snapshot atomicity, per-batch consolidation state, usd_target partial-failure reporting, ECS clamping, attention_log doc), with corresponding tests and documentation updates.

Changes

Multi-agent review harness (docs + CLI + synthesis + config)

Layer / File(s) Summary
Interface / Extras
pyproject.toml
Adds project.optional-dependencies.review with anthropic>=0.40.
Repo filtering / git ignore
.gitignore
Ignores docs/review/* artifacts while unignoring docs/review/synthesis.md.
Constitution / Policy
docs/review-constitution.md
Adds locked constitution describing reviewer rules, strict JSON schema, escalation/closure rules, loop themes, role isolation, and output discipline.
User docs / Runbook
CLAUDE.md, docs/round4-brief.md
Documents the harness, install/run examples, and Round 4 escalation brief derived from synthesis.
Roles & Config
scripts/review_roles.py
New Role/Theme dataclasses, role touchpoint globs, theme mapping, scope utils, and ranking constants.
Harness CLI / Orchestration
scripts/review_harness.py
Adds CLI to build deterministic repo snapshot, hash constitution, run N loops, call model backends (Anthropic SDK or claude CLI) with retries, extract/stamp findings, adversarial pass, checkpointing, and output files under docs/review/.
Synthesis / Validation
scripts/review_synthesizer.py
Validates findings against constitution schema, enforces role-isolation, deduplicates, ranks, records closures, renders per-loop and final markdown, and provides JSONL helpers.
sequenceDiagram
  participant CLI as review_harness CLI
  participant Snapshot as RepoSnapshot
  participant Const as Constitution (docs)
  participant Model as Model Backend\n(Anthropic SDK / claude CLI)
  participant Synth as Synthesizer
  participant FS as Filesystem (`docs/review/*`)

  CLI->>Snapshot: load_repo_snapshot()
  CLI->>Const: read & hash constitution
  CLI->>Model: assemble_reviewer_prompt() -> call_reviewer()
  Model-->>CLI: response (fenced JSON)
  CLI->>Synth: extract_findings_array() + stamp_constitution_hash()
  CLI->>Model: call adversarial reviewer (uses role findings)
  CLI->>Synth: synthesize_loop(raw_findings)
  Synth-->>FS: write round markdown + findings.jsonl
  CLI->>FS: save_checkpoint(), final synthesis.md (on completion)
Loading

Core system correctness changes, errors, and tests (moneta runtime & tests)

Layer / File(s) Summary
Concurrency primitive
src/moneta/api.py
Adds self._deposit_lock and moves protected-quota count+add into a locked critical section in Moneta.deposit.
Durability atomicity
src/moneta/durability.py, tests/integration/test_durability_roundtrip.py
Holds _lock across snapshot timestamping, write+fsync+atomic replace, WAL truncation; adds integration test exercising concurrent WAL appends during snapshot.
Per-batch consolidation
src/moneta/consolidation.py, tests/integration/test_sequential_writer_ordering.py
Marks entities CONSOLIDATED immediately after each successful batch commit; adds tests for partial-batch success vs full success and minor type annotation update.
ECS utility clamping
src/moneta/ecs.py, tests/unit/test_ecs.py
apply_attention now clamps utility upper to 1.0 then lower to protected_floor; adds unit test verifying clamp at protected floor.
UsdTarget partial-failure reporting
src/moneta/usd_target.py
Adds FlushPartialFailureError and per-layer try/except in flush() to report saved/failed/pending layers and raise on first failure.
Attention log docs
src/moneta/attention_log.py
Rewrites module docstring to clarify possible small-loss race window around buffer swap.
API docs handle transition
docs/api.md
Rewrites API docs to a Moneta handle model (with Moneta(...) as m:), documents lifecycle errors and per-batch consolidation semantics.
Vector index and other unit tests
tests/unit/test_vector_index.py, tests/unit/test_decay.py, tests/unit/test_api.py
Adds comprehensive vector index tests, decay invocation scan tests, protected-deposit concurrency test, and other test updates to reflect runtime changes.
Integration wiring
various tests
Multiple new/updated unit and integration tests across the codebase aligning with the above behavior changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

"I hopped through diffs at break of dawn,
I stitched the rules where codes were drawn.
Five loops, many voices sing—🐇
I stamped the law, then nudged a spring.
Files saved, tests passed, the branch goes on."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 49.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop)' clearly and specifically describes the main addition in this changeset—a new multi-agent review orchestration system with specific implementation details (Claude Opus 4.7, mixture-of-experts roles, 5-iteration loops).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/multi-agent-codebase-review-A8WHg

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
scripts/review_harness.py (1)

340-346: 💤 Low value

Consider adding check=False explicitly for clarity.

The subprocess.run call relies on manual returncode checking (line 347). While this works, explicit check=False documents the intent that you're handling errors manually.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/review_harness.py` around lines 340 - 346, Add an explicit
check=False to the subprocess.run invocation so it is clear errors are handled
manually; update the subprocess.run call that assigns to completed (which passes
cmd and input=bundle.user_prompt) to include check=False to document intent
while preserving existing manual returncode checking and timeout behavior.
scripts/review_roles.py (1)

187-187: 💤 Low value

Optional: Use unpacking syntax per Ruff RUF005.

Ruff suggests (*ROLE_REVIEWERS, ADVERSARIAL) over tuple concatenation. This is purely stylistic — both are correct.

Proposed fix
-ALL_ROLES: tuple[Role, ...] = ROLE_REVIEWERS + (ADVERSARIAL,)
+ALL_ROLES: tuple[Role, ...] = (*ROLE_REVIEWERS, ADVERSARIAL)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/review_roles.py` at line 187, Replace the tuple concatenation used to
build ALL_ROLES with unpacking syntax to satisfy Ruff RUF005: change the
expression that defines ALL_ROLES (currently using ROLE_REVIEWERS +
(ADVERSARIAL,)) to use (*ROLE_REVIEWERS, ADVERSARIAL) so ALL_ROLES remains a
tuple of Role while following the recommended style; ensure the name ALL_ROLES
and the referenced symbols ROLE_REVIEWERS and ADVERSARIAL are used exactly as in
the current definition.
docs/review-constitution.md (1)

125-137: 💤 Low value

Add language specifier to fenced code block.

The fenced code block at line 125 lacks a language specifier, triggering markdownlint MD040. Since this is verbatim text from CLAUDE.md, use ```text to silence the lint warning while preserving intent.

Proposed fix
-```
+```text
 - Do not re-open Round 2 or Round 3 decisions. Escalate per MONETA.md §9.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/review-constitution.md` around lines 125 - 137, The fenced code block
that begins with the line "- Do not re-open Round 2 or Round 3 decisions.
Escalate per MONETA.md §9." is missing a language specifier and triggers
markdownlint MD040; update the opening fence so it specifies the text language
(i.e., change the opening triple-backtick to include "text") for the code block
containing the list so the linter is satisfied while preserving the verbatim
CLAUDE.md content.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/review-constitution.md`:
- Around line 125-137: The fenced code block that begins with the line "- Do not
re-open Round 2 or Round 3 decisions. Escalate per MONETA.md §9." is missing a
language specifier and triggers markdownlint MD040; update the opening fence so
it specifies the text language (i.e., change the opening triple-backtick to
include "text") for the code block containing the list so the linter is
satisfied while preserving the verbatim CLAUDE.md content.

In `@scripts/review_harness.py`:
- Around line 340-346: Add an explicit check=False to the subprocess.run
invocation so it is clear errors are handled manually; update the subprocess.run
call that assigns to completed (which passes cmd and input=bundle.user_prompt)
to include check=False to document intent while preserving existing manual
returncode checking and timeout behavior.

In `@scripts/review_roles.py`:
- Line 187: Replace the tuple concatenation used to build ALL_ROLES with
unpacking syntax to satisfy Ruff RUF005: change the expression that defines
ALL_ROLES (currently using ROLE_REVIEWERS + (ADVERSARIAL,)) to use
(*ROLE_REVIEWERS, ADVERSARIAL) so ALL_ROLES remains a tuple of Role while
following the recommended style; ensure the name ALL_ROLES and the referenced
symbols ROLE_REVIEWERS and ADVERSARIAL are used exactly as in the current
definition.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd23c72b-506d-4c57-9f63-3ae86b9ce6b8

📥 Commits

Reviewing files that changed from the base of the PR and between 9a46293 and 6350c2c.

📒 Files selected for processing (8)
  • .gitignore
  • CLAUDE.md
  • docs/review-constitution.md
  • docs/review/synthesis.md
  • pyproject.toml
  • scripts/review_harness.py
  • scripts/review_roles.py
  • scripts/review_synthesizer.py

Implementation pass against the v1.0.0 multi-agent review synthesis
committed in 6350c2c. Fixes the Critical impl bugs that did NOT need
§9 escalation; routes the 6 spec-level surprises to docs/round4-brief.md
for the Architect.

Implementation fixes:
- ecs.py apply_attention: clamp utility lower bound against
  protected_floor so negative-weight signals cannot defeat §10
  protection (substrate-L3-signal-attention-bypasses-protected-floor).
- api.py deposit: per-handle threading.Lock around count_protected +
  ecs.add so concurrent protected deposits at quota-1 cannot both
  succeed (substrate-L2-protected-quota-count-then-add-toctou).
- durability.py snapshot_ecs: hold _lock across timestamp capture,
  JSON dump, and WAL unlink — closes the race window where wal_appends
  between `now` capture and unlink were silently truncated
  (persistence-L1-wal-truncation-race).
- usd_target.py flush: new FlushPartialFailureError carrying
  saved/failed/pending layer bookkeeping so SequentialWriter can refuse
  to advance the vector index when only some layers Saved
  (usd-L2-flush-partial-failure-breaks-sequential-write-atomicity).
- consolidation.py run_pass: hoist STAGED_FOR_SYNC → CONSOLIDATED
  transition INSIDE the batch loop so a later-batch failure does not
  pin earlier successfully-committed batches in STAGED_FOR_SYNC
  forever (consolidation-L2-stage-partial-batch-atomicity-break).
- attention_log.py module docstring: soften "Entries cannot be lost"
  to match ARCHITECTURE.md §5.1 "lock-free, eventually consistent,
  simplest failure mode" — adversarial-L2 correctly classified the
  drain race as an impl bug + docstring fix, NOT §9.
- docs/api.md: full rewrite to v1.1.0+ handle pattern (collapses 5
  separate findings into one pass; replaces moneta.init() / module-
  level free functions with `with Moneta(MonetaConfig.ephemeral())`).

New tests (21):
- tests/unit/test_decay.py: AST scan asserting exactly three decay
  evaluation points (api.py, attention_log.py, consolidation.py) per
  ARCHITECTURE.md §16; adding a fourth site is §9 Trigger 2.
- tests/unit/test_vector_index.py: new file, 16 tests covering §7.1
  dim-homogeneity hard-raise, §5.1 silent-no-op for missing entities,
  query zero-norm guards, snapshot/restore round-trip.
- tests/unit/test_ecs.py: floor-clamp regression for apply_attention.
- tests/unit/test_api.py: 20-thread concurrent stress for the deposit
  lock, asserts exactly one of N succeeds at quota-1.
- tests/integration/test_durability_roundtrip.py: appender threads
  hammering wal_append while main thread runs many snapshot_ecs calls;
  hydrate succeeds, no IndexError, no silent corruption.
- tests/integration/test_sequential_writer_ordering.py: per-batch
  ECS state preservation under mid-batch commit_staging failure.

New docs:
- docs/round4-brief.md: the 6 genuine §9 Trigger 2 candidates the
  review surfaced (four-op signature drift, protected-floor
  consolidation contradiction, vector-authoritative inversion across
  hydrate, handle exclusivity model absent from ARCHITECTURE.md,
  quota_override unbounded vs §10 cap, multi-handle aggregation).
  Architect-routed; no implementation proceeds on affected subsystems
  until the round closes.

Verification: 117 passed, 4 pxr-skipped (was 94 baseline — 23 new
tests). All prior tests still green. Ruff clean on modified files.

https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/moneta/ecs.py (1)

191-217: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Docstring formula is stale after the protected-floor clamp.

The method docstring at line 193 still advertises Utility = min(1.0, Utility + summed_weight), but the new code also clamps to protected_floor (lines 214-216). With negative summed_weight the floor is now load-bearing, and the docstring contract should reflect it so callers reading the spec line don't misjudge the lower bound.

📝 Proposed docstring fix
         `agg` maps entity_id to `(summed_weight, signal_count)`, produced by
         `attention_log.aggregate`. Per ARCHITECTURE.md §5:
 
-            Utility       = min(1.0, Utility + summed_weight)
+            Utility       = max(protected_floor,
+                                min(1.0, Utility + summed_weight))
             AttendedCount = AttendedCount + signal_count
             LastEvaluated = now

As per coding guidelines: "Do not let documentation lag implementation by more than one PR. Documentarian maintains inline docstrings…".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/moneta/ecs.py` around lines 191 - 217, The docstring for the method that
reduces attention_log.aggregate still states "Utility = min(1.0, Utility +
summed_weight)" but the implementation also clamps the result to
self._protected_floor (via self._protected_floor[row] and new_u < floor logic);
update the docstring to reflect the full contract—Utility = max(protected_floor,
min(1.0, Utility + summed_weight)) (and note that AttendedCount adds
signal_count and missing entities are skipped) so readers see the
protected-floor lower bound; reference attention_log.aggregate,
summed_weight/count from agg, and the fields self._utility and
self._protected_floor in the description.
🧹 Nitpick comments (1)
tests/unit/test_decay.py (1)

188-209: ⚡ Quick win

Make the §16 gate recursive — glob("*.py") skips subpackages.

The docstring claims it walks "every module under src/moneta/", but moneta_root.glob("*.py") is non-recursive and only enumerates top-level files. The package is flat today, but the moment a future PR introduces a subpackage that calls decay_all/decay_value (the exact §9 Trigger 2 scenario this test is meant to catch), the gate will pass silently — the conformance assertion would then be a false positive.

Switching to rglob (with the same name-skip set) keeps the gate honest as the package grows, at zero readability cost.

♻️ Proposed fix
         moneta_root = Path(moneta.__file__).parent
         sites: list[tuple[str, int, str]] = []
-        for py in sorted(moneta_root.glob("*.py")):
+        for py in sorted(moneta_root.rglob("*.py")):
             if py.name in {"ecs.py", "decay.py", "__init__.py"}:
                 continue

Note: if subpackages later host decay implementation files of their own, the name in {...} skip-list will need to grow accordingly, or you can switch to a path-suffix check.

As per coding guidelines: "ARCHITECTURE.md is the locked spec... If the blueprint and spec disagree on a clause, escalate per MONETA.md §9." A silent false-pass here would let a §9 Trigger 2 ship.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_decay.py` around lines 188 - 209, The test scanning code uses
moneta_root.glob("*.py") which is non-recursive; change that call to
moneta_root.rglob("*.py") in the module-scanning block (the code that collects
sites using the moneta_root variable and appends to sites) so it visits
subpackages as well, keeping the existing skip-check for py.name in
{"ecs.py","decay.py","__init__.py"} unchanged; also leave a comment or adjust
the skip strategy (e.g., path-suffix check) if you expect decay implementation
files to appear in subpackages later so the exclusion logic remains correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/moneta/usd_target.py`:
- Around line 422-467: The flush logic in UsdTarget currently catches
BaseException and ignores boolean return values from Sdf.Layer.Save(), so
failures that return False are treated as successes; change the loop over
self._layers and the final self._root_layer.Save() handling to (1) call Save()
and if it returns False immediately raise FlushPartialFailureError with the same
saved_layers/failed_layer/pending_layers semantics (use failed_layer=name for
layer loop and "<root>" for root), and (2) narrow exception handlers from except
BaseException to except Exception to still convert raised exceptions (e.g.
pxr.Tf.ErrorException) into FlushPartialFailureError while avoiding masking
SystemExit/KeyboardInterrupt; reference self._layers, layer.Save(),
self._root_layer.Save(), and FlushPartialFailureError when applying the fixes.

In `@tests/integration/test_durability_roundtrip.py`:
- Around line 230-251: The test currently only checks that appends completed and
hydrate succeeds, but doesn't verify each wal_append is either in a snapshot or
replayed from the post-snapshot WAL; modify the test around the wal_append loop
and re-hydration to record each append's (eid, ts) (use the existing written
list or extend run_sleep_pass to return timestamps), capture the timestamp of
the last snapshot call (snapshot_ecs) before closing, then after reopening
Moneta (m2) assert for every recorded (eid, ts) that either ts <=
snapshot_created_at (thus allowed to be in snapshot) or the entry appears in
m2.attention/was replayed (i.e., m2.attention contains that eid+payload),
ensuring each wal_append is accounted for rather than only checking hydrate
succeeded or seeded entity exists.

---

Outside diff comments:
In `@src/moneta/ecs.py`:
- Around line 191-217: The docstring for the method that reduces
attention_log.aggregate still states "Utility = min(1.0, Utility +
summed_weight)" but the implementation also clamps the result to
self._protected_floor (via self._protected_floor[row] and new_u < floor logic);
update the docstring to reflect the full contract—Utility = max(protected_floor,
min(1.0, Utility + summed_weight)) (and note that AttendedCount adds
signal_count and missing entities are skipped) so readers see the
protected-floor lower bound; reference attention_log.aggregate,
summed_weight/count from agg, and the fields self._utility and
self._protected_floor in the description.

---

Nitpick comments:
In `@tests/unit/test_decay.py`:
- Around line 188-209: The test scanning code uses moneta_root.glob("*.py")
which is non-recursive; change that call to moneta_root.rglob("*.py") in the
module-scanning block (the code that collects sites using the moneta_root
variable and appends to sites) so it visits subpackages as well, keeping the
existing skip-check for py.name in {"ecs.py","decay.py","__init__.py"}
unchanged; also leave a comment or adjust the skip strategy (e.g., path-suffix
check) if you expect decay implementation files to appear in subpackages later
so the exclusion logic remains correct.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc4a2bf4-cb9a-445b-ac23-7ff8542c660b

📥 Commits

Reviewing files that changed from the base of the PR and between 6350c2c and b7c8223.

📒 Files selected for processing (14)
  • docs/api.md
  • docs/round4-brief.md
  • src/moneta/api.py
  • src/moneta/attention_log.py
  • src/moneta/consolidation.py
  • src/moneta/durability.py
  • src/moneta/ecs.py
  • src/moneta/usd_target.py
  • tests/integration/test_durability_roundtrip.py
  • tests/integration/test_sequential_writer_ordering.py
  • tests/unit/test_api.py
  • tests/unit/test_decay.py
  • tests/unit/test_ecs.py
  • tests/unit/test_vector_index.py
✅ Files skipped from review due to trivial changes (2)
  • tests/unit/test_ecs.py
  • docs/round4-brief.md

Comment thread src/moneta/usd_target.py
Comment on lines +422 to +467
if self._in_memory:
return

saved: List[str] = []
# Stable iteration: capture an explicit list so saved_layers and
# pending_layers refer to the same ordering on retry diagnostics.
layer_items: List[Tuple[str, Sdf.Layer]] = list(self._layers.items())
for idx, (name, layer) in enumerate(layer_items):
try:
layer.Save()
except BaseException as exc: # USD raises diverse exception types
pending = [n for n, _ in layer_items[idx + 1 :]]
_logger.error(
"UsdTarget.flush partial failure on layer=%s "
"saved=%d pending=%d cause=%s",
name,
len(saved),
len(pending),
exc,
)
raise FlushPartialFailureError(
saved_layers=saved,
failed_layer=name,
pending_layers=pending,
cause=exc,
) from exc
saved.append(name)

# Root layer last: if it raises, every sublayer is durable but the
# composition handle is not — still a partial-failure event from
# §7's perspective.
try:
self._root_layer.Save()
except BaseException as exc:
_logger.error(
"UsdTarget.flush partial failure on root layer "
"saved=%d cause=%s",
len(saved),
exc,
)
raise FlushPartialFailureError(
saved_layers=saved,
failed_layer="<root>",
pending_layers=[],
cause=exc,
) from exc
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What exception types does pxr Sdf Layer Save raise OpenUSD 0.25.5

💡 Result:

In OpenUSD 0.25.5, SdfLayer::Save does not document any specific “exception types” that it throws; instead, the API is defined to return a boolean success/failure value (Save returns true on success, false if an error occurred) [1]. What you’ll typically see in practice is that OpenUSD may emit Tf.ErrorException instances in error scenarios (for example, when file-format parsing/reading fails in related layer/file-format operations), but that is not stated as the guaranteed exception contract for SdfLayer::Save itself [2]. Therefore, from the authoritative API surface available in the OpenUSD documentation for this release line: there are no enumerated exception types for SdfLayer::Save to report beyond the documented boolean return behavior [1].

Citations:


🌐 Web query:

pxr.Sdf.Layer.Save() Python API exception behavior OpenUSD

💡 Result:

The pxr.Sdf.Layer.Save method in the OpenUSD Python API returns a boolean: True if the save was successful, False if an error occurred [1][2]. It does not throw exceptions for save failures; instead, errors are reported via the Tf error system (pxr.Tf errors/warnings) [3]. In Python bindings, TfErrors encountered during the call are automatically converted to Python exceptions like pxr.Tf.ErrorException, but only for unhandled errors—standard save failures return False without exception [3][4]. Examples of failure cases returning False (no exception): - Anonymous layers cannot be saved [1]. - Layer has no remembered file identifier [2]. - Layer not dirty and file exists (unless force=True) [2]. - Specific bugs like identifier change on usdc layers (returns False) [5]. However, fatal errors in underlying file formats (e.g., invalid file parsing during related ops like Import) do throw pxr.Tf.ErrorException [4]. To handle: Check the return value and optionally query Tf diagnostics: from pxr import Sdf, Tf layer = Sdf.Layer.CreateNew('test.usda') success = layer.Save() if not success: print("Save failed - check Tf errors") for diag in Tf.Diagnostics(): print(diag) No exceptions for standard save errors; use return code [1][2].

Citations:


Narrow except BaseException to except Exception and check Save() return values instead of relying solely on exception handling.

Verification of OpenUSD 0.25.5 behavior confirms that pxr.Sdf.Layer.Save() primarily returns a boolean (True for success, False for standard failures like anonymous layers or missing file identifiers) rather than raising exceptions. Only fatal errors in underlying file formats trigger pxr.Tf.ErrorException.

The current code at lines 432 and 455 catches BaseException but does not check the return value, which means standard save failures (returning False) are silently ignored and never raise FlushPartialFailureError. This is a safety issue: the flush loop will conclude successfully even if individual layers failed to save.

To fix:

  1. Check the boolean return value: if not layer.Save(): raise FlushPartialFailureError(...)
  2. Narrow the exception handler to except Exception to avoid trapping KeyboardInterrupt and SystemExit

The narrowing from BaseException to Exception remains valid and necessary to prevent masking interpreter control signals.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/moneta/usd_target.py` around lines 422 - 467, The flush logic in
UsdTarget currently catches BaseException and ignores boolean return values from
Sdf.Layer.Save(), so failures that return False are treated as successes; change
the loop over self._layers and the final self._root_layer.Save() handling to (1)
call Save() and if it returns False immediately raise FlushPartialFailureError
with the same saved_layers/failed_layer/pending_layers semantics (use
failed_layer=name for layer loop and "<root>" for root), and (2) narrow
exception handlers from except BaseException to except Exception to still
convert raised exceptions (e.g. pxr.Tf.ErrorException) into
FlushPartialFailureError while avoiding masking SystemExit/KeyboardInterrupt;
reference self._layers, layer.Save(), self._root_layer.Save(), and
FlushPartialFailureError when applying the fixes.

Comment on lines +230 to +251

# Read back: every wal_append must be either in the post-snapshot
# WAL OR no-longer-in-WAL because its timestamp predates the
# final snapshot's snapshot_created_at (and thus is captured in
# the snapshot's rows or correctly truncated as already-applied).
# The bug we're guarding against is: an entry written between
# `now` capture and unlink on an EARLIER snapshot, with the
# entry's timestamp > that snapshot's `now`, then unlinked.
# With the fix, that race window is closed: the only entries
# that get unlinked are those written under the same lock as
# the timestamp capture, which means their timestamps are
# strictly < `now`.
#
# Operationally: re-hydrate and verify the resulting state is
# consistent — no IndexError, no silent corruption.
assert len(written) == n_appenders * appends_per_thread

with Moneta(cfg) as m2:
# Hydrate must succeed without error, ECS preserved.
assert m2.ecs.n == 1
mem = m2.ecs.get_memory(eid)
assert mem is not None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test assertions are weaker than the docstring claim.

The docstring promises that every wal_append survives — either captured in snapshot rows or preserved in the post-snapshot WAL. But the actual assertions only check (a) that all appends completed without raising, (b) that hydrate succeeds, and (c) that the seeded entity is present. The seeded entity is captured by run_sleep_pass's snapshot, not by WAL replay, so this test would still pass even if every WAL append were silently truncated.

To actually exercise the truncation race, the test should record the per-thread timestamps, then on re-hydrate verify that for each (eid, ts) the entry was either replayed (via the attention log post-hydrate) or accounted for in a prior snapshot. A simple proxy: append an entry whose timestamp is strictly after the last snapshot_ecs call before close, and assert it shows up on m2.attention after hydrate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_durability_roundtrip.py` around lines 230 - 251, The
test currently only checks that appends completed and hydrate succeeds, but
doesn't verify each wal_append is either in a snapshot or replayed from the
post-snapshot WAL; modify the test around the wal_append loop and re-hydration
to record each append's (eid, ts) (use the existing written list or extend
run_sleep_pass to return timestamps), capture the timestamp of the last snapshot
call (snapshot_ecs) before closing, then after reopening Moneta (m2) assert for
every recorded (eid, ts) that either ts <= snapshot_created_at (thus allowed to
be in snapshot) or the entry appears in m2.attention/was replayed (i.e.,
m2.attention contains that eid+payload), ensuring each wal_append is accounted
for rather than only checking hydrate succeeded or seeded entity exists.

Round 4 closes the six genuine §9 Trigger 2 (spec-level surprise)
candidates the multi-agent review surfaced. Closure document:
docs/rounds/round-4.md. Brief: docs/round4-brief.md (now superseded).

This round did NOT go to Gemini Deep Think — none of the items required
external scoping. Each had a clear locally-correct ruling derivable
from existing artifacts plus the ratified implementation history
(v1.1.0 surgery, Phase 3 narrow-lock work, Pass 6 closure). Round 4 is
post-implementation reconciliation between specs that locked at
different times.

Six rulings:

1. Four-op API surface is methods on Moneta(config) handle. Type
   signatures in MONETA.md §2.1 unchanged; dispatch (method vs free
   function) is implementation. The fifth-op rule applies to public
   methods of Moneta. Spec amended at MONETA.md §2.1, ARCHITECTURE.md
   §2 + §16. Code already conformant from v1.1.0 surgery.

2. Protected memories are PINNED, not consolidated. The §6 staging
   gate (utility < 0.3) is unreachable for protected_floor ≥ 0.3 by
   construction; this is the deliberate spec choice, not a bug.
   cortex_protected.usda is reserved for the explicit Phase 3 unpin
   tool. Spec amended at MONETA.md §2.6, ARCHITECTURE.md §6.

3. Vector index is RUNTIME-authoritative; ECS snapshot is
   ACROSS-RESTART authoritative. The §7 atomicity guarantee is a
   within-session property; hydrate's ECS-first ordering is a Phase 1
   in-memory shadow consequence. Phase 2 LanceDB persistence is the
   path to making vector authoritative across restart, if needed.
   Spec amended at ARCHITECTURE.md §7.2 (new sub-clause).

4. Handle exclusivity is per-process via _ACTIVE_URIS. New
   ARCHITECTURE.md §17 with five sub-clauses: registry/lifecycle,
   GIL-atomicity TOCTOU model, fork() behavior (cross-process is
   bridge layer's concern via flock), PEP 703 free-threading
   (migration is §9 Trigger 2), SIGTERM cleanup ordering. MONETA.md
   §2.11 cross-references §17.

5. quota_override BOUNDED at 1 ≤ q ≤ 1000. Default 100 unchanged.
   New MAX_QUOTA_OVERRIDE constant in src/moneta/api.py;
   MonetaConfig.__post_init__ raises ValueError on out-of-range
   construction with a message that cites "Round 4 Ruling 5". Spec
   amended at MONETA.md §2.10, ARCHITECTURE.md §10.

6. _ACTIVE_URIS scope is "per substrate handle" — disambiguates
   §10's "per agent." Multi-handle aggregation is by design (an
   agent process holding N handles on N storage URIs gets N
   independent quotas). Spec amended at MONETA.md §2.10,
   ARCHITECTURE.md §10.

Code change (Ruling 5 only):
- src/moneta/api.py: MAX_QUOTA_OVERRIDE = 1000; MonetaConfig
  __post_init__ validation; new module-level constant.
- tests/unit/test_api.py: 8 new tests covering floor/ceiling
  acceptance, zero/negative/above-ceiling/unbounded rejection, and
  the error-message provenance ("Round 4 Ruling 5").

Constitutional updates:
- docs/review-constitution.md: round-closure binding now Rounds 1–4
  (was 1–3); §3 hard-rule line updated; §12 anti-pattern updated.
  This means the harness's constitution hash changes and any future
  review run will treat Round 4 as closed.
- CLAUDE.md: source-of-truth list now references round-{1..4};
  hard-rules line updated; locked-decisions list updated.
- docs/round4-brief.md: top header marks SUPERSEDED, points readers
  at docs/rounds/round-4.md for the rulings themselves.

Verification: 125 passed, 4 pxr-skipped (was 117 pre-Round-4 — 8 new
tests for the quota_override bound). Smoke check confirms the bound
fires with the expected message and the four-op handle pattern still
works end-to-end.

https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
JosephOIbrahim pushed a commit that referenced this pull request May 5, 2026
Four-item state-of-union followup:

1. docs/api.md — full rewrite for the v1.1.0+ handle API. The previous
   text described the deleted singleton API (moneta.init(),
   module-level moneta.deposit(...), MonetaNotInitializedError,
   PROTECTED_QUOTA=100 module constant). Replaced with: handle
   construction via Moneta(MonetaConfig), context-manager pattern,
   per-handle quota_override, multi-instance section, current error
   classes (MonetaResourceLockedError, ProtectedQuotaExceededError),
   runtime requirements section.

2. README.md test counts — bumped 107 → 109 (plain Python) and
   147 → 149 (hython total). Drift since the free-threading guard
   tests landed in 76da067. Historical surgery-table number (94 → 107
   delta column) left intact.

3. docs/patent-evidence/README.md — split into "claim-substantiating
   evidence" (Pass 5 + Pass 6, mapping to claims #4 and #3) and
   "implementation maturity evidence" (v1.1.0 TOCTOU 480-attempt,
   v1.2.0-rc1 codeless schema acceptance gate, free-threading guard).
   Added explicit claims-evidence gap analysis: claims #1 (LIVRPS
   decay priority) and #2 (variant LOD for fidelity) lack dated
   entries; claim #3 has only Pass 6 secondary mention. Added "how to
   add a new entry" section codifying date / claim mapping / prior
   art / reproducibility requirements.

4. CLAUDE.md hard rules — surfaced the free-threading limitation as
   an explicit hard rule (was source-only in attention_log.py
   docstring). Bumped stale "94 existing Phase 1 tests" reference to
   the current 109 plain / 149 hython numbers.

No code changed. No tests added or removed. The four locked decisions
in the spec remain untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JosephOIbrahim pushed a commit that referenced this pull request May 5, 2026
Six updates that bring the README current with reality post the
documentarian-debt followup:

1. Status table — added free-threading guard row (commit 76da067).
   Module-level surgical addition, not a tagged version, but worth
   surfacing so the entry isn't only visible by reading source.

2. Locked decisions §3 (concurrency primitive) — appended GIL-required
   clause. The lock-free swap-and-drain correctness argument depends
   on the GIL; AttentionLog.__init__ raises under PEP 703. Forward
   work tracked via §9 Trigger 2.

3. Project structure tests/ block — added test_attention_log_gil_guard.py
   line. Math now reconciles to 109 plain Python (70 unit + 22 integration
   + 2 load + 3 twin + 10 twin-adversarial + 2 gil-guard).

4. Lineage paragraph — appended free-threading guard step + this
   documentarian followup (commit 6cb1fd1) so the timeline stays
   complete through the present.

5. System overview mermaid — annotated AttentionLog node with
   "GIL-required · PEP 703 sentinel" so the architecture diagram
   surfaces the runtime contract, not just the structural one.

6. NEW mermaid diagram — claim → evidence map next to the Novelty
   claims section. Visualizes the gap analysis from the patent-evidence
   README: claims #3 and #4 have direct evidence; claims #1 and #2
   have dated-entry gaps; implementation maturity evidence (v1.1.0,
   v1.2.0-rc1, free-threading guard) is shown as auxiliary not
   claim-substantiating. Same two-color palette as the other 8
   diagrams.

No code changed. Test suite untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants