Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop)#2
Add multi-agent review harness (Opus 4.7, MoE roles, 5-loop)#2JosephOIbrahim wants to merge 5 commits into
Conversation
A long-running orchestrator that spawns role-scoped reviewers mirroring MONETA.md §6 plus an Adversarial-Reviewer (Commandment 7), governed by a bespoke constitution sourced verbatim from the locked artifacts. - docs/review-constitution.md: composition of MONETA.md §6/§7/§9, ARCHITECTURE.md §15.2/§15.6, CLAUDE.md hard rules, agent-commandments.md, and substrate-conventions.md. No new doctrine — every clause cites a locked artifact. Hashed and stamped into every finding; mid-run drift aborts. - scripts/review_roles.py: 7 role-reviewers + Adversarial; 5 themed loops (spec-conformance → concurrency-atomicity → correctness-edges → performance-envelope → simplicity-doc-tests); touchpoint glob matching for role isolation. - scripts/review_synthesizer.py: schema validation, role-isolation filter, fuzzy dedupe, severity+§9-aware ranking, cross-loop closure ledger, per-loop markdown rendering. - scripts/review_harness.py: parallel fan-out per loop, JSONL checkpointing, --resume, --dry-run, --max-loops, prompt caching on the constitution + repo snapshot, three-retry cap per Commandment 3. - pyproject.toml: [review] optional extra (anthropic>=0.40); not pulled into Phase 1/3 runtime. - .gitignore: per-round files and findings.jsonl ignored; synthesis.md re-included. - CLAUDE.md: one-line pointer in the build-and-test section. Verification: ruff clean, 94 existing tests pass, dry-run loop 1 assembles 8 prompts (constitution+role ~25KB system, snapshot ~521KB user; constitution hash 3b56e3ffae083c9b), JSONL round-trip works, schema validator rejects malformed findings, snapshot excludes harness/constitution to prevent feedback loops. The live 5-loop run is intentionally not executed in this commit; user runs it separately after reviewing the harness, the constitution, and the smoke-test output. https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
Adds an alternate model-call backend that shells out to `claude -p`,
making the harness runnable in environments that have an authenticated
Claude Code CLI but no ANTHROPIC_API_KEY. The SDK backend is unchanged
and still preferred when the env var is set.
- New --backend {auto,sdk,cli} flag; auto picks sdk if ANTHROPIC_API_KEY
is set, else cli.
- _call_via_cli pipes the user prompt over stdin (snapshot ~520KB
exceeds argv limits), passes the system prompt via --system-prompt,
and parses claude's JSON output.
- Retry budget (3 attempts, exponential backoff) is shared across both
backends and continues to honor Commandment 3.
- Smoke-tested end to end: SMOKE-OK round-trip via _call_via_cli.
- .gitignore: docs/review/run.log added (ephemeral tee target).
https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
Final cross-loop rollup from scripts/review_harness.py against this branch. 5 themed loops, 8 reviewers each (7 role + adversarial), constitution hash 3b56e3ffae083c9b. Stats: - 494 findings retained (all loops); 466 open after closures - 17 Critical, 100 High, 8 §9 escalation candidates - Per-loop kept: 67 / 91 / 128 / 91 / 117 - Per-loop invalid: 0 / 0 / 0 / 35 / 0 (loop 4 schema drift) - Per-loop role-isolation drops: 0 / 0 / 0 / 2 / 25 - 0 duplicates across all loops §9 escalation candidates surfaced (Architect-routed): - four-op signature drift vs ARCHITECTURE.md §2 (module-level vs bound method) - protected-floor consolidation contradiction (MONETA.md §2.5 vs ARCHITECTURE.md §6 selection criteria) - vector-authoritative inversion across hydrate path (§7 keystone) - v1.1.0 handle exclusivity model absent from ARCHITECTURE.md - multi-handle protected-quota aggregation bypasses §10 hard cap - quota_override field unbounded vs §10 hard cap Top Critical implementation findings: - WAL truncation race (DurabilityManager.snapshot_ecs) - Attention log drain stale-buffer race + module docstring overpromise - UsdTarget.flush() partial-failure breaks sequential-write atomicity - Stage-partial-batch atomicity break in ConsolidationRunner.run_pass - signal_attention bypasses protected_floor lower bound - Protected-quota count-then-add TOCTOU - docs/api.md still documents pre-v1.1.0 module singleton API Cross-loop memory worked: adversarial reviewer correctly refuted §9 framing of attention-log race in loop 2 (real bug, but ARCHITECTURE.md §5.1 'eventually consistent' permits the loss window — fix is impl correction + docstring softening, not a Round 4 trigger). The constitution and per-round artifacts are gitignored; docs/review/synthesis.md is the durable record. https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughAdds a multi-agent review harness (scripts + docs + CI ignores + optional dependency) and a review synthesizer/roles system; plus multiple runtime correctness changes in moneta (deposit concurrency lock, durability snapshot atomicity, per-batch consolidation state, usd_target partial-failure reporting, ECS clamping, attention_log doc), with corresponding tests and documentation updates. ChangesMulti-agent review harness (docs + CLI + synthesis + config)
sequenceDiagram
participant CLI as review_harness CLI
participant Snapshot as RepoSnapshot
participant Const as Constitution (docs)
participant Model as Model Backend\n(Anthropic SDK / claude CLI)
participant Synth as Synthesizer
participant FS as Filesystem (`docs/review/*`)
CLI->>Snapshot: load_repo_snapshot()
CLI->>Const: read & hash constitution
CLI->>Model: assemble_reviewer_prompt() -> call_reviewer()
Model-->>CLI: response (fenced JSON)
CLI->>Synth: extract_findings_array() + stamp_constitution_hash()
CLI->>Model: call adversarial reviewer (uses role findings)
CLI->>Synth: synthesize_loop(raw_findings)
Synth-->>FS: write round markdown + findings.jsonl
CLI->>FS: save_checkpoint(), final synthesis.md (on completion)
Core system correctness changes, errors, and tests (moneta runtime & tests)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
scripts/review_harness.py (1)
340-346: 💤 Low valueConsider adding
check=Falseexplicitly for clarity.The
subprocess.runcall relies on manualreturncodechecking (line 347). While this works, explicitcheck=Falsedocuments the intent that you're handling errors manually.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/review_harness.py` around lines 340 - 346, Add an explicit check=False to the subprocess.run invocation so it is clear errors are handled manually; update the subprocess.run call that assigns to completed (which passes cmd and input=bundle.user_prompt) to include check=False to document intent while preserving existing manual returncode checking and timeout behavior.scripts/review_roles.py (1)
187-187: 💤 Low valueOptional: Use unpacking syntax per Ruff RUF005.
Ruff suggests
(*ROLE_REVIEWERS, ADVERSARIAL)over tuple concatenation. This is purely stylistic — both are correct.Proposed fix
-ALL_ROLES: tuple[Role, ...] = ROLE_REVIEWERS + (ADVERSARIAL,) +ALL_ROLES: tuple[Role, ...] = (*ROLE_REVIEWERS, ADVERSARIAL)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/review_roles.py` at line 187, Replace the tuple concatenation used to build ALL_ROLES with unpacking syntax to satisfy Ruff RUF005: change the expression that defines ALL_ROLES (currently using ROLE_REVIEWERS + (ADVERSARIAL,)) to use (*ROLE_REVIEWERS, ADVERSARIAL) so ALL_ROLES remains a tuple of Role while following the recommended style; ensure the name ALL_ROLES and the referenced symbols ROLE_REVIEWERS and ADVERSARIAL are used exactly as in the current definition.docs/review-constitution.md (1)
125-137: 💤 Low valueAdd language specifier to fenced code block.
The fenced code block at line 125 lacks a language specifier, triggering markdownlint MD040. Since this is verbatim text from
CLAUDE.md, use```textto silence the lint warning while preserving intent.Proposed fix
-``` +```text - Do not re-open Round 2 or Round 3 decisions. Escalate per MONETA.md §9.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/review-constitution.md` around lines 125 - 137, The fenced code block that begins with the line "- Do not re-open Round 2 or Round 3 decisions. Escalate per MONETA.md §9." is missing a language specifier and triggers markdownlint MD040; update the opening fence so it specifies the text language (i.e., change the opening triple-backtick to include "text") for the code block containing the list so the linter is satisfied while preserving the verbatim CLAUDE.md content.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docs/review-constitution.md`:
- Around line 125-137: The fenced code block that begins with the line "- Do not
re-open Round 2 or Round 3 decisions. Escalate per MONETA.md §9." is missing a
language specifier and triggers markdownlint MD040; update the opening fence so
it specifies the text language (i.e., change the opening triple-backtick to
include "text") for the code block containing the list so the linter is
satisfied while preserving the verbatim CLAUDE.md content.
In `@scripts/review_harness.py`:
- Around line 340-346: Add an explicit check=False to the subprocess.run
invocation so it is clear errors are handled manually; update the subprocess.run
call that assigns to completed (which passes cmd and input=bundle.user_prompt)
to include check=False to document intent while preserving existing manual
returncode checking and timeout behavior.
In `@scripts/review_roles.py`:
- Line 187: Replace the tuple concatenation used to build ALL_ROLES with
unpacking syntax to satisfy Ruff RUF005: change the expression that defines
ALL_ROLES (currently using ROLE_REVIEWERS + (ADVERSARIAL,)) to use
(*ROLE_REVIEWERS, ADVERSARIAL) so ALL_ROLES remains a tuple of Role while
following the recommended style; ensure the name ALL_ROLES and the referenced
symbols ROLE_REVIEWERS and ADVERSARIAL are used exactly as in the current
definition.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dd23c72b-506d-4c57-9f63-3ae86b9ce6b8
📒 Files selected for processing (8)
.gitignoreCLAUDE.mddocs/review-constitution.mddocs/review/synthesis.mdpyproject.tomlscripts/review_harness.pyscripts/review_roles.pyscripts/review_synthesizer.py
Implementation pass against the v1.0.0 multi-agent review synthesis committed in 6350c2c. Fixes the Critical impl bugs that did NOT need §9 escalation; routes the 6 spec-level surprises to docs/round4-brief.md for the Architect. Implementation fixes: - ecs.py apply_attention: clamp utility lower bound against protected_floor so negative-weight signals cannot defeat §10 protection (substrate-L3-signal-attention-bypasses-protected-floor). - api.py deposit: per-handle threading.Lock around count_protected + ecs.add so concurrent protected deposits at quota-1 cannot both succeed (substrate-L2-protected-quota-count-then-add-toctou). - durability.py snapshot_ecs: hold _lock across timestamp capture, JSON dump, and WAL unlink — closes the race window where wal_appends between `now` capture and unlink were silently truncated (persistence-L1-wal-truncation-race). - usd_target.py flush: new FlushPartialFailureError carrying saved/failed/pending layer bookkeeping so SequentialWriter can refuse to advance the vector index when only some layers Saved (usd-L2-flush-partial-failure-breaks-sequential-write-atomicity). - consolidation.py run_pass: hoist STAGED_FOR_SYNC → CONSOLIDATED transition INSIDE the batch loop so a later-batch failure does not pin earlier successfully-committed batches in STAGED_FOR_SYNC forever (consolidation-L2-stage-partial-batch-atomicity-break). - attention_log.py module docstring: soften "Entries cannot be lost" to match ARCHITECTURE.md §5.1 "lock-free, eventually consistent, simplest failure mode" — adversarial-L2 correctly classified the drain race as an impl bug + docstring fix, NOT §9. - docs/api.md: full rewrite to v1.1.0+ handle pattern (collapses 5 separate findings into one pass; replaces moneta.init() / module- level free functions with `with Moneta(MonetaConfig.ephemeral())`). New tests (21): - tests/unit/test_decay.py: AST scan asserting exactly three decay evaluation points (api.py, attention_log.py, consolidation.py) per ARCHITECTURE.md §16; adding a fourth site is §9 Trigger 2. - tests/unit/test_vector_index.py: new file, 16 tests covering §7.1 dim-homogeneity hard-raise, §5.1 silent-no-op for missing entities, query zero-norm guards, snapshot/restore round-trip. - tests/unit/test_ecs.py: floor-clamp regression for apply_attention. - tests/unit/test_api.py: 20-thread concurrent stress for the deposit lock, asserts exactly one of N succeeds at quota-1. - tests/integration/test_durability_roundtrip.py: appender threads hammering wal_append while main thread runs many snapshot_ecs calls; hydrate succeeds, no IndexError, no silent corruption. - tests/integration/test_sequential_writer_ordering.py: per-batch ECS state preservation under mid-batch commit_staging failure. New docs: - docs/round4-brief.md: the 6 genuine §9 Trigger 2 candidates the review surfaced (four-op signature drift, protected-floor consolidation contradiction, vector-authoritative inversion across hydrate, handle exclusivity model absent from ARCHITECTURE.md, quota_override unbounded vs §10 cap, multi-handle aggregation). Architect-routed; no implementation proceeds on affected subsystems until the round closes. Verification: 117 passed, 4 pxr-skipped (was 94 baseline — 23 new tests). All prior tests still green. Ruff clean on modified files. https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/moneta/ecs.py (1)
191-217:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocstring formula is stale after the protected-floor clamp.
The method docstring at line 193 still advertises
Utility = min(1.0, Utility + summed_weight), but the new code also clamps toprotected_floor(lines 214-216). With negativesummed_weightthe floor is now load-bearing, and the docstring contract should reflect it so callers reading the spec line don't misjudge the lower bound.📝 Proposed docstring fix
`agg` maps entity_id to `(summed_weight, signal_count)`, produced by `attention_log.aggregate`. Per ARCHITECTURE.md §5: - Utility = min(1.0, Utility + summed_weight) + Utility = max(protected_floor, + min(1.0, Utility + summed_weight)) AttendedCount = AttendedCount + signal_count LastEvaluated = nowAs per coding guidelines: "Do not let documentation lag implementation by more than one PR. Documentarian maintains inline docstrings…".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/moneta/ecs.py` around lines 191 - 217, The docstring for the method that reduces attention_log.aggregate still states "Utility = min(1.0, Utility + summed_weight)" but the implementation also clamps the result to self._protected_floor (via self._protected_floor[row] and new_u < floor logic); update the docstring to reflect the full contract—Utility = max(protected_floor, min(1.0, Utility + summed_weight)) (and note that AttendedCount adds signal_count and missing entities are skipped) so readers see the protected-floor lower bound; reference attention_log.aggregate, summed_weight/count from agg, and the fields self._utility and self._protected_floor in the description.
🧹 Nitpick comments (1)
tests/unit/test_decay.py (1)
188-209: ⚡ Quick winMake the §16 gate recursive —
glob("*.py")skips subpackages.The docstring claims it walks "every module under
src/moneta/", butmoneta_root.glob("*.py")is non-recursive and only enumerates top-level files. The package is flat today, but the moment a future PR introduces a subpackage that callsdecay_all/decay_value(the exact §9 Trigger 2 scenario this test is meant to catch), the gate will pass silently — the conformance assertion would then be a false positive.Switching to
rglob(with the same name-skip set) keeps the gate honest as the package grows, at zero readability cost.♻️ Proposed fix
moneta_root = Path(moneta.__file__).parent sites: list[tuple[str, int, str]] = [] - for py in sorted(moneta_root.glob("*.py")): + for py in sorted(moneta_root.rglob("*.py")): if py.name in {"ecs.py", "decay.py", "__init__.py"}: continueNote: if subpackages later host decay implementation files of their own, the
name in {...}skip-list will need to grow accordingly, or you can switch to a path-suffix check.As per coding guidelines: "ARCHITECTURE.md is the locked spec... If the blueprint and spec disagree on a clause, escalate per MONETA.md §9." A silent false-pass here would let a §9 Trigger 2 ship.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_decay.py` around lines 188 - 209, The test scanning code uses moneta_root.glob("*.py") which is non-recursive; change that call to moneta_root.rglob("*.py") in the module-scanning block (the code that collects sites using the moneta_root variable and appends to sites) so it visits subpackages as well, keeping the existing skip-check for py.name in {"ecs.py","decay.py","__init__.py"} unchanged; also leave a comment or adjust the skip strategy (e.g., path-suffix check) if you expect decay implementation files to appear in subpackages later so the exclusion logic remains correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/moneta/usd_target.py`:
- Around line 422-467: The flush logic in UsdTarget currently catches
BaseException and ignores boolean return values from Sdf.Layer.Save(), so
failures that return False are treated as successes; change the loop over
self._layers and the final self._root_layer.Save() handling to (1) call Save()
and if it returns False immediately raise FlushPartialFailureError with the same
saved_layers/failed_layer/pending_layers semantics (use failed_layer=name for
layer loop and "<root>" for root), and (2) narrow exception handlers from except
BaseException to except Exception to still convert raised exceptions (e.g.
pxr.Tf.ErrorException) into FlushPartialFailureError while avoiding masking
SystemExit/KeyboardInterrupt; reference self._layers, layer.Save(),
self._root_layer.Save(), and FlushPartialFailureError when applying the fixes.
In `@tests/integration/test_durability_roundtrip.py`:
- Around line 230-251: The test currently only checks that appends completed and
hydrate succeeds, but doesn't verify each wal_append is either in a snapshot or
replayed from the post-snapshot WAL; modify the test around the wal_append loop
and re-hydration to record each append's (eid, ts) (use the existing written
list or extend run_sleep_pass to return timestamps), capture the timestamp of
the last snapshot call (snapshot_ecs) before closing, then after reopening
Moneta (m2) assert for every recorded (eid, ts) that either ts <=
snapshot_created_at (thus allowed to be in snapshot) or the entry appears in
m2.attention/was replayed (i.e., m2.attention contains that eid+payload),
ensuring each wal_append is accounted for rather than only checking hydrate
succeeded or seeded entity exists.
---
Outside diff comments:
In `@src/moneta/ecs.py`:
- Around line 191-217: The docstring for the method that reduces
attention_log.aggregate still states "Utility = min(1.0, Utility +
summed_weight)" but the implementation also clamps the result to
self._protected_floor (via self._protected_floor[row] and new_u < floor logic);
update the docstring to reflect the full contract—Utility = max(protected_floor,
min(1.0, Utility + summed_weight)) (and note that AttendedCount adds
signal_count and missing entities are skipped) so readers see the
protected-floor lower bound; reference attention_log.aggregate,
summed_weight/count from agg, and the fields self._utility and
self._protected_floor in the description.
---
Nitpick comments:
In `@tests/unit/test_decay.py`:
- Around line 188-209: The test scanning code uses moneta_root.glob("*.py")
which is non-recursive; change that call to moneta_root.rglob("*.py") in the
module-scanning block (the code that collects sites using the moneta_root
variable and appends to sites) so it visits subpackages as well, keeping the
existing skip-check for py.name in {"ecs.py","decay.py","__init__.py"}
unchanged; also leave a comment or adjust the skip strategy (e.g., path-suffix
check) if you expect decay implementation files to appear in subpackages later
so the exclusion logic remains correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bc4a2bf4-cb9a-445b-ac23-7ff8542c660b
📒 Files selected for processing (14)
docs/api.mddocs/round4-brief.mdsrc/moneta/api.pysrc/moneta/attention_log.pysrc/moneta/consolidation.pysrc/moneta/durability.pysrc/moneta/ecs.pysrc/moneta/usd_target.pytests/integration/test_durability_roundtrip.pytests/integration/test_sequential_writer_ordering.pytests/unit/test_api.pytests/unit/test_decay.pytests/unit/test_ecs.pytests/unit/test_vector_index.py
✅ Files skipped from review due to trivial changes (2)
- tests/unit/test_ecs.py
- docs/round4-brief.md
| if self._in_memory: | ||
| return | ||
|
|
||
| saved: List[str] = [] | ||
| # Stable iteration: capture an explicit list so saved_layers and | ||
| # pending_layers refer to the same ordering on retry diagnostics. | ||
| layer_items: List[Tuple[str, Sdf.Layer]] = list(self._layers.items()) | ||
| for idx, (name, layer) in enumerate(layer_items): | ||
| try: | ||
| layer.Save() | ||
| except BaseException as exc: # USD raises diverse exception types | ||
| pending = [n for n, _ in layer_items[idx + 1 :]] | ||
| _logger.error( | ||
| "UsdTarget.flush partial failure on layer=%s " | ||
| "saved=%d pending=%d cause=%s", | ||
| name, | ||
| len(saved), | ||
| len(pending), | ||
| exc, | ||
| ) | ||
| raise FlushPartialFailureError( | ||
| saved_layers=saved, | ||
| failed_layer=name, | ||
| pending_layers=pending, | ||
| cause=exc, | ||
| ) from exc | ||
| saved.append(name) | ||
|
|
||
| # Root layer last: if it raises, every sublayer is durable but the | ||
| # composition handle is not — still a partial-failure event from | ||
| # §7's perspective. | ||
| try: | ||
| self._root_layer.Save() | ||
| except BaseException as exc: | ||
| _logger.error( | ||
| "UsdTarget.flush partial failure on root layer " | ||
| "saved=%d cause=%s", | ||
| len(saved), | ||
| exc, | ||
| ) | ||
| raise FlushPartialFailureError( | ||
| saved_layers=saved, | ||
| failed_layer="<root>", | ||
| pending_layers=[], | ||
| cause=exc, | ||
| ) from exc |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What exception types does pxr Sdf Layer Save raise OpenUSD 0.25.5
💡 Result:
In OpenUSD 0.25.5, SdfLayer::Save does not document any specific “exception types” that it throws; instead, the API is defined to return a boolean success/failure value (Save returns true on success, false if an error occurred) [1]. What you’ll typically see in practice is that OpenUSD may emit Tf.ErrorException instances in error scenarios (for example, when file-format parsing/reading fails in related layer/file-format operations), but that is not stated as the guaranteed exception contract for SdfLayer::Save itself [2]. Therefore, from the authoritative API surface available in the OpenUSD documentation for this release line: there are no enumerated exception types for SdfLayer::Save to report beyond the documented boolean return behavior [1].
Citations:
- 1: https://openusd.org/dev/api/class_sdf_layer.html
- 2:
Importfails when importingusdaintousdc(andusdcintousda) PixarAnimationStudios/OpenUSD#3660
🌐 Web query:
pxr.Sdf.Layer.Save() Python API exception behavior OpenUSD
💡 Result:
The pxr.Sdf.Layer.Save method in the OpenUSD Python API returns a boolean: True if the save was successful, False if an error occurred [1][2]. It does not throw exceptions for save failures; instead, errors are reported via the Tf error system (pxr.Tf errors/warnings) [3]. In Python bindings, TfErrors encountered during the call are automatically converted to Python exceptions like pxr.Tf.ErrorException, but only for unhandled errors—standard save failures return False without exception [3][4]. Examples of failure cases returning False (no exception): - Anonymous layers cannot be saved [1]. - Layer has no remembered file identifier [2]. - Layer not dirty and file exists (unless force=True) [2]. - Specific bugs like identifier change on usdc layers (returns False) [5]. However, fatal errors in underlying file formats (e.g., invalid file parsing during related ops like Import) do throw pxr.Tf.ErrorException [4]. To handle: Check the return value and optionally query Tf diagnostics: from pxr import Sdf, Tf layer = Sdf.Layer.CreateNew('test.usda') success = layer.Save() if not success: print("Save failed - check Tf errors") for diag in Tf.Diagnostics(): print(diag) No exceptions for standard save errors; use return code [1][2].
Citations:
- 1: https://openusd.org/dev/api/class_sdf_layer.html
- 2: https://github.com/parallax/ar-export/blob/master/libs/USDPython/USD/pxr/Sdf/__DOC.py
- 3: https://www.sidefx.com/docs/hdk20.5/py_error_8h_source.html
- 4:
Importfails when importingusdaintousdc(andusdcintousda) PixarAnimationStudios/OpenUSD#3660 - 5: Layer Cannot Be Saved When Identifier Changes for Crate Files PixarAnimationStudios/OpenUSD#3975
Narrow except BaseException to except Exception and check Save() return values instead of relying solely on exception handling.
Verification of OpenUSD 0.25.5 behavior confirms that pxr.Sdf.Layer.Save() primarily returns a boolean (True for success, False for standard failures like anonymous layers or missing file identifiers) rather than raising exceptions. Only fatal errors in underlying file formats trigger pxr.Tf.ErrorException.
The current code at lines 432 and 455 catches BaseException but does not check the return value, which means standard save failures (returning False) are silently ignored and never raise FlushPartialFailureError. This is a safety issue: the flush loop will conclude successfully even if individual layers failed to save.
To fix:
- Check the boolean return value:
if not layer.Save(): raise FlushPartialFailureError(...) - Narrow the exception handler to
except Exceptionto avoid trappingKeyboardInterruptandSystemExit
The narrowing from BaseException to Exception remains valid and necessary to prevent masking interpreter control signals.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/moneta/usd_target.py` around lines 422 - 467, The flush logic in
UsdTarget currently catches BaseException and ignores boolean return values from
Sdf.Layer.Save(), so failures that return False are treated as successes; change
the loop over self._layers and the final self._root_layer.Save() handling to (1)
call Save() and if it returns False immediately raise FlushPartialFailureError
with the same saved_layers/failed_layer/pending_layers semantics (use
failed_layer=name for layer loop and "<root>" for root), and (2) narrow
exception handlers from except BaseException to except Exception to still
convert raised exceptions (e.g. pxr.Tf.ErrorException) into
FlushPartialFailureError while avoiding masking SystemExit/KeyboardInterrupt;
reference self._layers, layer.Save(), self._root_layer.Save(), and
FlushPartialFailureError when applying the fixes.
|
|
||
| # Read back: every wal_append must be either in the post-snapshot | ||
| # WAL OR no-longer-in-WAL because its timestamp predates the | ||
| # final snapshot's snapshot_created_at (and thus is captured in | ||
| # the snapshot's rows or correctly truncated as already-applied). | ||
| # The bug we're guarding against is: an entry written between | ||
| # `now` capture and unlink on an EARLIER snapshot, with the | ||
| # entry's timestamp > that snapshot's `now`, then unlinked. | ||
| # With the fix, that race window is closed: the only entries | ||
| # that get unlinked are those written under the same lock as | ||
| # the timestamp capture, which means their timestamps are | ||
| # strictly < `now`. | ||
| # | ||
| # Operationally: re-hydrate and verify the resulting state is | ||
| # consistent — no IndexError, no silent corruption. | ||
| assert len(written) == n_appenders * appends_per_thread | ||
|
|
||
| with Moneta(cfg) as m2: | ||
| # Hydrate must succeed without error, ECS preserved. | ||
| assert m2.ecs.n == 1 | ||
| mem = m2.ecs.get_memory(eid) | ||
| assert mem is not None |
There was a problem hiding this comment.
Test assertions are weaker than the docstring claim.
The docstring promises that every wal_append survives — either captured in snapshot rows or preserved in the post-snapshot WAL. But the actual assertions only check (a) that all appends completed without raising, (b) that hydrate succeeds, and (c) that the seeded entity is present. The seeded entity is captured by run_sleep_pass's snapshot, not by WAL replay, so this test would still pass even if every WAL append were silently truncated.
To actually exercise the truncation race, the test should record the per-thread timestamps, then on re-hydrate verify that for each (eid, ts) the entry was either replayed (via the attention log post-hydrate) or accounted for in a prior snapshot. A simple proxy: append an entry whose timestamp is strictly after the last snapshot_ecs call before close, and assert it shows up on m2.attention after hydrate.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration/test_durability_roundtrip.py` around lines 230 - 251, The
test currently only checks that appends completed and hydrate succeeds, but
doesn't verify each wal_append is either in a snapshot or replayed from the
post-snapshot WAL; modify the test around the wal_append loop and re-hydration
to record each append's (eid, ts) (use the existing written list or extend
run_sleep_pass to return timestamps), capture the timestamp of the last snapshot
call (snapshot_ecs) before closing, then after reopening Moneta (m2) assert for
every recorded (eid, ts) that either ts <= snapshot_created_at (thus allowed to
be in snapshot) or the entry appears in m2.attention/was replayed (i.e.,
m2.attention contains that eid+payload), ensuring each wal_append is accounted
for rather than only checking hydrate succeeded or seeded entity exists.
Round 4 closes the six genuine §9 Trigger 2 (spec-level surprise)
candidates the multi-agent review surfaced. Closure document:
docs/rounds/round-4.md. Brief: docs/round4-brief.md (now superseded).
This round did NOT go to Gemini Deep Think — none of the items required
external scoping. Each had a clear locally-correct ruling derivable
from existing artifacts plus the ratified implementation history
(v1.1.0 surgery, Phase 3 narrow-lock work, Pass 6 closure). Round 4 is
post-implementation reconciliation between specs that locked at
different times.
Six rulings:
1. Four-op API surface is methods on Moneta(config) handle. Type
signatures in MONETA.md §2.1 unchanged; dispatch (method vs free
function) is implementation. The fifth-op rule applies to public
methods of Moneta. Spec amended at MONETA.md §2.1, ARCHITECTURE.md
§2 + §16. Code already conformant from v1.1.0 surgery.
2. Protected memories are PINNED, not consolidated. The §6 staging
gate (utility < 0.3) is unreachable for protected_floor ≥ 0.3 by
construction; this is the deliberate spec choice, not a bug.
cortex_protected.usda is reserved for the explicit Phase 3 unpin
tool. Spec amended at MONETA.md §2.6, ARCHITECTURE.md §6.
3. Vector index is RUNTIME-authoritative; ECS snapshot is
ACROSS-RESTART authoritative. The §7 atomicity guarantee is a
within-session property; hydrate's ECS-first ordering is a Phase 1
in-memory shadow consequence. Phase 2 LanceDB persistence is the
path to making vector authoritative across restart, if needed.
Spec amended at ARCHITECTURE.md §7.2 (new sub-clause).
4. Handle exclusivity is per-process via _ACTIVE_URIS. New
ARCHITECTURE.md §17 with five sub-clauses: registry/lifecycle,
GIL-atomicity TOCTOU model, fork() behavior (cross-process is
bridge layer's concern via flock), PEP 703 free-threading
(migration is §9 Trigger 2), SIGTERM cleanup ordering. MONETA.md
§2.11 cross-references §17.
5. quota_override BOUNDED at 1 ≤ q ≤ 1000. Default 100 unchanged.
New MAX_QUOTA_OVERRIDE constant in src/moneta/api.py;
MonetaConfig.__post_init__ raises ValueError on out-of-range
construction with a message that cites "Round 4 Ruling 5". Spec
amended at MONETA.md §2.10, ARCHITECTURE.md §10.
6. _ACTIVE_URIS scope is "per substrate handle" — disambiguates
§10's "per agent." Multi-handle aggregation is by design (an
agent process holding N handles on N storage URIs gets N
independent quotas). Spec amended at MONETA.md §2.10,
ARCHITECTURE.md §10.
Code change (Ruling 5 only):
- src/moneta/api.py: MAX_QUOTA_OVERRIDE = 1000; MonetaConfig
__post_init__ validation; new module-level constant.
- tests/unit/test_api.py: 8 new tests covering floor/ceiling
acceptance, zero/negative/above-ceiling/unbounded rejection, and
the error-message provenance ("Round 4 Ruling 5").
Constitutional updates:
- docs/review-constitution.md: round-closure binding now Rounds 1–4
(was 1–3); §3 hard-rule line updated; §12 anti-pattern updated.
This means the harness's constitution hash changes and any future
review run will treat Round 4 as closed.
- CLAUDE.md: source-of-truth list now references round-{1..4};
hard-rules line updated; locked-decisions list updated.
- docs/round4-brief.md: top header marks SUPERSEDED, points readers
at docs/rounds/round-4.md for the rulings themselves.
Verification: 125 passed, 4 pxr-skipped (was 117 pre-Round-4 — 8 new
tests for the quota_override bound). Smoke check confirms the bound
fires with the expected message and the four-op handle pattern still
works end-to-end.
https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
Four-item state-of-union followup: 1. docs/api.md — full rewrite for the v1.1.0+ handle API. The previous text described the deleted singleton API (moneta.init(), module-level moneta.deposit(...), MonetaNotInitializedError, PROTECTED_QUOTA=100 module constant). Replaced with: handle construction via Moneta(MonetaConfig), context-manager pattern, per-handle quota_override, multi-instance section, current error classes (MonetaResourceLockedError, ProtectedQuotaExceededError), runtime requirements section. 2. README.md test counts — bumped 107 → 109 (plain Python) and 147 → 149 (hython total). Drift since the free-threading guard tests landed in 76da067. Historical surgery-table number (94 → 107 delta column) left intact. 3. docs/patent-evidence/README.md — split into "claim-substantiating evidence" (Pass 5 + Pass 6, mapping to claims #4 and #3) and "implementation maturity evidence" (v1.1.0 TOCTOU 480-attempt, v1.2.0-rc1 codeless schema acceptance gate, free-threading guard). Added explicit claims-evidence gap analysis: claims #1 (LIVRPS decay priority) and #2 (variant LOD for fidelity) lack dated entries; claim #3 has only Pass 6 secondary mention. Added "how to add a new entry" section codifying date / claim mapping / prior art / reproducibility requirements. 4. CLAUDE.md hard rules — surfaced the free-threading limitation as an explicit hard rule (was source-only in attention_log.py docstring). Bumped stale "94 existing Phase 1 tests" reference to the current 109 plain / 149 hython numbers. No code changed. No tests added or removed. The four locked decisions in the spec remain untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six updates that bring the README current with reality post the documentarian-debt followup: 1. Status table — added free-threading guard row (commit 76da067). Module-level surgical addition, not a tagged version, but worth surfacing so the entry isn't only visible by reading source. 2. Locked decisions §3 (concurrency primitive) — appended GIL-required clause. The lock-free swap-and-drain correctness argument depends on the GIL; AttentionLog.__init__ raises under PEP 703. Forward work tracked via §9 Trigger 2. 3. Project structure tests/ block — added test_attention_log_gil_guard.py line. Math now reconciles to 109 plain Python (70 unit + 22 integration + 2 load + 3 twin + 10 twin-adversarial + 2 gil-guard). 4. Lineage paragraph — appended free-threading guard step + this documentarian followup (commit 6cb1fd1) so the timeline stays complete through the present. 5. System overview mermaid — annotated AttentionLog node with "GIL-required · PEP 703 sentinel" so the architecture diagram surfaces the runtime contract, not just the structural one. 6. NEW mermaid diagram — claim → evidence map next to the Novelty claims section. Visualizes the gap analysis from the patent-evidence README: claims #3 and #4 have direct evidence; claims #1 and #2 have dated-entry gaps; implementation maturity evidence (v1.1.0, v1.2.0-rc1, free-threading guard) is shown as auxiliary not claim-substantiating. Same two-color palette as the other 8 diagrams. No code changed. Test suite untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A long-running orchestrator that spawns role-scoped reviewers mirroring
MONETA.md §6 plus an Adversarial-Reviewer (Commandment 7), governed by a
bespoke constitution sourced verbatim from the locked artifacts.
ARCHITECTURE.md §15.2/§15.6, CLAUDE.md hard rules,
agent-commandments.md, and substrate-conventions.md. No new doctrine —
every clause cites a locked artifact. Hashed and stamped into every
finding; mid-run drift aborts.
(spec-conformance → concurrency-atomicity → correctness-edges →
performance-envelope → simplicity-doc-tests); touchpoint glob matching
for role isolation.
filter, fuzzy dedupe, severity+§9-aware ranking, cross-loop closure
ledger, per-loop markdown rendering.
checkpointing, --resume, --dry-run, --max-loops, prompt caching on
the constitution + repo snapshot, three-retry cap per Commandment 3.
into Phase 1/3 runtime.
synthesis.md re-included.
Verification: ruff clean, 94 existing tests pass, dry-run loop 1
assembles 8 prompts (constitution+role ~25KB system, snapshot ~521KB
user; constitution hash 3b56e3ffae083c9b), JSONL round-trip works,
schema validator rejects malformed findings, snapshot excludes
harness/constitution to prevent feedback loops.
The live 5-loop run is intentionally not executed in this commit; user
runs it separately after reviewing the harness, the constitution, and
the smoke-test output.
https://claude.ai/code/session_01Nf6Jd4uNEQHpqohumGwJzL
Summary by CodeRabbit