fix(worker): treat an already-processed block as a no-op, not a reorg#155
Conversation
|
Commit: e56f79c
|
Codecov Report✅ All modified and coverable lines are covered by tests.
... and 3 files with indirect coverage changes 🚀 New features to boost your workflow:
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c542a6492
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The block watcher submits the chain tip on startup to catch up, then streams live `hashblock` notifications. When blocks are mined during that startup window, the tip-submit walks the chain forward past them and the buffered notifications for those same blocks drain afterwards, re-delivering ancestors of the current tip. `sync_to_block` misread each as an L1 reorg: it logged a phantom "reorg detected", rolled the in-memory tip back to the re-delivered ancestor, and re-committed that block's anchor. The re-commit is the damaging part. `get_anchor_state` reconstructs the `AsmState` with empty logs (only the `AnchorState` is persisted), so re-deriving from it drops the block's STF logs. For the Moho extension that silently rewrites the block's persisted MohoState — its export state and next predicate are folded from those logs — desyncing it from what the step proof attested and breaking the recursive-proof chain with `InvalidMohoChain` once a rolled-back block carries real logs (a bridge deposit/withdrawal). A genuine reorg always brings at least one new block on the new branch, i.e. a non-empty plan; an empty plan means the target was already processed. Return early in that case — no rollback, no re-commit, no reorg log — leaving the tip and every derived state where the forward pass left them.
The anchor state DB persists only the `AnchorState`; the STF logs live in the manifest store. `get_anchor_state`/`get_latest_asm_state` rebuilt the `AsmState` with empty logs, on the assumption that reads only ever need `AsmState::state()`. But `store_anchor_state` derives the MohoState and the export-entry index from `AsmState::logs()`, so feeding it a reloaded, logless state silently drops the block's export entries and predicate update — exactly how a re-committed anchor desynced its persisted MohoState from the proven one. Rejoin the logs from the block's manifest so a reconstructed `AsmState` matches what the STF produced, keeping every log-derived artifact correct even when it runs over reloaded state. Genesis has no manifest (it is seeded without running the STF), so its logs come back empty.
8c542a6 to
b4ed6d5
Compare
Validated end-to-end against strata-bridge#638This fix unblocks alpenlabs/strata-bridge#638 (the asm version bump), whose functional tests were failing on the phantom reorg. The CI progression on that PR pins it down:
The remaining failures at |
Description
The ASM worker misclassifies duplicate/lagging ZMQ
hashblocknotifications as L1 reorgs, and the resulting rollback corrupts derived state. This PR fixes it at two layers.The bug
The block watcher submits the chain tip once on startup to catch up, then streams live
hashblocknotifications (drive_asm_from_bitcoininbin/asm-runner/src/block_watcher.rs). When blocks are mined during that startup window, the one-shot tip-submit walks the chain forward past them, and the buffered ZMQ notifications for those same blocks drain afterwards — re-delivering blocks that are now ancestors of the in-memory tip.sync_to_blockread each as a reorg: it logged a phantomASM L1 reorg detected, rolled the in-memory tip back to the re-delivered ancestor, and re-committed that block's anchor.The re-commit is the damaging part.
get_anchor_statereconstructed theAsmStatewith empty logs (only theAnchorStateis persisted; logs live in the manifest).store_anchor_statederives the block'sMohoStateand export-entry index fromAsmState::logs(), so re-committing a logless state snapped the block's persistedMohoStateback to its parent's — dropping that block's export entries and any predicate update. The next block's step proof then read the corruptedfromstate and the recursive-proof chain failed withInvalidMohoChain. This surfaced in the strata-bridge functional tests (which mine bursts of blocks carrying real deposit/withdrawal logs at runner startup); the ASM suite never hit it because it mines empty blocks one at a time after the runner is ready.Fix 1 — worker: an already-processed block is a no-op, not a reorg
A genuine reorg always brings at least one new block on the new branch, i.e. a non-empty plan. An empty plan means the target already has a stored anchor and was processed before.
sync_to_blocknow returns early in that case — no rollback, no re-commit, no reorg log — leaving the tip and every derived state where the forward pass left them. bitcoind only announces blocks on the active chain, so a re-delivered already-stored block is never a real fork.Fix 2 — asm-runner: rejoin manifest logs when reconstructing
AsmStateget_anchor_state/get_latest_asm_statereturned empty logs on the assumption that reads only needAsmState::state(). But the log-derived artifacts (MohoState, the export-entry index) make that assumption false, so a reconstructed state was silently lossy — the root reason the re-commit corrupted anything. They now rejoin the block's logs from its manifest, so a reconstructedAsmStatematches what the STF produced andstore_anchor_stateis genuinely idempotent as documented. Genesis has no manifest (seeded without the STF), so its logs come back empty. (Fix 1 already removes the re-commit, so this is defense-in-depth + an honestget_anchor_statecontract.)Type of Change
Notes to Reviewers
The two
sync_resync_*tests encoded the old "roll the tip back on resync" behaviour, which was only ever exercised by these stale notifications — there is no driver that deliberately rewinds the worker. They are rewritten to assert the new no-op semantics (tip and durablelateststay on the real tip across a restart). All reorg tests (plan_reorg_uses_fork_point,sync_reorg_overwrites_leaves,reorg_orphans_leaves_present_but_unprovable) are unchanged and still pass — genuine reorgs go through the non-empty-plan path untouched.Fix 2 has no dedicated unit test: exercising
AsmWorkerContext::get_anchor_statemeans standing up a full context (bitcoin client, runtime handle, a validAnchorState), which the asm-runner tests deliberately avoid. It is covered end-to-end by the functional tests that read anchor state back (the restart test) and the Moho path (the bridge suite). Happy to add the scaffolding if reviewers prefer.The phantom reorg was also reproduced independently in the ASM functional harness (restarting the runner into a background mining burst produces
ASM L1 reorg detectedwithgetchaintipsshowing a single linear chain).Checklist