Skip to content

feat: Deferred block proving#1725

Open
sergerad wants to merge 40 commits intonextfrom
sergerad-deferred-block-proving
Open

feat: Deferred block proving#1725
sergerad wants to merge 40 commits intonextfrom
sergerad-deferred-block-proving

Conversation

@sergerad
Copy link
Collaborator

@sergerad sergerad commented Mar 2, 2026

Context

We are adding deferred (asynchronous) block proving for the node, as described #1592. Currently, block proving happens synchronously during apply_block, which means block commitment is blocked until the proof is generated.

Blocks will now exhibit committed (not yet proven) and proven states. A committed block is already part of the canonical chain and fully usable. Clients that require proof-level finality can opt into it via the new finality parameter on SyncChainMmr.

Changes

  • Schema: Added proving_inputs BLOB and is_proven BOOLEAN columns to the block_headers table, with partial indexes for querying unproven (is_proven = 0) and proven (is_proven = 1) blocks.
  • Block proof file storage: Block proofs are stored as files via BlockStore (following the existing block file pattern) rather than as BLOBs in SQLite. The DB tracks only an is_proven flag.
  • DB queries: Added mark_block_proven, select_unproven_blocks (excludes genesis block 0), select_block_proving_inputs (returns deserialized BlockProofRequest), and select_latest_proven_block_num.
  • Decoupled proving from apply_block: The BlockProofRequest is now serialized and persisted alongside the block during apply_block.
  • Proof scheduler: Added a background task (proof_scheduler.rs) that drives deferred proving. It queries unproven blocks on startup (restart recovery), listens for new block commits via Notify, and proves blocks concurrently using FuturesOrdered for FIFO completion ordering. Proofs are saved to files, then the block is marked proven in the DB.
  • Finality on SyncChainMmr: Added a Finality enum (COMMITTED, PROVEN) to the protobuf and a finality field on SyncChainMmrRequest.
  • Refactored apply_block query: Introduced ApplyBlockData struct to replace the 7-parameter function signature.

@sergerad sergerad changed the title Sergerad deferred block proving feat: Deferred block proving Mar 2, 2026
@sergerad sergerad marked this pull request as ready for review March 2, 2026 23:42
@sergerad sergerad requested review from Mirko-von-Leipzig, bobbinth and drahnr and removed request for Mirko-von-Leipzig March 2, 2026 23:43
let block_nums: Vec<i64> =
SelectDsl::select(schema::block_headers::table, schema::block_headers::block_num)
.filter(schema::block_headers::block_proof.is_null())
.filter(schema::block_headers::block_num.gt(0i64))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q; are there proven blocks at genesis?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Genesis block is never proven.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to think about this; but I think it follows from the fact that the genesis block is probably invalid. It has no transactions; we just magically make state appear in it.

// ================================================================================================

/// Overall timeout for proving a single block.
const BLOCK_PROVE_TIMEOUT: Duration = Duration::from_mins(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this a bit more relaxed, particularly for CI runs 🫠

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped to 4. Leaving comment here for others to weigh in

continue;
},
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is racy and we need atomicity in general. The current way of calling works. I'd appreciate a comment on why this works (we have the outer Mutex<()> to protect both)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the logic to construct the notify before the db query. See the comment above that line. LMK if my understanding is incorrect.

};
futures.push_back(fut);
}
futures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FuturesOrdered doesn't have a concurrency execution limit, it only guarantees the results are in the input order (FIFO). So https://docs.rs/futures/latest/futures/stream/trait.StreamExt.html#method.buffered might a strictly better choice, to limit the optimistic proving.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with adding a batch limit for the db query instead

Copy link
Contributor

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally like the approach/LGTM, a few edges to be smoothed

@sergerad sergerad requested a review from drahnr March 3, 2026 21:07
@sergerad sergerad requested a review from bobbinth March 3, 2026 22:32
- Introduce `SyncChainMmr` RPC endpoint to sync chain MMR deltas within specified block ranges ([#1591](https://github.com/0xMiden/node/issues/1591)).
- Fixed `TransactionHeader` serialization for row insertion on database & fixed transaction cursor on retrievals ([#1701](https://github.com/0xMiden/node/issues/1701)).
- Added KMS signing support in validator ([#1677](https://github.com/0xMiden/node/pull/1677)).
- Restructured block proving to be asynchronous and added finality field for `SyncChainMmr` requests ([#1725](https://github.com/0xMiden/miden-node/pull/1725)).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Restructured block proving to be asynchronous and added finality field for `SyncChainMmr` requests ([#1725](https://github.com/0xMiden/miden-node/pull/1725)).
- Added finality field for `SyncChainMmr` requests ([#1725](https://github.com/0xMiden/miden-node/pull/1725)).

Comment on lines +488 to +491
// Return data up to the latest committed block (default).
FINALITY_COMMITTED = 0;
// Return data only up to the latest proven block.
FINALITY_PROVEN = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the prefix a protobuf thing? (I think so).

https://protobuf.dev/best-practices/dos-donts/#unspecified-enum:

Suggested change
// Return data up to the latest committed block (default).
FINALITY_COMMITTED = 0;
// Return data only up to the latest proven block.
FINALITY_PROVEN = 1;
FINALITY_UNSPECIFIED = 0;
// Return data up to the latest committed block (default).
FINALITY_COMMITTED = 1;
// Return data only up to the latest proven block.
FINALITY_PROVEN = 2;


// The finality level to use when clamping the upper bound of the block range.
//
// When set to `FINALITY_COMMITTED` (default), the upper bound is clamped to the chain tip.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't have a default value here iiuc.

NoProvenBlocks,
#[error("database error")]
#[grpc(internal)]
DatabaseError(#[from] DatabaseError),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DatabaseError(#[from] DatabaseError),
DatabaseError(#[source] DatabaseError),


/// Returns the database.
pub(crate) fn db(&self) -> Arc<Db> {
self.db.clone()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.db.clone()
Arc::clone(&self.db)

db: Arc<Db>,
block_prover: Arc<BlockProver>,
block_store: Arc<BlockStore>,
) -> (ProofSchedulerHandle, JoinHandle<Result<(), ProofSchedulerError>>) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the name a bit confusing since we have two handles. But maybe I just need to shift my perspective a bit.

Comment on lines +98 to +106
// Query all unproven blocks. This handles both startup recovery and new blocks.
let unproven_blocks = match db.select_unproven_blocks(MAX_PROVING_BATCH_SIZE).await {
Ok(blocks) => blocks,
Err(err) => {
error!(target: COMPONENT, %err, "Failed to query unproven blocks, retrying");
tokio::time::sleep(Duration::from_secs(1)).await;
continue;
},
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, I don't think you actually need this.

You only need to track chain_tip and latest_proven_block -- which you can do essentially for free after starting up if you replace Notify with a watch::Receiver<BlockNumber>.

Comment on lines +114 to +119
// Construct proving jobs and drain results in order.
// On any failure we break immediately — dropping remaining futures cancels them.
// The outer loop will re-query unproven blocks and restart the sequence, ensuring
// we never persist a proof while an ancestor block is still unproven.
let mut proving_futures = order_proving_jobs(&db, &block_prover, &unproven_blocks);
while let Some(timeout_result) = proving_futures.next().await {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works as expected - this essentially means we submit a set of blocks for proving, and then wait until all blocks are proven.

What we want is to constantly have N blocks being proven concurrently, when one completes, another should be queued.

Comment on lines +136 to +143
Err(ProveBlockError::Transient(err)) => {
error!(
target: COMPONENT,
%err,
"Block proving failed, abandoning batch and retrying next iteration"
);
break;
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means a single error in the batch, restarts every block proof in the batch from scratch?

block_prover: Arc<BlockProver>,
block_store: Arc<BlockStore>,
notify: Arc<Notify>,
) -> Result<(), ProofSchedulerError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose an alternative implementation here, which I think covers my other comments as well.

imo, what we want is:

  • N ongoing proofs
  • blocks in database must only be marked as proven in sequence

What I suggest:

  • move as much as possible into a single proof future
    • it auto-retries unless fatal or overall timeout is exceeded
    • it saves the proof to file and essentially does everything except update the database
  • central loop
    • tracks chain_tip by replacing Notify with watch::Receiver<BlockNumber>
    • tracks latest proven block number by getting it as input from outside, or reading it once from database outside the loop
    • tracks a set of tokio::JoinSet with proof jobs
      • JoinSet capacity is limited to N, each is mapped to the BlockNumber
      • Upon task completion store result locally BlockNumber: Result<(), Error>
        • Add new proof job (if any)
        • If block number = last proven + 1, then
          • if Ok(()) -> update database and repeat until sequence gap is hit
          • if Err then crash out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants