Kernel auth gap on Withdraw — reproducible PoC + analysis by saroupille · Pull Request #4 · trilitech/tzel

saroupille · 2026-04-21T09:39:36Z

Summary

This branch documents and provides reproducible proofs of an authentication gap in the kernel's Withdraw path (and, by the same mechanism, Shield). Any entity that can submit an external inbox message to the rollup — i.e., any Tezos L1 account holder — can drain any known public_account to a recipient they control. The KernelWithdrawReq struct has no signature field, apply_kernel_message runs no authentication check, and the operator's single bearer token does not protect against direct octez-client send smart rollup message submissions.

This is not a bug report framed as "fix me in this PR". It is an analysis branch that (a) makes the gap trivially reproducible in CI, (b) walks through the evidence in the codebase, and (c) sketches the design space. The design decision — accept single-tenant as the intended model, add a Tezos sig, add a WOTS leaf per account, or something else — belongs upstream and is out of scope here.

Built on top of PR #3 (fix/configure-messages-via-dal) to keep the kernel tree consistent with admin-DAL routing.

The gap in one paragraph

KernelWithdrawReq is three fields — sender, recipient, amount — no signature, no proof. apply_kernel_message on Withdraw checks that balance(sender) >= amount and that recipient is a parseable tz1/KT1, writes an outbox message debiting sender and crediting recipient, and returns. The kernel has no access to the L1 tx source that carried the inbox message, and the struct carries no information that could bind the withdraw to the true owner of the public account. The tzel-operator's submit_rollup_message handler verifies a single bearer token shared across the whole instance, but this is irrelevant because octez-client send smart rollup message is callable by any Tezos account holder, bypassing the operator entirely. The Shield path has the same structural absence of sender authentication (the STARK proof binds hash(sender) but has no private input tying to ownership).

What this branch ships

File	Purpose
`docs/analysis/withdraw-auth-gap.md`	Full write-up: evidence with line references, threat model (public_accounts are enumerable via durable state RPC + bridge deposits are public on L1), blast radius, and four mitigation sketches with tradeoffs.
`tezos/rollup-kernel/tests/bridge_flow.rs` (+104)	`withdraw_poc_drains_unauthorized_sender` — runs under `cargo test --test bridge_flow`, no sandbox needed. Configure → deposit 500_001 mutez to `alice` → unauthorized third party submits Withdraw with `sender = "alice"` → asserts the drain succeeded. Positive-passing today (documenting the gap); flip to negative-asserting once auth lands.
`scripts/sandbox_withdraw_auth_bypass_poc.sh`	End-to-end sandbox smoke that forks the DAL smoke, keeps setup + deposit, then attacks: submit a Withdraw from `bootstrap2` (explicitly NOT the operator's `source_alias`) via `octez-client send smart rollup message`. Terminates with `VULNERABILITY CONFIRMED` on success.
`tezos/rollup-kernel/src/bin/octez_kernel_message.rs` (+21)	`withdraw` subcommand — minimal PoC helper that emits a framed `KernelInboxMessage::Withdraw` ready for `octez-client`. Removes cleanly once authentication is added.

Evidence (abridged)

core/src/kernel_wire.rs:110-115 — KernelWithdrawReq { sender, recipient, amount }. No signature, no proof.
tezos/rollup-kernel/src/lib.rs:~1009 — Withdraw match arm: balance check, recipient format check (TezosContract::from_b58check at :509), outbox write. Nothing compares sender with anything the caller can prove.
services/tzel/src/bin/tzel_operator.rs:304 — require_bearer_auth is a single-token check against config.bearer_token. No per-user mapping.
apps/wallet/src/lib.rs:6501 — the legitimate CLI withdraw constructs a KernelWithdrawReq with the user's chosen sender string and posts through the operator; the same construction is reachable by any third party.

The attack path used in the sandbox PoC is exactly:

octez-client send smart rollup message "hex:[ \"<framed withdraw>\" ]" from bootstrap2

From bootstrap2, which is not the operator source. Kernel processes. Balance drains.

Verification

$ cargo test --test bridge_flow withdraw_poc_drains_unauthorized_sender
cargo test: 1 passed, 8 filtered out (1 suite, 0.02s)

$ TZEL_OCTEZ_SANDBOX_PRESERVE=1 ./scripts/sandbox_withdraw_auth_bypass_poc.sh
...
==========================================================
VULNERABILITY CONFIRMED: alice's 500001 mutez was drained
by a withdraw message signed by bootstrap2 (not operator).
No bearer token was needed.  No proof was needed.
==========================================================

Open question for the kernel maintainer

The design intent here needs to be stated explicitly before any further UX / multi-tenant deployment work proceeds. Specifically:

Is the current model single-tenant by intent? If yes, documenting this constraint in deployment guides + operator runbooks + wallet UX would prevent misuse. In that case, these PoCs serve as a regression trap rather than a fix target.
Or is sender authentication at the kernel level planned? The analysis doc sketches two families (Tezos-sig-bound-at-deposit, WOTS-leaf-per-account), both are post-quantum-compatible with the existing kernel structure. If this is the intended direction, it shapes downstream work: bridge contract changes, wallet submission flow, operator submission API, etc.

Follow-up hygiene

This branch does not propose a fix. The withdraw subcommand in octez_kernel_message.rs is a PoC helper; it should be removed once authentication lands. The Rust test and sandbox script are kept as regression traps.

🤖 Generated with Claude Code

A WOTS-signed `ConfigureVerifier` KernelInboxMessage serializes to 4923 bytes, and `ConfigureBridge` to 4835 bytes. Both exceed the Tezos smart-rollup protocol constant `sc_rollup_message_size_limit` (4096 bytes), so they cannot transit through the L1 external-message path that `octez-client send smart rollup message` uses. This commit extends the existing DAL delivery path — already routing Shield / Transfer / Unshield payloads too large for L1 — to cover the two admin configuration messages. Kernel-side: - `core/src/kernel_wire.rs` * `KernelDalPayloadKind` gains `ConfigureVerifier` (wire tag 3) and `ConfigureBridge` (wire tag 4). * `kernel_dal_payload_kind_{to,from}_wire` handle both. * A comment clarifies that tag numbering here is independent of the tags used by `WireKernelInboxMessage`. * `KERNEL_WIRE_VERSION` bumped to 10: older clients that read a `DalPointer` with `kind=3|4` now see an explicit envelope version mismatch rather than an opaque tag error. - `tezos/rollup-kernel/src/lib.rs` * `fetch_kernel_message_from_dal` accepts the two new kinds. * The dispatch match is reshaped to be exhaustive on `KernelDalPayloadKind`: any future variant will be a compile error until handled here, instead of silently hitting the old `_ => Err("kind mismatch")` arm. * Docstring explains that the kind-vs-content check is a defense-in-depth control (forces honest labeling for auditors) and that authenticity itself comes from the WOTS signature / STARK proof inside the payload — DAL is a public bulletin board with no transport-level authentication. * `dal_payload_kind_name` labels the two new variants. Tooling: - `tezos/rollup-kernel/src/bin/octez_kernel_message.rs` * New `configure-verifier-payload` and `configure-bridge-payload` subcommands emit the raw unframed KernelInboxMessage hex — the input for chunking and DAL publication. * `dal-pointer` accepts the `configure_verifier` and `configure_bridge` kind tokens. * When `TZEL_ROLLUP_CONFIG_ADMIN_ASK_HEX` is unset in debug builds, the fallback to the public dev ask now emits a `eprintln!` warning; silent fallback paired with a release-profile kernel built without admin material would be a footgun. - `tezos/rollup-kernel/build.rs` (new) * Emits `cargo:rerun-if-env-changed=` for the three admin material env vars consumed by `option_env!()` in the kernel source (the config-admin public seed plus the verifier and bridge config-admin WOTS leaves), so a rotation of the admin material always re-bakes the WASM. Tests: - `core/src/kernel_wire.rs::tests` * Size sentinels for `ConfigureVerifier` (4923 bytes) and `ConfigureBridge` (4835 bytes). The tests pass today; they fail loudly if a future encoding change shifts either size, forcing a review of DAL routing assumptions. Operator and wallet-server are intentionally left unchanged: by design, admin config messages flow directly from an admin's `octez_kernel_message` + `octez-client` (with the admin's own L1 key and WOTS ask), never through the user-facing operator API. This keeps the operator's interface narrow, preserves admin availability independent of operator health, and prevents a bearer-token leak from granting the ability to inject admin configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add an "Admin configuration messages and DAL routing" section to the rollup-kernel README describing: - why `ConfigureVerifier` and `ConfigureBridge` must use DAL (WOTS signature bloat pushes each message above 4096 bytes); - the delivery flow end-to-end (admin computes unframed payload, chunks to DAL, injects a `DalPointer` on L1; kernel reassembles pages, verifies the payload hash, decodes, and dispatches); - a checklist for adding a new oversized message type in the future (wire tag, dispatch arm, CLI subcommand, size sentinel). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five fixes bundled to get `octez_rollup_sandbox_dal_smoke.sh` through configure / deposit on an Octez master built from recent trunk: 1. `attestation_lags` invariant The Alpha protocol gained a restriction (tezos master 8499ce19ac, 2025-12-04) that the last element of `attestation_lags` must equal `attestation_lag`. The mockup generates `attestation_lags = [1,2,3,4,5]` by default, and the script overrides `attestation_lag` to 2 via `DAL_ATTESTATION_LAG`, so `activate_alpha` fails. Force `attestation_lags = [attestation_lag]` in `build_alpha_sandbox_params`. 2. Configure messages via DAL `configure-verifier` (4944 bytes framed) and `configure-bridge` (4856 bytes framed) exceed `sc_rollup_message_size_limit = 4096`, so the old direct `octez-client send smart rollup message` path fails at encoding. Route both via the DAL delivery path instead, using the new `configure-{verifier,bridge}-payload` CLI subcommands and a generalized `publish_payload_via_dal_and_inject_pointer` helper factored out of `publish_shield_via_dal_and_inject_pointer`. 3. Admin material baked into the release kernel WASM The release kernel's `authenticate_{verifier,bridge}_config` only accepts admin-signed payloads when the admin leaves are baked in at compile time via `TZEL_ROLLUP_CONFIG_ADMIN_*_HEX`. Without this, the kernel silently rejects every configure payload. Call `scripts/prepare_rollup_config_admin.sh` before the kernel build and source both the runtime (secret ask) and build (public leaves) env files. 4. `xxd -ps -c 0` newline workaround On our xxd version `-c 0` still wraps at ~60 characters, inserting newlines that silently break string matches and URL / Michelson arg construction. Pipe to `tr -d '\n'` on the affected call sites: `await_bridge_ticketer`, `deposit_to_bridge`, and the balance-key construction in `main`. Without this, `await_bridge_ticketer` reports "ticketer did not appear" even after the kernel applied the configuration. 5. Caveat on `set -a` scope A comment makes explicit that exporting `TZEL_ROLLUP_CONFIG_ADMIN_ASK_HEX` to every descendant process is acceptable in sandbox (ephemeral per-workdir ask) but must not be copied to a production runner. After these fixes, the smoke reaches and applies configure-verifier + configure-bridge + the initial bridge deposit; the subsequent fixture shield step is not within the scope of this patch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bde1347 ("Add burned rollup fees and DAL producer note outputs") made two changes the sandbox smoke script never absorbed: 1. apply_shield now debits `v + fee + producer_fee` (not just `v`) from the sender's public balance: let debit = req.v .checked_add(req.fee) .and_then(|value| value.checked_add(req.producer_fee))?; if bal < debit { return Err("insufficient balance"); } The fixture metadata still exposed only `shield_amount: fixture.shield.v` and the sandbox deposited exactly that. Post-bde1347 the balance is short by `fee + producer_fee`, the shield fails with "insufficient balance", and the public drain never happens. For the checked-in fixture (v=400_000, fee=100_000, producer_fee=1 mutez) the required deposit is 500_001 mutez instead of 400_000. Rename `shield_amount` to `shield_bridge_deposit` in `FixtureMetadata` and compute it as `v + fee + producer_fee`. The sandbox script picks up the new field and uses the same value for both the bridge deposit and the pre-shield balance assertion. 2. apply_shield now appends *two* notes to the Merkle tree per shield: the sender's own commitment and the producer's compensation commitment. The smoke's post-shield assertion still expected `/tzel/v1/state/tree/size == 1` — the pre-fees value — which makes the smoke stall at line 698 even though the shield applied cleanly and the public balance drained to zero. Update the assertion to `2` and leave a comment pointing at apply_shield so the next person knows why. These two regressions were hidden behind an earlier one (configure messages exceeding sc_rollup_message_size_limit, fixed in 5071e2e on this branch): the script never got past configure-verifier, so neither bde1347-induced break ever executed. Once configures route through DAL, the smoke advances through the shield and both breaks surface in sequence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The two size sentinels next to this test lock exact byte counts for `ConfigureVerifier` and `ConfigureBridge`. They would have caught the specific regression they target (a WOTS signature change growing those two messages past `sc_rollup_message_size_limit = 4096`) — but only because someone knew to write them *after* the regression surfaced. A more general failure mode: a new field lands on any `KernelInboxMessage` variant, pushes its serialized size past 4096, and nothing fails until an operator tries `octez-client send smart rollup message` against a real node and gets rejected at the L1 inbox. That is how commit 2c45d9c broke admin config: unit tests all passed; the break surfaced weeks later in the sandbox smoke. Add a third test that makes the invariant structural: - A `Routing` enum (`FitsL1` / `RequiresDal`) classifies every variant. `required_routing` is an **exhaustive match** on `KernelInboxMessage` with no `_` arm — the compiler forces any future variant author to classify the new message before the crate builds. - `framed_len` computes the on-wire size the L1 inbox actually sees, i.e. `encode_kernel_inbox_message(...).len() + ExternalMessageFrame::Targetted` overhead (21 bytes: 1 tag + 20 bytes of `SmartRollupHash`). The existing sentinels measure the unframed envelope and under-count by 21 bytes — a message that lands just below 4096 unframed can still be rejected on wire. - The assertion is two-sided: * FitsL1 with `framed > 4096` fails: the L1 routing is broken. * RequiresDal with `framed <= 4096` fails: the DAL plumbing for that variant is dead code and the classification needs revisiting. Representative instances: - `ConfigureVerifier` / `ConfigureBridge`: real WOTS-signed configs (same construction as the sentinels). - `Shield` / `Transfer` / `Unshield`: built with a 4096-byte `proof_bytes` stub — the cheapest size that keeps the RequiresDal classification unambiguous without requiring a full STARK proof in the test harness. - `Withdraw`: small string fields + `u64`, representative of production. - `DalPointer`: single-chunk pointer, representative of what the kernel emits. The frame overhead is replicated as a local constant rather than pulled from `tezos-smart-rollup-encoding` at dev-dep time: that crate pins `tezos_data_encoding = 0.5.2` while `tzel-core` already depends on `tezos_data_encoding = 0.6`, and introducing both majors into the test build for a single 21-byte constant is not worth the friction. The constant is documented with the layout it replicates and verified empirically against `octez_kernel_message dal-pointer` output on a real sr1 address. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After bde1347 ("Add burned rollup fees and DAL producer note outputs"), `apply_shield` debits `v + fee + producer_fee` from the sender's public rollup balance, not just `v + fee`. The tutorial instructed readers to deposit `300000` mutez before shielding `200000` mutez, which covers `v + burn (100000)` exactly — leaving the shield short by the configured DAL-producer fee (`dal_fee = 1` mutez as set by the init-shadownet example). The shield step then fails with "insufficient balance" and the tutorial cannot be completed as written. Bump the deposit to `300001` mutez and update the expected post-deposit balance line to match. The extra paragraph explains the math so the next reader understands why the deposit is not a round number. This matches the sandbox smoke fix in 44adaa8, one tree up. The live-shadownet smoke script (`scripts/shadownet_live_e2e_smoke.sh`) needs a similar bump plus `--dal-fee` / `--dal-fee-address` plumbing in `init_profile`; that is more invasive (producer-address generation + operator fee-policy alignment) and is tracked as a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two small corrections flagged by the 2nd / 3rd adversarial review that are cheap enough to land in the same PR rather than trailing as issues. 1. `apps/wallet/src/lib.rs::kernel_message_kind` used to fold `Withdraw` / `ConfigureVerifier` / `ConfigureBridge` into the same `RollupSubmissionKind::Withdraw` arm. The wallet never submits admin `Configure*` messages (they flow through `octez_kernel_message` + `octez-client` directly), so that arm was dead code — and silently mislabelling an admin message as a `Withdraw` would be a hard-to-spot footgun if some future caller ever reached it. Split the arm: `Withdraw` keeps its own mapping, admin `Configure*` variants become an `unreachable!()` with a message pointing at the admin CLI. 2. `tezos/rollup-kernel/README.md` step 3 of "Adding a new oversized message type" still instructed the next contributor to mirror any new variant into `RollupSubmissionKind` and the operator's submission-matcher — which directly contradicts the design established in commit 5071e2e (admin-signed payloads bypass the operator on purpose, so a bearer-token leak cannot authorise admin injection). The old text also named a function (`submission_kind_matches_message`) that no longer exists. Rewrite step 3 to split the decision by submission path (user-facing via operator, admin-signed via `octez_kernel_message` directly), and add a pointer to the variant-exhaustive size test added in 5c308b4 so the next reader understands it will compile- break on an unclassified variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`fetch_kernel_message_from_dal` had five near-identical arms that checked a `KernelDalPayloadKind` tag against the decoded `KernelInboxMessage` variant and returned the same "payload kind mismatch" error when they disagreed. Collapse the pattern into one `match` that produces a boolean ("does pointer.kind match message?") followed by a single early- return with the shared error message. The `match pointer.kind` arms are still exhaustive (no `_ =>`), so any new `KernelDalPayloadKind` variant added in the future remains a compile error here until it is classified — the structural guarantee is preserved. Net: -37 / +18 lines, same behaviour, same error message, same compile-time exhaustiveness guarantee. Suggested by the second adversarial review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…per) Emit a framed `KernelInboxMessage::Withdraw` ready to be submitted via `octez-client send smart rollup message`, with no signature and no proof — the `KernelWithdrawReq` struct has no such fields, and neither the kernel nor the operator ask for them on the user withdraw path. Used by `scripts/sandbox_withdraw_auth_bypass_poc.sh` and referenced in `docs/analysis/withdraw-auth-gap.md` to make the auth gap reproducible. Kept minimal (one subcommand, three string/integer parameters, same encoding path as the existing `configure-*` paths) so that a later commit can remove it cleanly once authentication is added to `KernelWithdrawReq`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a kernel-level integration test (`bridge_flow.rs`) that exercises the current auth gap on `KernelInboxMessage::Withdraw`: 1. Configure bridge ticketer (admin path, signed with the dev WOTS ask). 2. Deposit 500_001 mutez to `alice` via the legitimate bridge flow. 3. Submit a Withdraw with `sender = "alice"` and a recipient the attacker controls — as an external inbox message, no signature, no proof. Nothing in the caller's provenance is checked by the kernel (the PVM has no access to the L1 tx author anyway, and the `KernelWithdrawReq` struct has no sig/proof field that could bind the withdraw to the actual owner of `alice`). 4. Assert the withdraw applied, alice's balance is zero, and the outbox message credits the attacker's recipient. The test is **positive-passing today** (the kernel accepts the attack), which is exactly what documents the gap. When authentication is ever added — e.g. a Tezos sig verified against an owner stored at deposit time, or a per-account WOTS leaf registered and checked — this test MUST be updated to expect a rejection and flip its assertions. At that point it turns into a regression trap against accidentally removing the auth. Reference: `docs/analysis/withdraw-auth-gap.md` for the full analysis and mitigation sketches. Sandbox-level reproduction at `scripts/sandbox_withdraw_auth_bypass_poc.sh`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two artefacts: 1. `docs/analysis/withdraw-auth-gap.md` — full write-up: what the gap is, evidence in the code (KernelWithdrawReq struct, apply_kernel_message match arm, operator's bearer-only auth, direct-L1-submit bypass), threat model, blast radius, and four mitigation sketches with tradeoffs (not an endorsement — the design decision belongs upstream). 2. `scripts/sandbox_withdraw_auth_bypass_poc.sh` — end-to-end reproduction on an octez sandbox. Forked from the DAL smoke, keeps setup + originate + configure + deposit unchanged, then replaces the shield fixture step with: bootstrap2 (NOT operator) → octez_kernel_message withdraw → octez-client send smart rollup message from bootstrap2 → kernel applies → alice's balance drained to 0 Terminates with "VULNERABILITY CONFIRMED" on success. The sandbox PoC and the kernel-level test in `bridge_flow.rs` (previous commit) are complementary: the Rust test exercises the kernel PVM directly (runs in CI under `cargo test`, no sandbox needed), while the sandbox PoC demonstrates the full end-to-end attack path including the "submit from a non-operator tz1 via `octez-client send smart rollup message`" step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

saroupille · 2026-04-21T09:50:15Z

Reopening on my fork with base=fix/configure-messages-via-dal (PR #3's branch) so the diff shows only the 3 analysis commits instead of the 8 carried over from PR #3. Will link the new PR here once created.

saroupille and others added 11 commits April 21, 2026 00:13

saroupille closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel auth gap on Withdraw — reproducible PoC + analysis#4

Kernel auth gap on Withdraw — reproducible PoC + analysis#4
saroupille wants to merge 11 commits intotrilitech:mainfrom
saroupille:analysis/withdraw-auth-gap-poc

saroupille commented Apr 21, 2026

Uh oh!

saroupille commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

saroupille commented Apr 21, 2026

Summary

The gap in one paragraph

What this branch ships

Evidence (abridged)

Verification

Open question for the kernel maintainer

Follow-up hygiene

Uh oh!

saroupille commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant