Skip to content

adapter: use occ for read-then write, with incremental retries#35192

Draft
aljoscha wants to merge 9 commits intoMaterializeInc:mainfrom
aljoscha:adapter-read-then-write-occ-incremental
Draft

adapter: use occ for read-then write, with incremental retries#35192
aljoscha wants to merge 9 commits intoMaterializeInc:mainfrom
aljoscha:adapter-read-then-write-occ-incremental

Conversation

@aljoscha
Copy link
Copy Markdown
Contributor

We now use SUBSCRIBE instead of PEEK to maintain the desired set of
updates that need to be written. We also don't acquire locks on tables
but instead optimistically try and write our updates at the timestamp
right at our current subscribe frontier.

Additionally, we take the opportunity this provides and move the
sequencing code from the coordinator main loop to the frontend, similar
to how we have done that for peeks in frontend_peek.rs.

Work towards https://github.com/MaterializeInc/database-issues/issues/6686

Implementation of https://github.com/MaterializeInc/materialize/blob/63645b72e83ee26d2cfa99d25d773a06e6accb74/doc/developer/design/20260210_incremental_occ_read_then_write.md

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch 4 times, most recently from 1675464 to 2cb8e6d Compare February 27, 2026 13:59
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch from 865091f to 22a54a2 Compare March 16, 2026 15:30
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch 11 times, most recently from 179deb6 to 8b8d81c Compare April 23, 2026 10:33
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch 9 times, most recently from 01bcd4d to aff3f2e Compare April 28, 2026 13:36
aljoscha and others added 3 commits April 28, 2026 16:38
The cancel scenario periodically fires `pg_cancel_backend` against a
random worker. If that signal arrives during the worker's ROLLBACK,
psycopg's `connection.rollback()` raises a "canceling statement due
to user request" error. The rollback-error handler didn't recognize
that string, silently cleared `rollback_next`, and moved on — but
psycopg was still in `InFailedSqlTransaction`, so the next
psycopg-path query (COPY TO, SET, PREPARE, ...) died client-side
with "current transaction is aborted", which no action tolerates.

Surfaced by nightly 16160 (Parallel Workload (cancel)) on the
frontend-RTW-OCC branch: the RTW path's wider per-statement window
widens the race, but the bug is in the harness. Fix it by forcing a
reconnect on cancelled or aborted-transaction rollback failures —
reconnection is a clean reset, and psycopg doesn't guarantee the
connection is usable after a failed rollback anyway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
We will later use this for the frontend read-then-write implementation.
aljoscha and others added 2 commits April 28, 2026 16:39
We now use SUBSCRIBE instead of PEEK to maintain the desired set of
updates that need to be written. We also don't acquire locks on tables
but instead optimistically try and write our updates at the timestamp
right at our current subscribe frontier.

Additionally, we take the opportunity this provides and move the
sequencing code from the coordinator main loop to the frontend, similar
to how we have done that for peeks in frontend_peek.rs.

Work towards MaterializeInc/database-issues#6686
The original `if active_subscribe.internal { ... as BuiltinTableAppendNotify }`
branch needed an explicit cast (with a `clippy::as_conversions` allow) for
the if/else types to unify. Splitting into a guarded match arm lets the
match's coercion site unsize both `Box::pin`s to the trait-object type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch from aff3f2e to f444d6d Compare April 28, 2026 14:39
aljoscha and others added 4 commits April 28, 2026 16:10
Commit 21bbdde unified the error returned when a peek or subscribe
sees an underlying relation dropped mid-flight to "query could not
complete because relation X was dropped" (via
DroppedDependency::query_terminated_error). The COPY path keeps a
distinct "copy has been terminated because underlying ..." message via
DroppedDependency::copy_terminated_error.

Two assertions were missed in that change:
  - test/sqllogictest/ct_various.slt:190 still expected the old subscribe
    wording, so the FETCH-after-DROP-CT case fails.
  - src/environmentd/tests/sql.rs::test_dont_drop_sinks_twice was
    updated to expect "query could not complete", but the test runs
    COPY (SUBSCRIBE ...), which goes through copy_terminated_error and
    therefore returns "copy has been terminated".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…HEN_WRITE

The `enable_adapter_frontend_occ_read_then_write` dyncfg is sampled
once at coordinator startup (coord.rs:4730) and cached on the
`Coordinator` (coord.rs:1922). The cached bool is what every session
dispatches on (peek_client / try_frontend_read_then_write), and the
two paths it gates — coordinator lock-based RTW and frontend OCC RTW
— must not run concurrently within one process; the module docs on
`frontend_read_then_write` spell this out.

Two consequences of letting `ALTER SYSTEM SET` flip the catalog value
at runtime:

  - It looks like the change took effect (catalog updated, no error)
    when in fact the running process keeps using the startup value
    until restart. This is a footgun for operators and for anyone
    debugging "is this run actually exercising the OCC path?".
  - parallel-workload's `FlipFlagsAction` was randomly toggling it
    every test run. The flips were a no-op for the running process,
    but they were still noise — and worse, they hid the assumption
    that *nothing* reads the dyncfg directly post-startup. If a
    future change forgets the cached bool and reads the dyncfg fresh,
    the flips would silently open a mixed-mode window.

Reject runtime mutation of this var (and any future "startup-only"
vars added to the same allowlist) at the `ALTER SYSTEM SET` / `RESET`
boundary, and remove the var from `FlipFlagsAction.flags_with_values`
so parallel-workload doesn't probe the now-rejected path.

`with_system_parameter_default` and `--system-parameter-default` are
unaffected — they set the startup defaults via a different code path
(catalog config), not via the alter-system sequencer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flip the dyncfg default to false so this nightly exercises the legacy
lock-based read-then-write path. Revert before merging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes for a deterministic-ish repro of the
parallel-workload-dml retraction failure first seen in nightly 16238:

  - Restore FRONTEND_READ_THEN_WRITE default to `true` (revert
    f61a0e5), so the OCC RTW path is exercised again.
  - Pin parse_common_args's --seed default to 1777388214, the seed
    from nightly 16238. Re-running the parallel-workload nightly job
    on this branch should now reproduce that run's exact action mix.

Both changes need to be reverted before merging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant