Skip to content

antithesis: scaffold harness, multi-scenario layout, broadened testdrive corpus#36437

Draft
DAlperin wants to merge 1 commit intoMaterializeInc:mainfrom
DAlperin:dov/antithesis-harness
Draft

antithesis: scaffold harness, multi-scenario layout, broadened testdrive corpus#36437
DAlperin wants to merge 1 commit intoMaterializeInc:mainfrom
DAlperin:dov/antithesis-harness

Conversation

@DAlperin
Copy link
Copy Markdown
Member

@DAlperin DAlperin commented May 7, 2026

Brings up the Antithesis Test Composer harness for Materialize. The branch covers research artifacts, the test-driver mzbuild image, scenario infrastructure, workload helpers (testdrive runner, corpus lists, the upsert_sources prototype), and a few small upstream fixes that were prerequisites.

This is a draft because validation still needs to run on real x86 (see Known limitations) and we still need to wire ANTITHESIS_REPOSITORY into the launch pipeline. Functionally, the harness is ready to ship to Antithesis as the first base scenario.

What's in here

antithesis/

antithesis/
  AGENTS.md                           directory map + scenarios table
  scratchbook/                        SUT analysis, deployment topology,
                                      test-driver integration plan,
                                      scenario strategy, existing-assertions
                                      inventory
  configs/<scenario>/
    mzcompose.py                      source-of-truth composition
    docker-compose.yaml               generated artifact (snouty consumes)
  bin/render-compose-yaml.py          renders configs/<scenario>/docker-compose.yaml
                                      from mzcompose.py and layers on
                                      platform / hostname / container_name /
                                      NO_COLOR Antithesis attributes
  test-driver/                        mzbuild image: MZFROM testdrive +
                                      Python + Antithesis SDK + the workload
                                      tree at /opt/antithesis/test/v1/ +
                                      curated test/testdrive corpus at
                                      /opt/materialize/td/testdrive/
  test/v1/
    helper_bootstrap.py               shared sys.path injector
    testdrive_{sql,kafka,
               load_generator,
               recovery}/             area templates; each picks a random .td
                                      from materialize.antithesis.testdrive_corpus
    upsert_sources/                   randomized helper-driven upsert workload
                                      with expected-state model

misc/python/materialize/antithesis/

Module Role
sdk.py SDK wrapper with local fallbacks. The antithesis package coexists with our scaffolding directory because both are namespace packages.
testdrive_config.py Shared TestdriveConfig dataclass (env-driven).
td_runner.py Generic .td runner: subprocess + tolerated-failure retry + reachable() on success and on tolerated failure.
testdrive_corpus.py Curated lists of base-compatible .td files split into 4 area buckets — BASE_SQL (22), BASE_KAFKA (35), BASE_LOAD_GENERATOR (12), BASE_RECOVERY (8) — 72 unique files.
upsert_sources.py Prototype workload helper: randomized Kafka UPSERT writes with an expected-state model.

Why per-scenario configs

antithesis/scratchbook/scenario-strategy.md documents the design with citations to the Antithesis docs (snouty docs CLI). Short version: incompatible topologies (e.g. base SQL+Kafka vs MySQL CDC with multithreaded replicas) get separate config dirs and separate snouty launch invocations; incompatible workloads on the same topology coexist as multiple test templates inside one config (Antithesis selects exactly one template per execution history).

The base scenario is the only one wired up here; mysql_mt_replicas for the SS-95 ticket is the planned next addition.

Why no per-area eventually_* recovery checks

Earlier drafts of this branch had per-area eventually_* commands. They either reduced to tautologies (SELECT count(*) >= 0 always passes if pgwire is up) or to a generic CREATE/INSERT/SELECT/DROP round-trip that was only weakly correlated with the chosen .td's actual semantics. When the singleton picks a random .td you can't write a useful recovery property without knowing what state was created.

Real recovery properties belong either in SUT-side Rust assertions or in scenario-specific eventually_* commands tied to a specific workload — which is exactly the shape upsert_sources/eventually_* already has (it writes a sentinel and waits for it). The scratchbook entry warns against re-adding generic per-area recovery checks.

Upstream fixes pulled in

These could ideally be split out as their own PRs. They were prerequisites for the Antithesis work to function and are limited in scope.

  • misc/python/materialize/cli/mzcompose.py — the shtab Enum-choices workaround broke --arch and --sanitizer on Python 3.13. argparse's post-conversion member in choices check failed because choices were member names while type=Enum returned member objects. Switched to list(action.choices) (Enum members) so argparse and shtab are both happy. Includes the regenerated bash/zsh shell completions.

  • misc/python/materialize/mzbuild.py — the Copy pre-image plugin was unused upstream and crashed when used: Copy.inputs() returned paths relative to the source dir, but the mzbuild fingerprinter expected paths relative to the repo root, so os.lstat(rd.root / rel_path) hit FileNotFoundError. Made Copy.inputs() repo-root-relative and Copy.run() strip the source prefix before computing the destination inside the build context.

  • misc/python/materialize/mzcompose/service.py — added container_name to the ServiceConfig TypedDict (a real Compose field). Antithesis requires it for log/fault attribution.

  • ci/test/lint-main/checks/check-mzcompose-files.sh — exclude antithesis/configs/*/mzcompose.py from the "unused in any CI pipeline file" check; Antithesis runs are submitted via snouty, not Buildkite.

  • ci/builder/requirements.txt — added antithesis==0.2.0 so the SDK resolves locally for type-checking and ad-hoc imports.

Known limitations

  • On Apple Silicon, snouty validate antithesis/configs/base fails end-to-end because the amd64 materialized image's clusterd child segfaults under Rosetta during lgalloc init (unix_wait_status(11) → container Exited (139)). Run validate on Linux/x86 instead. Documented in antithesis/AGENTS.md.

  • On Apple Silicon, bin/mzimage acquire --arch x86_64 currently fails to link with ld.lld: error: undefined symbol: getauxval: the materializeinc/crosstools/x86_64-unknown-linux-gnu homebrew formula ships glibc 2.12.1, but Rust 1.95's stdlib references getauxval which needs glibc 2.16+. Workaround:

    bin/ci-builder run stable bin/mzimage acquire --arch x86_64 antithesis-test-driver
    

    uses the Docker builder's current glibc. The homebrew formula needs an upstream update; CI is unaffected (Linux hosts route through ci-builder by default).

Next steps

  • SS-95: mysql_mt_replicas scenario per scratchbook/test-driver-integration.md.
  • Tier 2 scenarios: pg_cdc, mysql_cdc, sql_server_cdc, s3_copy.
  • Tier 3 structural refactors: parallel-workload regression scenario (gate Database.create() Kafka/CSR/AWS/PG/MySQL/SQLServer/Iceberg CREATE CONNECTIONs on flags), zippy execution adapter.
  • Per-scenario template gating via ANTITHESIS_SCENARIO env in the test-driver entrypoint, so a scenario only sees its compatible templates.
  • SUT-side Rust assertions where they're justified (rare/dangerous internal states, branch outcomes).

Tests

Lint passes locally (bin/lint); the audited test commands bin/pyactivate-import cleanly. The compose YAML validates via docker compose config --quiet. End-to-end snouty validate blocked on the local toolchain limitations above.

🤖 Generated with Claude Code

…ive corpus

Brings up the Antithesis Test Composer harness for Materialize. The branch
covers research artifacts, the test-driver mzbuild image, scenario
infrastructure, workload helpers (testdrive runner, corpus lists, the
upsert-sources prototype), and a few small upstream fixes that were
prerequisites.

Layout
------

    antithesis/
      AGENTS.md                          directory map + scenarios table
      scratchbook/                       SUT analysis, deployment topology,
                                         test-driver integration plan,
                                         scenario strategy, existing-assertions
                                         inventory
      configs/<scenario>/
        mzcompose.py                     source-of-truth composition
        docker-compose.yaml              generated artifact (snouty consumes)
      bin/render-compose-yaml.py         renders configs/<scenario>/docker-compose.yaml
                                         from mzcompose.py and layers on
                                         platform/hostname/container_name/
                                         NO_COLOR Antithesis attributes
      test-driver/                       mzbuild image: MZFROM testdrive +
                                         Python + Antithesis SDK + the workload
                                         tree at /opt/antithesis/test/v1/ +
                                         curated test/testdrive corpus at
                                         /opt/materialize/td/testdrive/
      test/v1/
        helper_bootstrap.py              shared sys.path injector
        testdrive_{sql,kafka,load_generator,recovery}/
                                         area templates; each picks a random
                                         .td from materialize.antithesis.testdrive_corpus
        upsert_sources/                  randomized helper-driven upsert
                                         workload with expected-state model

    misc/python/materialize/antithesis/
      sdk.py                             SDK wrapper with local fallbacks; the
                                         antithesis package coexists with our
                                         scaffolding directory because both
                                         are namespace packages
      testdrive_config.py                shared TestdriveConfig dataclass
      td_runner.py                       generic .td runner: subprocess +
                                         tolerated-failure retry + reachable()
                                         on success and on tolerated failure
      testdrive_corpus.py                curated lists of base-compatible .td
                                         files split into 4 area buckets
                                         (BASE_SQL=22, BASE_KAFKA=35,
                                         BASE_LOAD_GENERATOR=12,
                                         BASE_RECOVERY=8 — 72 unique files)
      upsert_sources.py                  prototype workload helper

Why per-scenario configs
------------------------

`antithesis/scratchbook/scenario-strategy.md` documents the design with
citations to the Antithesis docs (snouty docs CLI). In short: incompatible
topologies (e.g. base SQL+Kafka vs MySQL CDC with multithreaded replicas)
get separate config dirs and separate `snouty launch` invocations;
incompatible workloads on the same topology coexist as multiple test
templates inside one config (Antithesis selects exactly one template per
execution history). The base scenario is the only one wired up here;
`mysql_mt_replicas` for the SS-95 ticket is the planned next addition.

Why no per-area eventually_* recovery checks
--------------------------------------------

Earlier drafts of this branch had per-area `eventually_*` commands. They
either reduced to tautologies (`SELECT count(*) >= 0` always passes if
pgwire is up) or to a generic CREATE/INSERT/SELECT/DROP round-trip that
was only weakly correlated with the chosen .td's actual semantics. When
the singleton picks a random .td you can't write a useful recovery
property without knowing what state was created. Real recovery
properties belong either in SUT-side Rust assertions or in
scenario-specific `eventually_*` commands tied to a specific workload —
which is exactly the shape `upsert_sources/eventually_*` already has
(it writes a sentinel and waits for it). The scratchbook entry warns
against re-adding generic per-area recovery checks.

Upstream fixes pulled in (could be split out later)
---------------------------------------------------

* misc/python/materialize/cli/mzcompose.py — the shtab-Enum-choices
  workaround broke `--arch` and `--sanitizer` on Python 3.13. Argparse's
  post-conversion `member in choices` check failed because choices were
  member names while `type=Enum` returned member objects. Switched to
  `list(action.choices)` (Enum members) so argparse and shtab are both
  happy. Includes the regenerated bash/zsh shell completions.

* misc/python/materialize/mzbuild.py — the Copy pre-image plugin was
  unused upstream and crashed when used: `Copy.inputs()` returned paths
  relative to the source dir, but the mzbuild fingerprinter expected
  paths relative to the repo root, so `os.lstat(rd.root / rel_path)` hit
  FileNotFoundError. Made `Copy.inputs()` repo-root-relative and
  `Copy.run()` strip the source prefix before computing the destination
  inside the build context.

* misc/python/materialize/mzcompose/service.py — added `container_name`
  to `ServiceConfig` (a real Compose field). Antithesis requires it for
  log/fault attribution.

* ci/test/lint-main/checks/check-mzcompose-files.sh — exclude
  `antithesis/configs/*/mzcompose.py` from the 'unused in any CI
  pipeline file' check; Antithesis runs are submitted via snouty, not
  Buildkite.

* ci/builder/requirements.txt — added `antithesis==0.2.0` so the SDK
  resolves locally for type-checking and ad-hoc imports. Coexists with
  our `antithesis/` scaffolding directory because both are namespace
  packages and the merge picks up submodules from site-packages.

Known limitations
-----------------

* On Apple Silicon, `snouty validate antithesis/configs/base` fails
  end-to-end because the amd64 `materialized` image's `clusterd` child
  segfaults under Rosetta during lgalloc init
  (`unix_wait_status(11)` -> container `Exited (139)`). Run validate on
  Linux/x86 instead. Documented in antithesis/AGENTS.md.

* On Apple Silicon, `bin/mzimage acquire --arch x86_64` currently fails
  to link with `ld.lld: error: undefined symbol: getauxval`: the
  `materializeinc/crosstools/x86_64-unknown-linux-gnu` homebrew formula
  ships glibc 2.12.1, but Rust 1.95's stdlib references getauxval which
  needs glibc 2.16+. Workaround:
  `bin/ci-builder run stable bin/mzimage acquire --arch x86_64 antithesis-test-driver`
  uses the Docker builder's current glibc. The homebrew formula needs an
  upstream update; CI is unaffected (Linux hosts route through ci-builder
  by default).

Next steps
----------

* SS-95: `mysql_mt_replicas` scenario per scratchbook/test-driver-integration.md
* Tier 2 scenarios: pg_cdc, mysql_cdc, sql_server_cdc, s3_copy
* Tier 3 structural refactors: parallel-workload regression scenario
  (gate Database.create() Kafka/CSR/AWS/PG/MySQL/SQLServer/Iceberg
  CONNECTIONs on flags), zippy execution adapter
* Per-scenario template gating via `ANTITHESIS_SCENARIO` env in the
  test-driver entrypoint, so a scenario only sees its compatible
  templates
* SUT-side Rust assertions where they're justified (rare/dangerous
  internal states, branch outcomes)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant