antithesis: scaffold harness, multi-scenario layout, broadened testdrive corpus#36437
Draft
DAlperin wants to merge 1 commit intoMaterializeInc:mainfrom
Draft
antithesis: scaffold harness, multi-scenario layout, broadened testdrive corpus#36437DAlperin wants to merge 1 commit intoMaterializeInc:mainfrom
DAlperin wants to merge 1 commit intoMaterializeInc:mainfrom
Conversation
…ive corpus
Brings up the Antithesis Test Composer harness for Materialize. The branch
covers research artifacts, the test-driver mzbuild image, scenario
infrastructure, workload helpers (testdrive runner, corpus lists, the
upsert-sources prototype), and a few small upstream fixes that were
prerequisites.
Layout
------
antithesis/
AGENTS.md directory map + scenarios table
scratchbook/ SUT analysis, deployment topology,
test-driver integration plan,
scenario strategy, existing-assertions
inventory
configs/<scenario>/
mzcompose.py source-of-truth composition
docker-compose.yaml generated artifact (snouty consumes)
bin/render-compose-yaml.py renders configs/<scenario>/docker-compose.yaml
from mzcompose.py and layers on
platform/hostname/container_name/
NO_COLOR Antithesis attributes
test-driver/ mzbuild image: MZFROM testdrive +
Python + Antithesis SDK + the workload
tree at /opt/antithesis/test/v1/ +
curated test/testdrive corpus at
/opt/materialize/td/testdrive/
test/v1/
helper_bootstrap.py shared sys.path injector
testdrive_{sql,kafka,load_generator,recovery}/
area templates; each picks a random
.td from materialize.antithesis.testdrive_corpus
upsert_sources/ randomized helper-driven upsert
workload with expected-state model
misc/python/materialize/antithesis/
sdk.py SDK wrapper with local fallbacks; the
antithesis package coexists with our
scaffolding directory because both
are namespace packages
testdrive_config.py shared TestdriveConfig dataclass
td_runner.py generic .td runner: subprocess +
tolerated-failure retry + reachable()
on success and on tolerated failure
testdrive_corpus.py curated lists of base-compatible .td
files split into 4 area buckets
(BASE_SQL=22, BASE_KAFKA=35,
BASE_LOAD_GENERATOR=12,
BASE_RECOVERY=8 — 72 unique files)
upsert_sources.py prototype workload helper
Why per-scenario configs
------------------------
`antithesis/scratchbook/scenario-strategy.md` documents the design with
citations to the Antithesis docs (snouty docs CLI). In short: incompatible
topologies (e.g. base SQL+Kafka vs MySQL CDC with multithreaded replicas)
get separate config dirs and separate `snouty launch` invocations;
incompatible workloads on the same topology coexist as multiple test
templates inside one config (Antithesis selects exactly one template per
execution history). The base scenario is the only one wired up here;
`mysql_mt_replicas` for the SS-95 ticket is the planned next addition.
Why no per-area eventually_* recovery checks
--------------------------------------------
Earlier drafts of this branch had per-area `eventually_*` commands. They
either reduced to tautologies (`SELECT count(*) >= 0` always passes if
pgwire is up) or to a generic CREATE/INSERT/SELECT/DROP round-trip that
was only weakly correlated with the chosen .td's actual semantics. When
the singleton picks a random .td you can't write a useful recovery
property without knowing what state was created. Real recovery
properties belong either in SUT-side Rust assertions or in
scenario-specific `eventually_*` commands tied to a specific workload —
which is exactly the shape `upsert_sources/eventually_*` already has
(it writes a sentinel and waits for it). The scratchbook entry warns
against re-adding generic per-area recovery checks.
Upstream fixes pulled in (could be split out later)
---------------------------------------------------
* misc/python/materialize/cli/mzcompose.py — the shtab-Enum-choices
workaround broke `--arch` and `--sanitizer` on Python 3.13. Argparse's
post-conversion `member in choices` check failed because choices were
member names while `type=Enum` returned member objects. Switched to
`list(action.choices)` (Enum members) so argparse and shtab are both
happy. Includes the regenerated bash/zsh shell completions.
* misc/python/materialize/mzbuild.py — the Copy pre-image plugin was
unused upstream and crashed when used: `Copy.inputs()` returned paths
relative to the source dir, but the mzbuild fingerprinter expected
paths relative to the repo root, so `os.lstat(rd.root / rel_path)` hit
FileNotFoundError. Made `Copy.inputs()` repo-root-relative and
`Copy.run()` strip the source prefix before computing the destination
inside the build context.
* misc/python/materialize/mzcompose/service.py — added `container_name`
to `ServiceConfig` (a real Compose field). Antithesis requires it for
log/fault attribution.
* ci/test/lint-main/checks/check-mzcompose-files.sh — exclude
`antithesis/configs/*/mzcompose.py` from the 'unused in any CI
pipeline file' check; Antithesis runs are submitted via snouty, not
Buildkite.
* ci/builder/requirements.txt — added `antithesis==0.2.0` so the SDK
resolves locally for type-checking and ad-hoc imports. Coexists with
our `antithesis/` scaffolding directory because both are namespace
packages and the merge picks up submodules from site-packages.
Known limitations
-----------------
* On Apple Silicon, `snouty validate antithesis/configs/base` fails
end-to-end because the amd64 `materialized` image's `clusterd` child
segfaults under Rosetta during lgalloc init
(`unix_wait_status(11)` -> container `Exited (139)`). Run validate on
Linux/x86 instead. Documented in antithesis/AGENTS.md.
* On Apple Silicon, `bin/mzimage acquire --arch x86_64` currently fails
to link with `ld.lld: error: undefined symbol: getauxval`: the
`materializeinc/crosstools/x86_64-unknown-linux-gnu` homebrew formula
ships glibc 2.12.1, but Rust 1.95's stdlib references getauxval which
needs glibc 2.16+. Workaround:
`bin/ci-builder run stable bin/mzimage acquire --arch x86_64 antithesis-test-driver`
uses the Docker builder's current glibc. The homebrew formula needs an
upstream update; CI is unaffected (Linux hosts route through ci-builder
by default).
Next steps
----------
* SS-95: `mysql_mt_replicas` scenario per scratchbook/test-driver-integration.md
* Tier 2 scenarios: pg_cdc, mysql_cdc, sql_server_cdc, s3_copy
* Tier 3 structural refactors: parallel-workload regression scenario
(gate Database.create() Kafka/CSR/AWS/PG/MySQL/SQLServer/Iceberg
CONNECTIONs on flags), zippy execution adapter
* Per-scenario template gating via `ANTITHESIS_SCENARIO` env in the
test-driver entrypoint, so a scenario only sees its compatible
templates
* SUT-side Rust assertions where they're justified (rare/dangerous
internal states, branch outcomes)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Brings up the Antithesis Test Composer harness for Materialize. The branch covers research artifacts, the
test-drivermzbuild image, scenario infrastructure, workload helpers (testdrive runner, corpus lists, theupsert_sourcesprototype), and a few small upstream fixes that were prerequisites.This is a draft because validation still needs to run on real x86 (see Known limitations) and we still need to wire
ANTITHESIS_REPOSITORYinto the launch pipeline. Functionally, the harness is ready to ship to Antithesis as the first base scenario.What's in here
antithesis/misc/python/materialize/antithesis/sdk.pyantithesispackage coexists with our scaffolding directory because both are namespace packages.testdrive_config.pyTestdriveConfigdataclass (env-driven).td_runner.py.tdrunner: subprocess + tolerated-failure retry +reachable()on success and on tolerated failure.testdrive_corpus.py.tdfiles split into 4 area buckets —BASE_SQL(22),BASE_KAFKA(35),BASE_LOAD_GENERATOR(12),BASE_RECOVERY(8) — 72 unique files.upsert_sources.pyWhy per-scenario configs
antithesis/scratchbook/scenario-strategy.mddocuments the design with citations to the Antithesis docs (snouty docs CLI). Short version: incompatible topologies (e.g. base SQL+Kafka vs MySQL CDC with multithreaded replicas) get separate config dirs and separatesnouty launchinvocations; incompatible workloads on the same topology coexist as multiple test templates inside one config (Antithesis selects exactly one template per execution history).The base scenario is the only one wired up here;
mysql_mt_replicasfor the SS-95 ticket is the planned next addition.Why no per-area
eventually_*recovery checksEarlier drafts of this branch had per-area
eventually_*commands. They either reduced to tautologies (SELECT count(*) >= 0always passes if pgwire is up) or to a generic CREATE/INSERT/SELECT/DROP round-trip that was only weakly correlated with the chosen.td's actual semantics. When the singleton picks a random.tdyou can't write a useful recovery property without knowing what state was created.Real recovery properties belong either in SUT-side Rust assertions or in scenario-specific
eventually_*commands tied to a specific workload — which is exactly the shapeupsert_sources/eventually_*already has (it writes a sentinel and waits for it). The scratchbook entry warns against re-adding generic per-area recovery checks.Upstream fixes pulled in
These could ideally be split out as their own PRs. They were prerequisites for the Antithesis work to function and are limited in scope.
misc/python/materialize/cli/mzcompose.py— theshtabEnum-choices workaround broke--archand--sanitizeron Python 3.13. argparse's post-conversionmember in choicescheck failed because choices were member names whiletype=Enumreturned member objects. Switched tolist(action.choices)(Enum members) so argparse and shtab are both happy. Includes the regenerated bash/zsh shell completions.misc/python/materialize/mzbuild.py— theCopypre-image plugin was unused upstream and crashed when used:Copy.inputs()returned paths relative to the source dir, but the mzbuild fingerprinter expected paths relative to the repo root, soos.lstat(rd.root / rel_path)hitFileNotFoundError. MadeCopy.inputs()repo-root-relative andCopy.run()strip the source prefix before computing the destination inside the build context.misc/python/materialize/mzcompose/service.py— addedcontainer_nameto theServiceConfigTypedDict (a real Compose field). Antithesis requires it for log/fault attribution.ci/test/lint-main/checks/check-mzcompose-files.sh— excludeantithesis/configs/*/mzcompose.pyfrom the "unused in any CI pipeline file" check; Antithesis runs are submitted via snouty, not Buildkite.ci/builder/requirements.txt— addedantithesis==0.2.0so the SDK resolves locally for type-checking and ad-hoc imports.Known limitations
On Apple Silicon,
snouty validate antithesis/configs/basefails end-to-end because the amd64materializedimage'sclusterdchild segfaults under Rosetta during lgalloc init (unix_wait_status(11)→ containerExited (139)). Run validate on Linux/x86 instead. Documented inantithesis/AGENTS.md.On Apple Silicon,
bin/mzimage acquire --arch x86_64currently fails to link withld.lld: error: undefined symbol: getauxval: thematerializeinc/crosstools/x86_64-unknown-linux-gnuhomebrew formula ships glibc 2.12.1, but Rust 1.95's stdlib referencesgetauxvalwhich needs glibc 2.16+. Workaround:uses the Docker builder's current glibc. The homebrew formula needs an upstream update; CI is unaffected (Linux hosts route through
ci-builderby default).Next steps
mysql_mt_replicasscenario perscratchbook/test-driver-integration.md.pg_cdc,mysql_cdc,sql_server_cdc,s3_copy.parallel-workloadregression scenario (gateDatabase.create()Kafka/CSR/AWS/PG/MySQL/SQLServer/IcebergCREATE CONNECTIONs on flags),zippyexecution adapter.ANTITHESIS_SCENARIOenv in the test-driver entrypoint, so a scenario only sees its compatible templates.Tests
Lint passes locally (
bin/lint); the audited test commandsbin/pyactivate-import cleanly. The compose YAML validates viadocker compose config --quiet. End-to-endsnouty validateblocked on the local toolchain limitations above.🤖 Generated with Claude Code