bazel: add explicit rust test shard labels by bolinfest · Pull Request #17998 · openai/codex

bolinfest · 2026-04-15T20:53:19Z

Why

The larger Rust test targets are expensive as single Bazel test actions. Native Bazel sharding helps split execution, but it still reports through the same test target label. That is not enough for the Codex workflow: we want BuildBuddy to show timing and flakiness per shard label, and we want a failed or flaky shard to be rerunnable as a concrete Bazel target instead of treating the whole aggregate test as one opaque target.

This PR therefore moves the selected large Rust tests to explicit generated shard targets. It also incorporates the rules_rust behavior from hermeticbuild/rules_rust#14 so each test name maps to a stable shard bucket by hash, rather than by list position.

What

Extends codex_rust_crate with test_shard_counts.
For configured targets, keeps the original test target name as a test_suite aggregate and generates one concrete test rule per shard.
For unit tests, keeps the existing workspace_root_test wrapper shape around one *-unit-tests-bin rust_test.
For integration tests, compiles one manual *-all-test-bin rust_test and makes each shard label a lightweight test_binary_test wrapper around that binary. This preserves distinct labels without compiling the Rust test crate once per shard.
Sets RULES_RUST_TEST_TOTAL_SHARDS / RULES_RUST_TEST_SHARD_INDEX on each generated shard target so the rules_rust wrapper can run the correct subset without using Bazel-reserved TEST_* env vars.
Adds patches/rules_rust_stable_explicit_test_shards.patch, which mirrors rust_test: shard by stable name hash hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit containing that support.
Adds a Windows manifest fallback for nested test launchers so the rules_rust sharding wrapper can find the real test binary when it is run from another test rule's runfiles tree.
Uses explicit decimal UInt64 constants in the Windows PowerShell FNV hash expression so the 32-bit mask cannot be interpreted as -1.
Uses TEST_TMPDIR plus a per-wrapper temp directory in the Windows sharding wrapper so parallel shards do not collide on shared %TEMP%\rust_test_list_*.txt files.
Configures 8 shards for:
- //codex-rs/core:core-all-test
- //codex-rs/core:core-unit-tests
- //codex-rs/app-server:app-server-all-test
- //codex-rs/app-server:app-server-unit-tests
- //codex-rs/tui:tui-unit-tests

Example Labels

The aggregate label remains available:

//codex-rs/core:core-all-test

but it now expands to explicit shard labels:

//codex-rs/core:core-all-test-shard-1-of-8
//codex-rs/core:core-all-test-shard-2-of-8
...
//codex-rs/core:core-all-test-shard-8-of-8

For integration tests, those shard labels point at one compiled test binary:

rust_test(
    name = "core-all-test-bin",
    experimental_enable_sharding = True,
    tags = ["manual", "no-sandbox"],
    ...
)

test_binary_test(
    name = "core-all-test-shard-1-of-8",
    test_bin = ":core-all-test-bin",
    env = {
        "RULES_RUST_TEST_TOTAL_SHARDS": "8",
        "RULES_RUST_TEST_SHARD_INDEX": "0",
    },
    tags = ["no-sandbox"],
)

That means BuildBuddy sees labels such as //codex-rs/core:core-all-test-shard-8-of-8, but Bazel still has only one underlying rust_test rule for the integration test binary: //codex-rs/core:core-all-test-bin.

Unit tests use the same explicit shard label pattern while still running through the workspace-root launcher:

//codex-rs/core:core-unit-tests
//codex-rs/core:core-unit-tests-bin
//codex-rs/core:core-unit-tests-shard-1-of-8
//codex-rs/core:core-unit-tests-shard-8-of-8

The label is one-indexed for readability (shard-1-of-8), while the env value is the zero-indexed shard index consumed by the wrapper.

Verification

just bazel-lock-check
bazel query 'kind("rust_test rule", //codex-rs/core:*)'
bazel query 'kind(".* rule", //codex-rs/core:core-all-test-shard-8-of-8 + //codex-rs/core:core-all-test-bin + //codex-rs/app-server:app-server-all-test-shard-8-of-8 + //codex-rs/tui:tui-unit-tests-bin + //codex-rs/tui:tui-unit-tests-shard-1-of-8)' --output=build
bazel test --test_output=errors //codex-rs/core:core-all-test-shard-1-of-8 //codex-rs/core:core-all-test-shard-8-of-8 //codex-rs/tui:tui-unit-tests-shard-1-of-8
bazel test --test_output=errors //codex-rs/core:core-all-test-shard-7-of-8
bazel test --test_output=errors //codex-rs/app-server:app-server-unit-tests-shard-3-of-8 //codex-rs/core:core-all-test-shard-7-of-8

Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets. For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard. Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, Windows-safe PowerShell UInt32 masking, and isolated Windows shard temp files from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it. Co-authored-by: Codex <noreply@openai.com>

## Why The large Rust test suites are slow and include some of our flakiest tests, so we want to run them with Bazel native sharding while keeping shard membership stable between runs. This is the simpler follow-up to the explicit-label experiment in #17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which includes the stable test-name hashing support from hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel macros into that support. Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's per-shard test action caching. Using stable name hashing avoids reshuffling every test when one test is added or removed. ## What Changed `codex_rust_crate` now accepts `test_shard_counts` and applies the right Bazel/rules_rust attributes to generated unit and integration test rules. Matched tests are also marked `flaky = True`, giving them Bazel's default three attempts. This PR shards these labels 8 ways: ```text //codex-rs/core:core-all-test //codex-rs/core:core-unit-tests //codex-rs/app-server:app-server-all-test //codex-rs/app-server:app-server-unit-tests //codex-rs/tui:tui-unit-tests ``` ## Verification `bazel query --output=build` over the selected public labels and their inner unit-test binaries confirmed the expected `shard_count = 8`, `flaky = True`, and `experimental_enable_sharding = True` attributes. Also verified that we see the shards as expected in BuildBuddy so they can be analyzed independently. Co-authored-by: Codex <noreply@openai.com>

bolinfest · 2026-04-21T16:35:34Z

#18082 took care of this

bolinfest force-pushed the pr17998 branch 4 times, most recently from 58f942a to 94665c4 Compare April 15, 2026 21:57

bolinfest changed the title ~~bazel: shard core rust tests~~ bazel: shard selected rust tests Apr 15, 2026

bolinfest force-pushed the pr17998 branch from 94665c4 to 90fa878 Compare April 15, 2026 22:51

bolinfest changed the title ~~bazel: shard selected rust tests~~ bazel: add explicit rust test shard labels Apr 15, 2026

bolinfest force-pushed the pr17998 branch 3 times, most recently from 86d1bd8 to decf801 Compare April 15, 2026 23:38

starr-openai approved these changes Apr 16, 2026

View reviewed changes

bolinfest force-pushed the pr17998 branch from decf801 to 67a769d Compare April 16, 2026 05:10

bolinfest mentioned this pull request Apr 16, 2026

bazel: use native rust test sharding #18082

Merged

bolinfest closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bazel: add explicit rust test shard labels#17998

bazel: add explicit rust test shard labels#17998
bolinfest wants to merge 1 commit intomainfrom
pr17998

bolinfest commented Apr 15, 2026 •

edited

Loading

Uh oh!

bolinfest commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bolinfest commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Example Labels

Verification

Uh oh!

bolinfest commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bolinfest commented Apr 15, 2026 •

edited

Loading