Conversation
58f942a to
94665c4
Compare
86d1bd8 to
decf801
Compare
starr-openai
approved these changes
Apr 16, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets. For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard. Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, Windows-safe PowerShell UInt32 masking, and isolated Windows shard temp files from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it. Co-authored-by: Codex <noreply@openai.com>
bolinfest
added a commit
that referenced
this pull request
Apr 17, 2026
## Why The large Rust test suites are slow and include some of our flakiest tests, so we want to run them with Bazel native sharding while keeping shard membership stable between runs. This is the simpler follow-up to the explicit-label experiment in #17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which includes the stable test-name hashing support from hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel macros into that support. Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's per-shard test action caching. Using stable name hashing avoids reshuffling every test when one test is added or removed. ## What Changed `codex_rust_crate` now accepts `test_shard_counts` and applies the right Bazel/rules_rust attributes to generated unit and integration test rules. Matched tests are also marked `flaky = True`, giving them Bazel's default three attempts. This PR shards these labels 8 ways: ```text //codex-rs/core:core-all-test //codex-rs/core:core-unit-tests //codex-rs/app-server:app-server-all-test //codex-rs/app-server:app-server-unit-tests //codex-rs/tui:tui-unit-tests ``` ## Verification `bazel query --output=build` over the selected public labels and their inner unit-test binaries confirmed the expected `shard_count = 8`, `flaky = True`, and `experimental_enable_sharding = True` attributes. Also verified that we see the shards as expected in BuildBuddy so they can be analyzed independently. Co-authored-by: Codex <noreply@openai.com>
Collaborator
Author
|
#18082 took care of this |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The larger Rust test targets are expensive as single Bazel test actions. Native Bazel sharding helps split execution, but it still reports through the same test target label. That is not enough for the Codex workflow: we want BuildBuddy to show timing and flakiness per shard label, and we want a failed or flaky shard to be rerunnable as a concrete Bazel target instead of treating the whole aggregate test as one opaque target.
This PR therefore moves the selected large Rust tests to explicit generated shard targets. It also incorporates the rules_rust behavior from hermeticbuild/rules_rust#14 so each test name maps to a stable shard bucket by hash, rather than by list position.
What
codex_rust_cratewithtest_shard_counts.test_suiteaggregate and generates one concrete test rule per shard.workspace_root_testwrapper shape around one*-unit-tests-binrust_test.*-all-test-binrust_testand makes each shard label a lightweighttest_binary_testwrapper around that binary. This preserves distinct labels without compiling the Rust test crate once per shard.RULES_RUST_TEST_TOTAL_SHARDS/RULES_RUST_TEST_SHARD_INDEXon each generated shard target so the rules_rust wrapper can run the correct subset without using Bazel-reservedTEST_*env vars.patches/rules_rust_stable_explicit_test_shards.patch, which mirrors rust_test: shard by stable name hash hermeticbuild/rules_rust#14 until Codex can bump to a mergedrules_rustcommit containing that support.UInt64constants in the Windows PowerShell FNV hash expression so the 32-bit mask cannot be interpreted as-1.TEST_TMPDIRplus a per-wrapper temp directory in the Windows sharding wrapper so parallel shards do not collide on shared%TEMP%\rust_test_list_*.txtfiles.//codex-rs/core:core-all-test//codex-rs/core:core-unit-tests//codex-rs/app-server:app-server-all-test//codex-rs/app-server:app-server-unit-tests//codex-rs/tui:tui-unit-testsExample Labels
The aggregate label remains available:
but it now expands to explicit shard labels:
For integration tests, those shard labels point at one compiled test binary:
That means BuildBuddy sees labels such as
//codex-rs/core:core-all-test-shard-8-of-8, but Bazel still has only one underlyingrust_testrule for the integration test binary://codex-rs/core:core-all-test-bin.Unit tests use the same explicit shard label pattern while still running through the workspace-root launcher:
The label is one-indexed for readability (
shard-1-of-8), while the env value is the zero-indexed shard index consumed by the wrapper.Verification
just bazel-lock-checkbazel query 'kind("rust_test rule", //codex-rs/core:*)'bazel query 'kind(".* rule", //codex-rs/core:core-all-test-shard-8-of-8 + //codex-rs/core:core-all-test-bin + //codex-rs/app-server:app-server-all-test-shard-8-of-8 + //codex-rs/tui:tui-unit-tests-bin + //codex-rs/tui:tui-unit-tests-shard-1-of-8)' --output=buildbazel test --test_output=errors //codex-rs/core:core-all-test-shard-1-of-8 //codex-rs/core:core-all-test-shard-8-of-8 //codex-rs/tui:tui-unit-tests-shard-1-of-8bazel test --test_output=errors //codex-rs/core:core-all-test-shard-7-of-8bazel test --test_output=errors //codex-rs/app-server:app-server-unit-tests-shard-3-of-8 //codex-rs/core:core-all-test-shard-7-of-8