Skip to content

bazel: make build byte-for-byte reproducible#30187

Merged
travisdowns merged 2 commits intoredpanda-data:devfrom
travisdowns:td-hermetic
Apr 23, 2026
Merged

bazel: make build byte-for-byte reproducible#30187
travisdowns merged 2 commits intoredpanda-data:devfrom
travisdowns:td-hermetic

Conversation

@travisdowns
Copy link
Copy Markdown
Member

@travisdowns travisdowns commented Apr 16, 2026

Make the Bazel build of //:redpanda byte-for-byte reproducible across
builds on the same host (with different output bases or sandbox instances).

Five third-party dependencies were embedding sandbox-specific absolute paths
or timestamps into compiled artifacts. Each commit fixes one dependency:

  • hwloc: -ffile-prefix-map to normalize __FILE__ in inlined asserts,
    --runstatedir to avoid baking the sandbox install prefix into HWLOC_RUN_DIR
  • openssl: SOURCE_DATE_EPOCH=0 to pin the build timestamp, patch
    mkbuildinf.pl to strip $EXT_BUILD_ROOT from the compiler info string
  • libxml2: --sysconfdir=/etc to fix the XML catalog path to the standard
    Linux location (instead of a sandbox-derived prefix), -ffile-prefix-map for
    __FILE__ normalization
  • cranelift-codegen: patch make_isle_source_path_relative() to compute
    relative paths via common ancestor when strip_prefix fails (i.e., when
    OUT_DIR is not under CWD, as in sandboxed builds)
  • cranelift-assembler-x64: patch build.rs to relativize OUT_DIR paths
    in generated-files.rs before they get compiled into the library

Verified: building //:redpanda twice with separate --output_base dirs
produces bit-for-bit identical binaries.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

Copilot AI review requested due to automatic review settings April 16, 2026 03:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make Bazel builds of //:redpanda byte-for-byte reproducible on the same host by removing sandbox-specific timestamps and absolute paths from third-party build outputs.

Changes:

  • Pin/normalize build metadata for OpenSSL (timestamp + build-root path stripping) and apply the patch via http_archive.
  • Normalize embedded paths for libxml2 and hwloc via configure options and compiler prefix-map flags.
  • Patch cranelift build scripts to emit deterministic, relative paths; wire these patches through MODULE.bazel / lockfile updates.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
bazel/thirdparty/openssl.BUILD Sets SOURCE_DATE_EPOCH to stabilize OpenSSL build metadata.
bazel/thirdparty/openssl-reproducible-buildinf.patch Strips sandbox/build-root paths from OpenSSL mkbuildinf.pl output.
bazel/thirdparty/libxml2.BUILD Attempts to prevent sandbox-derived paths via fixed sysconfdir and prefix-map flags.
bazel/thirdparty/hwloc.BUILD Attempts to prevent sandbox-derived paths via fixed runstatedir and prefix-map flags.
bazel/thirdparty/cranelift-codegen-reproducible.patch Adjusts cranelift codegen build script to compute relative paths when sandboxed.
bazel/thirdparty/cranelift-assembler-x64-reproducible.patch Adjusts assembler build script to write relative paths into generated Rust code.
bazel/repositories.bzl Applies the new OpenSSL patch when fetching the dependency.
MODULE.bazel Adds crate annotations to apply cranelift patches under bzlmod.
MODULE.bazel.lock Records repository rule attribute changes (patches/args) for reproducibility.

Comment thread bazel/thirdparty/libxml2.BUILD Outdated
Comment thread bazel/thirdparty/libxml2.BUILD Outdated
Comment thread bazel/thirdparty/hwloc.BUILD Outdated
Comment thread bazel/thirdparty/cranelift-codegen-reproducible.patch Outdated
Comment thread bazel/thirdparty/cranelift-assembler-x64-reproducible.patch Outdated
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

CI test results

test results on build#83219
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShardPlacementTest test_core_count_change null integration https://buildkite.com/redpanda/redpanda/builds/83219#019d94a0-6699-4521-be08-aa0703589bfb 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShardPlacementTest&test_method=test_core_count_change

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to repeat this for arm64 assembler?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.

Comment thread MODULE.bazel Outdated

# CORE-16110: make cranelift build scripts produce deterministic output.
# The cranelift-codegen patch can be removed after upgrading to a wasmtime
# version that includes 0694bba38 ("Strip prefixes in file names").
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was merged a while ago and released. Can we just upgrade wasmtime?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes we can. Claude led me astray telling me it wasn't released yet, but I see that is just wrong.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rockwotj looks it requires a bump from v32 to v42 at a minimum to get fixes for both issues. I'm trying that out now, but is that the kind of jump you'd be OK if all existing tests pass?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, should be - I can double check the release notes and new config knobs to make sure.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an effort at this:

https://github.com/travisdowns/redpanda/tree/td-hermetic-wasmtime42

It was a bit convoluted due to this allocator mangling change. So there's this workaround to link in this empty file. There is an upstream fix in rules_rust but it requires switching to unstable rustc.

I'm wondering if you think any of that is worth it, or this patch on v32 is look better now as a low resistance/less churn path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change looks good, I wonder if there is a version there that doesn't require the allocator hack but does have the fix applied. I personally prefer the allocator hack with a big todo over this level of hacks in code I really have no clue about bc the hack is fairly clear as to what is going on.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was able to remove the allocator hack with a newer rules_rust + a mangling option.

+# across different Bazel output bases and sandbox instances.
+my $ebr = $ENV{'EXT_BUILD_ROOT'} // '';
+if ($ebr ne '') {
+ $cflags =~ s/\Q$ebr\E\/?/./g;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels sort of sketchy 😀

I suppose our alternatives are to use the bazel build for openssl is undesirable for other reasons? I am not sure if there is a way to achieve this without the regex replace... Do you know what cflags in practice have the build root embedded?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, since this also sketched me out, I checked and this "cflags" isn't used for compilation (after this point), it's only used to embed the a copy of cflags in the binary probably for diagnostic output (e.g. so --version can tell you what flags you compiled with).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah a comment to that effect would be helpful TBH

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment in the patch clarifying that the modified cflags are only embedded as a diagnostic string (shown by openssl version -a), not used for compilation.

@StephanDollberg
Copy link
Copy Markdown
Member

Can we push on #28975. That gets rid of libxml.

@StephanDollberg
Copy link
Copy Markdown
Member

to the sandbox-absolute include path

I don't understand this pattern. Why is the sandbox not identical everywhere?

@@ -0,0 +1,46 @@
--- a/build.rs 2026-04-15 18:44:30.930898497 -0400
+++ b/build.rs 2026-04-15 18:44:46.447314373 -0400
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just pin to a commit instead of a released version?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to pick up the upstream fix? Probably.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already in a released version upstream, see also my thread with Tyler.

@travisdowns
Copy link
Copy Markdown
Member Author

I don't understand this pattern. Why is the sandbox not identical everywhere?

Apparently "no": the sandboxes have paths like <output_base>/sandbox/linux-sandbox/<action-id>/execroot/<workspace_name>/ where action-id is essentially a random number. There are apparently a few reasons why it can't use the fixed chroot approach, e.g., symlinks which point outside the sandbox, and ensuring paths into the sandbox are resolvable (e.g., for debugging with keep sandbox).

@travisdowns
Copy link
Copy Markdown
Member Author

Addressed — added a comment to the patch clarifying that the cflags modified here are only embedded as a diagnostic string (shown by openssl version -a), not used for actual compilation.

Copy link
Copy Markdown
Member Author

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review comments

@StephanDollberg
Copy link
Copy Markdown
Member

review comments

YOu wanted to push something here?

The hwloc configure_make build embeds sandbox-absolute paths into
compiled objects, making the output non-deterministic across builds
with different --output_base directories.

Two sources of path leakage:

1. Inlined assert() macros in hwloc headers (helper.h, plugins.h)
   expand __FILE__ to the sandbox-absolute include path, which ends
   up in .rodata of 7 object files and propagates into libhwloc.a.
   Fix: add -ffile-prefix-map=$EXT_BUILD_ROOT=. to CFLAGS/CXXFLAGS
   via the env dict, remapping the sandbox root to "." in all
   __FILE__ expansions.

2. Autoconf derives runstatedir from --prefix, which points into
   the sandbox install directory. This path gets compiled into
   topology-linux.o as a string literal.
   Fix: pass --runstatedir=/var/run/hwloc to configure, overriding
   the prefix-derived default with a fixed path.

With both fixes, libhwloc.a and all object files are bit-for-bit
identical across independent builds.
@travisdowns
Copy link
Copy Markdown
Member Author

Rebased onto latest dev. Two of the original commits are no longer part of this PR:

What remains here is the hwloc + openssl reproducibility fixes. The openssl commit picked up a small conflict resolution against the 3.5.6 CVE bump — patch still applies cleanly since it only touches util/mkbuildinf.pl.

Two sources of non-determinism in the openssl configure_make build:

1. util/mkbuildinf.pl captures the full CC and CFLAGS strings and
   embeds them into crypto/buildinf.h as a compiled-in constant
   (the compiler_flags[] array). Since CC and CFLAGS contain
   $EXT_BUILD_ROOT-relative paths (compiler wrapper, --sysroot),
   the sandbox-absolute path leaks into libcrypto. This string is
   only used by the informational OpenSSL_version(OPENSSL_CFLAGS)
   API (i.e., `openssl version -f`).
   Fix: patch mkbuildinf.pl to strip $EXT_BUILD_ROOT prefixes
   from the compiler string before embedding it, using the
   EXT_BUILD_ROOT environment variable already exported by
   rules_foreign_cc.

2. mkbuildinf.pl also embeds a DATE macro using the current
   wall-clock time via time(), causing the binary to differ between
   runs even with identical inputs. OpenSSL already supports the
   SOURCE_DATE_EPOCH convention for reproducible builds.
   Fix: set SOURCE_DATE_EPOCH=0 in the env dict, pinning the
   embedded timestamp to the Unix epoch.

With both fixes, libcrypto.so.3, libssl.so.3, libcrypto.a, and the
openssl binary are all bit-for-bit identical across independent
builds. The only remaining diff is Configure.log (a text build log).

Refs:
  https://reproducible-builds.org/docs/source-date-epoch/
  https://wiki.openssl.org/index.php/Compilation_and_Installation
@travisdowns
Copy link
Copy Markdown
Member Author

YOu wanted to push something here?

No, I needed to write something because I made comments on this PR in "review mode", requires an overall comment to submit them (or something), not sure.

@travisdowns travisdowns merged commit 3918552 into redpanda-data:dev Apr 23, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants