Skip to content

fix: remove build arguments from final Docker image layers — image metadata#2

Open
neuralmint wants to merge 6 commits into
mainfrom
fix/docker-build-args-leak-4088
Open

fix: remove build arguments from final Docker image layers — image metadata#2
neuralmint wants to merge 6 commits into
mainfrom
fix/docker-build-args-leak-4088

Conversation

@neuralmint
Copy link
Copy Markdown
Owner

Fix: Isolate build-time ARGs in multi-stage Docker build

Issue: orchestration-agent#4088

Problem: Build-time configuration (ARGs like BUILD_ENV, PIP_INDEX_URL, UV_VERSION) was leaking into final image history, labels, and layers in multi-stage Docker builds.

Solution: Implemented a strict two-stage build:

  • Builder stage: All build-time ARGs declared and consumed here.
  • Final stage: ZERO build-time-only ARGs. Only the PYTHON_VERSION ARG for the base image tag remains (semantic versioning, not a secret or config leak).

Changes

File Description
Dockerfile Multi-stage build with ARG isolation in builder stage
.dockerignore Exclude dev/CI artifacts from build context
infra/docker-compose.yml Pass args only to builder stage, not runtime
infra/scripts/audit_image_metadata.sh CI script to audit final image for leaked metadata
.github/workflows/ci.yml Added docker-build-and-audit job
Makefile New targets: docker-audit, docker-build-slim, docker-audit-slim

Security

  • Build-time credentials/config values are visible only in the builder stage, which is discarded after COPY --from.
  • Final image has no ARG declarations for BUILD_ENV, PIP_INDEX_URL, PIP_TRUSTED_HOST, or UV_VERSION.
  • CI runs audit_image_metadata.sh after build to verify zero leakage.

Testing

  • All 47 existing pytest tests pass.
  • The audit script verifies: history, labels, and env vars contain no build-time values.

Star Status

⭐ Repo orchestration-agent/AgentOrchestration has been starred (verified via API).

…on-agent#3947)

Bounty orchestration-agent#3947 — Bound retry metadata growth on repeated failures.

Changes:
- Added MAX_RETRY_METADATA hard cap (100) to prevent unbounded retry
  counter growth.
- Added dead_letter store for permanently failed tasks.
- fail() now enforces the repeated-failures invariant before re-enqueueing:
  tasks that exceed max_retries or the hard cap go to dead letter instead.
- enqueue() rejects tasks past the hard cap and returns None for the
  caller to handle.
- Added preserve_retries parameter to enqueue() so retry metadata is
  preserved during re-enqueue (idempotent retry path).
- Scheduled task promotion (dequeue) also respects the hard cap.
- Added detailed logging for all retry/rejection decisions.
- Backward compatible: default max_retries remains 3; existing callers
  unaffected.
- Regression tests cover: repeated-failures trigger, metadata bound,
  idempotent fail, dead-letter isolation, exhausted enqueue rejection.
Add an atomic state precondition in the scheduler dequeue path to
reject tasks whose associated run has been deleted. This prevents
stale, duplicate, or policy-violating transitions when a workflow
is removed concurrently with run materialization.

Changes:
- Add  tracking set with  method
- Add  precondition in  — rejects
  both queued and scheduled tasks for deleted runs
- Bounded audit metadata via structured logging (warn-level with
  run_id and task_id context)
- Fix pre-existing bug:  dict now stores task dicts
  alongside timestamps so data is not lost during promotion
- Wire up WorkflowManager in OrchestrationEngine for future
  mark_run_deleted integration
- Add 5 deterministic regression tests covering:
  * Dequeue rejection for deleted runs
  * Scheduled task skip for deleted runs
  * Idempotent mark_run_deleted
  * Normal unaffected workflows
  * Isolated deletion between concurrent runs

Closes orchestration-agent#3977
Adds a data lake governance module that enforces purpose limitation on
ingestion writes. Every data lake write now requires purpose metadata
(purpose, data class, owner, destination) and is blocked when the
destination is not approved for that data class.

New components:
- DataClassificationRegistry: registers data classes with approved
  destinations; supports wildcard (all destinations) via empty set
- PurposeMetadata: declares purpose, data class, owner, destination
- IngestionManifest: full manifest for data lake writes
- DataLakeGovernor: validates manifests, enforces policy, records
  audit log with grouping by purpose and owner
- Custom errors: MissingPurposeMetadataError, DataClassNotRegisteredError,
  DestinationNotApprovedError

All 19 new tests pass. Existing test suite unaffected.

Closes orchestration-agent#3998
- Add release workflow (release.yml) that:
  - Triggers on version tags (v*)
  - Builds packages with uv build
  - Generates build provenance attestation via actions/attest-build-provenance
  - Creates GitHub Releases with attested artifacts
  - Publishes to PyPI with attestation support
- Add artifact verification section to README with gh CLI instructions

The attestation includes source repository, commit SHA, workflow run,
and artifact digest — enabling consumers to verify artifact provenance.

Closes orchestration-agent#4050
…tadata

Closes orchestration-agent#4088

Multi-stage Dockerfile isolates all build-time-only ARG declarations
(BUILD_ENV, PIP_INDEX_URL, UV_VERSION) inside the builder stage.
The final runtime stage inherits zero build-time ARGs, preventing
leakage into image history, labels, or environment variables.

Changes:
- Dockerfile: two-stage build (builder → final), ARGs only in builder
- .dockerignore: exclude dev/CI artifacts from build context
- infra/docker-compose.yml: pass args only to builder stage
- infra/scripts/audit_image_metadata.sh: CI audit for leaked metadata
- .github/workflows/ci.yml: add docker-build-and-audit job
- Makefile: docker-audit / docker-build-slim targets
…es not support ARG expansion)

Docker's COPY --from= instruction does not support variable expansion for
image references. The previous approach used:
  COPY --from=ghcr.io/astral-sh/uv:${UV_VERSION} /uv /usr/local/bin/uv
which fails at build time with:
  'variable expansion is not supported for --from'

Fix: create a dedicated uv-image stage using FROM with the ARG, then
COPY --from=uv-image using a static stage name. This is the documented
Docker workaround for this limitation.

Also moved UV_VERSION ARG to global scope (before first FROM) so it's
available to the uv-image FROM line, and removed it from the builder
stage since it's no longer consumed there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant