Skip to content

Design prod Vertex/GCP credential injection for extraction agent containers #773

Description

@aredenba-rh

Summary

Extraction agent workloads (sticky session containers and agentic-ci job containers) need Vertex AI credentials to call Claude via the Agent SDK. Local dev has an end-to-end pattern; production does not.

This issue tracks designing and implementing how Vertex + GCP service-account / workload identity credentials are injected at runtime into child agent containers in deployed environments.

Current state

Established (dev + platform auth)

  • Kartograph workload tokens: ScopedWorkloadCredentialIssuer mints short-lived scoped JWTs and injects them via ContainerRunSpec.env (KARTOGRAPH_WORKLOAD_TOKEN, etc.). Tested and spec'd.
  • Vertex SDK env vars: build_vertex_container_env() sets CLAUDE_CODE_USE_VERTEX, ANTHROPIC_VERTEX_PROJECT_ID, CLOUD_ML_REGION, VERTEXAI_LOCATION.
  • GCP ADC mount hook: KARTOGRAPH_GCLOUD_CONFIG_MOUNT + _gcloud_adc_env() bind-mount a credential directory and set GOOGLE_APPLICATION_CREDENTIALS / CLOUDSDK_CONFIG.
  • Dev wiring: compose.dev.yaml mounts ${HOME}/.config/gcloud and enables Vertex for the API process, which spawns sibling containers via Docker-out-of-Docker (/var/run/docker.sock).

Relevant code:

  • src/api/extraction/infrastructure/container_workload_runtime.py
  • src/api/extraction/infrastructure/agentic_ci_extraction_job_runner.py
  • src/api/extraction/infrastructure/vertex_runtime_env.py
  • src/api/extraction/infrastructure/workload_runtime_settings.py
  • compose.dev.yaml

Not established (production)

  • No K8s/OpenShift implementation of IContainerRuntime (only CliContainerRuntime via docker/podman CLI).
  • No workload identity federation, GCP SA binding, or projected token/secret volume wiring for agent containers.
  • Stage/prod deploy manifests do not configure extraction runtime backend, Vertex env vars, or GCP credentials.
  • Dev's Docker-socket + host ~/.config/gcloud bind-mount pattern does not transfer to OpenShift pods.

Problem

In dev, the API container launches sibling containers and bind-mounts the developer's gcloud ADC. In production, we need an equivalent way for agent containers to authenticate to Vertex without baking secrets into images or repo files (per specs/nfr/workload-execution.spec.md).

The injection contract exists (ContainerRunSpec.env + optional credential binds), but the production identity source and container orchestration are undefined.

Design questions to resolve

  1. Container runtime backend for prod: K8s Job/Pod API vs another orchestrator? (Spec says workloads run in "pod containers managed by the platform".)
  2. GCP identity model:
    • Per-agent-pod K8s SA + GCP workload identity federation?
    • API pod mints short-lived tokens and mounts them into child pods?
    • K8s Secret with a GCP service-account key JSON (less ideal)?
  3. Config surface: Generalize gcloud_config_mount beyond host paths, or add a distinct prod credential source setting?
  4. Deploy/gitops: Where do Vertex project/region and identity bindings live? (deploy/ here is deprecated in favor of hp-fleet-gitops.)
  5. Parity across runtimes: Sticky session containers (kartograph-agent-runtime) and agentic-ci job containers (ai-helpers) use slightly different credential mount targets — ensure one prod pattern covers both.

Suggested acceptance criteria (draft)

  • Document chosen prod credential injection approach (ADR or spec update).
  • Implement prod container runtime backend (or integrate with platform job runner) that can launch agent workloads without docker.sock.
  • Inject Vertex env vars + GCP credentials into child containers at runtime (no secrets in images/repo).
  • Wire stage/prod deployment config (gitops) for Vertex project, region, and identity.
  • Integration test or smoke path validating Claude/Vertex auth in a non-dev environment.
  • Clarify fallback behavior when ANTHROPIC_API_KEY is set vs Vertex mode (agentic-ci harness precedence).

References

  • specs/nfr/workload-execution.spec.md — runtime credential injection requirements
  • specs/extraction/sticky-session-runtime.spec.md — scoped credentials for sticky containers
  • Prior discussion: dev uses Vertex + host gcloud ADC; prod path is not yet implemented.

Priority

Deferred — track design for a later milestone. Dev workflow is unblocked via compose.dev.yaml.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions