Skip to content

fix(ops-controller): inject decrypted runtime secrets into compose subprocesses#60

Merged
AlienWalker1995 merged 1 commit into
mainfrom
fix/ops-controller-compose-secret-injection
Jun 26, 2026
Merged

fix(ops-controller): inject decrypted runtime secrets into compose subprocesses#60
AlienWalker1995 merged 1 commit into
mainfrom
fix/ops-controller-compose-secret-injection

Conversation

@AlienWalker1995

Copy link
Copy Markdown
Owner

Problem

The ops-controller recreates services via its own docker-compose subprocess (dashboard "recreate", POST /compose/*, /services/*/recreate). Those subprocesses only ever saw the auto-loaded .envnever ~/.ai-toolkit/runtime/.env. So any secret-dependent service ops-controller recreated came up with its secrets unset and crash-looped:

[main.go:67] invalid configuration:
  cookie_secret must be 16, 24, or 32 bytes ... but is 11 bytes

This is the root cause behind the 2026-06-26 oauth2-proxy outage (and the bandaids that followed: placeholder secrets in .env, empty stub files, compose-path rewrites).

Fix (least privilege)

  • Mount the already-decrypted runtime/.env read-only into ops-controller (/run/runtime.env) and inject it into the compose subprocess env via a shared _compose_env() helper (which also de-duplicates 3 inline env constructions). docker-compose interpolates ${VAR} from the process env, so recreated secret services now get real values.
  • ops-controller gets the decrypted env only — never the age key. Decryption stays a host-only operation (make decrypt-secrets). A compromised ops-controller (already docker-socket-privileged) leaks current secrets but cannot decrypt .sops history or re-derive the key.
  • caddy/oauth2-proxy/searxng added to ALLOWED_SERVICES — now safe to recreate. (The self-heal watchdog already covers them via its exclude-list model; the "hermes-only" comment was stale and is corrected.)

Changes

  • ops-controller/main.py_load_runtime_env() + _compose_env(); used in _recreate_service, _run_compose, and the /services/*/recreate endpoint. ALLOWED_SERVICES += caddy/oauth2-proxy/searxng.
  • docker-compose.yml — read-only runtime/.env mount + RUNTIME_ENV_FILE; corrected watchdog comment.
  • docs/runbooks/secrets.md — new "How services receive secrets at runtime" section (two---env-file model, ops-controller injection, local-only boundary). README + secrets/README.md updated to match.
  • tests/test_ops_controller_compose_env.py — parsing, missing-file/dir degradation, runtime-overrides-process-env, extra-overrides-all (fabricated values only).

No secret values are committed — only path references and architecture.

Validation

  • End-to-end: docker exec ops-controllerPOST /compose/up {"service":"oauth2-proxy"} (the previously-broken path) → oauth2-proxy comes up healthy in ~31s, no cookie_secret error. Verified the 32-byte cookie secret is read inside ops-controller.
  • 36 ops-controller tests pass (5 new + existing audit/auth suites); ruff clean.
  • Full stack healthy after the change.

🤖 Generated with Claude Code

…bprocesses

The ops-controller recreates services via its own docker-compose subprocess
(dashboard recreate, POST /compose/*, /services/*/recreate), but those only saw
the auto-loaded `.env` — never `~/.ai-toolkit/runtime/.env`. So any
secret-dependent service it recreated came up with secrets unset and crash-looped
(oauth2-proxy: `cookie_secret must be 16, 24, or 32 bytes`). This is the root
cause behind the 2026-06-26 outage and the bandaids that followed it.

Fix (least privilege): mount the already-decrypted `runtime/.env` read-only into
ops-controller and inject it into the compose subprocess env via a shared
`_compose_env()` helper (also de-duplicates 3 inline env constructions). compose
interpolates `${VAR}` from the process env, so recreated secret services now get
real values. ops-controller gets the decrypted env only — never the age key;
decryption stays host-only.

- ops-controller/main.py: add `_load_runtime_env()` + `_compose_env()`; use in
  `_recreate_service`, `_run_compose`, and the `/services/*/recreate` endpoint.
  Add caddy/oauth2-proxy/searxng to ALLOWED_SERVICES (now safe to recreate).
- docker-compose.yml: mount `${HOME}/.ai-toolkit/runtime/.env:/run/runtime.env:ro`
  + `RUNTIME_ENV_FILE`; correct the stale watchdog comment.
- docs: secrets runbook gains a "how services receive secrets at runtime" section
  (two --env-file model + ops-controller injection + local-only boundary);
  README + secrets/README updated to match.
- tests: parsing, missing-file/dir degradation, runtime-overrides-process-env,
  extra-overrides-all (fabricated values only).

No secret values are committed — only path references and architecture.

Validated end-to-end: ops-controller `/compose/up oauth2-proxy` (the previously
broken path) brings it up healthy with no cookie_secret error; 36 ops-controller
tests pass; ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AlienWalker1995 AlienWalker1995 merged commit e026609 into main Jun 26, 2026
5 checks passed
@AlienWalker1995 AlienWalker1995 deleted the fix/ops-controller-compose-secret-injection branch June 26, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant