Skip to content

Add Docker image and Helm chart for Kubernetes deployment#57

Open
gustcol wants to merge 1 commit intofacebookresearch:mainfrom
gustcol:feature/docker-helm
Open

Add Docker image and Helm chart for Kubernetes deployment#57
gustcol wants to merge 1 commit intofacebookresearch:mainfrom
gustcol:feature/docker-helm

Conversation

@gustcol
Copy link
Contributor

@gustcol gustcol commented Feb 24, 2026

Summary

  • Add multi-stage Dockerfile for the Python components (gcm + health_checks) with multi-platform support (linux/amd64, linux/arm64)
  • Add Helm chart (deploy/helm/gcm/) with DaemonSet for monitoring collectors and CronJob for periodic health checks
  • Add CI workflow for Docker build validation and Helm lint across platforms

Security hardening

  • Non-root container execution (UID 65532)
  • Read-only root filesystem with emptyDir for /tmp
  • All Linux capabilities dropped (drop: ALL)
  • allowPrivilegeEscalation: false enforced
  • No shell login for runtime user

Test plan

  • helm lint deploy/helm/gcm passes
  • helm template renders valid Kubernetes manifests with all security contexts
  • Helm test pod validates both gcm --help and health_checks --help entry points
  • Docker build on linux/amd64 and linux/arm64 via CI
  • Deploy to a test Kubernetes cluster with Slurm nodes

@github-actions
Copy link

CI Commands

The following CI workflows run automatically on every push and pull request:

Workflow What it runs
GPU Cluster Monitoring Python CI lint, tests, typecheck, format, deb build, pyoxidizer builds
Go packages CI shelper tests, format, lint

The following commands can be used by maintainers to trigger additional tests that require access to secrets:

Command Description Requires approval?
/metaci tests Runs Meta internal integration tests (pytest) Yes — a maintainer must trigger the command and approve the deployment request
/metaci integration tests Same as above (alias) Yes

Note: Only repository maintainers (OWNER association) can trigger /metaci commands. After commenting the command, a maintainer must also navigate to the Actions tab and approve the deployment to the graph-api-access environment before the jobs will run. See the approval guidelines for what to approve or reject.

@gustcol gustcol force-pushed the feature/docker-helm branch 2 times, most recently from 0b4e9f6 to 35e7ac0 Compare February 25, 2026 00:52
Provide container packaging and Helm chart for deploying GCM
monitoring collectors and health checks on Kubernetes clusters.

Includes multi-platform Docker build (amd64/arm64), non-root
container execution, read-only root filesystem, dropped capabilities,
Helm tests, CI workflow, and deployment documentation.
@gustcol gustcol force-pushed the feature/docker-helm branch from 35e7ac0 to f401200 Compare February 25, 2026 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant