Skip to content

Latest commit

 

History

History
760 lines (586 loc) · 33.8 KB

File metadata and controls

760 lines (586 loc) · 33.8 KB

DeployMate Production Runbook

This runbook is the operator-facing reference for updating a production DeployMate instance.

Assumptions

  • the repository is checked out on the deployment host at /opt/deploymate
  • the deployment host is reachable as ssh <deploy-host>
  • production uses docker-compose.prod.yml with .env.production

Fast checks

ssh <deploy-host>
cd /opt/deploymate
docker compose -f docker-compose.prod.yml --env-file .env.production ps
docker compose -f docker-compose.prod.yml --env-file .env.production logs --tail=50 proxy
docker compose -f docker-compose.prod.yml --env-file .env.production logs --tail=50 backend
docker compose -f docker-compose.prod.yml --env-file .env.production logs --tail=50 frontend
curl -I https://your-domain
curl -I https://your-domain/app
curl -I https://your-domain/app/server-review
curl -I https://your-domain/api/health

Before any release from the workstation:

./scripts/preflight.sh

For the fastest local loop, use the new lightweight commands:

make changed
make profile-changed
make profile-frontend
make profile-backend
make profile-fast
make profile-frontend-hot
make profile-fast-hot
make frontend-smoke-server-status
make frontend-smoke-server-stop
make audit-cache-clear
make frontend
make frontend-hot
make backend
make fast
make fast-hot

These commands:

  1. detect the changed release surface locally when needed
  2. run the smaller --fast local gate instead of the full release gate
  3. skip the production frontend build in preflight fast mode
  4. keep backend verification on a targeted test set when changed files map cleanly, otherwise fall back to the focused safety suite
  5. skip the backend fast suite entirely when a mixed local diff does not actually touch backend or release-runtime backend contract
  6. narrow backend syntax in preflight to changed backend Python files when possible, and skip it entirely for frontend-only local diffs
  7. skip the frontend fast smokes entirely when a mixed local diff does not actually touch frontend or frontend delivery contract
  8. keep frontend verification on targeted fast smokes when changed files map cleanly, otherwise fall back to the default auth + ops + runtime
  9. auto-derive the same local diff context for explicit surface commands like make frontend, make backend, make profile-frontend, and make profile-backend
  10. keep release_workflow_audit enabled for release-contract diffs while still letting local security_audit stay on changed-file scope when a full tracked-file scan is unnecessary
  11. keep experimental persistent frontend smoke-server controls available, but leave the default fast loop on the safer per-command lifecycle unless FRONTEND_SMOKE_PERSIST_SERVER=1 is set explicitly
  12. reuse one shared frontend smoke dev server in fast mode instead of starting a new next dev process for each smoke
  13. reuse shared frontend smoke servers in the heavier full gate too, so the main frontend smoke pack no longer starts a separate next dev process per script
  14. cache repeated local audit steps inside one gate run, so nested security/runtime audits do not re-run the same expensive checks twice
  15. skip runtime-oriented local audits automatically when the current diff does not touch runtime or deploy contract files
  16. narrow local security_audit to changed files and skip nested release or credentials audits unless the diff touches those contracts
  17. split local security_audit into secret scan and runtime-policy scan, so docs or release-contract diffs still keep the right checks without scanning risky runtime defaults unnecessarily
  18. persist successful local secret-scan and runtime-policy results by fingerprint, so repeated commands on the same diff do not re-run them unnecessarily
  19. reuse fingerprint-cached results for repeated local release-contract and runtime-contract audits when their inputs stay the same
  20. print a timing summary for local preflight and release phases so the slowest step is visible immediately after each run
  21. print a short cache-hit summary after local preflight, release, and profile commands so saved reruns are visible immediately
  22. reuse phase-level fingerprint caches for repeated fast frontend smoke targets and backend fast test modules when the diff and inputs stay the same
  23. reuse phase-level fingerprint caches for repeated preflight backend syntax checks and local frontend builds when their inputs stay the same
  24. reuse a phase-level fingerprint cache for repeated security_audit blocks when the diff, scopes, and nested audit inputs stay the same
  25. reuse per-file fingerprints for changed-file security_audit secret and runtime-policy scans, so one extra file in the diff does not invalidate every unchanged security target
  26. reuse per-file fingerprints for repeated local runtime static-contract checks, so one changed runtime/deploy file does not force rechecking every unchanged runtime contract file
  27. reuse per-file extracted contract lists for release_workflow_audit, so changing one of release.yml, staging.yml, or RUNBOOK.md does not force reparsing the other two
  28. print family-level cache savings for security, release_contract, and runtime, so repeated local runs show which verification layer still dominates cost
  29. keep local diff-context derivation stable even when there are no changed files and print a family bottleneck hint, so explicit surface commands stay predictable and still show which verification family is dominating misses
  30. prefer per-file fingerprint reuse for security_audit secret and runtime-policy scans even in wider scopes when the file set is still manageable, so repeat full-scope checks stop rescanning the entire repo
  31. recommend the cheapest useful local loop for the current diff via make recommend-local-mode, so you spend less time choosing between make changed, make backend, make frontend-hot, or make profile-changed
  32. narrow make changed mixed diffs down to an effective frontend or backend fast surface when the other side already resolves to skip, so shared-but-one-sided changes stop paying for both halves
  33. execute the recommended local loop directly via make auto-local, including automatic switching between fast and profile modes when the diff is expensive enough to justify profiling context
  34. remember the last successful auto-local loop per diff family and print a cheaper follow-up command for the next tweak, so second-pass iterations shrink automatically
  35. append each local timing phase into .logs/local_gate_timing.csv so repeated runs can be compared over time
  36. keep project-specific path and route assumptions inside scripts/project_automation_config.sh, so the automation core can be ported to another repo without rewriting every script first
  37. keep project-specific path-to-target and path-to-scope rules inside scripts/project_automation_targets.sh, so the detect_* layer is portable too
  38. export the reusable automation layer with make export-automation-core when you want to move the core into a separate private repository
  39. keep frontend smoke assertions inside scripts/project_automation_smoke_checks.sh, so both the fast and heavier smoke runners stay reusable across projects

To inspect the latest local timings quickly:

make timing-history
make timing-stats
make timing-hint

For repeated frontend iterations, the faster hot loop is:

make frontend-hot
make profile-frontend-hot
make frontend-smoke-server-status
make frontend-smoke-server-stop

Or run the broader local release gate:

bash scripts/release_workflow.sh --surface full

Fast mode is also available directly:

bash scripts/release_workflow.sh --surface frontend --fast
bash scripts/release_workflow.sh --surface backend --fast
bash scripts/release_workflow.sh --surface full --fast

For template-heavy frontend changes, also run:

npm --prefix frontend run smoke:templates

The full local release gate already includes this templates smoke alongside the admin and runtime frontend smokes.

For ops-overview focused frontend changes, also run:

npm --prefix frontend run smoke:ops

For auth-surface frontend changes, also run:

npm --prefix frontend run smoke:auth

For admin interaction changes around saved views, audit filters, or bulk actions, also run:

npm --prefix frontend run smoke:admin-interactions

For backup / restore dry-run workflow changes, also run:

npm --prefix frontend run smoke:restore

For server-management frontend changes, also run:

npm --prefix frontend run smoke:servers

That smoke covers the dedicated /app/server-review workspace, which is now the main UI path for create/edit/test/diagnostics/delete server actions.

For beginner-path changes across /app, /app/server-review, or /app/deployment-workflow, also run:

npm --prefix frontend run smoke:beginner

That smoke checks the first-time admin path and the member remote-only blocked path.

For a single remote deploy command that also runs post-deploy smoke:

bash scripts/remote_release.sh \
  --host <deploy-host> \
  --surface full \
  --base-url https://your-domain \
  --admin-username admin \
  --admin-password '<secret>'

That helper now runs a fast smoke-credentials precheck before the remote rebuild. If the configured admin smoke credentials already return 401 or 403, the release stops immediately instead of spending another 10-20 minutes on a deploy that will fail in post-deploy smoke anyway. It also compares the provided smoke credentials against the target runtime env file over SSH before deploy, so GitHub environment drift now fails as a fast contract error instead of a delayed post-deploy surprise.

If you want the same flow from GitHub instead of a workstation shell, use the manual workflow in .github/workflows/release.yml after configuring repository secrets for the deploy host, deploy SSH key, pinned known_hosts contents, base URL, and admin smoke credentials. For a short operator-only secret drift check without a deploy, use .github/workflows/release-secrets-audit.yml and choose production or staging. That manual workflow now also supports an incident_self_test mode with open, update, and resolve actions so operators can verify the incident automation path without waiting for the nightly schedule and without forcing a real target failure. That workflow also runs every day at 02:17 UTC (09:17 in Novosibirsk) for both environments and sends a best-effort webhook notification when DEPLOY_NOTIFICATION_WEBHOOK is configured. If a scheduled audit fails, GitHub automatically opens or updates one environment-specific incident issue so the failure does not disappear in webhook history alone. That incident now gets incident plus severity labels, and severity is raised to severity:high after the configured number of consecutive scheduled failures. When the next scheduled audit for that environment succeeds, the workflow comments on the issue and closes it automatically. The manual self-test flow uses a separate [release-secrets-audit:self-test] ... issue title and the incident:test label, so it does not interfere with real scheduled incidents. The incident triage logic itself now lives in scripts/release_audit_incident.js, and the local regression path for it is node --test scripts/release_audit_incident.test.js. The workflow calls that helper through .github/actions/release-audit-incident/action.yml, so scheduled audits and manual self-tests share one incident wiring path. The workflow self-test effective status derivation now lives in scripts/release_audit_mode.js, so the YAML step wiring stays thin and the local regression path for it is node --test scripts/release_audit_mode.test.js.

Recommended promotion order:

  1. open and review a PR into develop
  2. merge the reviewed PR into develop
  3. .github/workflows/ci.yml runs the release gate and then auto-deploys the same commit to staging
  4. production deploy stays behind .github/workflows/release.yml or a manual scripts/remote_release.sh run

The CI and release workflows call the same reusable composite action in .github/actions/remote-release/action.yml, so staging and production deploy behavior stays aligned with scripts/remote_release.sh. Those workflows pass the exact checked-out commit SHA into the remote helper, so the release host deploys the reviewed commit rather than whichever newer commit happens to land on the branch later.

For the next project, the fastest non-empty starting point is now the product starter:

make bootstrap-product-starter TARGET_DIR=/absolute/path/to/project PRODUCT_STARTER_FLAGS="--project-name MyApp --app-slug myapp --contact-email founder@example.com --frontend-dir web --backend-dir api"

That gives the new repo:

  • starter frontend shell
  • starter backend shell
  • starter docs
  • reusable automation core

Then immediately scaffold the first product slice:

make scaffold-product-resource TARGET_DIR=/absolute/path/to/project RESOURCE_FLAGS="--name Projects --slug projects --frontend-dir web --backend-dir api"

Recommended PR-first daily flow:

git switch develop
git pull --ff-only origin develop
make start-pr-branch SLUG=my-change
make git-doctor
make ship-pr SLUG=my-change MESSAGE="Describe the change"

Then:

  1. commit the change on the feature branch
  2. run make pr-ready
  3. push with git push -u origin $(git branch --show-current)
  4. open the PR with make pr-open
  5. use make pr-status while the PR is under review
  6. wait with make pr-watch
  7. merge with make pr-land

If you want to compress even more of the Git overhead into one path:

make ship-pr SLUG=my-change MESSAGE="Describe the change"
make pr-watch
make pr-land-sync

Preferred cadence:

  1. finish one logical slice
  2. run the cheapest relevant local verification
  3. commit that slice cleanly
  4. push when one clean commit or a short series of 2-3 related commits is ready

Use this to keep GitHub readable:

  • do not push every tiny fix
  • do not mix unrelated scaffold, backend, and UI work into one opaque commit
  • prefer a small number of coherent commits that tell a clear story in PR review

Notes:

  • PR CI already runs the same release gate as direct develop pushes
  • auto-staging still happens only after merge into develop
  • .github/pull_request_template.md keeps the PR body short and predictable
  • make pr-ready uses the same recommendation and auto-local logic as the normal local loop, so the pre-PR check is not a second parallel process
  • make pr-doctor prints branch cleanliness, upstream state, PR state, local-loop freshness, and a PR size class so oversized branches get caught before review
  • make pr-doctor also reads the current PR check state from GitHub and suggests a likely split direction from the diff mix when a branch has grown too large
  • make pr-doctor now also compares the current local commit, the last locally verified commit, and the PR head SHA on GitHub so stale local green runs or unpushed commits are obvious before review
  • make pr-watch waits on GitHub checks and then refreshes doctor output
  • make pr-land refuses to merge unless doctor is clean, local HEAD matches the PR head SHA, and PR checks are green
  • make pr-land-sync merges the PR and then fast-forwards main from develop
  • make dev-doctor gives one compact local summary of recommended loop, timing bottleneck, and PR doctor state
  • make git-doctor gives one compact Git-only summary of branch cleanliness, upstream drift, stale lock state, and the most useful next Git command
  • doctor commands now also support --format shell, so future repos can automate around them without parsing human-oriented text

For daily iteration speed on staging:

git push origin develop

After the push:

  1. CI detects the changed surface once and uses that same decision for the release gate and the staging deploy
  2. if the commit changes only frontend/, staging deploy rebuilds frontend
  3. if the commit changes only backend/, staging deploy rebuilds backend
  4. mixed or shared changes fall back to a full staging deploy
  5. docs-only or workflow-only changes skip both the release gate and staging deploy entirely
  6. older in-progress CI runs on the same branch are cancelled automatically so only the newest iteration keeps running
  7. the release gate now skips unnecessary dependency installs too, so frontend-only changes do not install backend requirements and backend-only changes do not install frontend packages

Use .github/workflows/staging.yml only as a manual fallback when you need to redeploy staging on demand. That manual fallback now also supports skip_smoke when you need a faster redeploy for operator-only checks.

For current DeployMate feature work, do not start every new admin surface from a blank file. Use:

make scaffold-deploymate-surface SURFACE_FLAGS="--name Review Inbox --slug review-inbox"

That gives you the frontend page shell, backend route/service stub, backend API flow test, and backend/app/main.py registration in one pass. The generated page now also uses shared review-shell blocks from frontend/app/app/admin-ui.js, so new admin surfaces start from the same summary-and-queue pattern instead of ad hoc JSX. The backend side now also gets typed response models in backend/app/schemas.py, a built-in q filter path, and a generated API flow test for both default and filtered list responses. Add --with-table, --with-saved-views, --with-audit, and --with-export when the first useful version of the surface should already include those richer review sections. --with-table adds a denser starter table over the same queue data, so the first real operator pass can compare status, context, and workflow slice without hand-building a second view. Those richer flags now also generate real starter wiring for URL state, filter chips, saved-views manager hooks, audit filtering, and local JSON/CSV exports, so the surface starts closer to a live DeployMate workflow than a blank mock. Add --preset users, --preset upgrade-requests, or --preset servers when the feature already clearly matches one of those common DeployMate surface families. Presets now also change the starter action flow itself, so the generated surface already includes the first local decision pattern for that family instead of a queue-only mock. The generated page now also gets a preset-specific segment/workflow filter and richer card context fields, so the starter slice is closer to the actual entity shape you will build next. The scaffold now also includes a starter bulk-action panel and a mutation payload preview, so the first write contract is visible immediately instead of being invented ad hoc later. New surfaces now also generate modular frontend starter files (page.js, starter-data.js, starter-actions.js) so the scaffold output is easier to extend without turning the page file into a dump of preset constants. The scaffold now also generates starter-smoke.js plus a backend *_starter.py helper, so route smoke placeholders and backend starter contracts are modular too. It now also lays down a typed backend starter-action endpoint and matching test, so the first mutation path starts from a real API contract instead of only a frontend placeholder. The scaffold also now emits starter-api.js, so switching a generated surface from local scaffold mode to API-backed starter mode no longer requires inventing the client bridge from scratch.

The CI, staging, and production workflows now write a short GitHub job summary with the chosen surface, smoke mode, requested commit SHA, deployed SHA, and target URL so the result is readable without opening raw logs.

To verify that the release workflows and the documented GitHub secret contract still match:

bash scripts/release_workflow_audit.sh

Before the first deploy of encrypted server credentials, or before enabling remote server management on a fresh environment:

python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# store the value in .env.production as DEPLOYMATE_SERVER_CREDENTIALS_KEY

Before switching SSH host verification to strict pinned mode:

bash scripts/prepare_known_hosts.sh --host <target-host> --port 22
# store the resulting path in .env.production as DEPLOYMATE_SSH_KNOWN_HOSTS_FILE
# then set DEPLOYMATE_SSH_HOST_KEY_CHECKING=yes

Do not rotate DEPLOYMATE_SERVER_CREDENTIALS_KEY casually. Existing stored server credentials depend on it for decryption.

To audit the current database state for server credential encryption before a release:

bash scripts/server_credentials_audit.sh

To verify that the repo still keeps local Docker execution behind an explicit opt-in boundary:

bash scripts/local_runtime_audit.sh

To verify that production frontend and backend runtime capability flags are still aligned:

bash scripts/runtime_capability_audit.sh

This audit checks:

  • frontend/Dockerfile production default
  • docker-compose.prod.yml production build and runtime defaults
  • .env.production.example
  • .env.production when it exists on the workstation or deployment host

If .env.production sets DEPLOYMATE_LOCAL_DOCKER_ENABLED=false, then NEXT_PUBLIC_LOCAL_DEPLOYMENTS_ENABLED must be 0. If backend local runtime is explicitly enabled, the frontend flag must be 1.

To verify that production env security defaults still match the hardened contract:

bash scripts/production_env_audit.sh --env-file .env.production

On the deployment host, run the stricter form before docker compose up so the pinned known_hosts file must already exist and be non-empty:

bash scripts/production_env_audit.sh --env-file .env.production --require-runtime-files

Frontend-only deploy

Local:

npm --prefix frontend run smoke:beginner
npm --prefix frontend run smoke:admin
npm --prefix frontend run smoke:runtime
npm --prefix frontend run build
git status --short
git add frontend
git commit -m "Describe the frontend change"
git push origin develop

Host:

ssh <deploy-host>
cd /opt/deploymate
git fetch origin
git switch develop
git pull --ff-only origin develop
bash scripts/production_env_audit.sh --env-file .env.production --require-runtime-files
docker compose -f docker-compose.prod.yml --env-file .env.production up -d --build --no-deps frontend
docker compose -f docker-compose.prod.yml --env-file .env.production ps frontend
curl -I https://your-domain/app

Single-command alternative from the workstation:

bash scripts/remote_release.sh \
  --host <deploy-host> \
  --surface frontend \
  --base-url https://your-domain \
  --admin-username admin \
  --admin-password '<secret>'

Backend-only deploy

Local:

python3 -m py_compile backend/app/main.py backend/app/routes/*.py backend/app/services/*.py backend/app/db.py backend/app/schemas.py
PYTHONPATH=backend backend/venv/bin/python -m unittest discover -s backend/tests -p 'test_*.py'
PYTHONPATH=backend backend/venv/bin/python -m unittest backend.tests.test_server_credentials -v
bash scripts/security_audit.sh
git status --short
git add backend
git commit -m "Describe the backend change"
git push origin develop

Host:

ssh <deploy-host>
cd /opt/deploymate
grep '^DEPLOYMATE_SERVER_CREDENTIALS_KEY=' .env.production
git fetch origin
git switch develop
git pull --ff-only origin develop
bash scripts/production_env_audit.sh --env-file .env.production --require-runtime-files
docker compose -f docker-compose.prod.yml --env-file .env.production up -d --build --no-deps backend
docker compose -f docker-compose.prod.yml --env-file .env.production ps backend
curl -I https://your-domain/api/health

Single-command alternative from the workstation:

bash scripts/remote_release.sh \
  --host <deploy-host> \
  --surface backend \
  --base-url https://your-domain \
  --admin-username admin \
  --admin-password '<secret>'

If this release introduces encrypted server credentials and production already has existing server records, the backend startup path will migrate any plaintext records to encrypted form after boot as long as DEPLOYMATE_SERVER_CREDENTIALS_KEY is present.

Full stack deploy

Use a full rebuild when backend, frontend build args, or production compose settings changed.

Local:

./scripts/preflight.sh
npm --prefix frontend run smoke:admin
npm --prefix frontend run build
PYTHONPATH=backend backend/venv/bin/python -m unittest discover -s backend/tests -p 'test_*.py'
git status --short
git add .
git commit -m "Describe the release change"
git push origin develop

Host:

ssh <deploy-host>
cd /opt/deploymate
git fetch origin
git switch develop
git pull --ff-only origin develop
bash scripts/production_env_audit.sh --env-file .env.production --require-runtime-files
docker compose -f docker-compose.prod.yml --env-file .env.production up -d --build
docker compose -f docker-compose.prod.yml --env-file .env.production ps
curl -I https://your-domain
curl -I https://your-domain/app
curl -I https://your-domain/api/health

Single-command alternative from the workstation:

bash scripts/remote_release.sh \
  --host <deploy-host> \
  --surface full \
  --base-url https://your-domain \
  --admin-username admin \
  --admin-password '<secret>'

scripts/remote_release.sh now runs both runtime_capability_audit.sh --env-file <remote-env> and production_env_audit.sh --env-file <remote-env> --require-runtime-files on the target host before it rebuilds the stack.

Post-deploy smoke

DEPLOYMATE_BASE_URL=https://your-domain \
DEPLOYMATE_ADMIN_USERNAME=admin \
DEPLOYMATE_ADMIN_PASSWORD='<secret>' \
bash scripts/post_deploy_smoke.sh

Optional runtime coverage can be enabled when you want the smoke to create and remove a real test deployment:

DEPLOYMATE_BASE_URL=https://your-domain \
DEPLOYMATE_ADMIN_USERNAME=admin \
DEPLOYMATE_ADMIN_PASSWORD='<secret>' \
DEPLOYMATE_SMOKE_RUNTIME_ENABLED=1 \
DEPLOYMATE_SMOKE_SERVER_ID='<server-id>' \
bash scripts/post_deploy_smoke.sh

Or create a temporary smoke target on the fly from an SSH key file:

DEPLOYMATE_BASE_URL=https://your-domain \
DEPLOYMATE_ADMIN_USERNAME=admin \
DEPLOYMATE_ADMIN_PASSWORD='<secret>' \
DEPLOYMATE_SMOKE_RUNTIME_ENABLED=1 \
DEPLOYMATE_SMOKE_SERVER_HOST='203.0.113.10' \
DEPLOYMATE_SMOKE_SERVER_USERNAME='root' \
DEPLOYMATE_SMOKE_SSH_KEY_FILE="$HOME/.ssh/id_ed25519" \
bash scripts/post_deploy_smoke.sh

The same runtime env vars can be passed through scripts/remote_release.sh so a remote deploy can immediately run the deeper runtime smoke in one command:

DEPLOYMATE_SMOKE_RUNTIME_ENABLED=1 \
DEPLOYMATE_SMOKE_SERVER_HOST='203.0.113.10' \
DEPLOYMATE_SMOKE_SERVER_USERNAME='root' \
DEPLOYMATE_SMOKE_SSH_KEY_FILE="$HOME/.ssh/id_ed25519" \
bash scripts/remote_release.sh \
  --host <deploy-host> \
  --surface full \
  --base-url https://your-domain \
  --admin-username admin \
  --admin-password '<secret>'

GitHub Actions release workflow secrets for runtime smoke:

  • RUNTIME_SMOKE_SERVER_ID for a pre-saved smoke target, or
  • RUNTIME_SMOKE_SERVER_HOST, RUNTIME_SMOKE_SERVER_USERNAME, and RUNTIME_SMOKE_SSH_PRIVATE_KEY for a temporary target
  • optional RUNTIME_SMOKE_SERVER_PORT, RUNTIME_SMOKE_SERVER_NAME, RUNTIME_SMOKE_IMAGE, RUNTIME_SMOKE_INTERNAL_PORT, RUNTIME_SMOKE_EXTERNAL_PORT, RUNTIME_SMOKE_START_PORT, and RUNTIME_SMOKE_HEALTH_TIMEOUT

Required GitHub Actions release workflow secrets:

  • DEPLOY_HOST
  • DEPLOY_SSH_PRIVATE_KEY
  • DEPLOY_SSH_KNOWN_HOSTS
  • DEPLOYMATE_BASE_URL
  • DEPLOYMATE_ADMIN_USERNAME
  • DEPLOYMATE_ADMIN_PASSWORD

Optional GitHub Actions release workflow secrets:

  • DEPLOY_REPO_DIR
  • DEPLOY_BRANCH
  • DEPLOY_ENV_FILE
  • DEPLOY_NOTIFICATION_WEBHOOK for best-effort Slack/Discord-compatible deploy notifications

The staging workflow uses the same secret names, but scoped under the staging environment instead of production. If DEPLOY_NOTIFICATION_WEBHOOK is unset, the workflows simply skip notifications.

Required GitHub Actions release secrets audit workflow secrets:

  • DEPLOY_HOST
  • DEPLOY_SSH_PRIVATE_KEY
  • DEPLOY_SSH_KNOWN_HOSTS
  • DEPLOY_REPO_DIR
  • DEPLOY_ENV_FILE
  • DEPLOYMATE_ADMIN_USERNAME
  • DEPLOYMATE_ADMIN_PASSWORD

Optional GitHub Actions release secrets audit workflow secrets:

  • DEPLOY_NOTIFICATION_WEBHOOK for best-effort drift audit notifications

The audit workflow also needs GitHub Actions issue-write permission so scheduled failures can open or update an incident issue in the repository.

If the audit fails with REMOTE HOST IDENTIFICATION HAS CHANGED, treat it as a trust-anchor incident, not a normal CI flake:

  1. Confirm out of band that the target host was intentionally rebuilt, reinstalled, or rotated.
  2. Capture the current host key fingerprints from a trusted workstation:
bash scripts/prepare_known_hosts.sh --host <target-host> --port 22 --output /tmp/deploymate_known_hosts
cat /tmp/deploymate_known_hosts
  1. Compare the printed fingerprints with the provider console, host console, or another trusted owner-controlled path.
  2. Only after the new fingerprint is confirmed, update the GitHub environment secret DEPLOY_SSH_KNOWN_HOSTS with the full known_hosts contents for the same environment.
  3. Re-run Release Secrets Audit manually for that environment and close the incident issue only after the manual run succeeds.

Runtime smoke notes:

  • if DEPLOYMATE_SMOKE_SERVER_ID is set, the script asks /servers/{server_id}/suggested-ports for a free external port
  • if DEPLOYMATE_SMOKE_SERVER_ID is empty but DEPLOYMATE_SMOKE_SERVER_HOST, DEPLOYMATE_SMOKE_SERVER_USERNAME, and DEPLOYMATE_SMOKE_SSH_KEY_FILE are set, the script creates and later deletes a temporary server target automatically
  • if DEPLOYMATE_SMOKE_SERVER_ID is not set, provide DEPLOYMATE_SMOKE_EXTERNAL_PORT explicitly
  • production can keep runtime smoke disabled when running in remote-only mode without a preconfigured smoke target
  • the script always attempts to delete the temporary smoke deployment before exit
  • if it created a temporary smoke server target, it also deletes that target before exit

Optional GitHub repository variables for scheduled audit incident triage:

  • RELEASE_AUDIT_INCIDENT_ASSIGNEE to auto-assign the incident issue to one GitHub login
  • RELEASE_AUDIT_INCIDENT_FAILURE_THRESHOLD to control after how many consecutive scheduled failures severity escalates to severity:high (default: 3)

Recommended operator self-test sequence:

  1. run .github/workflows/release-secrets-audit.yml with incident_self_test=open
  2. re-run it with incident_self_test=update
  3. confirm the same issue was reused and severity/labels updated as expected
  4. re-run it with incident_self_test=resolve
  5. confirm the self-test issue was commented and closed

Deploy notifications

Release workflows can send a best-effort notification when DEPLOY_NOTIFICATION_WEBHOOK is configured in the target GitHub environment.

Compatible receivers:

  • Slack Incoming Webhooks
  • Discord channel webhooks
  • any endpoint that accepts a JSON body with either text or content

Minimal Slack setup:

  1. Open Slack -> Apps -> Incoming Webhooks.
  2. Create a webhook for the channel that should receive deploy results.
  3. Copy the webhook URL into the GitHub environment secret DEPLOY_NOTIFICATION_WEBHOOK.

Minimal Discord setup:

  1. Open the target channel settings.
  2. Create a channel webhook.
  3. Copy the webhook URL into the GitHub environment secret DEPLOY_NOTIFICATION_WEBHOOK.

Local receiver test:

bash scripts/send_workflow_notification.sh \
  --webhook-url 'https://hooks.slack.com/services/...' \
  --workflow 'Auto staging deploy' \
  --environment staging \
  --status success \
  --surface frontend \
  --smoke 'runtime enabled' \
  --commit 4ac9f9e94edae459bd97b3572ac15bfdaaa547ed \
  --ref develop \
  --run-url 'https://github.com/AlexGerlitz/deploymate/actions/runs/23931028894' \
  --details 'frontend-only change detected'

Expected message shape:

  • status line with workflow name
  • environment, surface, and smoke mode
  • short commit SHA and ref
  • direct link to the workflow run

If DEPLOY_NOTIFICATION_WEBHOOK is unset or the receiver is temporarily unavailable, deploys still continue because notification steps are best-effort.

The scripted smoke currently validates:

  • /login
  • /app
  • /api/health
  • admin login
  • /api/auth/me
  • backup bundle download
  • restore dry-run
  • optional create -> health -> diagnostics -> logs -> activity -> delete deployment flow
  • logout and session invalidation

Backup and restore dry-run

curl -sS -b "<cookie jar>" https://your-domain/api/admin/backup-bundle

curl -sS -b "<cookie jar>" \
  -H "Content-Type: application/json" \
  -X POST https://your-domain/api/admin/restore/dry-run \
  --data-binary @restore-dry-run-payload.json

Dry-run result meanings:

ok      section looks safe to import later
warn    review is required before any future restore
error   blockers exist and the payload should not be applied

Remote-only production defaults

Standard production is intentionally configured as:

DEPLOYMATE_LOCAL_DOCKER_ENABLED=false
NEXT_PUBLIC_LOCAL_DEPLOYMENTS_ENABLED=0

If either capability flag changes, rebuild both backend and frontend with the full stack flow.

Fallback procedure

  1. Identify the last known good commit.
  2. Switch the deployment host to that commit in detached mode.
  3. Rebuild the smallest affected surface.
  4. Re-run the smoke check immediately.

Example:

ssh <deploy-host>
cd /opt/deploymate
git log --oneline -n 10
git switch --detach <last_known_good_commit>
docker compose -f docker-compose.prod.yml --env-file .env.production up -d --build
curl -I https://your-domain
curl -I https://your-domain/app
curl -I https://your-domain/api/health

Notes

  • prefer develop as the release branch and deploy from Git, not by editing live files
  • use --no-deps for frontend-only and backend-only deploys
  • on the production host, port 80 is already occupied by DeployMate itself, so app deployments should use other external ports