This runbook is the operator-facing reference for updating a production DeployMate instance.
- the repository is checked out on the deployment host at
/opt/deploymate - the deployment host is reachable as
ssh <deploy-host> - production uses
docker-compose.prod.ymlwith.env.production
ssh <deploy-host>
cd /opt/deploymate
docker compose -f docker-compose.prod.yml --env-file .env.production ps
docker compose -f docker-compose.prod.yml --env-file .env.production logs --tail=50 proxy
docker compose -f docker-compose.prod.yml --env-file .env.production logs --tail=50 backend
docker compose -f docker-compose.prod.yml --env-file .env.production logs --tail=50 frontend
curl -I https://your-domain
curl -I https://your-domain/app
curl -I https://your-domain/app/server-review
curl -I https://your-domain/api/healthBefore any release from the workstation:
./scripts/preflight.shFor the fastest local loop, use the new lightweight commands:
make changed
make profile-changed
make profile-frontend
make profile-backend
make profile-fast
make profile-frontend-hot
make profile-fast-hot
make frontend-smoke-server-status
make frontend-smoke-server-stop
make audit-cache-clear
make frontend
make frontend-hot
make backend
make fast
make fast-hotThese commands:
- detect the changed release surface locally when needed
- run the smaller
--fastlocal gate instead of the full release gate - skip the production frontend build in preflight fast mode
- keep backend verification on a targeted test set when changed files map cleanly, otherwise fall back to the focused safety suite
- skip the backend fast suite entirely when a mixed local diff does not actually touch backend or release-runtime backend contract
- narrow backend syntax in preflight to changed backend Python files when possible, and skip it entirely for frontend-only local diffs
- skip the frontend fast smokes entirely when a mixed local diff does not actually touch frontend or frontend delivery contract
- keep frontend verification on targeted fast smokes when changed files map cleanly, otherwise fall back to the default
auth + ops + runtime - auto-derive the same local diff context for explicit surface commands like
make frontend,make backend,make profile-frontend, andmake profile-backend - keep
release_workflow_auditenabled for release-contract diffs while still letting localsecurity_auditstay on changed-file scope when a full tracked-file scan is unnecessary - keep experimental persistent frontend smoke-server controls available, but leave the default fast loop on the safer per-command lifecycle unless
FRONTEND_SMOKE_PERSIST_SERVER=1is set explicitly - reuse one shared frontend smoke dev server in fast mode instead of starting a new
next devprocess for each smoke - reuse shared frontend smoke servers in the heavier full gate too, so the main frontend smoke pack no longer starts a separate
next devprocess per script - cache repeated local audit steps inside one gate run, so nested security/runtime audits do not re-run the same expensive checks twice
- skip runtime-oriented local audits automatically when the current diff does not touch runtime or deploy contract files
- narrow local
security_auditto changed files and skip nested release or credentials audits unless the diff touches those contracts - split local
security_auditinto secret scan and runtime-policy scan, so docs or release-contract diffs still keep the right checks without scanning risky runtime defaults unnecessarily - persist successful local secret-scan and runtime-policy results by fingerprint, so repeated commands on the same diff do not re-run them unnecessarily
- reuse fingerprint-cached results for repeated local release-contract and runtime-contract audits when their inputs stay the same
- print a timing summary for local preflight and release phases so the slowest step is visible immediately after each run
- print a short cache-hit summary after local preflight, release, and profile commands so saved reruns are visible immediately
- reuse phase-level fingerprint caches for repeated fast frontend smoke targets and backend fast test modules when the diff and inputs stay the same
- reuse phase-level fingerprint caches for repeated preflight backend syntax checks and local frontend builds when their inputs stay the same
- reuse a phase-level fingerprint cache for repeated
security_auditblocks when the diff, scopes, and nested audit inputs stay the same - reuse per-file fingerprints for changed-file
security_auditsecret and runtime-policy scans, so one extra file in the diff does not invalidate every unchanged security target - reuse per-file fingerprints for repeated local runtime static-contract checks, so one changed runtime/deploy file does not force rechecking every unchanged runtime contract file
- reuse per-file extracted contract lists for
release_workflow_audit, so changing one ofrelease.yml,staging.yml, orRUNBOOK.mddoes not force reparsing the other two - print family-level cache savings for
security,release_contract, andruntime, so repeated local runs show which verification layer still dominates cost - keep local diff-context derivation stable even when there are no changed files and print a family bottleneck hint, so explicit surface commands stay predictable and still show which verification family is dominating misses
- prefer per-file fingerprint reuse for
security_auditsecret and runtime-policy scans even in wider scopes when the file set is still manageable, so repeat full-scope checks stop rescanning the entire repo - recommend the cheapest useful local loop for the current diff via
make recommend-local-mode, so you spend less time choosing betweenmake changed,make backend,make frontend-hot, ormake profile-changed - narrow
make changedmixed diffs down to an effectivefrontendorbackendfast surface when the other side already resolves toskip, so shared-but-one-sided changes stop paying for both halves - execute the recommended local loop directly via
make auto-local, including automatic switching between fast and profile modes when the diff is expensive enough to justify profiling context - remember the last successful
auto-localloop per diff family and print a cheaper follow-up command for the next tweak, so second-pass iterations shrink automatically - append each local timing phase into
.logs/local_gate_timing.csvso repeated runs can be compared over time - keep project-specific path and route assumptions inside
scripts/project_automation_config.sh, so the automation core can be ported to another repo without rewriting every script first - keep project-specific path-to-target and path-to-scope rules inside
scripts/project_automation_targets.sh, so thedetect_*layer is portable too - export the reusable automation layer with
make export-automation-corewhen you want to move the core into a separate private repository - keep frontend smoke assertions inside
scripts/project_automation_smoke_checks.sh, so both the fast and heavier smoke runners stay reusable across projects
To inspect the latest local timings quickly:
make timing-history
make timing-stats
make timing-hintFor repeated frontend iterations, the faster hot loop is:
make frontend-hot
make profile-frontend-hot
make frontend-smoke-server-status
make frontend-smoke-server-stopOr run the broader local release gate:
bash scripts/release_workflow.sh --surface fullFast mode is also available directly:
bash scripts/release_workflow.sh --surface frontend --fast
bash scripts/release_workflow.sh --surface backend --fast
bash scripts/release_workflow.sh --surface full --fastFor template-heavy frontend changes, also run:
npm --prefix frontend run smoke:templatesThe full local release gate already includes this templates smoke alongside the admin and runtime frontend smokes.
For ops-overview focused frontend changes, also run:
npm --prefix frontend run smoke:opsFor auth-surface frontend changes, also run:
npm --prefix frontend run smoke:authFor admin interaction changes around saved views, audit filters, or bulk actions, also run:
npm --prefix frontend run smoke:admin-interactionsFor backup / restore dry-run workflow changes, also run:
npm --prefix frontend run smoke:restoreFor server-management frontend changes, also run:
npm --prefix frontend run smoke:serversThat smoke covers the dedicated /app/server-review workspace, which is now the main UI path for create/edit/test/diagnostics/delete server actions.
For beginner-path changes across /app, /app/server-review, or /app/deployment-workflow, also run:
npm --prefix frontend run smoke:beginnerThat smoke checks the first-time admin path and the member remote-only blocked path.
For a single remote deploy command that also runs post-deploy smoke:
bash scripts/remote_release.sh \
--host <deploy-host> \
--surface full \
--base-url https://your-domain \
--admin-username admin \
--admin-password '<secret>'That helper now runs a fast smoke-credentials precheck before the remote rebuild. If the configured admin smoke credentials already return 401 or 403, the release stops immediately instead of spending another 10-20 minutes on a deploy that will fail in post-deploy smoke anyway.
It also compares the provided smoke credentials against the target runtime env file over SSH before deploy, so GitHub environment drift now fails as a fast contract error instead of a delayed post-deploy surprise.
If you want the same flow from GitHub instead of a workstation shell, use the manual workflow in .github/workflows/release.yml after configuring repository secrets for the deploy host, deploy SSH key, pinned known_hosts contents, base URL, and admin smoke credentials.
For a short operator-only secret drift check without a deploy, use .github/workflows/release-secrets-audit.yml and choose production or staging.
That manual workflow now also supports an incident_self_test mode with open, update, and resolve actions so operators can verify the incident automation path without waiting for the nightly schedule and without forcing a real target failure.
That workflow also runs every day at 02:17 UTC (09:17 in Novosibirsk) for both environments and sends a best-effort webhook notification when DEPLOY_NOTIFICATION_WEBHOOK is configured.
If a scheduled audit fails, GitHub automatically opens or updates one environment-specific incident issue so the failure does not disappear in webhook history alone.
That incident now gets incident plus severity labels, and severity is raised to severity:high after the configured number of consecutive scheduled failures.
When the next scheduled audit for that environment succeeds, the workflow comments on the issue and closes it automatically.
The manual self-test flow uses a separate [release-secrets-audit:self-test] ... issue title and the incident:test label, so it does not interfere with real scheduled incidents.
The incident triage logic itself now lives in scripts/release_audit_incident.js, and the local regression path for it is node --test scripts/release_audit_incident.test.js.
The workflow calls that helper through .github/actions/release-audit-incident/action.yml, so scheduled audits and manual self-tests share one incident wiring path.
The workflow self-test effective status derivation now lives in scripts/release_audit_mode.js, so the YAML step wiring stays thin and the local regression path for it is node --test scripts/release_audit_mode.test.js.
Recommended promotion order:
- open and review a PR into
develop - merge the reviewed PR into
develop .github/workflows/ci.ymlruns the release gate and then auto-deploys the same commit to staging- production deploy stays behind
.github/workflows/release.ymlor a manualscripts/remote_release.shrun
The CI and release workflows call the same reusable composite action in .github/actions/remote-release/action.yml, so staging and production deploy behavior stays aligned with scripts/remote_release.sh.
Those workflows pass the exact checked-out commit SHA into the remote helper, so the release host deploys the reviewed commit rather than whichever newer commit happens to land on the branch later.
For the next project, the fastest non-empty starting point is now the product starter:
make bootstrap-product-starter TARGET_DIR=/absolute/path/to/project PRODUCT_STARTER_FLAGS="--project-name MyApp --app-slug myapp --contact-email founder@example.com --frontend-dir web --backend-dir api"That gives the new repo:
- starter frontend shell
- starter backend shell
- starter docs
- reusable automation core
Then immediately scaffold the first product slice:
make scaffold-product-resource TARGET_DIR=/absolute/path/to/project RESOURCE_FLAGS="--name Projects --slug projects --frontend-dir web --backend-dir api"Recommended PR-first daily flow:
git switch develop
git pull --ff-only origin develop
make start-pr-branch SLUG=my-change
make git-doctor
make ship-pr SLUG=my-change MESSAGE="Describe the change"Then:
- commit the change on the feature branch
- run
make pr-ready - push with
git push -u origin $(git branch --show-current) - open the PR with
make pr-open - use
make pr-statuswhile the PR is under review - wait with
make pr-watch - merge with
make pr-land
If you want to compress even more of the Git overhead into one path:
make ship-pr SLUG=my-change MESSAGE="Describe the change"
make pr-watch
make pr-land-syncPreferred cadence:
- finish one logical slice
- run the cheapest relevant local verification
- commit that slice cleanly
- push when one clean commit or a short series of 2-3 related commits is ready
Use this to keep GitHub readable:
- do not push every tiny fix
- do not mix unrelated scaffold, backend, and UI work into one opaque commit
- prefer a small number of coherent commits that tell a clear story in PR review
Notes:
- PR CI already runs the same release gate as direct
developpushes - auto-staging still happens only after merge into
develop .github/pull_request_template.mdkeeps the PR body short and predictablemake pr-readyuses the same recommendation andauto-locallogic as the normal local loop, so the pre-PR check is not a second parallel processmake pr-doctorprints branch cleanliness, upstream state, PR state, local-loop freshness, and a PR size class so oversized branches get caught before reviewmake pr-doctoralso reads the current PR check state from GitHub and suggests a likely split direction from the diff mix when a branch has grown too largemake pr-doctornow also compares the current local commit, the last locally verified commit, and the PR head SHA on GitHub so stale local green runs or unpushed commits are obvious before reviewmake pr-watchwaits on GitHub checks and then refreshes doctor outputmake pr-landrefuses to merge unless doctor is clean, localHEADmatches the PR head SHA, and PR checks are greenmake pr-land-syncmerges the PR and then fast-forwardsmainfromdevelopmake dev-doctorgives one compact local summary of recommended loop, timing bottleneck, and PR doctor statemake git-doctorgives one compact Git-only summary of branch cleanliness, upstream drift, stale lock state, and the most useful next Git command- doctor commands now also support
--format shell, so future repos can automate around them without parsing human-oriented text
For daily iteration speed on staging:
git push origin developAfter the push:
- CI detects the changed surface once and uses that same decision for the release gate and the staging deploy
- if the commit changes only
frontend/, staging deploy rebuildsfrontend - if the commit changes only
backend/, staging deploy rebuildsbackend - mixed or shared changes fall back to a full staging deploy
- docs-only or workflow-only changes skip both the release gate and staging deploy entirely
- older in-progress CI runs on the same branch are cancelled automatically so only the newest iteration keeps running
- the release gate now skips unnecessary dependency installs too, so frontend-only changes do not install backend requirements and backend-only changes do not install frontend packages
Use .github/workflows/staging.yml only as a manual fallback when you need to redeploy staging on demand.
That manual fallback now also supports skip_smoke when you need a faster redeploy for operator-only checks.
For current DeployMate feature work, do not start every new admin surface from a blank file. Use:
make scaffold-deploymate-surface SURFACE_FLAGS="--name Review Inbox --slug review-inbox"That gives you the frontend page shell, backend route/service stub, backend API flow test, and backend/app/main.py registration in one pass.
The generated page now also uses shared review-shell blocks from frontend/app/app/admin-ui.js, so new admin surfaces start from the same summary-and-queue pattern instead of ad hoc JSX.
The backend side now also gets typed response models in backend/app/schemas.py, a built-in q filter path, and a generated API flow test for both default and filtered list responses.
Add --with-table, --with-saved-views, --with-audit, and --with-export when the first useful version of the surface should already include those richer review sections.
--with-table adds a denser starter table over the same queue data, so the first real operator pass can compare status, context, and workflow slice without hand-building a second view.
Those richer flags now also generate real starter wiring for URL state, filter chips, saved-views manager hooks, audit filtering, and local JSON/CSV exports, so the surface starts closer to a live DeployMate workflow than a blank mock.
Add --preset users, --preset upgrade-requests, or --preset servers when the feature already clearly matches one of those common DeployMate surface families.
Presets now also change the starter action flow itself, so the generated surface already includes the first local decision pattern for that family instead of a queue-only mock.
The generated page now also gets a preset-specific segment/workflow filter and richer card context fields, so the starter slice is closer to the actual entity shape you will build next.
The scaffold now also includes a starter bulk-action panel and a mutation payload preview, so the first write contract is visible immediately instead of being invented ad hoc later.
New surfaces now also generate modular frontend starter files (page.js, starter-data.js, starter-actions.js) so the scaffold output is easier to extend without turning the page file into a dump of preset constants.
The scaffold now also generates starter-smoke.js plus a backend *_starter.py helper, so route smoke placeholders and backend starter contracts are modular too.
It now also lays down a typed backend starter-action endpoint and matching test, so the first mutation path starts from a real API contract instead of only a frontend placeholder.
The scaffold also now emits starter-api.js, so switching a generated surface from local scaffold mode to API-backed starter mode no longer requires inventing the client bridge from scratch.
The CI, staging, and production workflows now write a short GitHub job summary with the chosen surface, smoke mode, requested commit SHA, deployed SHA, and target URL so the result is readable without opening raw logs.
To verify that the release workflows and the documented GitHub secret contract still match:
bash scripts/release_workflow_audit.shBefore the first deploy of encrypted server credentials, or before enabling remote server management on a fresh environment:
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# store the value in .env.production as DEPLOYMATE_SERVER_CREDENTIALS_KEYBefore switching SSH host verification to strict pinned mode:
bash scripts/prepare_known_hosts.sh --host <target-host> --port 22
# store the resulting path in .env.production as DEPLOYMATE_SSH_KNOWN_HOSTS_FILE
# then set DEPLOYMATE_SSH_HOST_KEY_CHECKING=yesDo not rotate DEPLOYMATE_SERVER_CREDENTIALS_KEY casually. Existing stored server credentials depend on it for decryption.
To audit the current database state for server credential encryption before a release:
bash scripts/server_credentials_audit.shTo verify that the repo still keeps local Docker execution behind an explicit opt-in boundary:
bash scripts/local_runtime_audit.shTo verify that production frontend and backend runtime capability flags are still aligned:
bash scripts/runtime_capability_audit.shThis audit checks:
frontend/Dockerfileproduction defaultdocker-compose.prod.ymlproduction build and runtime defaults.env.production.example.env.productionwhen it exists on the workstation or deployment host
If .env.production sets DEPLOYMATE_LOCAL_DOCKER_ENABLED=false, then NEXT_PUBLIC_LOCAL_DEPLOYMENTS_ENABLED must be 0. If backend local runtime is explicitly enabled, the frontend flag must be 1.
To verify that production env security defaults still match the hardened contract:
bash scripts/production_env_audit.sh --env-file .env.productionOn the deployment host, run the stricter form before docker compose up so the pinned known_hosts file must already exist and be non-empty:
bash scripts/production_env_audit.sh --env-file .env.production --require-runtime-filesLocal:
npm --prefix frontend run smoke:beginner
npm --prefix frontend run smoke:admin
npm --prefix frontend run smoke:runtime
npm --prefix frontend run build
git status --short
git add frontend
git commit -m "Describe the frontend change"
git push origin developHost:
ssh <deploy-host>
cd /opt/deploymate
git fetch origin
git switch develop
git pull --ff-only origin develop
bash scripts/production_env_audit.sh --env-file .env.production --require-runtime-files
docker compose -f docker-compose.prod.yml --env-file .env.production up -d --build --no-deps frontend
docker compose -f docker-compose.prod.yml --env-file .env.production ps frontend
curl -I https://your-domain/appSingle-command alternative from the workstation:
bash scripts/remote_release.sh \
--host <deploy-host> \
--surface frontend \
--base-url https://your-domain \
--admin-username admin \
--admin-password '<secret>'Local:
python3 -m py_compile backend/app/main.py backend/app/routes/*.py backend/app/services/*.py backend/app/db.py backend/app/schemas.py
PYTHONPATH=backend backend/venv/bin/python -m unittest discover -s backend/tests -p 'test_*.py'
PYTHONPATH=backend backend/venv/bin/python -m unittest backend.tests.test_server_credentials -v
bash scripts/security_audit.sh
git status --short
git add backend
git commit -m "Describe the backend change"
git push origin developHost:
ssh <deploy-host>
cd /opt/deploymate
grep '^DEPLOYMATE_SERVER_CREDENTIALS_KEY=' .env.production
git fetch origin
git switch develop
git pull --ff-only origin develop
bash scripts/production_env_audit.sh --env-file .env.production --require-runtime-files
docker compose -f docker-compose.prod.yml --env-file .env.production up -d --build --no-deps backend
docker compose -f docker-compose.prod.yml --env-file .env.production ps backend
curl -I https://your-domain/api/healthSingle-command alternative from the workstation:
bash scripts/remote_release.sh \
--host <deploy-host> \
--surface backend \
--base-url https://your-domain \
--admin-username admin \
--admin-password '<secret>'If this release introduces encrypted server credentials and production already has existing server records, the backend startup path will migrate any plaintext records to encrypted form after boot as long as DEPLOYMATE_SERVER_CREDENTIALS_KEY is present.
Use a full rebuild when backend, frontend build args, or production compose settings changed.
Local:
./scripts/preflight.sh
npm --prefix frontend run smoke:admin
npm --prefix frontend run build
PYTHONPATH=backend backend/venv/bin/python -m unittest discover -s backend/tests -p 'test_*.py'
git status --short
git add .
git commit -m "Describe the release change"
git push origin developHost:
ssh <deploy-host>
cd /opt/deploymate
git fetch origin
git switch develop
git pull --ff-only origin develop
bash scripts/production_env_audit.sh --env-file .env.production --require-runtime-files
docker compose -f docker-compose.prod.yml --env-file .env.production up -d --build
docker compose -f docker-compose.prod.yml --env-file .env.production ps
curl -I https://your-domain
curl -I https://your-domain/app
curl -I https://your-domain/api/healthSingle-command alternative from the workstation:
bash scripts/remote_release.sh \
--host <deploy-host> \
--surface full \
--base-url https://your-domain \
--admin-username admin \
--admin-password '<secret>'scripts/remote_release.sh now runs both runtime_capability_audit.sh --env-file <remote-env> and production_env_audit.sh --env-file <remote-env> --require-runtime-files on the target host before it rebuilds the stack.
DEPLOYMATE_BASE_URL=https://your-domain \
DEPLOYMATE_ADMIN_USERNAME=admin \
DEPLOYMATE_ADMIN_PASSWORD='<secret>' \
bash scripts/post_deploy_smoke.shOptional runtime coverage can be enabled when you want the smoke to create and remove a real test deployment:
DEPLOYMATE_BASE_URL=https://your-domain \
DEPLOYMATE_ADMIN_USERNAME=admin \
DEPLOYMATE_ADMIN_PASSWORD='<secret>' \
DEPLOYMATE_SMOKE_RUNTIME_ENABLED=1 \
DEPLOYMATE_SMOKE_SERVER_ID='<server-id>' \
bash scripts/post_deploy_smoke.shOr create a temporary smoke target on the fly from an SSH key file:
DEPLOYMATE_BASE_URL=https://your-domain \
DEPLOYMATE_ADMIN_USERNAME=admin \
DEPLOYMATE_ADMIN_PASSWORD='<secret>' \
DEPLOYMATE_SMOKE_RUNTIME_ENABLED=1 \
DEPLOYMATE_SMOKE_SERVER_HOST='203.0.113.10' \
DEPLOYMATE_SMOKE_SERVER_USERNAME='root' \
DEPLOYMATE_SMOKE_SSH_KEY_FILE="$HOME/.ssh/id_ed25519" \
bash scripts/post_deploy_smoke.shThe same runtime env vars can be passed through scripts/remote_release.sh so a remote deploy can immediately run the deeper runtime smoke in one command:
DEPLOYMATE_SMOKE_RUNTIME_ENABLED=1 \
DEPLOYMATE_SMOKE_SERVER_HOST='203.0.113.10' \
DEPLOYMATE_SMOKE_SERVER_USERNAME='root' \
DEPLOYMATE_SMOKE_SSH_KEY_FILE="$HOME/.ssh/id_ed25519" \
bash scripts/remote_release.sh \
--host <deploy-host> \
--surface full \
--base-url https://your-domain \
--admin-username admin \
--admin-password '<secret>'GitHub Actions release workflow secrets for runtime smoke:
RUNTIME_SMOKE_SERVER_IDfor a pre-saved smoke target, orRUNTIME_SMOKE_SERVER_HOST,RUNTIME_SMOKE_SERVER_USERNAME, andRUNTIME_SMOKE_SSH_PRIVATE_KEYfor a temporary target- optional
RUNTIME_SMOKE_SERVER_PORT,RUNTIME_SMOKE_SERVER_NAME,RUNTIME_SMOKE_IMAGE,RUNTIME_SMOKE_INTERNAL_PORT,RUNTIME_SMOKE_EXTERNAL_PORT,RUNTIME_SMOKE_START_PORT, andRUNTIME_SMOKE_HEALTH_TIMEOUT
Required GitHub Actions release workflow secrets:
DEPLOY_HOSTDEPLOY_SSH_PRIVATE_KEYDEPLOY_SSH_KNOWN_HOSTSDEPLOYMATE_BASE_URLDEPLOYMATE_ADMIN_USERNAMEDEPLOYMATE_ADMIN_PASSWORD
Optional GitHub Actions release workflow secrets:
DEPLOY_REPO_DIRDEPLOY_BRANCHDEPLOY_ENV_FILEDEPLOY_NOTIFICATION_WEBHOOKfor best-effort Slack/Discord-compatible deploy notifications
The staging workflow uses the same secret names, but scoped under the staging environment instead of production.
If DEPLOY_NOTIFICATION_WEBHOOK is unset, the workflows simply skip notifications.
Required GitHub Actions release secrets audit workflow secrets:
DEPLOY_HOSTDEPLOY_SSH_PRIVATE_KEYDEPLOY_SSH_KNOWN_HOSTSDEPLOY_REPO_DIRDEPLOY_ENV_FILEDEPLOYMATE_ADMIN_USERNAMEDEPLOYMATE_ADMIN_PASSWORD
Optional GitHub Actions release secrets audit workflow secrets:
DEPLOY_NOTIFICATION_WEBHOOKfor best-effort drift audit notifications
The audit workflow also needs GitHub Actions issue-write permission so scheduled failures can open or update an incident issue in the repository.
If the audit fails with REMOTE HOST IDENTIFICATION HAS CHANGED, treat it as a trust-anchor
incident, not a normal CI flake:
- Confirm out of band that the target host was intentionally rebuilt, reinstalled, or rotated.
- Capture the current host key fingerprints from a trusted workstation:
bash scripts/prepare_known_hosts.sh --host <target-host> --port 22 --output /tmp/deploymate_known_hosts
cat /tmp/deploymate_known_hosts- Compare the printed fingerprints with the provider console, host console, or another trusted owner-controlled path.
- Only after the new fingerprint is confirmed, update the GitHub environment secret
DEPLOY_SSH_KNOWN_HOSTSwith the full known_hosts contents for the same environment. - Re-run
Release Secrets Auditmanually for that environment and close the incident issue only after the manual run succeeds.
Runtime smoke notes:
- if
DEPLOYMATE_SMOKE_SERVER_IDis set, the script asks/servers/{server_id}/suggested-portsfor a free external port - if
DEPLOYMATE_SMOKE_SERVER_IDis empty butDEPLOYMATE_SMOKE_SERVER_HOST,DEPLOYMATE_SMOKE_SERVER_USERNAME, andDEPLOYMATE_SMOKE_SSH_KEY_FILEare set, the script creates and later deletes a temporary server target automatically - if
DEPLOYMATE_SMOKE_SERVER_IDis not set, provideDEPLOYMATE_SMOKE_EXTERNAL_PORTexplicitly - production can keep runtime smoke disabled when running in remote-only mode without a preconfigured smoke target
- the script always attempts to delete the temporary smoke deployment before exit
- if it created a temporary smoke server target, it also deletes that target before exit
Optional GitHub repository variables for scheduled audit incident triage:
RELEASE_AUDIT_INCIDENT_ASSIGNEEto auto-assign the incident issue to one GitHub loginRELEASE_AUDIT_INCIDENT_FAILURE_THRESHOLDto control after how many consecutive scheduled failures severity escalates toseverity:high(default: 3)
Recommended operator self-test sequence:
- run
.github/workflows/release-secrets-audit.ymlwithincident_self_test=open - re-run it with
incident_self_test=update - confirm the same issue was reused and severity/labels updated as expected
- re-run it with
incident_self_test=resolve - confirm the self-test issue was commented and closed
Release workflows can send a best-effort notification when DEPLOY_NOTIFICATION_WEBHOOK is configured in the target GitHub environment.
Compatible receivers:
- Slack Incoming Webhooks
- Discord channel webhooks
- any endpoint that accepts a JSON body with either
textorcontent
Minimal Slack setup:
- Open
Slack -> Apps -> Incoming Webhooks. - Create a webhook for the channel that should receive deploy results.
- Copy the webhook URL into the GitHub environment secret
DEPLOY_NOTIFICATION_WEBHOOK.
Minimal Discord setup:
- Open the target channel settings.
- Create a channel webhook.
- Copy the webhook URL into the GitHub environment secret
DEPLOY_NOTIFICATION_WEBHOOK.
Local receiver test:
bash scripts/send_workflow_notification.sh \
--webhook-url 'https://hooks.slack.com/services/...' \
--workflow 'Auto staging deploy' \
--environment staging \
--status success \
--surface frontend \
--smoke 'runtime enabled' \
--commit 4ac9f9e94edae459bd97b3572ac15bfdaaa547ed \
--ref develop \
--run-url 'https://github.com/AlexGerlitz/deploymate/actions/runs/23931028894' \
--details 'frontend-only change detected'Expected message shape:
- status line with workflow name
- environment, surface, and smoke mode
- short commit SHA and ref
- direct link to the workflow run
If DEPLOY_NOTIFICATION_WEBHOOK is unset or the receiver is temporarily unavailable, deploys still continue because notification steps are best-effort.
The scripted smoke currently validates:
/login/app/api/health- admin login
/api/auth/me- backup bundle download
- restore dry-run
- optional create -> health -> diagnostics -> logs -> activity -> delete deployment flow
- logout and session invalidation
curl -sS -b "<cookie jar>" https://your-domain/api/admin/backup-bundle
curl -sS -b "<cookie jar>" \
-H "Content-Type: application/json" \
-X POST https://your-domain/api/admin/restore/dry-run \
--data-binary @restore-dry-run-payload.jsonDry-run result meanings:
ok section looks safe to import later
warn review is required before any future restore
error blockers exist and the payload should not be applied
Standard production is intentionally configured as:
DEPLOYMATE_LOCAL_DOCKER_ENABLED=false
NEXT_PUBLIC_LOCAL_DEPLOYMENTS_ENABLED=0
If either capability flag changes, rebuild both backend and frontend with the full stack flow.
- Identify the last known good commit.
- Switch the deployment host to that commit in detached mode.
- Rebuild the smallest affected surface.
- Re-run the smoke check immediately.
Example:
ssh <deploy-host>
cd /opt/deploymate
git log --oneline -n 10
git switch --detach <last_known_good_commit>
docker compose -f docker-compose.prod.yml --env-file .env.production up -d --build
curl -I https://your-domain
curl -I https://your-domain/app
curl -I https://your-domain/api/health- prefer
developas the release branch and deploy from Git, not by editing live files - use
--no-depsfor frontend-only and backend-only deploys - on the production host, port
80is already occupied by DeployMate itself, so app deployments should use other external ports