Skip to content

Harden production deployment configuration#23

Open
fakechris wants to merge 2 commits intomainfrom
fix/production-deploy-hardening
Open

Harden production deployment configuration#23
fakechris wants to merge 2 commits intomainfrom
fix/production-deploy-hardening

Conversation

@fakechris
Copy link
Copy Markdown
Owner

@fakechris fakechris commented Apr 21, 2026

Summary

  • standardize production image compose around .env.production and direct DATABASE_URL
  • add Google OAuth fail-fast validation and production smoke checks
  • add Ansible Vault examples and deploy-time OAuth smoke validation
  • update production docs and backup script defaults

Verification

  • pnpm typecheck
  • pnpm test --run
  • docker compose --env-file .env.production.example -f docker-compose.prod.images.yml config
  • Ansible syntax check for ops/ansible/playbooks/deploy.yml
  • precommit secret scan for long token/key patterns: 0
  • production smoke passed for https://involute.edai100.com

Open in Devin Review

Summary by CodeRabbit

  • New Features

    • Added configurable Google OAuth requirement flag for production deployments.
    • Introduced production smoke testing to verify deployment health.
  • Documentation

    • Updated deployment runbooks with production environment configuration guidance.
  • Chores

    • Enhanced production deployment infrastructure with improved Docker image support and Ansible automation.
    • Added parameterized volume and service binding configuration for production environments.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Warning

Rate limit exceeded

@fakechris has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 15b7e3d3-7da7-45fe-96b7-d7f9fdbdb6dc

📥 Commits

Reviewing files that changed from the base of the PR and between 47b3999 and 046bc2d.

📒 Files selected for processing (1)
  • ops/ansible/playbooks/deploy.yml
📝 Walkthrough

Walkthrough

This PR extends production deployment infrastructure by introducing database URL and bind-address configuration, Docker image deployment identifiers, and a Google OAuth requirement flag. It updates deployment automation (GitHub Actions, Ansible) to conditionally validate OAuth credentials, refactors Docker Compose production files to use pre-built images, and adds production smoke testing to verify health and OAuth configuration.

Changes

Cohort / File(s) Summary
Environment Configuration
.env.production.example, ops/ansible/templates/env.production.j2
Added DATABASE_URL, SERVER_BIND_ADDRESS, WEB_BIND_ADDRESS, image registry/namespace/tag variables, and REQUIRE_GOOGLE_OAUTH flag. Updated POSTGRES_PASSWORD placeholder guidance and made Google OAuth credential placeholders explicit.
Docker Compose Production
docker-compose.prod.images.yml, package.json
Externalized DATABASE_URL from hardcoded construction to required environment variable; added REQUIRE_GOOGLE_OAUTH and bind-address port mappings; parameterized volume names; added caddy service profile. Removed compose:prod:build script and updated compose commands to target docker-compose.prod.images.yml without --build.
Ansible Deployment Infrastructure
ops/ansible/playbooks/deploy.yml, ops/ansible/inventory/hosts.yml.example, ops/ansible/group_vars/all/...
Changed default stack profile from tailscale to production and database seeding from true to false. Added conditional Google OAuth validation; updated compose file selection to .images.yml variants; added smoke-test URL verification and post-deploy OAuth configuration checks. Added vault example template and group_vars structure.
GitHub Actions Deployment
.github/workflows/deploy.yml
Added INVOLUTE_IMAGE_TAG and INVOLUTE_REQUIRE_GOOGLE_OAUTH environment variables with conditional secret validation for OAuth credentials when Google OAuth is required.
Server Environment Parsing
packages/server/src/environment.ts, packages/server/src/environment.test.ts
Added requireGoogleOAuth boolean field to ServerEnvironment; implemented parsing of GOOGLE_OAUTH_* credentials with validation to ensure all OAuth fields are present when requirement is enabled. Added test cases for OAuth validation logic.
Production Smoke Testing
scripts/prod-smoke.sh, README.md
Added new production smoke test script that validates service health, OAuth session configuration, and Google OAuth redirect; updated README with deployment runbook for images-based compose, vault setup instructions, and updated smoke-check commands.
Git & Package Configuration
.gitignore, scripts/postgres-backup.sh
Added patterns to ignore Ansible vault files and group_vars; updated backup script to target docker-compose.prod.images.yml.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Add automated deployment tooling #9: Introduces the Ansible deployment and Docker Compose production infrastructure that this PR extends with database URL, bind-address, image configuration, and OAuth requirement handling.
  • Add Google OAuth sessions and team access UI #8: Adds Google OAuth environment validation and parsing logic in packages/server/src/environment.ts, which this PR builds upon with the requireGoogleOAuth flag and conditional validation.
  • Harden auth and add CI merge gates #3: Modifies packages/server/src/environment.ts to add new ServerEnvironment fields, directly related to the OAuth environment parsing changes in this PR.

Poem

🐰 A hop through the ops, with profiles galore,
Database URLs and OAuth's encore,
From Tailscale to production, the stack profile spins,
With smoke tests ascending—let deployment begin!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Harden production deployment configuration' directly summarizes the main objective of the PR: improving security and robustness of production deployment through standardized compose files, environment configuration, validation checks, and documentation updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/production-deploy-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request standardizes the production deployment process by introducing Ansible Vault for secret management, implementing automated smoke checks, and refining the Docker Compose configuration. Key changes include the addition of a DATABASE_URL environment variable, a new REQUIRE_GOOGLE_OAUTH flag with associated server-side validation, and a dedicated smoke test script. Feedback focuses on improving Ansible deployment reliability by adding descriptive failure messages to credential assertions and using more robust file existence checks when cleaning up legacy stacks.

Comment on lines +81 to +100
- >-
(
(involute_stack_profile | default(involute_default_stack_profile)) != 'production'
) or (
involute_postgres_password is defined
and involute_postgres_password | length > 0
and (involute_postgres_password is match('^[A-Za-z0-9._~-]+$'))
)
- >-
(
(involute_stack_profile | default(involute_default_stack_profile)) != 'production'
or not (involute_require_google_oauth | default(true) | bool)
) or (
involute_google_oauth_client_id is defined
and involute_google_oauth_client_id | length > 0
and involute_google_oauth_client_secret is defined
and involute_google_oauth_client_secret | length > 0
and involute_google_oauth_redirect_uri is defined
and involute_google_oauth_redirect_uri | length > 0
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic for involute_postgres_password and Google OAuth credentials is a significant improvement for deployment reliability. However, the ansible.builtin.assert task lacks a fail_msg. If the regex check for the password fails (e.g., if it contains characters like # or !), the operator will see a generic 'Assertion failed' error. Adding a descriptive fail_msg would help the operator understand that only URL-safe characters are permitted for the password to ensure DATABASE_URL consistency.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 046bc2d by adding an explicit fail_msg that calls out the URL-safe Postgres password requirement, required Google OAuth variables, and Tailscale admin prerequisites.

Comment on lines +143 to +159
- name: Stop legacy source-build stack if present
ansible.builtin.shell: >
docker compose
--env-file .env
-f docker-compose.yml
down --remove-orphans || true
args:
chdir: "{{ involute_deploy_path | default(involute_default_deploy_path) }}"

- name: Stop legacy production source-build stack if present
ansible.builtin.shell: >
docker compose
--env-file .env.production
-f docker-compose.prod.yml
down --remove-orphans || true
args:
chdir: "{{ involute_deploy_path | default(involute_default_deploy_path) }}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These tasks use shell with || true to clean up legacy stacks. This approach can be noisy in Ansible logs if the files are missing. A more robust and cleaner pattern is to use the ansible.builtin.stat module to check for the existence of docker-compose.yml and docker-compose.prod.yml and then use the when conditional to run the cleanup only when necessary.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 046bc2d by adding stat checks for the legacy compose files and running the cleanup tasks conditionally instead of relying on || true.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

297-303: ⚠️ Potential issue | 🟡 Minor

Document the actual GitHub Actions OAuth toggle name.

The workflow reads vars.INVOLUTE_REQUIRE_GOOGLE_OAUTH, but this section refers to REQUIRE_GOOGLE_OAUTH=true. Operators trying to disable the requirement via REQUIRE_GOOGLE_OAUTH=false will still get the workflow default.

Docs patch
-- `INVOLUTE_GOOGLE_OAUTH_CLIENT_ID`, `INVOLUTE_GOOGLE_OAUTH_CLIENT_SECRET`, `INVOLUTE_GOOGLE_OAUTH_REDIRECT_URI` when `REQUIRE_GOOGLE_OAUTH=true`
+- `INVOLUTE_GOOGLE_OAUTH_CLIENT_ID`, `INVOLUTE_GOOGLE_OAUTH_CLIENT_SECRET`, `INVOLUTE_GOOGLE_OAUTH_REDIRECT_URI` when `INVOLUTE_REQUIRE_GOOGLE_OAUTH=true`
 - optional: `INVOLUTE_ADMIN_EMAIL_ALLOWLIST`, `INVOLUTE_IMAGE_TAG`
 
 Recommended repository variables:
 
 - `INVOLUTE_DEPLOY_ON_MAIN=false` to keep deploy manual by default
 - `INVOLUTE_DEPLOY_PROFILE=tailscale` for the current private test phase
+- `INVOLUTE_REQUIRE_GOOGLE_OAUTH=true` for production OAuth enforcement, or `false` only for deployments that intentionally disable Google sign-in
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 297 - 303, The README references REQUIRE_GOOGLE_OAUTH
but the GitHub Actions workflow uses vars.INVOLUTE_REQUIRE_GOOGLE_OAUTH; update
the documentation to mention INVOLUTE_REQUIRE_GOOGLE_OAUTH (and its expected
true/false values) wherever REQUIRE_GOOGLE_OAUTH appears (including the list of
required env vars and any examples), so the documented toggle matches the
workflow variable name and operators can correctly override the default.
🧹 Nitpick comments (5)
ops/ansible/group_vars/all/.gitkeep (1)

1-1: LGTM! The .gitkeep file correctly preserves the directory.

The file serves its purpose of keeping the ops/ansible/group_vars/all/ directory in Git. While a single blank line is functional, you might optionally consider either leaving it completely empty or adding a brief comment explaining its purpose (e.g., # Keep this directory for Ansible group vars), but the current approach works fine.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ops/ansible/group_vars/all/.gitkeep` at line 1, The .gitkeep file is correct
and no change is required; keep the existing file as-is to preserve the
directory, or optionally replace the single blank line with a short explanatory
comment in the .gitkeep to document its purpose (e.g., indicating it preserves
the directory for Ansible group vars).
scripts/prod-smoke.sh (1)

15-30: Temp file can leak on failure.

SESSION_RESPONSE is created at line 15, but with set -eu any failure between the mktemp and line 30 (e.g. curl connection error) will abort the script without running the later rm -f. Register a trap so the file is always removed.

🧹 Suggested cleanup
 SESSION_RESPONSE="$(mktemp)"
+trap 'rm -f "$SESSION_RESPONSE"' EXIT
 SESSION_STATUS="$(

Then drop the explicit rm -f "$SESSION_RESPONSE" calls on lines 25 and 30.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/prod-smoke.sh` around lines 15 - 30, The temporary file created in
SESSION_RESPONSE via mktemp can leak on errors; after creating SESSION_RESPONSE
call mktemp, immediately register a cleanup trap (e.g. trap 'rm -f
"$SESSION_RESPONSE"' EXIT) so the file is removed on script exit, and then
remove the later explicit rm -f "$SESSION_RESPONSE" calls; ensure the trap is
set while SESSION_RESPONSE is defined and before running curl (referencing
SESSION_RESPONSE, mktemp, and the curl/SESSION_STATUS block).
ops/ansible/templates/env.production.j2 (2)

7-7: DATABASE_URL relies on password being URL-safe.

The default embeds involute_postgres_password verbatim without percent-encoding. This is currently safe because deploy.yml enforces ^[A-Za-z0-9._~-]+$ for the production profile, but the coupling is implicit. Consider either:

  • Documenting in a template comment that involute_postgres_password must be URL-safe when involute_database_url is not overridden; or
  • Applying Jinja's urlencode filter to the password so the default is robust even if the regex constraint is ever relaxed.
♻️ Suggested robustness tweak
-DATABASE_URL={{ involute_database_url | default('postgresql://' ~ (involute_postgres_user | default('involute')) ~ ':' ~ involute_postgres_password ~ '@db:5432/' ~ (involute_postgres_db | default('involute')) ~ '?schema=public') }}
+DATABASE_URL={{ involute_database_url | default('postgresql://' ~ (involute_postgres_user | default('involute') | urlencode) ~ ':' ~ (involute_postgres_password | urlencode) ~ '@db:5432/' ~ (involute_postgres_db | default('involute')) ~ '?schema=public') }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ops/ansible/templates/env.production.j2` at line 7, DATABASE_URL default
currently embeds involute_postgres_password verbatim which assumes the password
is URL-safe; update the template so the password is percent-encoded to be
robust: change the default expression that builds DATABASE_URL (the one using
involute_database_url | default('postgresql://' ~ (involute_postgres_user |
default('involute')) ~ ':' ~ involute_postgres_password ~ '@db:5432/' ~
(involute_postgres_db | default('involute')) ~ '?schema=public')) to apply
Jinja's urlencode filter to involute_postgres_password (e.g.,
involute_postgres_password|urlencode) so the generated URL safely encodes
special characters, or alternatively add a clear template comment above
DATABASE_URL stating that involute_postgres_password must be URL-safe when not
overriding involute_database_url.

11-13: Missing volume-name variables emitted by template.

docker-compose.prod.images.yml consumes POSTGRES_VOLUME_NAME, CADDY_DATA_VOLUME_NAME, and CADDY_CONFIG_VOLUME_NAME (lines 129-133), but this template does not emit them. Ansible-rendered .env.production will therefore always fall back to compose defaults, whereas operators editing .env.production.example can override them — the two paths diverge. Either emit these with matching defaults here, or drop them from .env.production.example to keep parity.

♻️ Suggested addition
 INVOLUTE_IMAGE_TAG={{ involute_image_tag | default('latest') }}
+POSTGRES_VOLUME_NAME={{ involute_postgres_volume_name | default('involute_postgres-prod-data') }}
+CADDY_DATA_VOLUME_NAME={{ involute_caddy_data_volume_name | default('involute_caddy-data') }}
+CADDY_CONFIG_VOLUME_NAME={{ involute_caddy_config_volume_name | default('involute_caddy-config') }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ops/ansible/templates/env.production.j2` around lines 11 - 13, The template
is missing the volume name variables used by docker-compose.prod.images.yml: add
POSTGRES_VOLUME_NAME, CADDY_DATA_VOLUME_NAME, and CADDY_CONFIG_VOLUME_NAME to
the env.production.j2 template with sensible defaults (e.g., match the defaults
used in .env.production.example or compose defaults) so rendered .env.production
contains the same keys; locate where INVOLUTE_IMAGE_REGISTRY /
INVOLUTE_IMAGE_NAMESPACE / INVOLUTE_IMAGE_TAG are defined and add analogous
lines emitting POSTGRES_VOLUME_NAME={{ postgres_volume_name | default('...') }},
CADDY_DATA_VOLUME_NAME={{ caddy_data_volume_name | default('...') }}, and
CADDY_CONFIG_VOLUME_NAME={{ caddy_config_volume_name | default('...') }} to
restore parity.
.env.production.example (1)

6-8: Keep POSTGRES_PASSWORD and DATABASE_URL in sync.

The password placeholder appears in two places, so operators must remember to update both or the DB connection will silently fail. Consider a short inline comment to call this out.

✏️ Suggested comment
 POSTGRES_PASSWORD=replace-with-a-long-url-safe-random-password
+# Keep password in sync with POSTGRES_PASSWORD above; characters must be URL-safe.
 DATABASE_URL=postgresql://involute:replace-with-a-long-url-safe-random-password@db:5432/involute?schema=public
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.env.production.example around lines 6 - 8, The POSTGRES_PASSWORD and
DATABASE_URL placeholders are duplicated and can fall out of sync; update
.env.production.example so operators don't have to edit two places—either change
DATABASE_URL to reference the POSTGRES_PASSWORD variable (so the password is
sourced from a single env var) or add a short inline comment next to
POSTGRES_PASSWORD and/or DATABASE_URL (mentioning POSTGRES_PASSWORD and
DATABASE_URL by name) explicitly telling maintainers to keep them in sync if
they manually replace the placeholder.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy.yml:
- Line 43: The workflow currently falls back to a mutable default by setting
INVOLUTE_IMAGE_TAG: ${{ vars.INVOLUTE_IMAGE_TAG || 'latest' }}, so update the
deploy job to disallow using 'latest' for production: remove the 'latest'
fallback and add a validation step (or job-level if check) that fails when the
environment is production and INVOLUTE_IMAGE_TAG is empty or equals 'latest';
alternatively set INVOLUTE_IMAGE_TAG from the image-publish job output (e.g., a
sha-<short-sha> tag) before running the deploy job. Ensure the check references
the INVOLUTE_IMAGE_TAG variable and the production environment name so deploys
cannot proceed with a mutable tag.

In `@ops/ansible/inventory/hosts.yml.example`:
- Around line 3-13: The example inventory's production profile is inconsistent:
remove or replace the Tailscale CGNAT values so templates don't set
SERVER_BIND_ADDRESS/WEB_BIND_ADDRESS to 100.x.y.z; either delete
involute_bind_address from the involute_production block (letting
env.production.j2 fall back to 127.0.0.1) or add explicit keys
involute_server_bind_address and involute_web_bind_address with
loopback/realistic production values, and change ansible_host from
tailnet-host.example.ts.net to a realistic host name or IP; update the
involute_production block accordingly so env.production.j2 and
.env.production.example are consistent.

In `@ops/ansible/playbooks/deploy.yml`:
- Around line 218-236: The two unconditional smoke tasks ("Verify auth session
reports Google OAuth configured" and "Verify Google OAuth start redirects to
Google") must be gated by the deployment flag so deployments with
involute_require_google_oauth=false don't fail; update both tasks to include a
when condition such as using the involute_require_google_oauth variable (e.g.
when: involute_require_google_oauth | default(true) | bool) so they run only
when OAuth is required, leaving the existing register names
involute_auth_session_smoke and involute_google_start_smoke and their
failed_when logic unchanged.

In `@package.json`:
- Line 12: The production smoke script invoked by the "smoke:prod" npm task
calls scripts/prod-smoke.sh which currently always asserts googleOAuthConfigured
and the /auth/google/start redirect; change scripts/prod-smoke.sh to gate those
OAuth-specific assertions behind an environment variable (e.g.,
REQUIRE_GOOGLE_OAUTH="${REQUIRE_GOOGLE_OAUTH:-true}") so the script always
performs generic checks (/health and /auth/session shape/status) but only
verifies googleOAuthConfigured=true and the /auth/google/start redirect if
REQUIRE_GOOGLE_OAUTH="true"; update deployment invocation to pass the rendered
REQUIRE_GOOGLE_OAUTH value (or rely on the default) so no-OAuth deployments can
pass the smoke test.

In `@README.md`:
- Around line 279-281: The README is missing the step to create the
ANSIBLE_VAULT_PASSWORD_FILE used by the deploy command; add an explicit
instruction to create ops/ansible/vault-password.txt containing the same vault
password (or a secure symlink to your password manager output) and set strict
permissions (e.g., chmod 600) before running
ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt pnpm deploy:prod so
the deploy can read the vault password; reference the vault file creation step
(cp ops/ansible/group_vars/all/vault.yml.example ...) and the
ANSIBLE_VAULT_PASSWORD_FILE environment variable in the same block.

---

Outside diff comments:
In `@README.md`:
- Around line 297-303: The README references REQUIRE_GOOGLE_OAUTH but the GitHub
Actions workflow uses vars.INVOLUTE_REQUIRE_GOOGLE_OAUTH; update the
documentation to mention INVOLUTE_REQUIRE_GOOGLE_OAUTH (and its expected
true/false values) wherever REQUIRE_GOOGLE_OAUTH appears (including the list of
required env vars and any examples), so the documented toggle matches the
workflow variable name and operators can correctly override the default.

---

Nitpick comments:
In @.env.production.example:
- Around line 6-8: The POSTGRES_PASSWORD and DATABASE_URL placeholders are
duplicated and can fall out of sync; update .env.production.example so operators
don't have to edit two places—either change DATABASE_URL to reference the
POSTGRES_PASSWORD variable (so the password is sourced from a single env var) or
add a short inline comment next to POSTGRES_PASSWORD and/or DATABASE_URL
(mentioning POSTGRES_PASSWORD and DATABASE_URL by name) explicitly telling
maintainers to keep them in sync if they manually replace the placeholder.

In `@ops/ansible/group_vars/all/.gitkeep`:
- Line 1: The .gitkeep file is correct and no change is required; keep the
existing file as-is to preserve the directory, or optionally replace the single
blank line with a short explanatory comment in the .gitkeep to document its
purpose (e.g., indicating it preserves the directory for Ansible group vars).

In `@ops/ansible/templates/env.production.j2`:
- Line 7: DATABASE_URL default currently embeds involute_postgres_password
verbatim which assumes the password is URL-safe; update the template so the
password is percent-encoded to be robust: change the default expression that
builds DATABASE_URL (the one using involute_database_url |
default('postgresql://' ~ (involute_postgres_user | default('involute')) ~ ':' ~
involute_postgres_password ~ '@db:5432/' ~ (involute_postgres_db |
default('involute')) ~ '?schema=public')) to apply Jinja's urlencode filter to
involute_postgres_password (e.g., involute_postgres_password|urlencode) so the
generated URL safely encodes special characters, or alternatively add a clear
template comment above DATABASE_URL stating that involute_postgres_password must
be URL-safe when not overriding involute_database_url.
- Around line 11-13: The template is missing the volume name variables used by
docker-compose.prod.images.yml: add POSTGRES_VOLUME_NAME,
CADDY_DATA_VOLUME_NAME, and CADDY_CONFIG_VOLUME_NAME to the env.production.j2
template with sensible defaults (e.g., match the defaults used in
.env.production.example or compose defaults) so rendered .env.production
contains the same keys; locate where INVOLUTE_IMAGE_REGISTRY /
INVOLUTE_IMAGE_NAMESPACE / INVOLUTE_IMAGE_TAG are defined and add analogous
lines emitting POSTGRES_VOLUME_NAME={{ postgres_volume_name | default('...') }},
CADDY_DATA_VOLUME_NAME={{ caddy_data_volume_name | default('...') }}, and
CADDY_CONFIG_VOLUME_NAME={{ caddy_config_volume_name | default('...') }} to
restore parity.

In `@scripts/prod-smoke.sh`:
- Around line 15-30: The temporary file created in SESSION_RESPONSE via mktemp
can leak on errors; after creating SESSION_RESPONSE call mktemp, immediately
register a cleanup trap (e.g. trap 'rm -f "$SESSION_RESPONSE"' EXIT) so the file
is removed on script exit, and then remove the later explicit rm -f
"$SESSION_RESPONSE" calls; ensure the trap is set while SESSION_RESPONSE is
defined and before running curl (referencing SESSION_RESPONSE, mktemp, and the
curl/SESSION_STATUS block).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb49f198-f2ed-482a-8de0-67f3943ffcd7

📥 Commits

Reviewing files that changed from the base of the PR and between 30942c5 and 47b3999.

📒 Files selected for processing (15)
  • .env.production.example
  • .github/workflows/deploy.yml
  • .gitignore
  • README.md
  • docker-compose.prod.images.yml
  • ops/ansible/group_vars/all/.gitkeep
  • ops/ansible/group_vars/all/vault.yml.example
  • ops/ansible/inventory/hosts.yml.example
  • ops/ansible/playbooks/deploy.yml
  • ops/ansible/templates/env.production.j2
  • package.json
  • packages/server/src/environment.test.ts
  • packages/server/src/environment.ts
  • scripts/postgres-backup.sh
  • scripts/prod-smoke.sh

INVOLUTE_GOOGLE_OAUTH_CLIENT_ID: ${{ secrets.INVOLUTE_GOOGLE_OAUTH_CLIENT_ID }}
INVOLUTE_GOOGLE_OAUTH_CLIENT_SECRET: ${{ secrets.INVOLUTE_GOOGLE_OAUTH_CLIENT_SECRET }}
INVOLUTE_GOOGLE_OAUTH_REDIRECT_URI: ${{ secrets.INVOLUTE_GOOGLE_OAUTH_REDIRECT_URI }}
INVOLUTE_IMAGE_TAG: ${{ vars.INVOLUTE_IMAGE_TAG || 'latest' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid defaulting production deploys to mutable latest.

Line 43 makes the deployed image depend on whatever latest points to at deploy time, which can race Docker publishing or redeploy an unintended build. Prefer requiring INVOLUTE_IMAGE_TAG for production, or deploy an immutable tag such as the published sha-<short-sha> tag.

Suggested direction
-      INVOLUTE_IMAGE_TAG: ${{ vars.INVOLUTE_IMAGE_TAG || 'latest' }}
+      INVOLUTE_IMAGE_TAG: ${{ vars.INVOLUTE_IMAGE_TAG }}

Then fail validation for production when it is unset, or set it from a workflow that runs after the image publish completes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy.yml at line 43, The workflow currently falls back
to a mutable default by setting INVOLUTE_IMAGE_TAG: ${{ vars.INVOLUTE_IMAGE_TAG
|| 'latest' }}, so update the deploy job to disallow using 'latest' for
production: remove the 'latest' fallback and add a validation step (or job-level
if check) that fails when the environment is production and INVOLUTE_IMAGE_TAG
is empty or equals 'latest'; alternatively set INVOLUTE_IMAGE_TAG from the
image-publish job output (e.g., a sha-<short-sha> tag) before running the deploy
job. Ensure the check references the INVOLUTE_IMAGE_TAG variable and the
production environment name so deploys cannot proceed with a mutable tag.

Comment on lines +3 to +13
involute_production:
ansible_host: tailnet-host.example.ts.net
ansible_user: root
involute_stack_profile: production
involute_bind_address: 100.x.y.z
involute_stack_profile: tailscale
involute_app_origin: http://100.x.y.z:4201
involute_seed_database: true
involute_admin_email_allowlist: "first-admin@example.com"
involute_auth_token: change-me
involute_viewer_assertion_secret: change-me
involute_app_domain: involute.example.com
involute_app_origin: https://involute.example.com
involute_smoke_base_url: https://involute.example.com
involute_seed_database: false
involute_require_google_oauth: true
# Put real secrets in ops/ansible/group_vars/all/vault.yml.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Example still carries Tailscale-style bind address under the production profile.

ansible_host: tailnet-host.example.ts.net and involute_bind_address: 100.x.y.z are CGNAT/Tailscale values, but the profile is now production with https://involute.example.com as origin. Via the template fallback chain in env.production.j2, this will set both SERVER_BIND_ADDRESS and WEB_BIND_ADDRESS to 100.x.y.z, which conflicts with the 127.0.0.1 defaults shown in .env.production.example and with a typical production layout (Caddy fronting loopback-bound services).

Either drop involute_bind_address here (so the template falls through to 127.0.0.1) or set involute_server_bind_address/involute_web_bind_address explicitly and use a realistic ansible_host, so the example isn't self-contradictory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ops/ansible/inventory/hosts.yml.example` around lines 3 - 13, The example
inventory's production profile is inconsistent: remove or replace the Tailscale
CGNAT values so templates don't set SERVER_BIND_ADDRESS/WEB_BIND_ADDRESS to
100.x.y.z; either delete involute_bind_address from the involute_production
block (letting env.production.j2 fall back to 127.0.0.1) or add explicit keys
involute_server_bind_address and involute_web_bind_address with
loopback/realistic production values, and change ansible_host from
tailnet-host.example.ts.net to a realistic host name or IP; update the
involute_production block accordingly so env.production.j2 and
.env.production.example are consistent.

Comment on lines +218 to +236
- name: Verify auth session reports Google OAuth configured
ansible.builtin.uri:
url: "{{ involute_smoke_base_url }}/auth/session"
method: GET
return_content: true
status_code:
- 200
- 401
register: involute_auth_session_smoke
failed_when: not (involute_auth_session_smoke.json.googleOAuthConfigured | default(false) | bool)

- name: Verify Google OAuth start redirects to Google
ansible.builtin.uri:
url: "{{ involute_smoke_base_url }}/auth/google/start"
method: GET
follow_redirects: none
status_code: 302
register: involute_google_start_smoke
failed_when: "'accounts.google.com' not in (involute_google_start_smoke.location | default(''))"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Smoke tasks contradict the require_google_oauth=false escape hatch.

The validation block at lines 89-100 explicitly permits production deployments with involute_require_google_oauth=false, and docker-compose.prod.images.yml (line 62) + environment.ts honor that flag. However, both smoke tasks here are unconditional and assert googleOAuthConfigured=true plus a redirect to accounts.google.com, so any deployment that intentionally disables OAuth will fail the playbook at this step.

Gate both tasks on the same flag, e.g.:

🛠️ Suggested fix
     - name: Verify auth session reports Google OAuth configured
       ansible.builtin.uri:
         url: "{{ involute_smoke_base_url }}/auth/session"
         method: GET
         return_content: true
         status_code:
           - 200
           - 401
       register: involute_auth_session_smoke
       failed_when: not (involute_auth_session_smoke.json.googleOAuthConfigured | default(false) | bool)
+      when: involute_require_google_oauth | default(true) | bool

     - name: Verify Google OAuth start redirects to Google
       ansible.builtin.uri:
         url: "{{ involute_smoke_base_url }}/auth/google/start"
         method: GET
         follow_redirects: none
         status_code: 302
       register: involute_google_start_smoke
       failed_when: "'accounts.google.com' not in (involute_google_start_smoke.location | default(''))"
+      when: involute_require_google_oauth | default(true) | bool

Note the same logic is effectively baked into scripts/prod-smoke.sh, which is fine for its intended default use against production where OAuth is required, but consider documenting that limitation alongside the script if the flag gains broader use.

🧰 Tools
🪛 Checkov (3.2.519)

[medium] 218-230: Ensure that HTTPS url is used with uri

(CKV2_ANSIBLE_1)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ops/ansible/playbooks/deploy.yml` around lines 218 - 236, The two
unconditional smoke tasks ("Verify auth session reports Google OAuth configured"
and "Verify Google OAuth start redirects to Google") must be gated by the
deployment flag so deployments with involute_require_google_oauth=false don't
fail; update both tasks to include a when condition such as using the
involute_require_google_oauth variable (e.g. when: involute_require_google_oauth
| default(true) | bool) so they run only when OAuth is required, leaving the
existing register names involute_auth_session_smoke and
involute_google_start_smoke and their failed_when logic unchanged.

Comment thread package.json
"deploy:bootstrap": "sh scripts/ansible-playbook.sh ops/ansible/playbooks/bootstrap-host.yml",
"deploy:prod": "INVOLUTE_STACK_PROFILE=production sh scripts/ansible-playbook.sh ops/ansible/playbooks/deploy.yml",
"deploy:tailscale": "INVOLUTE_STACK_PROFILE=tailscale sh scripts/ansible-playbook.sh ops/ansible/playbooks/deploy.yml",
"smoke:prod": "scripts/prod-smoke.sh",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make production smoke respect disabled Google OAuth.

smoke:prod currently runs scripts/prod-smoke.sh, and that script always requires googleOAuthConfigured=true plus a Google redirect. That conflicts with the new INVOLUTE_REQUIRE_GOOGLE_OAUTH=false / REQUIRE_GOOGLE_OAUTH=false path, so a valid no-OAuth deployment will still fail smoke.

Suggested direction

Gate the OAuth-specific checks in scripts/prod-smoke.sh on an env var, for example:

REQUIRE_GOOGLE_OAUTH="${REQUIRE_GOOGLE_OAUTH:-true}"

# Always check /health and /auth/session shape/status.
# Only require googleOAuthConfigured=true and /auth/google/start redirect when:
if [ "$REQUIRE_GOOGLE_OAUTH" = "true" ]; then
  # existing googleOAuthConfigured and redirect assertions
fi

Then call it with the rendered production flag during Ansible/GitHub deploy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package.json` at line 12, The production smoke script invoked by the
"smoke:prod" npm task calls scripts/prod-smoke.sh which currently always asserts
googleOAuthConfigured and the /auth/google/start redirect; change
scripts/prod-smoke.sh to gate those OAuth-specific assertions behind an
environment variable (e.g.,
REQUIRE_GOOGLE_OAUTH="${REQUIRE_GOOGLE_OAUTH:-true}") so the script always
performs generic checks (/health and /auth/session shape/status) but only
verifies googleOAuthConfigured=true and the /auth/google/start redirect if
REQUIRE_GOOGLE_OAUTH="true"; update deployment invocation to pass the rendered
REQUIRE_GOOGLE_OAUTH value (or rely on the default) so no-OAuth deployments can
pass the smoke test.

Comment thread README.md
Comment on lines +279 to +281
cp ops/ansible/group_vars/all/vault.yml.example ops/ansible/group_vars/all/vault.yml
ansible-vault encrypt ops/ansible/group_vars/all/vault.yml
ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt pnpm deploy:prod
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include the vault password-file creation step.

This flow encrypts vault.yml, then deploys with ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt, but never tells the operator to create that file with the same vault password. The deploy command will fail if the file is missing or contains a different password.

Docs patch
 cp ops/ansible/group_vars/all/vault.yml.example ops/ansible/group_vars/all/vault.yml
-ansible-vault encrypt ops/ansible/group_vars/all/vault.yml
+printf '%s\n' '<choose-a-vault-password>' > ops/ansible/vault-password.txt
+chmod 600 ops/ansible/vault-password.txt
+ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt \
+  ansible-vault encrypt ops/ansible/group_vars/all/vault.yml
 ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt pnpm deploy:prod
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cp ops/ansible/group_vars/all/vault.yml.example ops/ansible/group_vars/all/vault.yml
ansible-vault encrypt ops/ansible/group_vars/all/vault.yml
ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt pnpm deploy:prod
cp ops/ansible/group_vars/all/vault.yml.example ops/ansible/group_vars/all/vault.yml
printf '%s\n' '<choose-a-vault-password>' > ops/ansible/vault-password.txt
chmod 600 ops/ansible/vault-password.txt
ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt \
ansible-vault encrypt ops/ansible/group_vars/all/vault.yml
ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt pnpm deploy:prod
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 279 - 281, The README is missing the step to create
the ANSIBLE_VAULT_PASSWORD_FILE used by the deploy command; add an explicit
instruction to create ops/ansible/vault-password.txt containing the same vault
password (or a secure symlink to your password manager output) and set strict
permissions (e.g., chmod 600) before running
ANSIBLE_VAULT_PASSWORD_FILE=ops/ansible/vault-password.txt pnpm deploy:prod so
the deploy can read the vault password; reference the vault file creation step
(cp ops/ansible/group_vars/all/vault.yml.example ...) and the
ANSIBLE_VAULT_PASSWORD_FILE environment variable in the same block.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +237 to +255
- name: Verify auth session reports Google OAuth configured
ansible.builtin.uri:
url: "{{ involute_smoke_base_url }}/auth/session"
method: GET
return_content: true
status_code:
- 200
- 401
register: involute_auth_session_smoke
failed_when: not (involute_auth_session_smoke.json.googleOAuthConfigured | default(false) | bool)

- name: Verify Google OAuth start redirects to Google
ansible.builtin.uri:
url: "{{ involute_smoke_base_url }}/auth/google/start"
method: GET
follow_redirects: none
status_code: 302
register: involute_google_start_smoke
failed_when: "'accounts.google.com' not in (involute_google_start_smoke.location | default(''))"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Google OAuth smoke tests run unconditionally, breaking tailscale deployments

The two new smoke-test tasks ("Verify auth session reports Google OAuth configured" and "Verify Google OAuth start redirects to Google") have no when guard, so they execute for every deployment profile including tailscale. Tailscale deployments intentionally leave Google OAuth unconfigured (env.tailscale.j2 sets all three OAuth env vars to empty strings), which means:

  1. /auth/session returns googleOAuthConfigured: falsefailed_when at line 246 triggers.
  2. /auth/google/start returns HTTP 503 (see packages/server/src/auth-routes.ts:96-101) → status_code: 302 at line 253 fails the task before failed_when is even evaluated.

Every other OAuth-conditional check in this PR is correctly gated: the Ansible variable validation (ops/ansible/playbooks/deploy.yml:89-100), the GitHub Actions secret check (.github/workflows/deploy.yml:73-80), and the server startup guard (packages/server/src/environment.ts:57-61). These two tasks are the only ones missing the condition.

Prompt for agents
The two Google OAuth smoke-test tasks at lines 237-255 of ops/ansible/playbooks/deploy.yml need a `when` condition to only run when Google OAuth is expected. They should mirror the same condition used in the variable validation block at lines 89-100.

Add the following to both tasks:

  when: >
    (involute_stack_profile | default(involute_default_stack_profile)) == 'production'
    and (involute_require_google_oauth | default(true) | bool)

This ensures the smoke tests only run for production deployments with involute_require_google_oauth enabled, which is consistent with the tailscale env template (env.tailscale.j2) leaving Google OAuth env vars empty, and with the GitHub Actions validation (.github/workflows/deploy.yml:66-80) which gates OAuth checks on DEPLOY_PROFILE=production and INVOLUTE_REQUIRE_GOOGLE_OAUTH=true.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant