Skip to content

infra PR3: add Terraform env stacks, CI validation, and runbook docs#18

Merged
hudsonaikins merged 13 commits into
mainfrom
codex/infra-pr3-terraform-envs-ci-docs
Feb 26, 2026
Merged

infra PR3: add Terraform env stacks, CI validation, and runbook docs#18
hudsonaikins merged 13 commits into
mainfrom
codex/infra-pr3-terraform-envs-ci-docs

Conversation

@hudsonaikins
Copy link
Copy Markdown
Contributor

@hudsonaikins hudsonaikins commented Feb 26, 2026

Summary

  • add Terraform environment stacks:
    • infra/terraform/environments/dev
    • infra/terraform/environments/prod
  • add Terraform CI workflow to run fmt/init/validate on root, modules, and environments
  • add Terraform operations runbook in docs (workflows/terraform-runbook)
  • update architecture/infrastructure/promotion docs and docs navigation to include IaC operations guidance

Notes

  • This PR is intentionally stacked on codex/infra-pr2-terraform-gcp-modules.

Validation

  • terraform -chdir=infra/terraform init -backend=false
  • terraform -chdir=infra/terraform fmt -check -recursive
  • terraform -chdir=infra/terraform validate
  • terraform -chdir=infra/terraform/environments/dev init -backend=false && terraform -chdir=infra/terraform/environments/dev validate
  • terraform -chdir=infra/terraform/environments/prod init -backend=false && terraform -chdir=infra/terraform/environments/prod validate

Greptile Summary

This PR establishes the foundational Infrastructure as Code (IaC) layer for Neural by adding complete Terraform environment stacks for dev and prod, CI validation, and operational documentation.

Major additions:

  • Environment stacks in infra/terraform/environments/{dev,prod} that compose network, runner VM, secrets, and observability modules
  • GitHub Actions workflow validating Terraform fmt/init/validate across root, modules, and environments on every PR
  • Comprehensive deployment runbook (workflows/terraform-runbook) covering bootstrap, state management, secret injection, and troubleshooting
  • Documentation updates linking the new Terraform operations guide throughout architecture, infrastructure, and promotion docs

Environment configuration:

  • Dev: e2-standard-2 machine, 10.30.0.0/24 subnet, alerting disabled by default
  • Prod: e2-standard-4 machine, 10.40.0.0/24 subnet, alerting enabled, additional neural-runtime-env secret
  • Both environments use GCS remote state backends and IAP-based SSH access (35.235.240.0/20)

Note: The service account collision issue mentioned in previous comments remains - both environments use the default neural-runner service account ID. If deploying to the same GCP project, override service_account_id in one environment to avoid conflicts.

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations for multi-environment deployment
  • The infrastructure code is well-structured, follows Terraform best practices, and includes comprehensive validation through CI. The main deduction is for the service account naming collision risk when deploying both dev and prod to the same GCP project (already noted in previous comments). All other previously identified issues have been addressed through updates to the PR.
  • The environment main.tf files need attention if deploying both dev and prod to the same GCP project - consider passing explicit service_account_id overrides to avoid service account name collisions

Important Files Changed

Filename Overview
.github/workflows/terraform.yml added CI workflow to validate Terraform formatting, init, and validate for root, modules, and environments
infra/terraform/environments/dev/main.tf created dev environment stack with network, runner, secrets, and observability modules; uses GCS backend
infra/terraform/environments/prod/main.tf created prod environment stack mirroring dev with env-prod tags; service account collision risk if deployed to same GCP project
infra/terraform/environments/dev/variables.tf defined dev variables with e2-standard-2 machine, 10.30.0.0/24 subnet, IAP SSH access, and alerting disabled by default
infra/terraform/environments/prod/variables.tf defined prod variables with e2-standard-4 machine, 10.40.0.0/24 subnet, additional neural-runtime-env secret, and alerting enabled by default
docs/workflows/terraform-runbook.mdx added comprehensive Terraform operations guide covering bootstrap, deployment, secret injection, and troubleshooting

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Environment Stack<br/>dev/prod] --> B[Network Module]
    A --> C[Runner VM Module]
    A --> D[Secrets Module]
    A --> E[Observability Module]
    
    B --> F[VPC & Subnet<br/>10.30.0.0/24 dev<br/>10.40.0.0/24 prod]
    B --> G[Firewall Rules<br/>IAP SSH Access]
    
    C --> H[Compute Engine VM<br/>e2-standard-2 dev<br/>e2-standard-4 prod]
    C --> I[Service Account<br/>neural-runner]
    C --> J[Startup Script<br/>Docker Bootstrap]
    
    D --> K[Secret Manager<br/>kalshi-api-key-id<br/>kalshi-private-key-pem]
    D --> L[IAM Binding<br/>secretAccessor role]
    
    E --> M[Log Metrics<br/>Error Alerting]
    E --> N[Alert Policy<br/>enabled in prod]
    
    I -.grants access.-> L
    
    style A fill:#e1f5ff
    style I fill:#fff3cd
    style K fill:#d4edda
Loading

Last reviewed commit: 6e56b85

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19 files reviewed, 8 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +44 to +50
module "secrets" {
source = "../../modules/secrets"

project_id = var.project_id
secret_ids = var.secret_ids
runner_service_account_email = module.runner.service_account_email
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing secrets module breaks init/validate

Both the dev and prod environment stacks reference ../../modules/secrets, but this module does not exist anywhere in the repository — not in the base branch (codex/infra-pr2-terraform-gcp-modules), not at HEAD, and not introduced in this PR. The base branch only contains network, runner_vm, and observability modules.

This will cause terraform init to fail with a "module not found" error, which means the CI workflow's "Terraform validate environments" step will also fail. The PR's own validation steps (terraform -chdir=infra/terraform/environments/dev init -backend=false && terraform -chdir=infra/terraform/environments/dev validate) cannot pass without it.

Either add the secrets module in this PR or remove the module "secrets" references from the environment stacks until the module is introduced.

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 44-50

Comment:
**Missing `secrets` module breaks init/validate**

Both the dev and prod environment stacks reference `../../modules/secrets`, but this module does not exist anywhere in the repository — not in the base branch (`codex/infra-pr2-terraform-gcp-modules`), not at HEAD, and not introduced in this PR. The base branch only contains `network`, `runner_vm`, and `observability` modules.

This will cause `terraform init` to fail with a "module not found" error, which means the CI workflow's "Terraform validate environments" step will also fail. The PR's own validation steps (`terraform -chdir=infra/terraform/environments/dev init -backend=false && terraform -chdir=infra/terraform/environments/dev validate`) cannot pass without it.

Either add the `secrets` module in this PR or remove the `module "secrets"` references from the environment stacks until the module is introduced.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +10
terraform {
required_version = ">= 1.5.0, < 2.0.0"

required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev stack missing remote backend block

The prod environment declares backend "gcs" {} for remote state, but the dev environment has no backend block and will default to local state. This means the dev state file won't be shared or persisted across team members or CI runs.

If this is intentional (e.g., dev is always local-only), it may be worth adding a comment to document the choice. Otherwise, consider adding the same backend "gcs" {} block so that dev state can also be managed remotely with -backend-config flags, matching the pattern described in the runbook.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 1-10

Comment:
**Dev stack missing remote backend block**

The prod environment declares `backend "gcs" {}` for remote state, but the dev environment has no backend block and will default to local state. This means the dev state file won't be shared or persisted across team members or CI runs.

If this is intentional (e.g., dev is always local-only), it may be worth adding a comment to document the choice. Otherwise, consider adding the same `backend "gcs" {}` block so that dev state can also be managed remotely with `-backend-config` flags, matching the pattern described in the runbook.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +36 to +41
- name: Terraform validate modules
run: |
for module in network runner_vm secrets observability; do
terraform -chdir=infra/terraform/modules/$module init -backend=false
terraform -chdir=infra/terraform/modules/$module validate
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI validates a secrets module that doesn't exist

The secrets module is listed in the loop (for module in network runner_vm secrets observability), but infra/terraform/modules/secrets/ does not exist in the repository. This will cause the "Terraform validate modules" step to fail with a directory-not-found error.

Remove secrets from this loop until the module is added, or add the module in this PR.

Suggested change
- name: Terraform validate modules
run: |
for module in network runner_vm secrets observability; do
terraform -chdir=infra/terraform/modules/$module init -backend=false
terraform -chdir=infra/terraform/modules/$module validate
done
run: |
for module in network runner_vm observability; do
terraform -chdir=infra/terraform/modules/$module init -backend=false
terraform -chdir=infra/terraform/modules/$module validate
done
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/terraform.yml
Line: 36-41

Comment:
**CI validates a `secrets` module that doesn't exist**

The `secrets` module is listed in the loop (`for module in network runner_vm secrets observability`), but `infra/terraform/modules/secrets/` does not exist in the repository. This will cause the "Terraform validate modules" step to fail with a directory-not-found error.

Remove `secrets` from this loop until the module is added, or add the module in this PR.

```suggestion
        run: |
          for module in network runner_vm observability; do
            terraform -chdir=infra/terraform/modules/$module init -backend=false
            terraform -chdir=infra/terraform/modules/$module validate
          done
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile


- name: Terraform validate modules
run: |
for module in network runner_vm secrets observability; do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secrets module doesn't exist in infra/terraform/modules/ - this validation step will fail

Suggested change
for module in network runner_vm secrets observability; do
for module in network runner_vm observability; do
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/terraform.yml
Line: 38

Comment:
`secrets` module doesn't exist in `infra/terraform/modules/` - this validation step will fail

```suggestion
          for module in network runner_vm observability; do
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +44 to +49
module "secrets" {
source = "../../modules/secrets"

project_id = var.project_id
secret_ids = var.secret_ids
runner_service_account_email = module.runner.service_account_email
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

../../modules/secrets doesn't exist - terraform init will fail. The base branch is missing this module despite the commit message mentioning it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 44-49

Comment:
`../../modules/secrets` doesn't exist - terraform init will fail. The base branch is missing this module despite the commit message mentioning it.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +49 to +55
module "secrets" {
source = "../../modules/secrets"

project_id = var.project_id
secret_ids = var.secret_ids
runner_service_account_email = module.runner.service_account_email
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

../../modules/secrets doesn't exist - terraform init will fail. Need to either add the secrets module or remove this block.

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/prod/main.tf
Line: 49-55

Comment:
`../../modules/secrets` doesn't exist - terraform init will fail. Need to either add the secrets module or remove this block.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread docs/workflows/terraform-runbook.mdx Outdated
Comment on lines +33 to +34
-backend-config=\"bucket=neural-tf-state-prod\" \
-backend-config=\"prefix=neural/prod\"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaped quotes in bash command - these should be unescaped for copy-paste usability

Suggested change
-backend-config=\"bucket=neural-tf-state-prod\" \
-backend-config=\"prefix=neural/prod\"
-backend-config="bucket=neural-tf-state-prod" \
-backend-config="prefix=neural/prod"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/workflows/terraform-runbook.mdx
Line: 33-34

Comment:
Escaped quotes in bash command - these should be unescaped for copy-paste usability

```suggestion
  -backend-config="bucket=neural-tf-state-prod" \
  -backend-config="prefix=neural/prod"
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment thread docs/workflows/terraform-runbook.mdx Outdated
Comment on lines +62 to +63
echo -n \"<KALSHI_API_KEY_ID>\" | gcloud secrets versions add kalshi-api-key-id --data-file=-
echo -n \"<KALSHI_PRIVATE_KEY_PEM>\" | gcloud secrets versions add kalshi-private-key-pem --data-file=-
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaped quotes in bash commands - remove backslashes for proper copy-paste

Suggested change
echo -n \"<KALSHI_API_KEY_ID>\" | gcloud secrets versions add kalshi-api-key-id --data-file=-
echo -n \"<KALSHI_PRIVATE_KEY_PEM>\" | gcloud secrets versions add kalshi-private-key-pem --data-file=-
echo -n "<KALSHI_API_KEY_ID>" | gcloud secrets versions add kalshi-api-key-id --data-file=-
echo -n "<KALSHI_PRIVATE_KEY_PEM>" | gcloud secrets versions add kalshi-private-key-pem --data-file=-
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/workflows/terraform-runbook.mdx
Line: 62-63

Comment:
Escaped quotes in bash commands - remove backslashes for proper copy-paste

```suggestion
echo -n "<KALSHI_API_KEY_ID>" | gcloud secrets versions add kalshi-api-key-id --data-file=-
echo -n "<KALSHI_PRIVATE_KEY_PEM>" | gcloud secrets versions add kalshi-private-key-pem --data-file=-
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +16 to +19
output "secret_ids" {
value = module.secrets.secret_ids
description = "Provisioned Secret Manager IDs"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

references module.secrets.secret_ids but secrets module doesn't exist

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/outputs.tf
Line: 16-19

Comment:
references `module.secrets.secret_ids` but secrets module doesn't exist

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +16 to +19
output "secret_ids" {
value = module.secrets.secret_ids
description = "Provisioned Secret Manager IDs"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

references module.secrets.secret_ids but secrets module doesn't exist

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/prod/outputs.tf
Line: 16-19

Comment:
references `module.secrets.secret_ids` but secrets module doesn't exist

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +5
project_id = "my-gcp-project"
region = "us-central1"
zone = "us-central1-f"

notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prod example missing allow_ssh_cidrs and enable_alert_policy that are shown in dev example. Consider adding these for consistency:

Suggested change
project_id = "my-gcp-project"
region = "us-central1"
zone = "us-central1-f"
notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"]
project_id = "my-gcp-project"
region = "us-central1"
zone = "us-central1-f"
allow_ssh_cidrs = ["35.235.240.0/20"] # IAP TCP tunnel range
enable_alert_policy = true
notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"]

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/prod/terraform.tfvars.example
Line: 1-5

Comment:
prod example missing `allow_ssh_cidrs` and `enable_alert_policy` that are shown in dev example. Consider adding these for consistency:

```suggestion
project_id = "my-gcp-project"
region     = "us-central1"
zone       = "us-central1-f"

allow_ssh_cidrs = ["35.235.240.0/20"] # IAP TCP tunnel range

enable_alert_policy   = true
notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"]
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +30 to +34
variable "allow_ssh_cidrs" {
description = "CIDR blocks allowed for SSH"
type = list(string)
default = []
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev defaults to no SSH access [] while prod defaults to IAP range ["35.235.240.0/20"]. Typically dev needs easier access for debugging. Consider aligning defaults or documenting the rationale.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/variables.tf
Line: 30-34

Comment:
dev defaults to no SSH access `[]` while prod defaults to IAP range `["35.235.240.0/20"]`. Typically dev needs easier access for debugging. Consider aligning defaults or documenting the rationale.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

output "secret_ids" {
value = module.secrets.secret_ids
description = "Provisioned Secret Manager IDs"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev outputs missing log_metric_type that prod has - consider adding for consistency:

Suggested change
}
description = "Provisioned Secret Manager IDs"
}
output "log_metric_type" {
value = module.observability.log_metric_type
description = "Log metric used for error alerting"
}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/outputs.tf
Line: 19

Comment:
dev outputs missing `log_metric_type` that prod has - consider adding for consistency:

```suggestion
  description = "Provisioned Secret Manager IDs"
}

output "log_metric_type" {
  value       = module.observability.log_metric_type
  description = "Log metric used for error alerting"
}
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@@ -1,3 +1,5 @@
"""Helpers for resolving Polymarket US auth credentials from env/files."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polymarket docstrings and formatting changes are unrelated to the infrastructure PR scope - consider separating unrelated changes into their own PRs

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: neural/auth/polymarket_us_env.py
Line: 1

Comment:
polymarket docstrings and formatting changes are unrelated to the infrastructure PR scope - consider separating unrelated changes into their own PRs

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +35 to +47
module "runner" {
source = "../../modules/runner_vm"

project_id = var.project_id
zone = var.zone
instance_name = "${var.stack_name}-runner"
machine_type = var.machine_type
network_self_link = module.network.network_self_link
subnetwork_self_link = module.network.subnetwork_self_link

startup_script = var.startup_script
tags = ["neural-runner", "env-dev"]
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service account name collision across environments

Both dev and prod runner modules rely on the runner_vm module's default service_account_id = "neural-runner" (see infra/terraform/modules/runner_vm/variables.tf:41), since neither environment passes an explicit override. If both environments are deployed to the same GCP project, terraform apply on the second environment will fail with a "service account already exists" error because GCP service account IDs are project-unique.

Consider passing a per-environment service_account_id to disambiguate:

Suggested change
module "runner" {
source = "../../modules/runner_vm"
project_id = var.project_id
zone = var.zone
instance_name = "${var.stack_name}-runner"
machine_type = var.machine_type
network_self_link = module.network.network_self_link
subnetwork_self_link = module.network.subnetwork_self_link
startup_script = var.startup_script
tags = ["neural-runner", "env-dev"]
}
module "runner" {
source = "../../modules/runner_vm"
project_id = var.project_id
zone = var.zone
instance_name = "${var.stack_name}-runner"
machine_type = var.machine_type
network_self_link = module.network.network_self_link
subnetwork_self_link = module.network.subnetwork_self_link
service_account_id = "${var.stack_name}-runner"
startup_script = var.startup_script
tags = ["neural-runner", "env-dev"]
}

The same change is needed in infra/terraform/environments/prod/main.tf:35-47.

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 35-47

Comment:
**Service account name collision across environments**

Both dev and prod runner modules rely on the `runner_vm` module's default `service_account_id = "neural-runner"` (see `infra/terraform/modules/runner_vm/variables.tf:41`), since neither environment passes an explicit override. If both environments are deployed to the same GCP project, `terraform apply` on the second environment will fail with a "service account already exists" error because GCP service account IDs are project-unique.

Consider passing a per-environment `service_account_id` to disambiguate:

```suggestion
module "runner" {
  source = "../../modules/runner_vm"

  project_id           = var.project_id
  zone                 = var.zone
  instance_name        = "${var.stack_name}-runner"
  machine_type         = var.machine_type
  network_self_link    = module.network.network_self_link
  subnetwork_self_link = module.network.subnetwork_self_link
  service_account_id   = "${var.stack_name}-runner"

  startup_script = var.startup_script
  tags           = ["neural-runner", "env-dev"]
}
```

The same change is needed in `infra/terraform/environments/prod/main.tf:35-47`.

How can I resolve this? If you propose a fix, please make it concise.

…-pr3-terraform-envs-ci-docs

# Conflicts:
#	docs/basics/infrastructure.mdx
#	docs/mint.json
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 26, 2026

Additional Comments (1)

.gitignore, line 1
Missing Terraform-specific entries that should be ignored:

# Python
# Terraform
.terraform/
*.tfstate
*.tfstate.*
*.tfvars
!*.tfvars.example
override.tf
override.tf.json
*_override.tf
*_override.tf.json
.terraformrc
terraform.rc

# Python

Without these, developers risk committing:

  • .terraform/ provider binaries (large files)
  • *.tfstate files containing sensitive infrastructure state and secrets
  • *.tfvars files with actual credentials

@hudsonaikins hudsonaikins changed the base branch from codex/infra-pr2-terraform-gcp-modules to main February 26, 2026 03:06
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 26, 2026

Additional Comments (1)

infra/terraform/environments/dev/main.tf, line 47
Runner VM OAuth scopes block Secret Manager access at runtime

The runner_vm module defaults service_account_scopes to only logging.write, monitoring.write, and devstorage.read_only (see infra/terraform/modules/runner_vm/variables.tf:50-58). Neither the dev nor prod environment overrides this.

While the secrets module correctly grants roles/secretmanager.secretAccessor at the IAM level, GCE OAuth scopes act as an additional authorization layer — API calls from the VM are restricted to the intersection of IAM permissions and OAuth scopes. Without cloud-platform (or a Secret Manager-specific scope), runtime calls to Secret Manager from this VM will fail with a 403, even though the IAM binding exists.

This directly contradicts the runbook's section 4 ("resolve secrets at startup using the runner service account").

Add service_account_scopes to both environment stacks to include cloud-platform:

module "runner" {
  source = "../../modules/runner_vm"

  project_id           = var.project_id
  zone                 = var.zone
  instance_name        = "${var.stack_name}-runner"
  machine_type         = var.machine_type
  network_self_link    = module.network.network_self_link
  subnetwork_self_link = module.network.subnetwork_self_link

  startup_script         = var.startup_script
  tags                   = ["neural-runner", "env-dev"]
  service_account_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}

The same change is needed in infra/terraform/environments/prod/main.tf:35-47.

@github-actions
Copy link
Copy Markdown

📚 Documentation Status

✅ Documentation changes detected

  • Documentation structure validated
  • Links checked for broken references

This comment is automatically generated by the documentation workflow.

@hudsonaikins hudsonaikins merged commit ba7d68e into main Feb 26, 2026
22 checks passed
@hudsonaikins hudsonaikins deleted the codex/infra-pr3-terraform-envs-ci-docs branch March 8, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant