Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ dmyp.json

# Secrets and Credentials (CRITICAL - NEVER COMMIT)
secrets/
!infra/terraform/modules/secrets/
!infra/terraform/modules/secrets/*.tf
*.pem
*.key
*_private_key.pem
Expand Down Expand Up @@ -121,4 +123,4 @@ site/
# Temporary files
tmp/
temp/
*.tmp
*.tmp
22 changes: 22 additions & 0 deletions infra/terraform/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

94 changes: 94 additions & 0 deletions infra/terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Neural Terraform Baseline (GCP)

This directory contains a reference Terraform baseline for running Neural bots on GCP.
It is designed as a starting point for teams that want reproducible infrastructure around
Docker-based execution.

## Modules

- `modules/network`: VPC, subnet, and baseline firewall rules.
- `modules/runner_vm`: Compute Engine runner VM with Docker-friendly bootstrap hooks.
- `modules/secrets`: Secret Manager secret containers and runner service account access grants.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modules/secrets module is documented here but doesn't exist in this PR. Either remove this documentation or add the missing module.

Suggested change
- `modules/secrets`: Secret Manager secret containers and runner service account access grants.
- `modules/observability`: Log-based metric and optional alert policy for runtime errors.
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 11

Comment:
The `modules/secrets` module is documented here but doesn't exist in this PR. Either remove this documentation or add the missing module.

```suggestion
- `modules/observability`: Log-based metric and optional alert policy for runtime errors.
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modules/secrets is documented but doesn't exist in this PR - remove from list or add the module

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 11

Comment:
`modules/secrets` is documented but doesn't exist in this PR - remove from list or add the module

How can I resolve this? If you propose a fix, please make it concise.

- `modules/observability`: Log-based metric and optional alert policy for runtime errors.

## Module contracts

### `network`
Inputs:
- `project_id` (string)
- `region` (string)
- `network_name` (string)
- `subnet_name` (string)
- `subnet_cidr` (string)
- `enable_private_google_access` (bool, default `true`)
- `allow_ssh_cidrs` (list(string), default `[]`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing enable_private_google_access input (bool, default true) from the module contract

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 23

Comment:
Missing `enable_private_google_access` input (bool, default `true`) from the module contract

How can I resolve this? If you propose a fix, please make it concise.

- `internal_tcp_ports` (list(string), default `[]`)
- `internal_udp_ports` (list(string), default `[]`)

Outputs:
- `network_name`
- `network_self_link`
- `subnetwork_name`
- `subnetwork_self_link`
Comment on lines +14 to +32
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable_private_google_access is missing from the documented inputs (used in modules/network/variables.tf and modules/network/main.tf:14)

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 14-29

Comment:
`enable_private_google_access` is missing from the documented inputs (used in `modules/network/variables.tf` and `modules/network/main.tf:14`)

How can I resolve this? If you propose a fix, please make it concise.


### `runner_vm`
Inputs:
- `project_id` (string)
- `zone` (string)
- `instance_name` (string)
- `machine_type` (string, default `e2-standard-2`)
- `network_self_link` (string)
- `subnetwork_self_link` (string)
- `create_service_account` (bool, default `true`)
- `service_account_id` (string, default `neural-runner`)
- `service_account_email` (string, required when `create_service_account=false`)
- `service_account_scopes` (list(string), default logging/monitoring/container-pull scopes; add Secret Manager scope if needed)
- `assign_public_ip` (bool, default `true`)
- `startup_script` (string, optional)
- `metadata` (map(string), default `{}`)
- `tags` (list(string), default `["neural-runner"]`)
- `boot_image` (string, default Debian 12 family image)
- `boot_disk_size_gb` (number, default `50`)
- `boot_disk_type` (string, default `pd-balanced`)

Comment on lines +34 to +53
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing several inputs from the module contract: service_account_id, metadata, tags, boot_image, boot_disk_size_gb, boot_disk_type. These are all optional with defaults but should be documented for completeness.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 31-42

Comment:
Missing several inputs from the module contract: `service_account_id`, `metadata`, `tags`, `boot_image`, `boot_disk_size_gb`, `boot_disk_type`. These are all optional with defaults but should be documented for completeness.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Outputs:
- `instance_name`
- `instance_self_link`
- `instance_external_ip`
Comment on lines +34 to +57
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing several variables from documentation: service_account_id, metadata, tags, boot_image, boot_disk_size_gb, boot_disk_type (all defined in modules/runner_vm/variables.tf)

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 31-46

Comment:
Missing several variables from documentation: `service_account_id`, `metadata`, `tags`, `boot_image`, `boot_disk_size_gb`, `boot_disk_type` (all defined in `modules/runner_vm/variables.tf`)

How can I resolve this? If you propose a fix, please make it concise.

- `service_account_email`

### `secrets`
Inputs:
- `project_id` (string)
- `secret_ids` (set(string))
- `runner_service_account_email` (string)

Outputs:
- `secret_ids`
- `secret_resource_ids`
Comment on lines +62 to +68
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The secrets module is documented but wasn't actually added in this PR

Suggested change
- `project_id` (string)
- `secret_ids` (set(string))
- `runner_service_account_email` (string)
Outputs:
- `secret_ids`
- `secret_resource_ids`
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 51-57

Comment:
The `secrets` module is documented but wasn't actually added in this PR

```suggestion
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +60 to +68
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entire secrets module section should be removed - the module doesn't exist in this PR

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 60-68

Comment:
entire `secrets` module section should be removed - the module doesn't exist in this PR

How can I resolve this? If you propose a fix, please make it concise.


Comment on lines +60 to +69
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation for secrets module exists but the module itself is missing from this PR. Remove this entire section or implement the module.

Suggested change
### `secrets`
Inputs:
- `project_id` (string)
- `secret_ids` (set(string))
- `runner_service_account_email` (string)
Outputs:
- `secret_ids`
- `secret_resource_ids`
### `observability`
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/README.md
Line: 49-58

Comment:
Documentation for `secrets` module exists but the module itself is missing from this PR. Remove this entire section or implement the module.

```suggestion
### `observability`
```

How can I resolve this? If you propose a fix, please make it concise.

### `observability`
Inputs:
- `project_id` (string)
- `metric_name` (string, default `neural_runner_error_count`)
- `instance_name` (string)
- `enable_alert_policy` (bool, default `false`)
- `notification_channels` (list(string), default `[]`)

Outputs:
- `log_metric_name`
- `log_metric_type`
- `alert_policy_id`

## Usage

Wire these modules from environment stacks (added in PR-3) and run:

```bash
terraform init
terraform fmt -check -recursive
terraform validate
```

This baseline intentionally avoids provider-specific app deployment logic so teams can swap
the runtime bootstrap (Docker, private providers, or orchestrators) without rewriting core IaC.
58 changes: 58 additions & 0 deletions infra/terraform/modules/network/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
resource "google_compute_network" "this" {
project = var.project_id
name = var.network_name
auto_create_subnetworks = false
routing_mode = "GLOBAL"
}

resource "google_compute_subnetwork" "this" {
project = var.project_id
region = var.region
name = var.subnet_name
ip_cidr_range = var.subnet_cidr
network = google_compute_network.this.id
private_ip_google_access = var.enable_private_google_access
}

resource "google_compute_firewall" "allow_internal" {
project = var.project_id
name = "${var.network_name}-allow-internal"
network = google_compute_network.this.name

dynamic "allow" {
for_each = length(var.internal_tcp_ports) > 0 ? [1] : []
content {
protocol = "tcp"
ports = var.internal_tcp_ports
}
}

dynamic "allow" {
for_each = length(var.internal_udp_ports) > 0 ? [1] : []
content {
protocol = "udp"
ports = var.internal_udp_ports
}
}

allow {
protocol = "icmp"
}

source_ranges = [var.subnet_cidr]
Comment on lines +17 to +42
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firewall rule allows ALL TCP and UDP ports from subnet CIDR without restrictions - consider limiting to specific ports needed for internal services

Suggested change
resource "google_compute_firewall" "allow_internal" {
project = var.project_id
name = "${var.network_name}-allow-internal"
network = google_compute_network.this.name
allow {
protocol = "tcp"
}
allow {
protocol = "udp"
}
allow {
protocol = "icmp"
}
source_ranges = [var.subnet_cidr]
resource "google_compute_firewall" "allow_internal" {
project = var.project_id
name = "${var.network_name}-allow-internal"
network = google_compute_network.this.name
allow {
protocol = "icmp"
}
# Add specific port ranges as needed, e.g.:
# allow {
# protocol = "tcp"
# ports = ["80", "443", "8080"]
# }
source_ranges = [var.subnet_cidr]
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/modules/network/main.tf
Line: 17-34

Comment:
firewall rule allows ALL TCP and UDP ports from subnet CIDR without restrictions - consider limiting to specific ports needed for internal services

```suggestion
resource "google_compute_firewall" "allow_internal" {
  project = var.project_id
  name    = "${var.network_name}-allow-internal"
  network = google_compute_network.this.name

  allow {
    protocol = "icmp"
  }

  # Add specific port ranges as needed, e.g.:
  # allow {
  #   protocol = "tcp"
  #   ports    = ["80", "443", "8080"]
  # }

  source_ranges = [var.subnet_cidr]
}
```

How can I resolve this? If you propose a fix, please make it concise.

}

resource "google_compute_firewall" "allow_ssh" {
count = length(var.allow_ssh_cidrs) > 0 ? 1 : 0

project = var.project_id
name = "${var.network_name}-allow-ssh"
network = google_compute_network.this.name

allow {
protocol = "tcp"
ports = ["22"]
}

source_ranges = var.allow_ssh_cidrs
}
Comment on lines +45 to +58
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSH firewall rule applies to all instances in the VPC - add target_tags to restrict SSH access to only tagged instances (e.g., instances with the "neural-runner" tag)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/modules/network/main.tf
Line: 45-58

Comment:
SSH firewall rule applies to all instances in the VPC - add `target_tags` to restrict SSH access to only tagged instances (e.g., instances with the "neural-runner" tag)

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

19 changes: 19 additions & 0 deletions infra/terraform/modules/network/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
output "network_name" {
description = "VPC network name"
value = google_compute_network.this.name
}

output "network_self_link" {
description = "VPC network self link"
value = google_compute_network.this.self_link
}

output "subnetwork_name" {
description = "Subnet name"
value = google_compute_subnetwork.this.name
}

output "subnetwork_self_link" {
description = "Subnet self link"
value = google_compute_subnetwork.this.self_link
}
48 changes: 48 additions & 0 deletions infra/terraform/modules/network/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
variable "project_id" {
description = "GCP project ID"
type = string
}

variable "region" {
description = "GCP region for the subnet"
type = string
}

variable "network_name" {
description = "VPC network name"
type = string
}

variable "subnet_name" {
description = "Subnet name"
type = string
}

variable "subnet_cidr" {
description = "CIDR range for the subnet"
type = string
}

variable "enable_private_google_access" {
description = "Whether Private Google Access is enabled on the subnet"
type = bool
default = true
}

variable "allow_ssh_cidrs" {
description = "CIDR blocks allowed to SSH into tagged instances"
type = list(string)
default = []
}

variable "internal_tcp_ports" {
description = "TCP ports allowed for east-west traffic inside the subnet"
type = list(string)
default = []
}

variable "internal_udp_ports" {
description = "UDP ports allowed for east-west traffic inside the subnet"
type = list(string)
default = []
}
57 changes: 57 additions & 0 deletions infra/terraform/modules/observability/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
resource "google_logging_metric" "runner_errors" {
project = var.project_id
name = var.metric_name

filter = <<-EOT
resource.type="gce_instance"
resource.labels.instance_id:*
labels."compute.googleapis.com/resource_name"="${var.instance_name}"
severity>=ERROR
EOT

metric_descriptor {
metric_kind = "DELTA"
value_type = "INT64"
unit = "1"
labels {
key = "instance_name"
value_type = "STRING"
description = "Runner instance name"
}
}

label_extractors = {
instance_name = "EXTRACT(labels.\"compute.googleapis.com/resource_name\")"
}
}

resource "google_monitoring_alert_policy" "runner_error_alert" {
count = var.enable_alert_policy ? 1 : 0
project = var.project_id

display_name = "Neural Runner Error Alert"
combiner = "OR"
enabled = true

conditions {
display_name = "Runner emits error logs"

condition_threshold {
filter = "metric.type=\"logging.googleapis.com/user/${google_logging_metric.runner_errors.name}\""
comparison = "COMPARISON_GT"
threshold_value = 0
duration = "60s"

aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_DELTA"
}

trigger {
count = 1
}
}
}

notification_channels = var.notification_channels
}
14 changes: 14 additions & 0 deletions infra/terraform/modules/observability/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
output "log_metric_name" {
description = "Name of the log-based metric"
value = google_logging_metric.runner_errors.name
}

output "log_metric_type" {
description = "Fully qualified metric type"
value = "logging.googleapis.com/user/${google_logging_metric.runner_errors.name}"
}

output "alert_policy_id" {
description = "Alert policy ID when enabled"
value = var.enable_alert_policy ? google_monitoring_alert_policy.runner_error_alert[0].id : null
}
27 changes: 27 additions & 0 deletions infra/terraform/modules/observability/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
variable "project_id" {
description = "GCP project ID"
type = string
}

variable "metric_name" {
description = "Log-based metric name"
type = string
default = "neural_runner_error_count"
}

variable "instance_name" {
description = "Runner VM instance name used in log filters"
type = string
}

variable "enable_alert_policy" {
description = "Whether to create an alert policy for runtime errors"
type = bool
default = false
}

variable "notification_channels" {
description = "Notification channel IDs used by alert policy"
type = list(string)
default = []
}
Loading
Loading