Skip to content

Syncing development with main after pipeline finalization#43

Closed
NickSavino wants to merge 35 commits intodevelopmentfrom
main
Closed

Syncing development with main after pipeline finalization#43
NickSavino wants to merge 35 commits intodevelopmentfrom
main

Conversation

@NickSavino
Copy link
Copy Markdown
Contributor

@NickSavino NickSavino commented Jan 30, 2026

Summary by CodeRabbit

  • Chores
    • Enhanced deployment workflow with improved health verification after deployments
    • Optimized deployment configuration handling for better reliability
    • Updated infrastructure provider dependencies to latest compatible versions

✏️ Tip: You can customize this high-level summary in your review settings.

NickSavino and others added 30 commits January 27, 2026 18:11
HOTFIX: add backend.tf file to track remote state
Added VPC to allow SSM access to EC2
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

The pull request refactors the CI/CD deployment workflow to eliminate build-step output dependencies, restructures AWS SSM command parameters into a JSON block, and introduces a new health-check verification step that polls the application endpoint post-deployment. Additionally, Terraform provider dependencies are updated.

Changes

Cohort / File(s) Summary
CI/CD Workflow Restructuring
.github/workflows/deploy.yml
Removed job output propagation of image_uri; shifted IMAGE_URI computation to deploy step. Restructured AWS SSM inline parameters into a JSON COMMANDS block. Added new health-check verification step that fetches instance IP and polls health endpoint. Introduced environment-setup logic for Docker installation, ECR login configuration, and runtime secret retrieval.
Terraform Provider Updates
infra/env/prod/.terraform.lock.hcl
Added new provider blocks for local (v2.6.1) and tls (v4.1.0). Updated random provider from v3.8.0 to v3.8.1 with refreshed hashes.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant SSM as AWS SSM
    participant EC2 as EC2 Instance
    participant App as Application

    GHA->>SSM: Send command with JSON COMMANDS block<br/>(Docker setup, ECR login, deploy)
    SSM->>EC2: Execute remote commands
    EC2->>EC2: Install Docker, configure ECR,<br/>fetch secrets, deploy container
    GHA->>EC2: Retrieve public IP
    GHA->>App: Poll health endpoint<br/>(retry until success/timeout)
    App-->>GHA: Health check response
    GHA->>GHA: Deployment verified
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

Poem

🐰 Hops of joy! Commands now flow as JSON streams,
Health checks bloom where deployments dream,
No more output chains to chase and cite,
Docker preps and polls through the night,
Robust automation takes its flight! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main purpose: syncing development branch with main after pipeline finalization, which aligns with the deployment workflow and infrastructure updates in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch main

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @.github/workflows/deploy.yml:
- Around line 118-122: The health-check loop currently prints "Health check
failed" but does not return a non-zero status, so the job can still succeed;
update the failing branch after the loop (the block that contains the echo
"Health check failed") to exit with a non-zero code (e.g., add an exit 1
immediately after the echo) so the workflow step fails when the health check
exhausts all retries.
- Around line 90-96: The workflow captures COMMAND_ID from aws ssm send-command
but never waits for it to finish; update the deploy step to poll SSM for
completion using the COMMAND_ID (e.g., call aws ssm list-command-invocations or
aws ssm get-command-invocation with --command-id "${COMMAND_ID}" and
--instance-id "${EC2_INSTANCE_ID}") in a loop, checking the invocation
Status/StatusDetails until it reaches a terminal state (Success, Failed,
TimedOut, Cancelled), break when terminal, and fail the job (exit non-zero) if
the final status is not Success so the subsequent health check doesn't run on a
failed deploy. Ensure the polling includes a sleep/backoff and a reasonable
timeout to avoid infinite loops.
🧹 Nitpick comments (1)
.github/workflows/deploy.yml (1)

108-108: Inconsistent indentation.

Line 108 has extra leading whitespace compared to surrounding code. While this doesn't affect bash execution, it reduces readability.

🔧 Fix indentation
-            test "${PUBLIC_IP}" != "None"
+          test "${PUBLIC_IP}" != "None"

Comment on lines 90 to 96
COMMAND_ID=$(aws ssm send-command \
--region "${AWS_REGION}" \
--instance-ids "${EC2_INSTANCE_ID}" \
--document-name "AWS-RunShellScript" \
--parameters commands="[
"set -e",
"AWS_REGION='${AWS_REGION}'",
"IMAGE_URI='${IMAGE_URI}'",
"CONTAINER_NAME='${CONTAINER_NAME}'",
"",
"if ! command -v docker >/dev/null 2>&1; then sudo dnf -y install docker; sudo systemctl enable --now docker; fi",
"",
"sudo usermod -aG docker ec2-user || true",
"",
"# Login to ECR",
"aws ecr get-login-password --region ${AWS_REGION} | sudo docker login --username AWS --password-stdin $(echo ${IMAGE_URI} | cut -d/ -f1)",
"",
"# Fetch runtime secrets from SSM",
"DBURL=$(aws ssm get-parameter --region ${AWS_REGION} --with-decryption --name /cgc-2026-prod/api/database_url --query Parameter.Value --output text)",
"CLERK_SECRET=$(aws ssm get-parameter --region ${AWS_REGION} --with-decryption --name /cgc-2026-prod/api/clerk_secret_key --query Parameter.Value --output text)",
"",
"# Pull new image",
"",
"sudo docker pull ${IMAGE_URI}",
"# Stop existing container (if any)",
"sudo docker rm -f ${CONTAINER_NAME} || true",
"",
"# Run container",
"sudo docker run -d --restart unless-stopped --name \"${CONTAINER_NAME}\" -p 8080:8080 -e PORT=8080 -e DATABASE_URL=\"${DBURL}\" -e CLERK_SECRET_KEY=\"${CLERK_SECRET}\" \"${IMAGE_URI}\"",
"",
"# Smoke check",
"sleep 2",
"curl -fsS http://localhost:8080/health"
]"" \
--parameters "commands=${COMMANDS}" \
--query 'Command.CommandId' \
--output text)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SSM command is sent but never awaited — deployment success/failure is unknown.

The step captures COMMAND_ID but never uses it. The workflow proceeds immediately without waiting for the SSM command to complete or checking its exit status. The deployment could fail silently while the health check runs.

Add a wait loop to poll for command completion:

🛠️ Proposed fix to wait for SSM command completion
          COMMAND_ID=$(aws ssm send-command \
            --region "${AWS_REGION}" \
            --instance-ids "${EC2_INSTANCE_ID}" \
            --document-name "AWS-RunShellScript" \
            --parameters "commands=${COMMANDS}" \
            --query 'Command.CommandId' \
            --output text)
+
+          echo "Waiting for SSM command ${COMMAND_ID} to complete..."
+          for i in {1..60}; do
+            STATUS=$(aws ssm get-command-invocation \
+              --region "${AWS_REGION}" \
+              --command-id "${COMMAND_ID}" \
+              --instance-id "${EC2_INSTANCE_ID}" \
+              --query 'Status' \
+              --output text 2>/dev/null || echo "Pending")
+            
+            case "${STATUS}" in
+              Success)
+                echo "SSM command succeeded"
+                break
+                ;;
+              Failed|Cancelled|TimedOut)
+                echo "SSM command failed with status: ${STATUS}"
+                aws ssm get-command-invocation \
+                  --region "${AWS_REGION}" \
+                  --command-id "${COMMAND_ID}" \
+                  --instance-id "${EC2_INSTANCE_ID}" \
+                  --query 'StandardErrorContent' \
+                  --output text
+                exit 1
+                ;;
+              *)
+                echo "Status: ${STATUS} (attempt $i/60)"
+                sleep 5
+                ;;
+            esac
+          done
🤖 Prompt for AI Agents
In @.github/workflows/deploy.yml around lines 90 - 96, The workflow captures
COMMAND_ID from aws ssm send-command but never waits for it to finish; update
the deploy step to poll SSM for completion using the COMMAND_ID (e.g., call aws
ssm list-command-invocations or aws ssm get-command-invocation with --command-id
"${COMMAND_ID}" and --instance-id "${EC2_INSTANCE_ID}") in a loop, checking the
invocation Status/StatusDetails until it reaches a terminal state (Success,
Failed, TimedOut, Cancelled), break when terminal, and fail the job (exit
non-zero) if the final status is not Success so the subsequent health check
doesn't run on a failed deploy. Ensure the polling includes a sleep/backoff and
a reasonable timeout to avoid infinite loops.

Comment on lines +118 to +122
echo "Health not ready yet (attempt $i/20) — retrying..."
sleep 3
done

echo "Health check failed" No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing exit 1 — workflow succeeds even when health check fails.

When the loop exhausts all 20 retries, the step prints "Health check failed" but exits with status 0 (the exit code of echo). The workflow will incorrectly report success.

🐛 Proposed fix to exit non-zero on failure
          echo "Health check failed"
+          exit 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "Health not ready yet (attempt $i/20) — retrying..."
sleep 3
done
echo "Health check failed"
echo "Health not ready yet (attempt $i/20) — retrying..."
sleep 3
done
echo "Health check failed"
exit 1
🤖 Prompt for AI Agents
In @.github/workflows/deploy.yml around lines 118 - 122, The health-check loop
currently prints "Health check failed" but does not return a non-zero status, so
the job can still succeed; update the failing branch after the loop (the block
that contains the echo "Health check failed") to exit with a non-zero code
(e.g., add an exit 1 immediately after the echo) so the workflow step fails when
the health check exhausts all retries.

@NickSavino NickSavino closed this Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants