Syncing development with main after pipeline finalization#43
Syncing development with main after pipeline finalization#43NickSavino wants to merge 35 commits intodevelopmentfrom
Conversation
Deployment to main
Deployment hotfix
Fixed deploy.yml script
HOTFIX: add backend.tf file to track remote state
deployment debugging
Deployment debugging
EC2 ssm changes
Updated deploy.yml
fixed deploy.yml
Added VPC to allow SSM access to EC2
changes to deploy.yml
deploy.yml changes
yaml changes
Hotfix/cloud infra
yaml changes
yml changes
deploy.yml changes
yml changes
yml changes
yml changes
📝 WalkthroughWalkthroughThe pull request refactors the CI/CD deployment workflow to eliminate build-step output dependencies, restructures AWS SSM command parameters into a JSON block, and introduces a new health-check verification step that polls the application endpoint post-deployment. Additionally, Terraform provider dependencies are updated. Changes
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions
participant SSM as AWS SSM
participant EC2 as EC2 Instance
participant App as Application
GHA->>SSM: Send command with JSON COMMANDS block<br/>(Docker setup, ECR login, deploy)
SSM->>EC2: Execute remote commands
EC2->>EC2: Install Docker, configure ECR,<br/>fetch secrets, deploy container
GHA->>EC2: Retrieve public IP
GHA->>App: Poll health endpoint<br/>(retry until success/timeout)
App-->>GHA: Health check response
GHA->>GHA: Deployment verified
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.github/workflows/deploy.yml:
- Around line 118-122: The health-check loop currently prints "Health check
failed" but does not return a non-zero status, so the job can still succeed;
update the failing branch after the loop (the block that contains the echo
"Health check failed") to exit with a non-zero code (e.g., add an exit 1
immediately after the echo) so the workflow step fails when the health check
exhausts all retries.
- Around line 90-96: The workflow captures COMMAND_ID from aws ssm send-command
but never waits for it to finish; update the deploy step to poll SSM for
completion using the COMMAND_ID (e.g., call aws ssm list-command-invocations or
aws ssm get-command-invocation with --command-id "${COMMAND_ID}" and
--instance-id "${EC2_INSTANCE_ID}") in a loop, checking the invocation
Status/StatusDetails until it reaches a terminal state (Success, Failed,
TimedOut, Cancelled), break when terminal, and fail the job (exit non-zero) if
the final status is not Success so the subsequent health check doesn't run on a
failed deploy. Ensure the polling includes a sleep/backoff and a reasonable
timeout to avoid infinite loops.
🧹 Nitpick comments (1)
.github/workflows/deploy.yml (1)
108-108: Inconsistent indentation.Line 108 has extra leading whitespace compared to surrounding code. While this doesn't affect bash execution, it reduces readability.
🔧 Fix indentation
- test "${PUBLIC_IP}" != "None" + test "${PUBLIC_IP}" != "None"
| COMMAND_ID=$(aws ssm send-command \ | ||
| --region "${AWS_REGION}" \ | ||
| --instance-ids "${EC2_INSTANCE_ID}" \ | ||
| --document-name "AWS-RunShellScript" \ | ||
| --parameters commands="[ | ||
| "set -e", | ||
| "AWS_REGION='${AWS_REGION}'", | ||
| "IMAGE_URI='${IMAGE_URI}'", | ||
| "CONTAINER_NAME='${CONTAINER_NAME}'", | ||
| "", | ||
| "if ! command -v docker >/dev/null 2>&1; then sudo dnf -y install docker; sudo systemctl enable --now docker; fi", | ||
| "", | ||
| "sudo usermod -aG docker ec2-user || true", | ||
| "", | ||
| "# Login to ECR", | ||
| "aws ecr get-login-password --region ${AWS_REGION} | sudo docker login --username AWS --password-stdin $(echo ${IMAGE_URI} | cut -d/ -f1)", | ||
| "", | ||
| "# Fetch runtime secrets from SSM", | ||
| "DBURL=$(aws ssm get-parameter --region ${AWS_REGION} --with-decryption --name /cgc-2026-prod/api/database_url --query Parameter.Value --output text)", | ||
| "CLERK_SECRET=$(aws ssm get-parameter --region ${AWS_REGION} --with-decryption --name /cgc-2026-prod/api/clerk_secret_key --query Parameter.Value --output text)", | ||
| "", | ||
| "# Pull new image", | ||
| "", | ||
| "sudo docker pull ${IMAGE_URI}", | ||
| "# Stop existing container (if any)", | ||
| "sudo docker rm -f ${CONTAINER_NAME} || true", | ||
| "", | ||
| "# Run container", | ||
| "sudo docker run -d --restart unless-stopped --name \"${CONTAINER_NAME}\" -p 8080:8080 -e PORT=8080 -e DATABASE_URL=\"${DBURL}\" -e CLERK_SECRET_KEY=\"${CLERK_SECRET}\" \"${IMAGE_URI}\"", | ||
| "", | ||
| "# Smoke check", | ||
| "sleep 2", | ||
| "curl -fsS http://localhost:8080/health" | ||
| ]"" \ | ||
| --parameters "commands=${COMMANDS}" \ | ||
| --query 'Command.CommandId' \ | ||
| --output text) |
There was a problem hiding this comment.
SSM command is sent but never awaited — deployment success/failure is unknown.
The step captures COMMAND_ID but never uses it. The workflow proceeds immediately without waiting for the SSM command to complete or checking its exit status. The deployment could fail silently while the health check runs.
Add a wait loop to poll for command completion:
🛠️ Proposed fix to wait for SSM command completion
COMMAND_ID=$(aws ssm send-command \
--region "${AWS_REGION}" \
--instance-ids "${EC2_INSTANCE_ID}" \
--document-name "AWS-RunShellScript" \
--parameters "commands=${COMMANDS}" \
--query 'Command.CommandId' \
--output text)
+
+ echo "Waiting for SSM command ${COMMAND_ID} to complete..."
+ for i in {1..60}; do
+ STATUS=$(aws ssm get-command-invocation \
+ --region "${AWS_REGION}" \
+ --command-id "${COMMAND_ID}" \
+ --instance-id "${EC2_INSTANCE_ID}" \
+ --query 'Status' \
+ --output text 2>/dev/null || echo "Pending")
+
+ case "${STATUS}" in
+ Success)
+ echo "SSM command succeeded"
+ break
+ ;;
+ Failed|Cancelled|TimedOut)
+ echo "SSM command failed with status: ${STATUS}"
+ aws ssm get-command-invocation \
+ --region "${AWS_REGION}" \
+ --command-id "${COMMAND_ID}" \
+ --instance-id "${EC2_INSTANCE_ID}" \
+ --query 'StandardErrorContent' \
+ --output text
+ exit 1
+ ;;
+ *)
+ echo "Status: ${STATUS} (attempt $i/60)"
+ sleep 5
+ ;;
+ esac
+ done🤖 Prompt for AI Agents
In @.github/workflows/deploy.yml around lines 90 - 96, The workflow captures
COMMAND_ID from aws ssm send-command but never waits for it to finish; update
the deploy step to poll SSM for completion using the COMMAND_ID (e.g., call aws
ssm list-command-invocations or aws ssm get-command-invocation with --command-id
"${COMMAND_ID}" and --instance-id "${EC2_INSTANCE_ID}") in a loop, checking the
invocation Status/StatusDetails until it reaches a terminal state (Success,
Failed, TimedOut, Cancelled), break when terminal, and fail the job (exit
non-zero) if the final status is not Success so the subsequent health check
doesn't run on a failed deploy. Ensure the polling includes a sleep/backoff and
a reasonable timeout to avoid infinite loops.
| echo "Health not ready yet (attempt $i/20) — retrying..." | ||
| sleep 3 | ||
| done | ||
|
|
||
| echo "Health check failed" No newline at end of file |
There was a problem hiding this comment.
Missing exit 1 — workflow succeeds even when health check fails.
When the loop exhausts all 20 retries, the step prints "Health check failed" but exits with status 0 (the exit code of echo). The workflow will incorrectly report success.
🐛 Proposed fix to exit non-zero on failure
echo "Health check failed"
+ exit 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "Health not ready yet (attempt $i/20) — retrying..." | |
| sleep 3 | |
| done | |
| echo "Health check failed" | |
| echo "Health not ready yet (attempt $i/20) — retrying..." | |
| sleep 3 | |
| done | |
| echo "Health check failed" | |
| exit 1 |
🤖 Prompt for AI Agents
In @.github/workflows/deploy.yml around lines 118 - 122, The health-check loop
currently prints "Health check failed" but does not return a non-zero status, so
the job can still succeed; update the failing branch after the loop (the block
that contains the echo "Health check failed") to exit with a non-zero code
(e.g., add an exit 1 immediately after the echo) so the workflow step fails when
the health check exhausts all retries.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.