Generalize SageMaker launcher IAM roles for classifier and segmentation#691
Conversation
Adds create_segmentation_jobs_repo() method and wires it into __init__, provisioning a mermaid-segmentation-jobs ECR repo with image scanning, RETAIN removal policy, 10-image lifecycle rule, and a CfnOutput export. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renames `create_classifier_training_repo` → `create_classifier_jobs_repo`, construct ID `MermaidClassifierTrainingRepo` → `MermaidClassifierJobsRepo`, `repository_name` `mermaid-classifier-training` → `mermaid-classifier-jobs`, CfnOutput logical ID/export `MermaidClassifierTrainingRepoUri` → `MermaidClassifierJobsRepoUri`, and `self.classifier_training_repo` → `self.classifier_jobs_repo`. The old repo has RETAIN policy so it will be orphaned (not deleted) and old images remain accessible.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rename inline-policy ID MermaidClassifierLauncherPassRolePolicy → MermaidSagemakerLauncherPassRolePolicy so all four inline policies on the launcher role use the same MermaidSagemakerLauncher* prefix. Also update the adjacent comment from CreateTrainingJob to CreateTraining/ProcessingJob to reflect the role's actual scope. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Review limit reached
Your plan includes 1 review of capacity. Refill in 39 minutes and 34 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR consolidates SageMaker launcher infrastructure by replacing classifier-specific training roles and repositories with shared launcher role and dual ECR repositories (classifier and segmentation jobs), updating IAM policies to cover both job types with CloudWatch observability, and adding launcher conventions documentation as a cross-repo contract. ChangesSageMaker launcher infrastructure consolidation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
cdk-nag report✅ No unsuppressed errors. See |
There was a problem hiding this comment.
Pull request overview
This PR updates the IaC for SageMaker “launcher” workflows so a single shared operator role can submit both Training and Processing jobs for mermaid-classifier and mermaid-segmentation, and adds a cross-repo convention document describing the launcher contract.
Changes:
- Provision a new
mermaid-segmentation-jobsECR repo and rename the classifier ECR repo tomermaid-classifier-jobs. - Replace the classifier-only launcher role with a shared
mermaid-sagemaker-launcher-rolethat includes ProcessingJob + CloudWatch Logs permissions. - Add
iac/sagemaker-launcher-convention.mddocumenting CLI/YAML/role/ECR tagging conventions intended to be used by both model repos.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| iac/stacks/sagemaker.py | Adds segmentation ECR repo, renames classifier repo, and creates a shared launcher role with expanded SageMaker + Logs permissions. |
| iac/sagemaker-launcher-convention.md | Adds cross-repo contract for launcher scripts, including IAM/SSO setup, YAML schema, and tagging conventions. |
| iac/nag_suppressions.py | Updates cdk-nag suppression paths/reasons to match the renamed shared launcher role and its inline policies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
iac/stacks/sagemaker.py (1)
393-406:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove
ecr:DeleteRepositoryfrom the shared launcher role (least privilege)
iac/stacks/sagemaker.pyincludes"ecr:DeleteRepository"in the shared ECR permissions; no other code or docs reference ECR repository deletion in this repo. Granting it lets any launcher operator delete the sharedmermaid-*-jobsrepositories and break subsequent launches. Removeecr:DeleteRepositorywhile keeping image-level permissions.Minimal fix
actions=[ "ecr:BatchCheckLayerAvailability", "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer", "ecr:DescribeImages", "ecr:DescribeRepositories", "ecr:ListImages", "ecr:InitiateLayerUpload", "ecr:UploadLayerPart", "ecr:CompleteLayerUpload", "ecr:PutImage", "ecr:BatchDeleteImage", - "ecr:DeleteRepository", ],🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@iac/stacks/sagemaker.py` around lines 393 - 406, The IAM policy for the shared launcher role in iac/stacks/sagemaker.py currently includes the overly-broad action "ecr:DeleteRepository"; remove "ecr:DeleteRepository" from the actions list in the ECR permissions (the actions=[... ] block used to build the shared launcher role policy) so the role retains image-level permissions like PutImage/BatchGetImage but cannot delete repositories. Ensure only repository-deleting actions are removed and leave all other ecr:* image-layer actions intact.
🧹 Nitpick comments (1)
iac/nag_suppressions.py (1)
556-575: ⚡ Quick winPin these IAM5 suppressions to the current findings.
Without
applies_to, any future wildcard added inside these existing launcher policies will be silently blessed by the same suppression. Please scope each suppression to the exact current IAM5 entries so cdk-nag keeps catching later policy growth.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@iac/nag_suppressions.py` around lines 556 - 575, The NagPackSuppression calls created via _suppress_by_path for the policy paths (e.g., "MermaidSagemakerLauncherEcrPolicy/Resource", "MermaidSagemakerLauncherSagemakerPolicy/Resource", "MermaidSagemakerLauncherLogsPolicy/Resource", "MermaidSagemakerLauncherPassRolePolicy/Resource", and f"{prefix}MermaidSagemakerLauncherRole/DefaultPolicy/Resource") must include an applies_to list scoped to the exact current IAM5 findings; update each NagPackSuppression instantiation in this loop (the one constructed inside _suppress_by_path) to add an applies_to argument that contains the specific statement IDs or JSON paths that correspond to the current wildcard entries (so CDK-Nag will still surface any new wildcard additions), keeping the existing id ("AwsSolutions-IAM5") and reason (using ACCEPTED) otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@iac/stacks/sagemaker.py`:
- Around line 393-406: The IAM policy for the shared launcher role in
iac/stacks/sagemaker.py currently includes the overly-broad action
"ecr:DeleteRepository"; remove "ecr:DeleteRepository" from the actions list in
the ECR permissions (the actions=[... ] block used to build the shared launcher
role policy) so the role retains image-level permissions like
PutImage/BatchGetImage but cannot delete repositories. Ensure only
repository-deleting actions are removed and leave all other ecr:* image-layer
actions intact.
---
Nitpick comments:
In `@iac/nag_suppressions.py`:
- Around line 556-575: The NagPackSuppression calls created via
_suppress_by_path for the policy paths (e.g.,
"MermaidSagemakerLauncherEcrPolicy/Resource",
"MermaidSagemakerLauncherSagemakerPolicy/Resource",
"MermaidSagemakerLauncherLogsPolicy/Resource",
"MermaidSagemakerLauncherPassRolePolicy/Resource", and
f"{prefix}MermaidSagemakerLauncherRole/DefaultPolicy/Resource") must include an
applies_to list scoped to the exact current IAM5 findings; update each
NagPackSuppression instantiation in this loop (the one constructed inside
_suppress_by_path) to add an applies_to argument that contains the specific
statement IDs or JSON paths that correspond to the current wildcard entries (so
CDK-Nag will still surface any new wildcard additions), keeping the existing id
("AwsSolutions-IAM5") and reason (using ACCEPTED) otherwise.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ffe8a61c-9422-421a-8e3a-e27cafd7a193
📒 Files selected for processing (3)
iac/nag_suppressions.pyiac/sagemaker-launcher-convention.mdiac/stacks/sagemaker.py
Signed-off-by: Greg Nieuwenhuis <26285069+gnieuwenhuis@users.noreply.github.com>
michaelconnor00
left a comment
There was a problem hiding this comment.
LGTM. Added some comments for reference.
| self, | ||
| f"{self.prefix}MermaidClassifierTrainingRepo", | ||
| repository_name="mermaid-classifier-training", | ||
| f"{self.prefix}MermaidClassifierJobsRepo", |
There was a problem hiding this comment.
Changing the id and the name will cause a delete and re-create. However, this should just abandon the old one and the removal_policy is RETAIN.
| role = iam.Role( | ||
| self, | ||
| f"{self.prefix}MermaidClassifierLauncherRole", | ||
| f"{self.prefix}MermaidSagemakerLauncherRole", |
There was a problem hiding this comment.
This will cause a delete and re-create. Typically with IAM it is not a big deal. Pointing it out incase this resources is referenced anywhere.
As part of data-mermaid/mermaid-classifier#43, we are setting up generalized access so that data scientists can submit Sagemaker training and processing jobs as needed from their local machines.
The setup matches what was already being done for the mermaid-classifier, except generalizes it so that the same IAM Roles can be used for both the classifier and the segmentation models.
Included in this PR is also a
launcher_conventiondocument that outlines the contract which is to be used by both repos. Since we don't have a single souce of truth for this document to live in, I put it here alongside the permissions changes so that both models can reference it here.Summary by CodeRabbit
Chores
Documentation