feat: add eks_scale_node_group playbook action (#223)#2089
feat: add eks_scale_node_group playbook action (#223)#2089gerardrecinto wants to merge 2 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds an eks_scale_node_group remediation action (params + implementation), tests covering success/no-op/error/credential wiring, a remediation docs entry, Rootly and Sentry send-events docs, private registry proxy guidance, and a Helm dependency version bump. ChangesEKS Node Group Scaling
Documentation and Helm updates
Sequence Diagram(s)sequenceDiagram
participant eks_scale_node_group
participant boto3_EKS_client
participant EKS_Service
eks_scale_node_group->>boto3_EKS_client: describe_nodegroup(clusterName, nodegroupName)
boto3_EKS_client->>EKS_Service: DescribeNodegroup API call
EKS_Service-->>boto3_EKS_client: returns scaling.config (minSize,maxSize,desiredSize)
eks_scale_node_group->>eks_scale_node_group: compare new_max_size with maxSize
alt new_max_size > maxSize
eks_scale_node_group->>boto3_EKS_client: update_nodegroup_config(set new maxSize, preserve min/desired)
boto3_EKS_client->>EKS_Service: UpdateNodegroupConfig API call
EKS_Service-->>boto3_EKS_client: update response
eks_scale_node_group-->>eks_scale_node_group: create Finding and enrich event
else
eks_scale_node_group-->>eks_scale_node_group: enrich event with "no change"
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
playbooks/robusta_playbooks/aws_node_group_actions.py (1)
61-65: ⚡ Quick winPreserve original exception context when re-raising.
Use explicit exception chaining in both
except ClientError as eblocks (raise ... from e) so the root AWS error remains visible in tracebacks.Proposed patch
except ClientError as e: raise ActionException( ErrorCodes.ACTION_UNEXPECTED_ERROR, f"Failed to describe node group '{params.node_group_name}' " f"in cluster '{params.cluster_name}': {e}", - ) + ) from e @@ except ClientError as e: raise ActionException( ErrorCodes.ACTION_UNEXPECTED_ERROR, f"Failed to update node group '{params.node_group_name}': {e}", - ) + ) from eAlso applies to: 94-97
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@playbooks/robusta_playbooks/aws_node_group_actions.py` around lines 61 - 65, The except ClientError as e handlers that raise ActionException should preserve the original exception context by using explicit exception chaining; locate the raise statements that construct ActionException with ErrorCodes.ACTION_UNEXPECTED_ERROR (the blocks referencing params.node_group_name and params.cluster_name and the later similar block at lines 94-97) and change the re-raise to use "raise ActionException(... ) from e" so the original AWS ClientError is retained in the traceback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@playbooks/robusta_playbooks/aws_node_group_actions.py`:
- Around line 61-65: The except ClientError as e handlers that raise
ActionException should preserve the original exception context by using explicit
exception chaining; locate the raise statements that construct ActionException
with ErrorCodes.ACTION_UNEXPECTED_ERROR (the blocks referencing
params.node_group_name and params.cluster_name and the later similar block at
lines 94-97) and change the re-raise to use "raise ActionException(... ) from e"
so the original AWS ClientError is retained in the traceback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dd5c2ac7-227d-4783-8f36-07774d6f876d
📒 Files selected for processing (2)
playbooks/robusta_playbooks/aws_node_group_actions.pytests/test_aws_node_group_actions.py
Preserves original ClientError in tracebacks when re-raising as ActionException, per CodeRabbit review on PR robusta-dev#2089.
|
@aantn could you please review when you have a chance, and approve the pending fork workflows if this looks ok? I updated the branch with latest Changes made:
Why Local validation I could run here:
I could not run the targeted pytest locally because this shell does not have the repo's Python 3.11/Poetry environment installed; default |
08ab23c to
62b7a3f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_aws_node_group_actions.py`:
- Around line 148-158: The test asserts a hardcoded secret in the
EksNodeGroupParams call and can trigger Ruff/flake8-bandit S106; locate the
assertion that calls mock_boto_client.assert_called_once_with(...) after
invoking eks_scale_node_group and append an inline suppression comment for S106
(e.g., add " # noqa: S106" to the assertion line) so the linter ignores this
test-only hardcoded secret; keep references to EksNodeGroupParams,
eks_scale_node_group, and mock_boto_client.assert_called_once_with when editing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1a970722-a258-43c8-a871-96e0d82ee659
⛔ Files ignored due to path filters (1)
helm/robusta/Chart.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
docs/configuration/exporting/send-events-api.rstdocs/configuration/exporting/send-events/rootly.rstdocs/configuration/exporting/send-events/sentry.rstdocs/playbook-reference/actions/remediation.rstdocs/setup-robusta/proxies.rsthelm/robusta/Chart.yamlhelm/robusta/charts/holmes-0.30.0.tgzhelm/robusta/charts/holmes-0.31.0.tgzplaybooks/robusta_playbooks/aws_node_group_actions.pytests/test_aws_node_group_actions.py
✅ Files skipped from review due to trivial changes (5)
- docs/configuration/exporting/send-events/rootly.rst
- docs/setup-robusta/proxies.rst
- docs/configuration/exporting/send-events-api.rst
- helm/robusta/Chart.yaml
- docs/playbook-reference/actions/remediation.rst
🚧 Files skipped from review as they are similar to previous changes (1)
- playbooks/robusta_playbooks/aws_node_group_actions.py
| params = EksNodeGroupParams( | ||
| **{**BASE_PARAMS, "aws_access_key_id": "AKID", "aws_secret_access_key": "SECRET"} | ||
| ) | ||
| eks_scale_node_group(mock_event, params) | ||
|
|
||
| mock_boto_client.assert_called_once_with( | ||
| "eks", | ||
| region_name="us-west-2", | ||
| aws_access_key_id="AKID", | ||
| aws_secret_access_key="SECRET", | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate ruff config and inspect select/ignore and per-file-ignores for tests
fd -H -t f -e toml -e cfg 'pyproject|ruff|setup' | xargs -r grep -nEi 'ruff|S106|flake8-bandit' 2>/dev/null
fd -H 'ruff.toml|.ruff.toml' --type f
rg -nP -C3 '\[tool\.ruff' -g 'pyproject.toml' || true
rg -nP -C3 'per-file-ignores|extend-select|^\s*select|^\s*ignore' -g 'pyproject.toml' -g 'ruff.toml' -g '.ruff.toml' || trueRepository: robusta-dev/robusta
Length of output: 813
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Print relevant parts of ruff.toml to confirm whether S (flake8-bandit) rules are selected.
ls -la ruff.toml 2>/dev/null || true
sed -n '1,120p' ruff.tomlRepository: robusta-dev/robusta
Length of output: 1229
Suppress Ruff S106 for the hardcoded test secret
ruff.toml enables flake8-bandit ("S"), and only S101 is ignored; S106 is not ignored globally, so aws_secret_access_key="SECRET" may fail lint. Suppress it at the assertion line:
🔧 Suppress the false positive
mock_boto_client.assert_called_once_with(
"eks",
region_name="us-west-2",
aws_access_key_id="AKID",
- aws_secret_access_key="SECRET",
+ aws_secret_access_key="SECRET", # noqa: S106
)🧰 Tools
🪛 Ruff (0.15.15)
[error] 157-157: Possible hardcoded password assigned to argument: "aws_secret_access_key"
(S106)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_aws_node_group_actions.py` around lines 148 - 158, The test
asserts a hardcoded secret in the EksNodeGroupParams call and can trigger
Ruff/flake8-bandit S106; locate the assertion that calls
mock_boto_client.assert_called_once_with(...) after invoking
eks_scale_node_group and append an inline suppression comment for S106 (e.g.,
add " # noqa: S106" to the assertion line) so the linter ignores this test-only
hardcoded secret; keep references to EksNodeGroupParams, eks_scale_node_group,
and mock_boto_client.assert_called_once_with when editing.
f054ba2 to
90c4f23
Compare
Summary
Closes #223
Adds
eks_scale_node_group— a playbook action that increases themaxSizeof an EKS managed node group. Intended as a remediation step when the cluster autoscaler is blocked because the node group has reached its configured maximum.New files:
playbooks/robusta_playbooks/aws_node_group_actions.py— action + params modeltests/test_aws_node_group_actions.py— 6 pytest unit testsHow it works
The action calls
eks:DescribeNodegroupto read the current scaling config, then callseks:UpdateNodegroupConfigto raisemaxSize.minSizeanddesiredSizeare left unchanged.Params:
cluster_nameregionus-east-1)node_group_namenew_max_sizeaws_access_key_idaws_secret_access_keyExample playbook config:
Required IAM permissions:
eks:DescribeNodegroup,eks:UpdateNodegroupConfigTest plan
test_scale_up_succeeds— verifiesupdate_nodegroup_configcalled with correct args and finding emittedtest_no_op_when_new_max_not_larger— new_max_size < current max → no update, enrichment message returnedtest_no_op_when_new_max_is_equal— new_max_size == current max → no updatetest_raises_on_describe_failure—ClientErroron describe →ActionExceptionraised, update never calledtest_raises_on_update_failure—ClientErroron update →ActionExceptionraisedtest_boto_client_uses_explicit_credentials— explicit key/secret passed through to boto3