fix(eval): chunk + paginate rolling_mean GetMetricData (>500 MetricDataQueries cap — Sat-SF EvalRollingMean ERROR)#201
Merged
Conversation
…taQueries cap — Sat-SF EvalRollingMean ERROR)
Root cause: evals/rolling_mean.py issued a single cw.get_metric_data
call with one MetricDataQuery per (rubric x dimension) combo. AWS
GetMetricData hard-caps MetricDataQueries at 500 per call. As the
rubric/dimension matrix grew, len(queries) exceeded 500, so the call
raised "ValidationError: The collection MetricDataQueries must not have
a size greater than 500." This made the Saturday-SF EvalRollingMean
state return {"status":"ERROR"}, contributing to the terminal
PipelineFailure on the 2026-05-17 weekend run.
Fix: introduce _get_metric_data_all(cw, queries, start, end) ->
list[dict] which chunks queries into <=500-query batches (named
constant _GET_METRIC_DATA_MAX_QUERIES = 500), follows NextToken
pagination to exhaustion within each chunk, and flat-merges all
MetricDataResults. The flat merge is correct because every query Id
(m{idx} over the whole combos list) is globally unique, so the
downstream by_id = {r["Id"]: r for r in ...} mapping is unchanged.
The <=500 case is behaviourally identical (single chunk, paginated
only if AWS returns a NextToken).
AWS cap citation: GetMetricData MetricDataQueries collection size is
hard-limited to 500 per request
(https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricData.html).
Tests: +7 in tests/test_eval_rolling_mean.py — cap constant is 500;
>500 queries split into multiple <=500 calls with all results merged
in order; NextToken pagination followed within a chunk; <=500
single-call path unchanged; end-to-end >500-combo run maps every
result back to its combo via the unique m{idx} Id scheme. Full
research suite: 1343 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause
evals/rolling_mean.pyissued a singlecw.get_metric_datacall with oneMetricDataQueryper (rubric × dimension) combo. AWSGetMetricDatahard-capsMetricDataQueriesat 500 per call. As the rubric/dimension matrix grew,len(queries)exceeded 500, so the call raised:This made the Saturday-SF EvalRollingMean state return
{"status":"ERROR"}, contributing to the terminalPipelineFailureon the 2026-05-17 weekend run.Fix
Introduce a small private helper:
queriesinto ≤500-query batches via the named constant_GET_METRIC_DATA_MAX_QUERIES = 500.NextTokenpagination to exhaustion (response.get("NextToken")→ re-call withNextToken=...).MetricDataResultsacross chunks/pages into one list.The flat merge is correct because every query Id (
m{idx}over the wholecomboslist) is globally unique, so the downstreamby_id = {r["Id"]: r for r in ...}mapping is unchanged. The ≤500 case is behaviourally identical (single chunk, paginated only if AWS returns aNextToken).AWS 500 cap citation
GetMetricDataMetricDataQueriescollection size is hard-limited to 500 per request — see the AWS GetMetricData API reference.Test additions
+7tests intests/test_eval_rolling_mean.py(newTestGetMetricDataAllclass), mirroring the existing MagicMock-CloudWatch style:NextTokenpagination followed within a chunkcompute_and_emit_4w_meanrun maps every result back to its combo via the uniquem{idx}Id schemeFull research suite: 1343 passed.
🤖 Generated with Claude Code