test: add integ tests for batch evaluation and evaluator lifecycle by notgitika · Pull Request #1110 · aws/agentcore-cli

notgitika · 2026-05-04T18:48:07Z

Summary

Adds 20 integration tests covering batch evaluation, evaluator, and online eval config CLI lifecycle:

Batch eval flag validation: --runtime required, --evaluator required
Run eval flag validation: --evaluator required
Evaluator lifecycle: SESSION/TRACE/TOOL_CALL levels, code-based (external lambda + managed), config file import
Online eval config lifecycle: builtin evaluator refs, enable-on-create, multiple evaluators, sampling rate
Evaluator validation edge cases: invalid level, --model with code-based, --lambda-arn without code-based
Ground truth file parsing: malformed JSON, wrong structure, valid array format, valid object format

Originally from private-agentcore-cli-staging PR #104. Only the test file is needed — the code changes from that PR (remove code-based evaluator blocks, add test utils) are already on preview.

Test plan

npm run test:integ passes (20 new tests)

Adds 20 integration tests covering: - batch-evaluation CLI flag validation (requires --runtime, --evaluator) - run eval CLI flag validation - Evaluator lifecycle: SESSION, TRACE, TOOL_CALL levels, code-based (external lambda + managed), config file import - Online eval config lifecycle: builtin evaluator refs, enable-on-create, multiple evaluators, sampling rate - Evaluator validation edge cases: invalid level, --model with code-based, --lambda-arn without code-based - Ground truth file parsing: malformed JSON, wrong structure, valid array format, valid object format with sessionMetadata key Originally from private-agentcore-cli-staging PR #104.

github-actions · 2026-05-04T18:52:01Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	43.45%	9904 / 22794
🔵	Statements	42.72%	10519 / 24619
🔵	Functions	40.36%	1667 / 4130
🔵	Branches	40.16%	6401 / 15935

Generated in workflow #2369 for commit 62f9d80 by the Vitest Coverage Report Action

agentcore-cli-automation · 2026-05-04T18:53:04Z

Reviewed the new integ test file. There are a few issues that I think will make several tests fail (or silently pass without testing anything) on preview:

1. `run batch-evaluation` subcommand doesn't exist on `preview`

I can't find a batch-evaluation subcommand registered anywhere under src/cli/commands/run/ or in the primitives — the only run subcommand is run eval (see src/cli/commands/run/command.tsx). Same for --ground-truth — I can't find that flag defined on any command (only groundTruth in TUI wizard types).

The PR description says the code changes from staging PR #104 are "already on preview," but it looks like run batch-evaluation and --ground-truth are part of those code changes and are not on preview yet. The following tests exercise run batch-evaluation and will hit commander's "unknown command" path:

run batch-evaluation requires flags > requires --runtime
run batch-evaluation requires flags > requires --evaluator
all four tests under ground truth file parsing

Options:

(a) Drop the run batch-evaluation / ground-truth tests from this PR and add them back when the command lands on preview.
(b) Land the run batch-evaluation command + --ground-truth flag on preview first, then this PR.
(c) Rewrite these tests against run eval if that was the intent (but run eval has no --ground-truth flag today either).

2. Several assertions in those tests won't hold even if exit codes happen to match

Even though commander exits with code 1 for an unknown subcommand (which would satisfy expect(result.exitCode).toBe(1)), the downstream assertions won't pass:

expect(result.stderr).toContain('--runtime') / '--evaluator' — commander will print "error: unknown command 'batch-evaluation'", not a flag-specific validation message.
parseJsonOutput(result.stdout) in accepts valid ground truth file (array format) and (object format...) will throw "Empty output, cannot parse JSON" because commander writes the unknown-command error to stderr and stdout is empty.
expect(json.error).toContain('deployed') — same issue; there's no JSON to parse.

3. Silent no-op in `rejects code-based evaluator in online eval config`

const codeName = project
  ? (await readProjectConfig(project.projectPath)).evaluators.find(e => e.config.codeBased)?.name
  : undefined;

if (!codeName) return;

If the prior "adds a code-based evaluator" tests failed or ran in a different order, this test returns early with zero assertions and appears to pass. That defeats the purpose of the test. Worth either:

dropping the early return and letting it fail loudly (e.g. expect(codeName).toBeDefined()), or
adding a code-based evaluator inline in this test so it doesn't depend on prior test state.

Also, the project ? ternary is always truthy (project is assigned in beforeAll), so it can be simplified — but the early-return is the real issue.

Happy to re-review once these are sorted out.

The code-based evaluator restriction in OnlineEvalConfigPrimitive was already removed, so this test expected a rejection that no longer occurs.

notgitika · 2026-05-04T19:04:30Z

addressed comments

notgitika requested a review from a team May 4, 2026 18:48

github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label May 4, 2026

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 4, 2026

fix: remove stale test for code-based evaluator rejection in online eval

62f9d80

The code-based evaluator restriction in OnlineEvalConfigPrimitive was already removed, so this test expected a rejection that no longer occurs.

notgitika force-pushed the test/batch-eval-integ-preview branch from eaca03a to 62f9d80 Compare May 4, 2026 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add integ tests for batch evaluation and evaluator lifecycle#1110

test: add integ tests for batch evaluation and evaluator lifecycle#1110
notgitika wants to merge 2 commits intopreviewfrom
test/batch-eval-integ-preview

notgitika commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

agentcore-cli-automation commented May 4, 2026

Uh oh!

notgitika commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

notgitika commented May 4, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

agentcore-cli-automation commented May 4, 2026

1. run batch-evaluation subcommand doesn't exist on preview

2. Several assertions in those tests won't hold even if exit codes happen to match

3. Silent no-op in rejects code-based evaluator in online eval config

Uh oh!

notgitika commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 4, 2026 •

edited

Loading

1. `run batch-evaluation` subcommand doesn't exist on `preview`

3. Silent no-op in `rejects code-based evaluator in online eval config`