test: add integ tests for batch evaluation and evaluator lifecycle by notgitika · Pull Request #1109 · aws/agentcore-cli

notgitika · 2026-05-04T18:48:04Z

Summary

Adds 20 integration tests covering batch evaluation, evaluator, and online eval config CLI lifecycle:

Batch eval flag validation: --runtime required, --evaluator required
Run eval flag validation: --evaluator required
Evaluator lifecycle: SESSION/TRACE/TOOL_CALL levels, code-based (external lambda + managed), config file import
Online eval config lifecycle: builtin evaluator refs, enable-on-create, multiple evaluators, sampling rate
Evaluator validation edge cases: invalid level, --model with code-based, --lambda-arn without code-based
Ground truth file parsing: malformed JSON, wrong structure, valid array format, valid object format

Originally from private-agentcore-cli-staging PR #104. Only the test file is needed — the code changes from that PR (remove code-based evaluator blocks, add test utils) are already on main.

Test plan

npm run test:integ passes (20 new tests)

Adds 20 integration tests covering: - batch-evaluation CLI flag validation (requires --runtime, --evaluator) - run eval CLI flag validation - Evaluator lifecycle: SESSION, TRACE, TOOL_CALL levels, code-based (external lambda + managed), config file import - Online eval config lifecycle: builtin evaluator refs, enable-on-create, multiple evaluators, sampling rate - Evaluator validation edge cases: invalid level, --model with code-based, --lambda-arn without code-based - Ground truth file parsing: malformed JSON, wrong structure, valid array format, valid object format with sessionMetadata key Originally from private-agentcore-cli-staging PR #104.

github-actions · 2026-05-04T18:51:42Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	42.89%	8936 / 20830
🔵	Statements	42.17%	9485 / 22491
🔵	Functions	39.66%	1537 / 3875
🔵	Branches	39.82%	5769 / 14486

Generated in workflow #2370 for commit d549817 by the Vitest Coverage Report Action

agentcore-cli-automation · 2026-05-04T18:53:03Z

The test rejects code-based evaluator in online eval config (lines 195–223) appears to assert behavior that doesn't exist in the code. I searched OnlineEvalConfigPrimitive, OnlineEvalConfigSchema, the project-level superRefine in agentcore-project.ts, and the post-deploy online-eval code — nothing rejects a code-based evaluator reference in an online eval config. The test expects exitCode === 1 and error to contain 'Code-based', but the add online-eval command will currently succeed when given a code-based evaluator name, so this test should fail against main.

A couple of other things worth noting about that same test:

const codeName = project ? ... : undefined; — project is always defined after beforeAll, so the ternary is dead code.
if (!codeName) return; silently no-ops with no assertions if no code-based evaluator is found, which would quietly hide regressions if the earlier add-evaluator tests ever change.

Options to fix:

(a) If enforcement is intended, add the validation in OnlineEvalConfigPrimitive.createOnlineEvalConfig (look up each evaluator name in project.evaluators and reject if config.codeBased is set), with an error message starting with "Code-based". Then this test will pass as-is.
(b) If code-based evaluators are actually allowed in online eval configs, drop this test case (or flip it to assert success). The PR description mentions "remove code-based evaluator blocks ... are already on main", which reads like option (b) may be the intent — in which case this assertion is stale.
(c) At minimum, replace the silent if (!codeName) return; with an expect(codeName).toBeDefined() so the test can't pass by accident.

Could you confirm which direction is intended and update the test accordingly?

The code-based evaluator restriction in OnlineEvalConfigPrimitive was already removed, so this test expected a rejection that no longer occurs.

notgitika · 2026-05-04T19:04:38Z

addressed comments

…icate

notgitika requested a review from a team May 4, 2026 18:48

github-actions Bot added size/l PR size: L agentcore-harness-reviewing AgentCore Harness review in progress labels May 4, 2026

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 4, 2026

fix: remove stale test for code-based evaluator rejection in online eval

f4ec6f7

The code-based evaluator restriction in OnlineEvalConfigPrimitive was already removed, so this test expected a rejection that no longer occurs.

github-actions Bot added size/l PR size: L and removed size/l PR size: L labels May 4, 2026

refactor: use shared runSuccess from test-utils instead of local dupl…

d549817

…icate

github-actions Bot added size/m PR size: M and removed size/l PR size: L labels May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add integ tests for batch evaluation and evaluator lifecycle#1109

test: add integ tests for batch evaluation and evaluator lifecycle#1109
notgitika wants to merge 3 commits intomainfrom
test/batch-eval-integ-main

notgitika commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

agentcore-cli-automation commented May 4, 2026

Uh oh!

notgitika commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

notgitika commented May 4, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

agentcore-cli-automation commented May 4, 2026

Uh oh!

notgitika commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 4, 2026 •

edited

Loading