Skip to content

test: add integ tests for batch evaluation and evaluator lifecycle#1109

Open
notgitika wants to merge 3 commits intomainfrom
test/batch-eval-integ-main
Open

test: add integ tests for batch evaluation and evaluator lifecycle#1109
notgitika wants to merge 3 commits intomainfrom
test/batch-eval-integ-main

Conversation

@notgitika
Copy link
Copy Markdown
Contributor

Summary

Adds 20 integration tests covering batch evaluation, evaluator, and online eval config CLI lifecycle:

  • Batch eval flag validation: --runtime required, --evaluator required
  • Run eval flag validation: --evaluator required
  • Evaluator lifecycle: SESSION/TRACE/TOOL_CALL levels, code-based (external lambda + managed), config file import
  • Online eval config lifecycle: builtin evaluator refs, enable-on-create, multiple evaluators, sampling rate
  • Evaluator validation edge cases: invalid level, --model with code-based, --lambda-arn without code-based
  • Ground truth file parsing: malformed JSON, wrong structure, valid array format, valid object format

Originally from private-agentcore-cli-staging PR #104. Only the test file is needed — the code changes from that PR (remove code-based evaluator blocks, add test utils) are already on main.

Test plan

  • npm run test:integ passes (20 new tests)

Adds 20 integration tests covering:
- batch-evaluation CLI flag validation (requires --runtime, --evaluator)
- run eval CLI flag validation
- Evaluator lifecycle: SESSION, TRACE, TOOL_CALL levels, code-based
  (external lambda + managed), config file import
- Online eval config lifecycle: builtin evaluator refs, enable-on-create,
  multiple evaluators, sampling rate
- Evaluator validation edge cases: invalid level, --model with code-based,
  --lambda-arn without code-based
- Ground truth file parsing: malformed JSON, wrong structure, valid array
  format, valid object format with sessionMetadata key

Originally from private-agentcore-cli-staging PR #104.
@notgitika notgitika requested a review from a team May 4, 2026 18:48
@github-actions github-actions Bot added size/l PR size: L agentcore-harness-reviewing AgentCore Harness review in progress labels May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 42.89% 8936 / 20830
🔵 Statements 42.17% 9485 / 22491
🔵 Functions 39.66% 1537 / 3875
🔵 Branches 39.82% 5769 / 14486
Generated in workflow #2370 for commit d549817 by the Vitest Coverage Report Action

@agentcore-cli-automation
Copy link
Copy Markdown

The test rejects code-based evaluator in online eval config (lines 195–223) appears to assert behavior that doesn't exist in the code. I searched OnlineEvalConfigPrimitive, OnlineEvalConfigSchema, the project-level superRefine in agentcore-project.ts, and the post-deploy online-eval code — nothing rejects a code-based evaluator reference in an online eval config. The test expects exitCode === 1 and error to contain 'Code-based', but the add online-eval command will currently succeed when given a code-based evaluator name, so this test should fail against main.

A couple of other things worth noting about that same test:

  1. const codeName = project ? ... : undefined;project is always defined after beforeAll, so the ternary is dead code.
  2. if (!codeName) return; silently no-ops with no assertions if no code-based evaluator is found, which would quietly hide regressions if the earlier add-evaluator tests ever change.

Options to fix:

  • (a) If enforcement is intended, add the validation in OnlineEvalConfigPrimitive.createOnlineEvalConfig (look up each evaluator name in project.evaluators and reject if config.codeBased is set), with an error message starting with "Code-based". Then this test will pass as-is.
  • (b) If code-based evaluators are actually allowed in online eval configs, drop this test case (or flip it to assert success). The PR description mentions "remove code-based evaluator blocks ... are already on main", which reads like option (b) may be the intent — in which case this assertion is stale.
  • (c) At minimum, replace the silent if (!codeName) return; with an expect(codeName).toBeDefined() so the test can't pass by accident.

Could you confirm which direction is intended and update the test accordingly?

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 4, 2026
The code-based evaluator restriction in OnlineEvalConfigPrimitive was
already removed, so this test expected a rejection that no longer occurs.
@github-actions github-actions Bot added size/l PR size: L and removed size/l PR size: L labels May 4, 2026
@notgitika
Copy link
Copy Markdown
Contributor Author

addressed comments

@github-actions github-actions Bot added size/m PR size: M and removed size/l PR size: L labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants