Skip to content

Kass/test cases batch 2#695

Open
thrialectics wants to merge 4 commits into
mainfrom
kass/test-cases-batch-2
Open

Kass/test cases batch 2#695
thrialectics wants to merge 4 commits into
mainfrom
kass/test-cases-batch-2

Conversation

@thrialectics
Copy link
Copy Markdown
Contributor

@thrialectics thrialectics commented May 15, 2026

Three L1 failure-mode reproducers for the unified runner (DEV-1698):

  • misinterpretation_correction_creates_orphan_fact — synthetic
  • deriver_no_memory_signal — 8 SWE-chat sessions
  • dreamer_no_dedup — 3 SWE-chat sessions

Runner additions: AddMessagesFromFixtureAction, SaveArtifactAction, continue_on_failure, and _run_metadata.json for replay (procedure in tests/unified/README.md).

All gates start at pass_if=true per verify-first methodology. SWE-chat sample under data/ is SALT-NLP/SWE-chat (ODC-BY); attribution embedded in the file.

Summary by CodeRabbit

  • Tests

    • Enhanced unified test infrastructure to capture per-test run metadata and artifacts for better test reproducibility and debugging.
    • Added support for additional test step types including message fixture loading.
    • Added multiple new test fixtures for validating specific failure modes and system behaviors.
  • Documentation

    • Updated test documentation with guidance on failure-mode reproducers, artifact inspection, and manual snapshot replay procedures.
  • Chores

    • Updated configuration to properly manage test artifact directories.

Review Change Stack

Three orthogonal additions that together let the unified runner host
tests over real conversational data:

- `AddMessagesFromFixtureAction` -- ingest messages from a fixture JSON
  under tests/unified/test_cases/data/ rather than inline. Maps fixture
  role=user / role=assistant to configured peer IDs; preserves per-
  message timestamps when present.

- `continue_on_failure` flag on TestDefinition -- when set, per-step
  assertion failures collect rather than abort, so suites with N
  independent scenarios aggregate per-session pass/fail. Infrastructure
  errors still abort.

- `SaveArtifactAction` -- write a query result (get_representation /
  get_peer_card / get_context) to a named JSON file under a per-test
  artifact dir. No assertion coupling, so offline analysis is
  decoupled from the pass/fail gate.

Per-test artifact dir is at tests/unified/artifacts/<test_name>_<utc_ts>/
(override with UNIFIED_TEST_ARTIFACTS_DIR; gitignored). Each test run
also writes _run_metadata.json capturing honcho git SHA + dirty status,
transport + model for each LLM-using agent, SHA-256 hashes of relevant
prompt source files, judge model, and timestamp. Manual snapshot-replay
procedure documented in tests/unified/README.md; automatic replay
deferred.
L1 reproducer for the deriver_no_memory_signal failure mode: the
deriver writes observations that don't add signal about the user or
their work -- bare acknowledgments paraphrased as facts ("user said
yes"), shell-history fragments ("user executed cd ~/foo"), instruction-
event-records that look like durable user attributes but aren't.

Inputs: 8 sessions sampled from SALT-NLP/SWE-chat (ODC-BY,
https://arxiv.org/abs/2604.20779), persona-balanced across Expert
Nitpicker / Vague Requester / Mind Changer. Two sessions from the
underlying extract are excluded for data quality (one Japanese with
a heavily skewed user/assistant role split; one near-monologue);
documented in the fixture description.

Per session: ingest messages, wait for queue empty, run an llm_judge
asking whether the user's representation contains any Tier 1 bare-
speech-act observation, then save_artifact-dump the representation
for offline analysis. The suite uses continue_on_failure to aggregate
per-session pass/fail. Judge prompt is calibrated with annotated
example observations across three tiers (Tier 1 / Tier 2 / Signal) to
distinguish bare-event records from specifics-anchored observations.

Inputs file at tests/unified/test_cases/data/swe_chat_sample_persona_balanced.json
-- the committed dataset extract. Metadata embedded in the file
documents its provenance (SALT-NLP/SWE-chat, ODC-BY, selection
criteria, extraction timestamp).
L1 reproducer for the dreamer's failure to consolidate duplicate
observations. The dreamer's documented role is consolidation:
rewriting / merging semantically related observations into better-
stated ones, and deleting redundant observations. This fixture
asserts the consolidation specifically against duplicates --
exact-text duplicates, near-duplicates (varying only in trivial
details), and semantic duplicates (the same fact stated from
different angles).

Inputs: 3 sessions sampled from SALT-NLP/SWE-chat (indices 5, 6, 7
from swe_chat_sample_persona_balanced.json). Each session is
ingested through the deriver, then dreamed, then queried for
residual duplicates. Two-gate structure per session:

  - PRE-DREAM SETUP GATE: assert >=2 duplicate observations exist
    before the dream is scheduled. If the deriver did not produce
    duplicates, the test errors at the setup gate rather than
    silently passing the post-dream gate on no inputs.

  - POST-DREAM MAIN GATE (strict v1): assert ZERO duplicate
    observations remain after the dream. Strict gate; relative
    thresholds (post-dream count < pre-dream count) could be added
    later via runner-level cross-step state.

Judge prompts are calibrated with three flavors of duplicate
patterns (exact-content duplicates, near-twins of a single shell
command differing only by embedded timestamp, and opposite-polarity
semantic twins describing the same convention).
…ixture

L1 reproducer for the failure mode in the wiki failure-modes catalog:
the agent misinterprets the user's statement; the user explicitly
corrects; both the wrong claim (from the agent's misread) and the
corrected claim end up in storage with no supersession marker. End-
state shape resembles time_drift_storage_no_supersession but the
trigger is agent misinterpretation rather than user mind-change, and
the wrong claim is one the user never literally stated.

Setup: user_a clearly scopes their work to extracting a rate-limiter
subsystem from a Python payment-service monolith into a standalone
Go microservice. Assistant misreads as "full payment service
migration to Go." User immediately corrects and reaffirms the narrow
scope. Subsequent turns develop substantive technical content
(gRPC RPCs, Redis token-bucket, Lua atomicity, circuit breaker
trade-offs, testing strategy). The wrong claim never appears in a
user turn -- only in the assistant's misread.

Two-gate structure: positive control asserts the representation
captures the correct-scope work; negative reproducer asserts the
representation does NOT contain the assistant's misinterpretation
(claims about migrating the whole payment service or rewriting the
Python monolith). Both gates start at pass_if=true (invariant); per
verify-first methodology, polarity flips to pass_if=false only after
observed firing on real code.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Walkthrough

This PR introduces end-to-end artifact capture and failure-mode reproducer test infrastructure. It extends the unified test schema with fixture-loading and error-continuation capabilities, implements metadata/artifact serialization in the executor, adds new step handlers, refines .gitignore rules, documents the reproducer pattern and replay procedures, and contributes three production test fixtures that validate agent behavior.

Changes

Unified Test Infrastructure and Fixtures

Layer / File(s) Summary
Schema contracts for fixture loading and error continuation
tests/unified/schema.py
AddMessagesFromFixtureAction step type enables loading session messages from external JSON fixtures with peer-ID mapping and optional turn truncation. TestDefinition gains continue_on_failure flag to control whether assertion failures abort or accumulate.
Executor metadata and artifact infrastructure
tests/unified/runner.py
UnifiedTestExecutor gains artifacts_root parameter and implements per-test artifact directory creation, _run_metadata.json serialization (including git SHA, dirty state, agent models, prompt file hashes), JSON artifact writing with flat-path validation and serialization fallback, and failure classification between assertion errors (respecting continue_on_failure) and infrastructure errors.
New step action handlers
tests/unified/runner.py
AddMessagesFromFixtureAction handler loads JSON fixtures, validates session index, maps fixture roles to configured peer IDs, parses optional ISO timestamps, and bulk-adds constructed messages. SaveArtifactAction handler fetches target data and writes serialized JSON artifacts to the test's artifacts directory.
Project configuration and documentation
.gitignore, tests/unified/README.md
.gitignore updated to ignore tests/unified/artifacts/ while explicitly including tests/unified/test_cases/data/ for checked-in fixtures. README documents failure-mode reproducer test pattern, per-test artifact files, and manual replay procedures using recorded metadata.
Failure-mode reproducer test fixtures
tests/unified/test_cases/deriver_no_memory_signal.json, tests/unified/test_cases/dreamer_no_dedup.json, tests/unified/test_cases/misinterpretation_correction_creates_orphan_fact.json
Three new test fixtures: deriver_no_memory_signal exercises Tier 1/2/Signal observation classification across 8 sessions with continue_on_failure aggregation; dreamer_no_dedup verifies duplicate consolidation with pre/post representation capture and two llm_judge gates per session; misinterpretation_correction_creates_orphan_fact captures scripted assistant misinterpretation and user correction, then asserts correct and incorrect observations via two-gate llm_judge pattern.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 A rabbit built a test warren so fine,
With artifacts captured, each snapshot divine,
Fixtures loaded from files, reproduction precise,
Failures can gather like carrots in ice,
From dreamer to deriver, the tests now align!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Kass/test cases batch 2' is vague and generic, referring only to author/branch name and batch numbering without describing the actual changes or improvements being made. Provide a more descriptive title that summarizes the main contribution, such as 'Add failure-mode reproducers and artifact capture to unified test runner' or 'Introduce AddMessagesFromFixtureAction and test run metadata snapshotting'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kass/test-cases-batch-2

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
.gitignore (1)

10-12: ⚡ Quick win

Unignore fixture contents explicitly, not just the directory.

Given the broad data ignore rule, add an explicit recursive unignore so new fixture files under this directory are always tracked.

✅ Suggested fix
 !tests/unified/test_cases/data/
+!tests/unified/test_cases/data/**
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.gitignore around lines 10 - 12, The .gitignore currently ignores the
top-level data directory but only unignores the directory path
'!tests/unified/test_cases/data/' which does not unignore files inside; update
the unignore to be recursive so fixture files are tracked (i.e. replace or add a
recursive unignore for tests/unified/test_cases/data such that files under that
directory are not ignored while keeping the broader 'data' and 'redis-data'
ignore rules intact).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unified/runner.py`:
- Around line 514-520: The code currently builds fixture_path from FIXTURES_DIR
/ step.fixture_path and opens it directly, allowing path traversal (e.g., "../")
to escape the fixture root; fix by resolving the combined path (use
Path.resolve(strict=False) on FIXTURES_DIR / step.fixture_path) and verify the
resolved path is a child of FIXTURES_DIR.resolve() (e.g., compare parents or
string prefix) and raise FileNotFoundError or ValueError if it is outside the
fixture root before proceeding to open the file (references: FIXTURES_DIR,
step.fixture_path, fixture_path, and the with open(...) call).
- Around line 523-531: The runtime should reject negative fixture indices and
non-positive limits: before using step.fixture_session_index against
sessions_in_fixture, add a check that step.fixture_session_index is >= 0 and
raise an IndexError with a clear message if not (currently only upper bound is
checked); similarly, before slicing fixture_messages using step.limit, validate
that if step.limit is not None then step.limit > 0 and raise a ValueError (or
appropriate exception) for limit <= 0 with a descriptive message; update the
checks around fixture_session = sessions_in_fixture[...] and fixture_messages =
fixture_session.get("messages", []) to enforce these validations.
- Around line 346-347: Three f-strings are implicitly concatenated and must be
joined explicitly: replace the adjacent f-strings in the message built with
test_name and assertion_failures so they are joined with +; similarly join the
two adjacent f-strings that reference fixture_path, step.fixture_path, and
FIXTURES_DIR with +; and join the two adjacent f-strings that reference
step.fixture_session_index and sessions_in_fixture (len(sessions_in_fixture))
with +. Locate these by the symbols test_name, assertion_failures, fixture_path,
FIXTURES_DIR, step.fixture_path, step.fixture_session_index, and
sessions_in_fixture in tests/unified/runner.py and change the implicit
concatenations to use the + operator to form a single string expression.

In `@tests/unified/schema.py`:
- Around line 83-95: The fixture_session_index and limit fields currently allow
negative/zero values; update their Pydantic Field declarations to enforce
bounds: change fixture_session_index to require ge=0 (non-negative integer) and
change limit to require ge=1 (positive integer) by adding the appropriate Field
constraints (e.g., Field(..., ge=0, description=...) for fixture_session_index
and Field(None, ge=1, description=...) for limit) so invalid indices or limits
are rejected at validation time; ensure you keep the existing descriptions on
the Field calls.

---

Nitpick comments:
In @.gitignore:
- Around line 10-12: The .gitignore currently ignores the top-level data
directory but only unignores the directory path
'!tests/unified/test_cases/data/' which does not unignore files inside; update
the unignore to be recursive so fixture files are tracked (i.e. replace or add a
recursive unignore for tests/unified/test_cases/data such that files under that
directory are not ignored while keeping the broader 'data' and 'redis-data'
ignore rules intact).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 798cb782-e304-4602-90f0-17df0aeb026e

📥 Commits

Reviewing files that changed from the base of the PR and between 8fcbb54 and 2aac580.

📒 Files selected for processing (8)
  • .gitignore
  • tests/unified/README.md
  • tests/unified/runner.py
  • tests/unified/schema.py
  • tests/unified/test_cases/data/swe_chat_sample_persona_balanced.json
  • tests/unified/test_cases/deriver_no_memory_signal.json
  • tests/unified/test_cases/dreamer_no_dedup.json
  • tests/unified/test_cases/misinterpretation_correction_creates_orphan_fact.json

Comment thread tests/unified/runner.py
Comment on lines +346 to +347
f"Test {test_name} FAILED with {len(assertion_failures)} "
f"assertion failure(s) (continue_on_failure=true):"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the exact warning locations and then recheck after patching.
rg -n "assertion failure\\(s\\)|resolved from|out of range" tests/unified/runner.py

Repository: plastic-labs/honcho

Length of output: 304


🏁 Script executed:

sed -n '345,348p' tests/unified/runner.py

Repository: plastic-labs/honcho

Length of output: 246


🏁 Script executed:

sed -n '516,519p' tests/unified/runner.py

Repository: plastic-labs/honcho

Length of output: 283


🏁 Script executed:

sed -n '524,527p' tests/unified/runner.py

Repository: plastic-labs/honcho

Length of output: 276


Fix implicit string concatenation at all three locations to comply with basedpyright's reportImplicitStringConcatenation check.

These lines use adjacent f-strings without explicit operators, which triggers CI warnings:

  • Lines 346-347: f"Test {test_name} FAILED with {len(assertion_failures)} " f"assertion failure(s) (continue_on_failure=true):"
  • Lines 517-518: f"Fixture not found: {fixture_path} (resolved from " f"fixture_path={step.fixture_path!r}, FIXTURES_DIR={FIXTURES_DIR})"
  • Lines 525-526: f"fixture_session_index={step.fixture_session_index} out of range " f"(fixture has {len(sessions_in_fixture)} sessions)"

Use explicit + operator to join the strings.

🧰 Tools
🪛 GitHub Actions: Static Analysis / 0_basedpyright.txt

[warning] 346-346: basedpyright warning: Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🪛 GitHub Actions: Static Analysis / basedpyright

[warning] 346-346: basedpyright: Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🪛 GitHub Check: basedpyright

[warning] 346-346:
Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unified/runner.py` around lines 346 - 347, Three f-strings are
implicitly concatenated and must be joined explicitly: replace the adjacent
f-strings in the message built with test_name and assertion_failures so they are
joined with +; similarly join the two adjacent f-strings that reference
fixture_path, step.fixture_path, and FIXTURES_DIR with +; and join the two
adjacent f-strings that reference step.fixture_session_index and
sessions_in_fixture (len(sessions_in_fixture)) with +. Locate these by the
symbols test_name, assertion_failures, fixture_path, FIXTURES_DIR,
step.fixture_path, step.fixture_session_index, and sessions_in_fixture in
tests/unified/runner.py and change the implicit concatenations to use the +
operator to form a single string expression.

Comment thread tests/unified/runner.py
Comment on lines +514 to +520
fixture_path = FIXTURES_DIR / step.fixture_path
if not fixture_path.is_file():
raise FileNotFoundError(
f"Fixture not found: {fixture_path} (resolved from "
f"fixture_path={step.fixture_path!r}, FIXTURES_DIR={FIXTURES_DIR})"
)
with open(fixture_path) as f:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate fixture paths stay under tests/unified/test_cases/data.

fixture_path is concatenated directly with FIXTURES_DIR, so values like ../... can escape the fixture root and read arbitrary files.

✅ Suggested fix
         elif isinstance(step, AddMessagesFromFixtureAction):
-            fixture_path = FIXTURES_DIR / step.fixture_path
+            fixture_path = (FIXTURES_DIR / step.fixture_path).resolve()
+            fixtures_root = FIXTURES_DIR.resolve()
+            if fixtures_root not in fixture_path.parents:
+                raise ValueError(
+                    f"Invalid fixture_path {step.fixture_path!r}: must resolve under {fixtures_root}"
+                )
             if not fixture_path.is_file():
                 raise FileNotFoundError(
🧰 Tools
🪛 GitHub Actions: Static Analysis / 0_basedpyright.txt

[warning] 517-517: basedpyright warning: Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🪛 GitHub Actions: Static Analysis / basedpyright

[warning] 517-517: basedpyright: Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🪛 GitHub Check: basedpyright

[warning] 517-517:
Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unified/runner.py` around lines 514 - 520, The code currently builds
fixture_path from FIXTURES_DIR / step.fixture_path and opens it directly,
allowing path traversal (e.g., "../") to escape the fixture root; fix by
resolving the combined path (use Path.resolve(strict=False) on FIXTURES_DIR /
step.fixture_path) and verify the resolved path is a child of
FIXTURES_DIR.resolve() (e.g., compare parents or string prefix) and raise
FileNotFoundError or ValueError if it is outside the fixture root before
proceeding to open the file (references: FIXTURES_DIR, step.fixture_path,
fixture_path, and the with open(...) call).

Comment thread tests/unified/runner.py
Comment on lines +523 to +531
if step.fixture_session_index >= len(sessions_in_fixture):
raise IndexError(
f"fixture_session_index={step.fixture_session_index} out of range "
f"(fixture has {len(sessions_in_fixture)} sessions)"
)
fixture_session = sessions_in_fixture[step.fixture_session_index]
fixture_messages = fixture_session.get("messages", [])
if step.limit is not None:
fixture_messages = fixture_messages[: step.limit]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject negative fixture indices and non-positive limits at runtime.

Line 523 only validates the upper bound; negative fixture_session_index still loads from the end. Line 530 allows limit <= 0, which yields surprising slices.

✅ Suggested fix
             sessions_in_fixture = fixture.get("sessions", [])
+            if step.fixture_session_index < 0:
+                raise IndexError(
+                    f"fixture_session_index must be >= 0; got {step.fixture_session_index}"
+                )
             if step.fixture_session_index >= len(sessions_in_fixture):
                 raise IndexError(
@@
             fixture_messages = fixture_session.get("messages", [])
             if step.limit is not None:
+                if step.limit <= 0:
+                    raise ValueError(f"limit must be > 0; got {step.limit}")
                 fixture_messages = fixture_messages[: step.limit]
🧰 Tools
🪛 GitHub Actions: Static Analysis / 0_basedpyright.txt

[warning] 525-525: basedpyright warning: Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🪛 GitHub Actions: Static Analysis / basedpyright

[warning] 525-525: basedpyright: Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🪛 GitHub Check: basedpyright

[warning] 525-525:
Implicit string concatenation not allowed (reportImplicitStringConcatenation)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unified/runner.py` around lines 523 - 531, The runtime should reject
negative fixture indices and non-positive limits: before using
step.fixture_session_index against sessions_in_fixture, add a check that
step.fixture_session_index is >= 0 and raise an IndexError with a clear message
if not (currently only upper bound is checked); similarly, before slicing
fixture_messages using step.limit, validate that if step.limit is not None then
step.limit > 0 and raise a ValueError (or appropriate exception) for limit <= 0
with a descriptive message; update the checks around fixture_session =
sessions_in_fixture[...] and fixture_messages = fixture_session.get("messages",
[]) to enforce these validations.

Comment thread tests/unified/schema.py
Comment on lines +83 to +95
fixture_session_index: int = Field(
..., description="Index into fixture['sessions'] selecting which session to load"
)
user_peer_id: str = Field(
..., description="Honcho peer ID to attribute fixture role='user' messages to"
)
assistant_peer_id: str | None = Field(
None,
description="Honcho peer ID for fixture role='assistant' messages. If omitted, assistant messages are skipped.",
)
limit: int | None = Field(
None, description="Cap on number of conversational turns to ingest"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add bounds to fixture index and limit fields.

fixture_session_index and limit accept negative/non-positive values right now. At runtime, this can silently select unintended sessions or truncate turns unexpectedly.

✅ Suggested fix
 class AddMessagesFromFixtureAction(TestStep):
@@
-    fixture_session_index: int = Field(
-        ..., description="Index into fixture['sessions'] selecting which session to load"
-    )
+    fixture_session_index: int = Field(
+        ...,
+        ge=0,
+        description="Index into fixture['sessions'] selecting which session to load",
+    )
@@
-    limit: int | None = Field(
-        None, description="Cap on number of conversational turns to ingest"
-    )
+    limit: int | None = Field(
+        None, gt=0, description="Cap on number of conversational turns to ingest"
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unified/schema.py` around lines 83 - 95, The fixture_session_index and
limit fields currently allow negative/zero values; update their Pydantic Field
declarations to enforce bounds: change fixture_session_index to require ge=0
(non-negative integer) and change limit to require ge=1 (positive integer) by
adding the appropriate Field constraints (e.g., Field(..., ge=0,
description=...) for fixture_session_index and Field(None, ge=1,
description=...) for limit) so invalid indices or limits are rejected at
validation time; ensure you keep the existing descriptions on the Field calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant