Skip to content

[Bug] Test mocks too lenient — contract drift and 409 semantics undetectable #41

@chronoai-shining

Description

@chronoai-shining

Severity: High

Several mocks paper over real failure modes and contract details. Tests pass even when production code is silently broken.

1. aevatar-client mock returns { stream: null } — bypasses the entire event pipeline

sisyphus-api/test/runner/runner.test.ts:6-17:

vi.mock('../../src/runner/aevatar-client.js', () => ({
  startExecution: vi.fn().mockResolvedValue({ executionId: 'exec-123', stream: null }),
  streamEvents: vi.fn().mockResolvedValue(null),
  ...
}));

stream: nullconsumeEventsFromStream early-returns at session-manager.ts:298. The entire event-pipeline (parseSseStream → eventCol.insertOne → broadcastEvent → markCompleted/markFailed) is never executed in any test. A breaking change in AG-UI event shape (e.g. AG renames runFinishedrunComplete) would not fail any test.

Fix: add a fixture SSE stream (new ReadableStream emitting data: {"runFinished":{...}}\n); verify event row inserted, session marked completed, error path with runError marks failed.

2. ornn-client + reference-resolver disagree on response shape

  • sisyphus-api/src/workflows/ornn-client.ts:39-44 reads result.content directly.
  • sisyphus-api/src/workflows/reference-resolver.ts:89 reads result.data?.files.

The mock in test/workflows/workflow.test.ts:6-18 matches ornn-client's assumption only. If the real ornn returns { data: { content, files: ... } } (which reference-resolver suggests), fetchSkillContent returns empty string in production but tests still pass.

Fix: add a contract test that exercises both code paths against ONE fixture mirroring the real ornn response. Investigate which shape ornn actually returns and align both consumers.

3. dto-mappers casing not asserted

test/workflows/workflow.test.ts:222-231 checks workflowYaml contains "mathematical researcher" but never inspects connectorJson[0] shape. The mainnet DTO uses PascalCase (Name, Http.AllowedMethods, TimeoutMs: 60000, …); a regression to camelCase silently breaks every deploy.

Fix: explicit field-name assertions, or use the dto-mappers unit test from the previous issue. Add an mcp-typed connector test (today 100% coverage is http).

4. The "409 already running" assertion is testing a behavior that doesn't exist

test/runner/runner.test.ts:75-83 claims the second startSession("research", …) returns 409. But WorkflowRunController.start only returns 409 when the message contains "already running" (:83) — and session-manager.startSession doesn't throw with that message. Today the mocked startSession happily creates a duplicate; the 409 is not actually validating duplicate prevention.

Fix: either (a) implement real per-type duplicate prevention (see #28 — TOCTOU bug), or (b) drop the bogus assertion. Don't claim to test what isn't tested.

Metadata

Metadata

Labels

bugSomething isn't workingdxDeveloper experience

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions