Skip to content

test(b5): Domain E RED tests — apps/perf/lint polish [F#3 F#8 F#9 F#13 F#16]#148

Merged
SyncTekLLC merged 3 commits into
mainfrom
test/b5-domain-e-apps-perf-lint
May 22, 2026
Merged

test(b5): Domain E RED tests — apps/perf/lint polish [F#3 F#8 F#9 F#13 F#16]#148
SyncTekLLC merged 3 commits into
mainfrom
test/b5-domain-e-apps-perf-lint

Conversation

@SyncTekLLC

Copy link
Copy Markdown
Contributor

Summary

RED test suite for SimDrive b5 Domain E dogfood findings. 14 tests fail on HEAD, 3 pass as shape-preservation anchors.

  • F#3apps() must read CFBundleShortVersionString from app's Info.plist when simctl listapps plist omits it; fallback to build when absent from both
  • F#8tap(verify_change=True) must return screen_changed: bool and ssim_delta: float after capturing pre/post screenshots
  • F#9perf.snapshot() must sample CPU over a 200 ms window and return sample_window_ms field instead of an instant 0.0 value
  • F#13list_replays() must accept min_steps param (default=1) to filter 0-step placeholder recordings
  • F#16LintResult needs a category field; 0-step recordings must be 'empty' not 'fail: no requires block'

Test plan

  • pytest tests/test_b5_domain_e_apps_perf_lint.py -m "not live" — all 14 RED tests fail on HEAD
  • After CodeAtlas implements each finding, corresponding tests go GREEN
  • 3 passing shape-anchor tests must remain passing throughout

🤖 Generated with Claude Code

@SyncTekLLC SyncTekLLC marked this pull request as ready for review May 22, 2026 19:22
@SyncTekLLC SyncTekLLC force-pushed the test/b5-domain-e-apps-perf-lint branch from 1cc6f43 to 8bd89ab Compare May 22, 2026 20:22
@SyncTekLLC

Copy link
Copy Markdown
Contributor Author

Coverage filled in via new test file simdrive/tests/test_b5_domain_e_coverage.py (17 tests, production code unchanged).

Before: 89.21% (below --fail-under=90 gate)
After: 91.05% (gate passes)

New tests target the specific new production paths from F#3/F#8/F#9/F#13/F#16:

  • TestComputeSsim (9 tests) — covers _compute_ssim body in server.py (lines 815-874): None paths, missing files, non-PNG bytes, RGB/RGBA PNGs, size mismatches
  • TestToolTapVerifyChange (3 tests) — covers verify_change block in tool_tap (server.py 1169-1173)
  • TestLintOneOsError (2 tests) — covers OSError branch in _lint_one (recorder.py 843-844)
  • TestLintResultCategoryField (3 tests) — exercises to_dict() for the new F#16 category field

All 17 new tests pass. Existing test suite delta: 1529 → 1546 passed, 2 pre-existing failures in test_cloud_quotas.py (unrelated to this PR).

SyncTekLLC and others added 3 commits May 22, 2026 16:25
14 failing RED tests covering Domain E dogfood findings:
- F#3: apps() reads CFBundleShortVersionString from Info.plist when simctl omits it
- F#8: tap verify_change=True returns screen_changed bool + ssim_delta float
- F#9: perf.snapshot windows CPU over 200ms and returns sample_window_ms field
- F#13: list_replays accepts min_steps param; default=1 excludes 0-step placeholders
- F#16: LintResult category field; 0-step recordings classified 'empty' not 'fail'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
F#3: add _read_app_info_plist helper; list_apps() falls back to reading
     Info.plist from bundle when simctl omits CFBundleShortVersionString;
     final fallback to build number when both sources lack it.
F#8: add _compute_ssim() to server; tool_tap() accepts verify_change=true
     to capture pre/post screenshots and return screen_changed bool +
     ssim_delta float; default behaviour (no extra keys) unchanged.
F#9: perf.snapshot() now samples CPU over ~200 ms window (3 samples),
     averages them, and returns sample_window_ms=200 in the result dict.
F#13: list_replays() accepts min_steps=1 default; 0-step placeholders
      filtered out unless caller passes min_steps=0.
F#16: LintResult gains category field; 0-step recordings with no requires
      block get status='empty'/category='empty' instead of 'fail'; real
      recordings with steps-but-no-requires still get category='missing_state_contract'.
      Updated stale test_lint_one_missing_requires to match new semantic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…duction lines

Add simdrive/tests/test_b5_domain_e_coverage.py (17 tests) to cover the
new production paths introduced in the feat(b5) commit that dropped coverage
from 90%+ to 89.21%:

- TestComputeSsim (9 tests): exercises _compute_ssim — None paths, missing
  files, non-PNG bytes, identical/different RGB/RGBA PNGs, mismatched dims,
  empty string paths. Covers server.py lines 815-874.

- TestToolTapVerifyChange (3 tests): verify_change=True/False paths in
  tool_tap with monkeypatched _compute_ssim. Covers server.py 1169-1173.

- TestLintOneOsError (2 tests): OSError branch in _lint_one via patched
  Path.read_text. Covers recorder.py lines 843-844.

- TestLintResultCategoryField (3 tests): to_dict() round-trip for category
  values 'ok', 'empty', 'missing_state_contract' (F#16 field). Covers the
  category serialisation path.

Production code unchanged. CI gate: 89.21% → 91.05%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@SyncTekLLC SyncTekLLC force-pushed the test/b5-domain-e-apps-perf-lint branch from 8bd89ab to 63128a0 Compare May 22, 2026 20:25
@SyncTekLLC SyncTekLLC merged commit ee280d3 into main May 22, 2026
5 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant