test(b5): Domain E RED tests — apps/perf/lint polish [F#3 F#8 F#9 F#13 F#16]#148
Merged
Conversation
1cc6f43 to
8bd89ab
Compare
Contributor
Author
|
Coverage filled in via new test file Before: 89.21% (below --fail-under=90 gate) New tests target the specific new production paths from F#3/F#8/F#9/F#13/F#16:
All 17 new tests pass. Existing test suite delta: 1529 → 1546 passed, 2 pre-existing failures in test_cloud_quotas.py (unrelated to this PR). |
14 failing RED tests covering Domain E dogfood findings: - F#3: apps() reads CFBundleShortVersionString from Info.plist when simctl omits it - F#8: tap verify_change=True returns screen_changed bool + ssim_delta float - F#9: perf.snapshot windows CPU over 200ms and returns sample_window_ms field - F#13: list_replays accepts min_steps param; default=1 excludes 0-step placeholders - F#16: LintResult category field; 0-step recordings classified 'empty' not 'fail' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
F#3: add _read_app_info_plist helper; list_apps() falls back to reading
Info.plist from bundle when simctl omits CFBundleShortVersionString;
final fallback to build number when both sources lack it.
F#8: add _compute_ssim() to server; tool_tap() accepts verify_change=true
to capture pre/post screenshots and return screen_changed bool +
ssim_delta float; default behaviour (no extra keys) unchanged.
F#9: perf.snapshot() now samples CPU over ~200 ms window (3 samples),
averages them, and returns sample_window_ms=200 in the result dict.
F#13: list_replays() accepts min_steps=1 default; 0-step placeholders
filtered out unless caller passes min_steps=0.
F#16: LintResult gains category field; 0-step recordings with no requires
block get status='empty'/category='empty' instead of 'fail'; real
recordings with steps-but-no-requires still get category='missing_state_contract'.
Updated stale test_lint_one_missing_requires to match new semantic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…duction lines Add simdrive/tests/test_b5_domain_e_coverage.py (17 tests) to cover the new production paths introduced in the feat(b5) commit that dropped coverage from 90%+ to 89.21%: - TestComputeSsim (9 tests): exercises _compute_ssim — None paths, missing files, non-PNG bytes, identical/different RGB/RGBA PNGs, mismatched dims, empty string paths. Covers server.py lines 815-874. - TestToolTapVerifyChange (3 tests): verify_change=True/False paths in tool_tap with monkeypatched _compute_ssim. Covers server.py 1169-1173. - TestLintOneOsError (2 tests): OSError branch in _lint_one via patched Path.read_text. Covers recorder.py lines 843-844. - TestLintResultCategoryField (3 tests): to_dict() round-trip for category values 'ok', 'empty', 'missing_state_contract' (F#16 field). Covers the category serialisation path. Production code unchanged. CI gate: 89.21% → 91.05%. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8bd89ab to
63128a0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RED test suite for SimDrive b5 Domain E dogfood findings. 14 tests fail on HEAD, 3 pass as shape-preservation anchors.
apps()must readCFBundleShortVersionStringfrom app'sInfo.plistwhensimctl listappsplist omits it; fallback to build when absent from bothtap(verify_change=True)must returnscreen_changed: boolandssim_delta: floatafter capturing pre/post screenshotsperf.snapshot()must sample CPU over a 200 ms window and returnsample_window_msfield instead of an instant 0.0 valuelist_replays()must acceptmin_stepsparam (default=1) to filter 0-step placeholder recordingsLintResultneeds acategoryfield; 0-step recordings must be'empty'not'fail: no requires block'Test plan
pytest tests/test_b5_domain_e_apps_perf_lint.py -m "not live"— all 14 RED tests fail on HEAD🤖 Generated with Claude Code