Skip to content

test(b5): Domain C text targeting / OCR red tests [F#6 F#5 F#4 F#7 F#18]#149

Merged
SyncTekLLC merged 3 commits into
mainfrom
test/b5-domain-c-text-targeting
May 22, 2026
Merged

test(b5): Domain C text targeting / OCR red tests [F#6 F#5 F#4 F#7 F#18]#149
SyncTekLLC merged 3 commits into
mainfrom
test/b5-domain-c-text-targeting

Conversation

@SyncTekLLC

Copy link
Copy Markdown
Contributor

Summary

  • Red-test suite for SimDrive b5 Domain C: text targeting and OCR semantics
  • 18 tests total: 10 failing RED (unimplemented behaviors), 8 passing (already-shipped or regression guards)
  • No production code changed — test file only

Failing tests (10 RED)

Finding Test Asserts
F#5 test_stale_text_error_includes_suggestion target_not_found.details must include suggestion with closest fuzzy match
F#4 test_mark_exposes_alternates_field Mark must have alternates: list attribute
F#4 test_mark_to_dict_includes_alternates Mark.to_dict() must include alternates key
F#4 test_alternates_contains_both_ocr_readings alternates must be a list
F#7 test_annotate_false_still_returns_nonempty_marks observe(annotate=False) must call detect_marks and return marks
F#7 test_annotate_false_does_not_call_som_annotate detect_marks must be called; som.annotate must NOT be called
F#18 test_wifi_label_not_low 'Wi-Fi' must not be 'low' band
F#18 test_bluetooth_label_not_low 'Bluetooth' must not be 'low' band
F#18 test_general_label_not_low 'General' must not be 'low' band
F#18 test_apple_prefs_tech_labels_not_low 4 Apple Preferences tech labels must not all be 'low'

Passing tests (8 GREEN — already shipped or regression guards)

Root causes surfaced

  • F#5: target_not_found details has no fuzzy-match suggestion field; CodeAtlas fix: compute Levenshtein/prefix closest match from available marks and add to details
  • F#4: Mark dataclass has no alternates field; CodeAtlas fix: add alternates: list[str] = field(default_factory=list) + update to_dict()
  • F#7: observe.py:209if annotate: marks = detect_marks(...) couples detection to rendering; fix: always call detect_marks, gate only the som.annotate() draw call on annotate
  • F#18: _ENGLISH_WORDS frozenset missing: bluetooth, wi-fi/wifi, general, privacy, portrait, sound, battery, display (iOS settings vocabulary)

Test plan

  • Run pytest tests/test_b5_domain_c_text_targeting.py -m "not live" → 10 RED before CodeAtlas fixes
  • After CodeAtlas implements each fix, re-run → tests go GREEN one domain at a time
  • Full suite must stay green: pytest simdrive/tests/ -m "not live"

🤖 Generated with Claude Code

@SyncTekLLC SyncTekLLC marked this pull request as draft May 22, 2026 19:14
@SyncTekLLC SyncTekLLC marked this pull request as ready for review May 22, 2026 19:23
SyncTekLLC and others added 3 commits May 22, 2026 15:53
Domain C red-test suite for SimDrive b5. 18 tests total (10 failing,
8 passing). Failing tests pin unimplemented behaviors:
- F#5: target_not_found must include fuzzy 'suggestion' field
- F#4: Mark must expose 'alternates' list + to_dict() key
- F#7: observe(annotate=False) must still call detect_marks (Option A)
- F#18: 'Wi-Fi', 'Bluetooth', 'General', 'Privacy' incorrectly 'low'
        due to missing entries in _ENGLISH_WORDS

Passing tests pin already-shipped behaviors (F#6 ambiguous_text_target
implemented in PR #145) and regression guards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
F#4: Add `alternates: list` field to Mark dataclass and include in to_dict()
     so agents can see all OCR readings seen across consecutive observations.
F#5: Inject fuzzy `suggestion` key into target_not_found details when text
     search misses, using difflib.get_close_matches (stdlib, no new deps).
F#7: Decouple detect_marks from annotation drawing in observe(); always call
     detect_marks so marks are populated even when annotate=False — only skip
     the som.annotate() drawing pass, leaving annotated_path=None.
F#18: Expand _ENGLISH_WORDS with iOS settings vocabulary (wi-fi, bluetooth,
      general, privacy, etc.) so Apple Preferences labels land 'medium' not 'low'.
      Non-English gibberish regression guard unaffected.
Also updates test_a12_marks_parity._CANONICAL_MARK_KEYS to include 'alternates'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ouple contract

The old test asserted detect_marks was NOT called when annotate=False —
that assertion codified the F#7 bug itself. Updated to verify:
- detect_marks IS called regardless of annotate flag
- marks are returned non-empty when present
- annotated_path is None (rendering still skipped)
Renamed test to test_observe_annotate_false_still_returns_marks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@SyncTekLLC SyncTekLLC force-pushed the test/b5-domain-c-text-targeting branch from 8b551b1 to 775b588 Compare May 22, 2026 19:56
@SyncTekLLC

Copy link
Copy Markdown
Contributor Author

Fix-up: test updated to match F#7 decouple contract

The halted test `test_observe_annotate_false_skips_marks` was asserting that `detect_marks` is NOT called when `annotate=False` — which is exactly the F#7 bug the production fix addressed. The test codified the old (buggy) behavior.

Changes (commit `775b588`):

  • Renamed to `test_observe_annotate_false_still_returns_marks`
  • Asserts `detect_marks` IS called regardless of `annotate` flag
  • Asserts `marks` is returned non-empty when present
  • Asserts `annotated_path` is `None` (rendering still skipped — only the drawing is gated on `annotate=True`)

Results:

  • `test_observe_module.py`: 10/10 ✓
  • `test_b5_domain_c_text_targeting.py`: 18/18 ✓
  • Full non-live suite: 1524 passed, 15 failed/54 errors are pre-existing (cloud/server-coverage tests unrelated to this change, present on main)

No production code changed — F#7 fix in `observe.py` is untouched.

@SyncTekLLC SyncTekLLC merged commit dc8143e into main May 22, 2026
5 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant