test(b5): Domain C text targeting / OCR red tests [F#6 F#5 F#4 F#7 F#18]#149
Merged
Conversation
Domain C red-test suite for SimDrive b5. 18 tests total (10 failing,
8 passing). Failing tests pin unimplemented behaviors:
- F#5: target_not_found must include fuzzy 'suggestion' field
- F#4: Mark must expose 'alternates' list + to_dict() key
- F#7: observe(annotate=False) must still call detect_marks (Option A)
- F#18: 'Wi-Fi', 'Bluetooth', 'General', 'Privacy' incorrectly 'low'
due to missing entries in _ENGLISH_WORDS
Passing tests pin already-shipped behaviors (F#6 ambiguous_text_target
implemented in PR #145) and regression guards.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
F#4: Add `alternates: list` field to Mark dataclass and include in to_dict()
so agents can see all OCR readings seen across consecutive observations.
F#5: Inject fuzzy `suggestion` key into target_not_found details when text
search misses, using difflib.get_close_matches (stdlib, no new deps).
F#7: Decouple detect_marks from annotation drawing in observe(); always call
detect_marks so marks are populated even when annotate=False — only skip
the som.annotate() drawing pass, leaving annotated_path=None.
F#18: Expand _ENGLISH_WORDS with iOS settings vocabulary (wi-fi, bluetooth,
general, privacy, etc.) so Apple Preferences labels land 'medium' not 'low'.
Non-English gibberish regression guard unaffected.
Also updates test_a12_marks_parity._CANONICAL_MARK_KEYS to include 'alternates'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ouple contract The old test asserted detect_marks was NOT called when annotate=False — that assertion codified the F#7 bug itself. Updated to verify: - detect_marks IS called regardless of annotate flag - marks are returned non-empty when present - annotated_path is None (rendering still skipped) Renamed test to test_observe_annotate_false_still_returns_marks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8b551b1 to
775b588
Compare
Contributor
Author
|
Fix-up: test updated to match F#7 decouple contract The halted test `test_observe_annotate_false_skips_marks` was asserting that `detect_marks` is NOT called when `annotate=False` — which is exactly the F#7 bug the production fix addressed. The test codified the old (buggy) behavior. Changes (commit `775b588`):
Results:
No production code changed — F#7 fix in `observe.py` is untouched. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Failing tests (10 RED)
test_stale_text_error_includes_suggestiontarget_not_found.detailsmust includesuggestionwith closest fuzzy matchtest_mark_exposes_alternates_fieldMarkmust havealternates: listattributetest_mark_to_dict_includes_alternatesMark.to_dict()must includealternateskeytest_alternates_contains_both_ocr_readingsalternatesmust be a listtest_annotate_false_still_returns_nonempty_marksobserve(annotate=False)must calldetect_marksand return markstest_annotate_false_does_not_call_som_annotatedetect_marksmust be called;som.annotatemust NOT be calledtest_wifi_label_not_low'Wi-Fi'must not be'low'bandtest_bluetooth_label_not_low'Bluetooth'must not be'low'bandtest_general_label_not_low'General'must not be'low'bandtest_apple_prefs_tech_labels_not_low'low'Passing tests (8 GREEN — already shipped or regression guards)
ambiguous_text_targetis already implemented (PR fix(simdrive): tap({text}) raises ambiguous_text_target on duplicate labels (F#6) #145); tests pin the behaviortarget_not_foundalready returnsavailablelist; tests confirm thisannotated_path=Nonewhenannotate=False— already correct'low'— regression guardRoot causes surfaced
target_not_founddetails has no fuzzy-matchsuggestionfield; CodeAtlas fix: compute Levenshtein/prefix closest match fromavailablemarks and add to detailsMarkdataclass has noalternatesfield; CodeAtlas fix: addalternates: list[str] = field(default_factory=list)+ updateto_dict()observe.py:209—if annotate: marks = detect_marks(...)couples detection to rendering; fix: always calldetect_marks, gate only thesom.annotate()draw call onannotate_ENGLISH_WORDSfrozenset missing:bluetooth,wi-fi/wifi,general,privacy,portrait,sound,battery,display(iOS settings vocabulary)Test plan
pytest tests/test_b5_domain_c_text_targeting.py -m "not live"→ 10 RED before CodeAtlas fixespytest simdrive/tests/ -m "not live"🤖 Generated with Claude Code