Skip to content

fix(types): prevent false-positive lexical substitution of ambiguous slang terms#133

Merged
simongonzalezdc merged 1 commit into
mainfrom
fix/ambiguous-lexical-substitution
May 11, 2026
Merged

fix(types): prevent false-positive lexical substitution of ambiguous slang terms#133
simongonzalezdc merged 1 commit into
mainfrom
fix/ambiguous-lexical-substitution

Conversation

@simongonzalezdc
Copy link
Copy Markdown
Member

@simongonzalezdc simongonzalezdc commented May 11, 2026

Summary

  • Fix lexical substitution engine swapping common words ("botón", "agente", "cuero") to slang equivalents when they appeared as regional variants in the cop_informal and money_slang dictionary concepts
  • Add ambiguous flag to the Variant interface so dictionary entries can mark variants that have common non-slang meanings and should be excluded from avoid-term generation
  • Mark 4 variant entries as ambiguous: botón (es-UY/cop), agente (es-EC/cop), cuero (es-DO/cop), cuero (es-DO/money)
  • Add 3 regression tests for the false-positive cases

Root cause

The cop_informal concept lists "botón" as the es-UY slang for "police officer". For es-CO, this became an avoid-term, causing the engine to swap "el botón" (button, UI) → "la tomba" (cop slang). The LLM translation was correct — the corruption happened in post-processing.

Test plan

  • All 569 existing tests pass
  • 3 new regression tests cover botón, agente, cuero false positives
  • Verified real slang (paco, yuta) still gets substituted correctly
  • E2E test: "Click the button to grab your report." → "Haga clic en el botón para obtener su informe." (correct)

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

…slang terms

The cop_informal dictionary concept listed "botón" (button), "agente"
(agent), and "cuero" (leather) as avoid-terms because they're regional
slang for "police officer" in some dialects. The lexical substitution
engine then swapped these common words to dialect-specific cop slang
(e.g. "el botón" → "la tomba" for es-CO) even when used in their
primary meaning.

Add `ambiguous` flag to Variant interface so dictionary entries can
mark variants that should be excluded from avoid-term generation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@simongonzalezdc simongonzalezdc merged commit 25a5a15 into main May 11, 2026
3 checks passed
@simongonzalezdc simongonzalezdc deleted the fix/ambiguous-lexical-substitution branch May 11, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant