Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
9f630d1
feat(fn): ortho-insertion rescue + honest FN telemetry (v1.8.0 seed)
thettwe Jun 2, 2026
c93d32c
Merge pull request #49 from thettwe/release/v1.7.1
thettwe Jun 2, 2026
188e8ef
chore(infra): hygiene-03 add models/ to .gitignore
thettwe Jun 2, 2026
e5fed75
Merge pull request #47 from thettwe/ws/ortho-suppressor-rescue
thettwe Jun 2, 2026
d5a83b4
fix(bench): taxonomy-01 clean-set audit + flat-aa residue + cap misla…
thettwe Jun 2, 2026
ad4d843
Merge WS-0b: benchmark clean-set audit (v18b)
thettwe Jun 2, 2026
6fafa69
Merge WS-1: hygiene/baseline-lock
thettwe Jun 2, 2026
cbf27da
chore(deps): bump actions/checkout from 6.0.2 to 6.0.3
dependabot[bot] Jun 6, 2026
28dacc7
chore(deps): bump astral-sh/setup-uv from 8.1.0 to 8.2.0
dependabot[bot] Jun 6, 2026
2b32d2d
feat(strategy): unmask-02 aw-vowel un-mask detector + v18c bench fix
thettwe Jun 6, 2026
732ec93
Merge pull request #54 from thettwe/ws/v18-normalizer-unmask-aw-vowel
thettwe Jun 6, 2026
e42c41a
perf(pipeline): lat-02 memoize nasal variants + probe scores
thettwe Jun 6, 2026
f376cd9
Merge pull request #55 from thettwe/ws/v18-latency-reduction
thettwe Jun 6, 2026
c82876e
fix(strategy): serialize probe score-cache against batch-async races
thettwe Jun 10, 2026
b4f8560
refactor(strategy): drop dead ortho-typo wrapper + correct stale comm…
thettwe Jun 10, 2026
f7b480f
test(strategy): repair ortho-rescue harness flag + add rescue/aw-vowe…
thettwe Jun 10, 2026
fe9348f
fix(strategy): pin confidence on narrowed ortho-rescue error
thettwe Jun 10, 2026
58efaf1
fix(strategy): scope aw-vowel reorder-defer to overlapping spans
thettwe Jun 10, 2026
fe7a9eb
Merge pull request #56 from thettwe/ws/v18-preship-fixes
thettwe Jun 10, 2026
fc26d48
Merge pull request #52 from thettwe/dependabot/github_actions/actions…
thettwe Jun 10, 2026
ff4de7b
Merge pull request #53 from thettwe/dependabot/github_actions/astral-…
thettwe Jun 10, 2026
e50209a
chore: v1.8.0 release prep — version bump, changelog, readme
thettwe Jun 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/audit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6

- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
with:
python-version: "3.12"

- uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
- uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # v8.2.0

- name: Install dependencies
run: uv pip install --system pip-audit
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@ jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6

- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
with:
python-version: "3.12"

- uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
- uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # v8.2.0

- name: Install dependencies
run: uv pip install --system ruff
Expand All @@ -55,14 +55,14 @@ jobs:
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6

- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
with:
python-version: ${{ matrix.python-version }}
allow-prereleases: true

- uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
- uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # v8.2.0

- name: Install dependencies
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ jobs:
id-token: write
contents: read
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6

- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
with:
python-version: "3.12"

- uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
- uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # v8.2.0

- name: Install build tools
run: uv pip install --system build
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ benchmarks/_tmp/
*.db
data/

# Model artifacts (local only — symlinked to external storage; never committed)
models/

# N-gram checkpoint (pipeline resume state)
ngram_checkpoint/

Expand Down
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,25 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.8.0] - 2026-06-10

### Added

- **Aw-vowel un-mask detector (opt-in).** A pre-normalization detector that surfaces a class of Myanmar spelling errors the normalizer previously masked — flat/tall *aa* swaps in the aw-vowel rime (ော ↔ ေါ), e.g. `ခော်` → `ခေါ်`. Each violation emits one gated correction with a deterministic canonical suggestion. Opt-in via the `detect_aw_vowel_unmask` config flag or the `MSC_DETECT_AW_VOWEL_UNMASK` environment variable; default off.

### Changed

- **Hot-path latency reduced ~40% (default, behavior-identical).** Memoized SymSpell nasal-variant validation (instance-level cache keyed by term and level) and the syllable-span probe's per-sentence scoring (LRU). p95 latency drops from 658 ms to 401 ms and mean per-sentence time by ~43%, with byte-identical detections.

### Fixed

- **Concurrent batch checking.** The shared probe score cache is now lock-guarded, so concurrent `check_batch_async` workers can no longer raise a `KeyError` under load.
- Orthographic-insertion-rescue corrections now carry an explicit confidence so a recovered correction is not silently withheld, and the aw-vowel detector defers only the overlapping span (not the whole token) to the vowel-reorder detector. Plus internal comment, dead-code, and test-coverage cleanups.

### Benchmark

- With the aw-vowel detector enabled, spelling composite improves `0.6520` → `0.6870` (**+0.0350**) on the v1.8.0 benchmark: +98 true positives at zero added false positives, top-1 accuracy 92% on the un-masked corrections, clean false-positive sentences within cap (91/779), p95 382 ms. The benchmark's clean/error annotations were also corrected this release (67 previously-unannotated planted typos). Default behavior (detector off) is unchanged.

## [1.7.1] - 2026-06-02

### Added
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[![Python Version](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Coverage](https://img.shields.io/badge/coverage-75%25-green)](tests/)
[![Tests](https://img.shields.io/badge/tests-4%2C940_passed-brightgreen)](tests/)
[![Tests](https://img.shields.io/badge/tests-5%2C182_passed-brightgreen)](tests/)

## Overview

Expand Down Expand Up @@ -34,6 +34,7 @@
* **Compound & Morpheme Handling**: DP-based compound resolution, ternary compound splits in morpheme correction, productive reduplication validation.
* **AI Semantic Checking (Optional)**: ONNX masked language model for context-aware validation.
* **Syllable-Span Probe (opt-in, v1.7.1)**: A frozen-encoder neural probe that improves recall on broken-compound, over-segmentation, and consonant-substitution errors. Three strategies share one small model; default-off, enabled via `use_probe_*` config flags or `MSC_USE_PROBE_*` environment variables.
* **Aw-Vowel Un-Mask Detector (opt-in, v1.8.0)**: Surfaces a class of Myanmar aw-vowel spelling errors — flat/tall *aa* swaps in the aw-vowel rime (ော ↔ ေါ, e.g. `ခော်` → `ခေါ်`) — that pre-normalization would otherwise silently repair before validation. Default-off, enabled via the `detect_aw_vowel_unmask` config flag or `MSC_DETECT_AW_VOWEL_UNMASK`.
* **Named Entity Recognition**: Heuristic and Transformer-based NER to reduce false positives on names and places.

### Dictionary Building Pipeline
Expand Down Expand Up @@ -65,7 +66,7 @@

Full documentation is available at **[docs.myspellchecker.com](https://docs.myspellchecker.com/)**.

> **What's new in v1.7.1?** See the **[Release Notes](https://docs.myspellchecker.com/reference/release-notes)** for the opt-in **syllable-span probe** — a frozen-encoder neural enhancement (three default-off strategies sharing one small model) that improves recall on broken-compound, over-segmentation, and consonant-substitution errors (+0.0125 composite when enabled). Earlier v1.7.x work added mined-confusable detection, the cross-whitespace and compound-merge probes, the skip-rule confidence gate, and benchmark-hygiene reclassification.
> **What's new in v1.8.0?** See the **[Release Notes](https://docs.myspellchecker.com/reference/release-notes)** for the opt-in **aw-vowel un-mask detector** — it surfaces Myanmar aw-vowel spelling errors (ော ↔ ေါ) the normalizer previously masked (+0.0350 spelling composite when enabled, at zero added false positives). This release also cuts hot-path latency ~40% (p95 658 → 401 ms, with identical results) and hardens concurrent batch checking. The v1.7.1 syllable-span probe remains available behind its opt-in flags.

### Getting Started
* **[Introduction](https://docs.myspellchecker.com/introduction)**: Overview of the library and its architecture.
Expand Down
Loading
Loading