supervised fit#1
Merged
Merged
Conversation
… fit
The engine for Phase A.7 was already shipped (41 passing tests in
test_supervised.py + test_semi_supervised.py — fit(topo, X, states=...)
with closed-form MLE supervised path and constrained-EM semi-supervised
path on hmmlearn backend). The roadmap said "PLANNED" but audit revealed
"engine-shipped, user surface missing". This commit closes the gap:
- CLI: `hmm-fit run --labels states.csv` flag added. Single-column CSV
validation, length-match validation, dispatches to fit(states=...).
Float NaN or int -1 sentinel route to semi-supervised EM ; everything
labelled hits the closed-form supervised MLE in one pass.
- 4 new CLI tests in tests/test_cli.py covering dispatch (n_iter_actual=1
signature), single-column rejection, semi-supervised -1 routing, and
--help documentation.
- examples/topology_supervised_3state.yaml + data_supervised.csv (600 obs)
+ states_supervised.csv (fully labelled) + states_semi_supervised.csv
(first third labelled, rest -1) — canonical bundle for the README CLI
block. Regenerated by examples/generate_demo_data.py.
- README "Supervised & semi-supervised training" section rewritten: fixed
the legacy `state_labels=` drift (kwarg doesn't exist — would have crashed
any newcomer copying the snippet), added a 3-mode table, and added a CLI
block that maps 1-to-1 to the shipped flag.
- CHANGELOG Unreleased: documented Added + Fixed.
- Roadmap: A.7 status PLANNED -> SHIPPED, both in the Vue d'ensemble table
and the detailed § Phase A.7 section.
Test suite: 326 passed, 15 skipped (optional pymc / external dataset), full
hmm_core green.
Smoke LIVE on the generated bundle:
hmm-fit run examples/topology_supervised_3state.yaml \
examples/data_supervised.csv \
--labels examples/states_supervised.csv -o /tmp/sup
-> fit done: log_lik=-823.35 BIC=1736.25 converged=True iters=1
Semi-supervised (-1 sentinel on second 2/3 of labels):
-> fit done: log_lik=-823.35 BIC=1736.25 converged=True iters=3
…, frontend lint
Closes the gap audit revealed: studio tests were silently skipped in CI
(pip install -e ".[dev]" didn't pull sqlmodel/fastapi), E2E was workflow_dispatch
only, and there was no anti-drift guard for the README snippets that almost
shipped with the broken `state_labels=` typo before A.7 close.
### ci.yml — split into 4 dedicated jobs
* `test-engine` (matrix 3.11/3.12/3.13) — engine + ruff + black + pytest with
`--cov-fail-under=85` enforcing the roadmap's coverage target (currently 87%)
* `test-studio` — installs `.[web,dev]` so the 76-test FastAPI + SQLModel suite
in `tests/studio/` actually runs (was silently skipped before)
* `readme-snippets` — runs `tests/test_readme_snippets.py` to assert that the
canonical README snippets still execute against the live API
* `frontend` — `npm run lint` (`tsc --noEmit`) + `npm run build` (tsc + vite) so
the React studio doesn't break silently between e2e runs
Added `concurrency.cancel-in-progress: true` to save CI minutes on rapid pushes.
### e2e.yml — push/PR triggers activated
Previous `# push:` / `# pull_request:` lines were commented out. The 6
Playwright specs (golden-path, accessibility, academy, warehouse,
tour-recording, topology-editor) now gate `push` to main and `pull_request`
to main only — feature branches stay e2e-free.
### tests/test_readme_snippets.py (new — 4 tests)
* `test_readme_quickstart_snippet_runs` — the Jupyter quickstart Topology
builder + fit + predict block
* `test_readme_supervised_snippet_signature` — asserts `n_iter_actual == 1`
for the supervised README snippet (closed-form single-pass invariant)
* `test_readme_semi_supervised_snippet_runs` — the NaN-half-masked snippet
* `test_readme_supervised_kwarg_name_is_stable` — introspects
`inspect.signature(fit).parameters['states']` to fail loudly if the kwarg
is ever renamed (the exact class of drift that produced the legacy
`state_labels=` README bug)
### Lint hygiene (ruff + black)
Auto-fixed 12 ruff errors (unused imports, etc.), added 4 surgical `# noqa`
annotations where the lint rule is wrong for the context (PyMC context manager
binding, intentional local imports beside their consumers). Black-formatted
the 4 files I touched.
Pre-existing repo-wide tech debt remains: ~32 files still don't match black
on this version, and ~4 ruff errors I couldn't address without changing
behaviour. Follow-up commit `chore: black --apply src/ tests/` should clear
the rest in one pass.
### Local verification
* `pytest tests/test_supervised.py tests/test_semi_supervised.py tests/test_cli.py
tests/test_readme_snippets.py tests/test_selection.py tests/test_bayesian_backend.py`
-> 62 passed, 12 skipped (pymc optional)
* `pytest tests/studio/ -v` -> 76 passed
* `pytest tests/ --cov=src/hmm_core --ignore=tests/studio --ignore=tests/test_valentin_eth_regression.py`
-> 326 passed, 12 skipped, coverage 87%
* `ruff check src/ tests/` -> All checks passed!
* `black --check` on my 4 files -> 4 left unchanged
### Note on the 4 ruff `# noqa` annotations
`bayesian_backend.py:244` F841 — `with pm.Model() as model:` is the PyMC
contract for binding the model context; sampling reads it implicitly.
`fit/__init__.py` cleanup — removed unused `render_sequence_strip` import.
`selection.py:254` E402 — `from dataclasses import replace as _dc_replace`
kept local to `auto_grid` for colocation; module-level would be misleading.
`test_selection.py` x2 E402 + 1 F841 — tests deliberately keep imports
beside their consumers; the F841 `labels = {...}` was dead code, removed.
Repo-wide black format pass on the 33 files that were drifting from the
configured 100-char line-length. Pure formatting, zero behaviour change —
verified by running the full suite post-format:
pytest tests/ --ignore=tests/test_valentin_eth_regression.py
-> 406 passed, 12 skipped (pymc optional), 0 failed
Why this was needed
-------------------
The previous commit (ci.yml hardening, 5b20bf6) enforces `black --check` on
push/PR. Without this pass, CI would have been red on main from the moment
the new workflow lands. Doctrinal preference: keep the format gate strict
and pay the format-drift cost once, in a clearly-marked formatting-only
commit, instead of relaxing the gate.
pyproject.toml — new [tool.ruff.lint.per-file-ignores]
------------------------------------------------------
Two legitimate ruff false positives are silenced at config level (more
robust than inline `# noqa` annotations, which black can wrap and detach
from the flagged line):
* `src/hmm_core/backends/bayesian_backend.py` F841 — `with pm.Model() as
model:` is the canonical PyMC binding pattern; `model` is read from the
model stack by downstream `pm.*` even though it never appears in the
inner scope.
* `src/hmm_core/selection.py` E402 + `tests/test_selection.py` E402 — local
imports kept colocated with their single consumer (`auto_grid` and three
test fixtures). Hoisting them to module top would obscure the pairing.
Verification
------------
ruff check src/ tests/ -> All checks passed!
black --check src/ tests/ -> 81 files would be left unchanged
pytest tests/ --cov=src/hmm_core --ignore=tests/studio
--ignore=tests/test_valentin_eth_regression.py
--cov-fail-under=85
-> 326 passed, coverage 87% (gate 85% PASS)
No source files outside src/ and tests/ touched. Pre-commit hooks remain
on the same black + ruff pin (v0.5.0 / 24.4.2).
The previous commit `bbabfe6` was a repo-wide `black --apply` pass that
touched 34 files purely for formatting. Without this file, every future
`git blame` on those files would attribute reformatted lines to that
commit instead of the line's real author.
Both `git blame --ignore-revs-file .git-blame-ignore-revs` (CLI) and the
GitHub web UI honour this file automatically. Local users opt in to the
CLI behaviour permanently with:
git config blame.ignoreRevsFile .git-blame-ignore-revs
Format requirement (learned the hard way): one full 40-char SHA per line,
comments only on `#`-prefixed lines. Appending the commit message on the
SHA line causes `fatal: invalid object name`.
Verified locally: `git blame src/hmm_core/cli.py -L 41,46` now attributes
the `validate` command to its original author (8ff1c41, 2026-05-21)
instead of bbabfe6 (today's format pass).
Two CI failures surfaced on the first PR run of feat/a7-supervised-fit: ### Docs (mkdocs --strict) — 2 broken links from commit 2ce5ac6 The A.7 user-facing close added markdown links to `src/hmm_core/fit/__init__.py` and `src/hmm_core/cli.py` from `docs/roadmap.md`. `mkdocs --strict` only follows links into the documentation tree — anything outside `docs/` is a warning, and `--strict` turns warnings into errors. Replaced the link syntax with plain backtick paths. ### E2E `scripts/build_frontend.py` — `shell=True` is wrong on Linux `subprocess.run(["npm", "install"], shell=True)` works on Windows because `npm` resolves to `npm.cmd` and needs the shell to be found via PATH. On Linux / macOS the same call silently drops every argument past the first: the POSIX shell (`/bin/sh -c`) takes only `npm` as the command string and treats `install` as `$0`. Result: bare `npm` runs, prints help, exits 1. Gated the flag on platform: `_NEEDS_SHELL = sys.platform == "win32"`. The bug was pre-existing (`build_frontend.py` was authored on Windows); it surfaced now because commit 5b20bf6 enabled the e2e workflow on `pull_request`, which runs on `ubuntu-latest`. Both fixes are mechanical, no behaviour change in the docs or build script's intent.
`page.getByText(/Markov chains/i)` was matching both the card <h3> title
*and* the card description <p> ("Before HMMs are hidden, they're Markov
chains..."), triggering Playwright strict-mode rejection:
Error: strict mode violation: getByText(/Markov chains/i) resolved to 2 elements:
1) <h3 class="font-semibold text-base">…</h3>
2) <p class="text-sm leading-relaxed mb-3">Before HMMs are hidden...</p>
The pattern was added pre-emptively in commit a9cf296 (lesson descriptions)
without updating the test. Surfaces now because commit 5b20bf6 enabled the
e2e workflow on `pull_request` — it was `workflow_dispatch`-only before.
Fix: switch all seven lesson-title assertions to
`getByRole("heading", { name: ... })` — semantic role-based locator that
targets <h3> exclusively and ignores any later additions to the card
description that happen to repeat the title keyword.
Doctrine: test what the user perceives (a heading), not which DOM node
happens to contain the text first.
Two stability fixes after the first round of CI runs surfaced issues: ### pytest-timeout — turn CI hangs into clean failures The previous run's engine matrix (3.11/3.12/3.13) hung at 25m48s before concurrency-cancel killed it. The likely culprit is `test_batch_runs_multiple_jobs` using `ProcessPoolExecutor` on Linux, where fork() interacts badly with pytest-cov's subprocess coverage collection. Added `pytest-timeout>=2.3` to the `[dev]` extra and `--timeout=180` to `addopts` in `[tool.pytest.ini_options]`. The slowest legitimate test (Bayesian NUTS, ~30-60s) stays well under the ceiling; any actual hang now fails fast with a clear timeout signal instead of wasting 25 min of runner time. Individual tests can override with `@pytest.mark.timeout(N)` if they're legitimately slow. ### E2E reverted to `workflow_dispatch` only Enabling `push` + `pull_request` triggers (commit 5b20bf6) surfaced 4 pre-existing UI/test drifts unrelated to A.7: 1. `academy.spec.ts:53` — `locator('h1')` strict-mode (page now has 2 h1s) 2. `golden-path.spec.ts:39` — `getByText(/24 rows/i)` no longer matches 3. `warehouse.spec.ts:57` — committed fixture has 4 CSVs but test expects 3 4. `tour-recording.spec.ts:75` — `Launch fit` button stays disabled (form state) Rather than ship A.7 + CI hardening blocked on a separate UI cleanup, the E2E triggers go back to `workflow_dispatch` with a documented TODO in the workflow file. The 4 drifts are listed inline so the next person working on E2E has the exact line numbers + suggested fixes. The fix to `academy.spec.ts:14-20` from commit 9de52b6 (heading-role locators) is kept — it's correct and the same pattern will be needed for line 53 in the cleanup commit.
…e hang Root cause of the 25-min engine-matrix hangs on GitHub Actions (which the runner eventually killed / `cancelled`): `hmm_core.cli.batch` uses `ProcessPoolExecutor`, which defaults to `fork()` on Linux. The engine CI step runs under `pytest-cov`, whose tracer spawns background threads. Forking a threaded process copies locked mutexes into the child → the worker deadlocks → the pool's `shutdown()` waits forever → the job hangs until the runner's hard limit. This is invisible on Windows (where the default start method is already `spawn`, a clean re-import with no inherited threads), which is why the batch tests pass locally in ~17s but hung indefinitely on ubuntu-latest. `pytest-timeout` (added in dbae026) did NOT catch it: its SIGALRM fires in the main process but cannot reap the forked-and-stuck child, and the pool's context-manager exit blocks before the timeout handler can fail the test. Fix: `pytest_configure` in tests/conftest.py forces `multiprocessing.set_start_method("spawn", force=True)` for the whole test session. `ProcessPoolExecutor` with no explicit `mp_context` then inherits spawn on Linux too. `_run_one_job` is module-level (picklable), so spawn's re-import-based worker startup works unchanged. Kept `--timeout=180` from dbae026 as a defensive net for any *other* future hang — spawn fixes this specific deadlock at the source. Local verification: `pytest tests/test_cli.py -k batch --cov=src/hmm_core` -> 5 passed in 15.4s (spawn path exercised; no regression).
With the spawn fix (dfa10a9) the engine matrix now completes in ~31s instead of hanging, which surfaced the one real remaining failure: tests/test_cli.py::test_run_help_documents_labels AssertionError: assert '--labels' in result.stdout typer renders `--help` through Rich, which truncates the option-name column with `…` when the terminal is narrow. The test passed locally at COLUMNS=80 but failed on ubuntu-latest, where a newer Rich and an unset/narrow CI terminal width rendered the flag as `--label…`. Fix: pass `env={"COLUMNS": "200"}` to the CliRunner invoke so the option name always renders in full. Verified width-independent now — the test passes at COLUMNS=40/60/80/200 (previously only ≥80 was safe). Engine matrix this run: 1 failed, 329 passed, 15 skipped in 31.10s — the hang is gone; this was the last red.
The COLUMNS=200 fix (285374a) was necessary but not sufficient. Root cause confirmed by reproducing with FORCE_COLOR=1 locally: raw stdout has '--labels' literal : False after ANSI strip : True Rich styles the markdown code-span ``--labels`` in the docstring as cyan, and on a colour-enabled terminal it emits SGR escapes *mid-token*: \x1b[1;2;36m-\x1b[0m\x1b[1;2;36m-labels\x1b[0m So `"--labels" in result.stdout` fails even though the flag is visually there. ubuntu-latest with typer 0.24 / rich 13.x emits colour by default (unlike the local Windows run where CliRunner stripped it), which is why this only reproduced on CI. Fix: add `_strip_ansi()` (regex over SGR sequences) and assert against the de-styled text. Verified to pass under FORCE_COLOR=1 and COLUMNS=40 — the exact hostile conditions CI imposed. This should be the last red: the prior run was `1 failed, 329 passed, 15 skipped in 24s` with this single assertion as the only failure.
RoJLD
added a commit
that referenced
this pull request
Jun 3, 2026
Local main carried 60 commits (topology-editor overhaul Inc 1-3 + quick-wins, saved-models portability, Academy lesson-16 + persisted convergence_history); remote main carried 12 it lacked (A.7 user-facing close, CI hardening, black formatting, README/CHANGELOG, PRs #1/#2). Forked at 253b6d4. Conflicts resolved (3 files): - src/hmm_core/features.py: kept HEAD — the dcor refactor extracted clustering into _cluster_and_pick_medoids (used by both dcor and nmi paths); the remote's inlined nmi clustering is the pre-refactor equivalent. No behaviour lost. - .github/workflows/ci.yml: union — keep the remote's test-studio + readme-snippets jobs AND merge the frontend job to run npm ci + lint(tsc) + vitest + build. - docs/roadmap.md: kept our re-baselined overview (16 lessons incl. L16, A.6/A.10/ A.13 SHIPPED) which supersedes the remote's pre-baseline table. Gates on the merged tree: backend 44 passed (features incl. dcor + remote, fit incl. convergence, backends, /convergence endpoint); frontend tsc clean, 60 vitest, vite build OK. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.