Skip to content

supervised fit#1

Merged
RoJLD merged 10 commits into
mainfrom
feat/a7-supervised-fit
May 28, 2026
Merged

supervised fit#1
RoJLD merged 10 commits into
mainfrom
feat/a7-supervised-fit

Conversation

@RoJLD
Copy link
Copy Markdown
Owner

@RoJLD RoJLD commented May 27, 2026

No description provided.

RoJLD added 10 commits May 27, 2026 18:22
… fit

The engine for Phase A.7 was already shipped (41 passing tests in
test_supervised.py + test_semi_supervised.py — fit(topo, X, states=...)
with closed-form MLE supervised path and constrained-EM semi-supervised
path on hmmlearn backend). The roadmap said "PLANNED" but audit revealed
"engine-shipped, user surface missing". This commit closes the gap:

- CLI: `hmm-fit run --labels states.csv` flag added. Single-column CSV
  validation, length-match validation, dispatches to fit(states=...).
  Float NaN or int -1 sentinel route to semi-supervised EM ; everything
  labelled hits the closed-form supervised MLE in one pass.
- 4 new CLI tests in tests/test_cli.py covering dispatch (n_iter_actual=1
  signature), single-column rejection, semi-supervised -1 routing, and
  --help documentation.
- examples/topology_supervised_3state.yaml + data_supervised.csv (600 obs)
  + states_supervised.csv (fully labelled) + states_semi_supervised.csv
  (first third labelled, rest -1) — canonical bundle for the README CLI
  block. Regenerated by examples/generate_demo_data.py.
- README "Supervised & semi-supervised training" section rewritten: fixed
  the legacy `state_labels=` drift (kwarg doesn't exist — would have crashed
  any newcomer copying the snippet), added a 3-mode table, and added a CLI
  block that maps 1-to-1 to the shipped flag.
- CHANGELOG Unreleased: documented Added + Fixed.
- Roadmap: A.7 status PLANNED -> SHIPPED, both in the Vue d'ensemble table
  and the detailed § Phase A.7 section.

Test suite: 326 passed, 15 skipped (optional pymc / external dataset), full
hmm_core green.

Smoke LIVE on the generated bundle:
  hmm-fit run examples/topology_supervised_3state.yaml \
      examples/data_supervised.csv \
      --labels examples/states_supervised.csv -o /tmp/sup
  -> fit done: log_lik=-823.35 BIC=1736.25 converged=True iters=1

  Semi-supervised (-1 sentinel on second 2/3 of labels):
  -> fit done: log_lik=-823.35 BIC=1736.25 converged=True iters=3
…, frontend lint

Closes the gap audit revealed: studio tests were silently skipped in CI
(pip install -e ".[dev]" didn't pull sqlmodel/fastapi), E2E was workflow_dispatch
only, and there was no anti-drift guard for the README snippets that almost
shipped with the broken `state_labels=` typo before A.7 close.

### ci.yml — split into 4 dedicated jobs

* `test-engine` (matrix 3.11/3.12/3.13) — engine + ruff + black + pytest with
  `--cov-fail-under=85` enforcing the roadmap's coverage target (currently 87%)
* `test-studio` — installs `.[web,dev]` so the 76-test FastAPI + SQLModel suite
  in `tests/studio/` actually runs (was silently skipped before)
* `readme-snippets` — runs `tests/test_readme_snippets.py` to assert that the
  canonical README snippets still execute against the live API
* `frontend` — `npm run lint` (`tsc --noEmit`) + `npm run build` (tsc + vite) so
  the React studio doesn't break silently between e2e runs

Added `concurrency.cancel-in-progress: true` to save CI minutes on rapid pushes.

### e2e.yml — push/PR triggers activated

Previous `# push:` / `# pull_request:` lines were commented out. The 6
Playwright specs (golden-path, accessibility, academy, warehouse,
tour-recording, topology-editor) now gate `push` to main and `pull_request`
to main only — feature branches stay e2e-free.

### tests/test_readme_snippets.py (new — 4 tests)

* `test_readme_quickstart_snippet_runs` — the Jupyter quickstart Topology
  builder + fit + predict block
* `test_readme_supervised_snippet_signature` — asserts `n_iter_actual == 1`
  for the supervised README snippet (closed-form single-pass invariant)
* `test_readme_semi_supervised_snippet_runs` — the NaN-half-masked snippet
* `test_readme_supervised_kwarg_name_is_stable` — introspects
  `inspect.signature(fit).parameters['states']` to fail loudly if the kwarg
  is ever renamed (the exact class of drift that produced the legacy
  `state_labels=` README bug)

### Lint hygiene (ruff + black)

Auto-fixed 12 ruff errors (unused imports, etc.), added 4 surgical `# noqa`
annotations where the lint rule is wrong for the context (PyMC context manager
binding, intentional local imports beside their consumers). Black-formatted
the 4 files I touched.

Pre-existing repo-wide tech debt remains: ~32 files still don't match black
on this version, and ~4 ruff errors I couldn't address without changing
behaviour. Follow-up commit `chore: black --apply src/ tests/` should clear
the rest in one pass.

### Local verification

* `pytest tests/test_supervised.py tests/test_semi_supervised.py tests/test_cli.py
   tests/test_readme_snippets.py tests/test_selection.py tests/test_bayesian_backend.py`
  -> 62 passed, 12 skipped (pymc optional)
* `pytest tests/studio/ -v` -> 76 passed
* `pytest tests/ --cov=src/hmm_core --ignore=tests/studio --ignore=tests/test_valentin_eth_regression.py`
  -> 326 passed, 12 skipped, coverage 87%
* `ruff check src/ tests/` -> All checks passed!
* `black --check` on my 4 files -> 4 left unchanged

### Note on the 4 ruff `# noqa` annotations

`bayesian_backend.py:244` F841 — `with pm.Model() as model:` is the PyMC
contract for binding the model context; sampling reads it implicitly.
`fit/__init__.py` cleanup — removed unused `render_sequence_strip` import.
`selection.py:254` E402 — `from dataclasses import replace as _dc_replace`
kept local to `auto_grid` for colocation; module-level would be misleading.
`test_selection.py` x2 E402 + 1 F841 — tests deliberately keep imports
beside their consumers; the F841 `labels = {...}` was dead code, removed.
Repo-wide black format pass on the 33 files that were drifting from the
configured 100-char line-length. Pure formatting, zero behaviour change —
verified by running the full suite post-format:

    pytest tests/ --ignore=tests/test_valentin_eth_regression.py
    -> 406 passed, 12 skipped (pymc optional), 0 failed

Why this was needed
-------------------
The previous commit (ci.yml hardening, 5b20bf6) enforces `black --check` on
push/PR. Without this pass, CI would have been red on main from the moment
the new workflow lands. Doctrinal preference: keep the format gate strict
and pay the format-drift cost once, in a clearly-marked formatting-only
commit, instead of relaxing the gate.

pyproject.toml — new [tool.ruff.lint.per-file-ignores]
------------------------------------------------------
Two legitimate ruff false positives are silenced at config level (more
robust than inline `# noqa` annotations, which black can wrap and detach
from the flagged line):

* `src/hmm_core/backends/bayesian_backend.py` F841 — `with pm.Model() as
  model:` is the canonical PyMC binding pattern; `model` is read from the
  model stack by downstream `pm.*` even though it never appears in the
  inner scope.
* `src/hmm_core/selection.py` E402 + `tests/test_selection.py` E402 — local
  imports kept colocated with their single consumer (`auto_grid` and three
  test fixtures). Hoisting them to module top would obscure the pairing.

Verification
------------
    ruff check src/ tests/   -> All checks passed!
    black --check src/ tests/ -> 81 files would be left unchanged
    pytest tests/ --cov=src/hmm_core --ignore=tests/studio
        --ignore=tests/test_valentin_eth_regression.py
        --cov-fail-under=85
        -> 326 passed, coverage 87% (gate 85% PASS)

No source files outside src/ and tests/ touched. Pre-commit hooks remain
on the same black + ruff pin (v0.5.0 / 24.4.2).
The previous commit `bbabfe6` was a repo-wide `black --apply` pass that
touched 34 files purely for formatting. Without this file, every future
`git blame` on those files would attribute reformatted lines to that
commit instead of the line's real author.

Both `git blame --ignore-revs-file .git-blame-ignore-revs` (CLI) and the
GitHub web UI honour this file automatically. Local users opt in to the
CLI behaviour permanently with:

    git config blame.ignoreRevsFile .git-blame-ignore-revs

Format requirement (learned the hard way): one full 40-char SHA per line,
comments only on `#`-prefixed lines. Appending the commit message on the
SHA line causes `fatal: invalid object name`.

Verified locally: `git blame src/hmm_core/cli.py -L 41,46` now attributes
the `validate` command to its original author (8ff1c41, 2026-05-21)
instead of bbabfe6 (today's format pass).
Two CI failures surfaced on the first PR run of feat/a7-supervised-fit:

### Docs (mkdocs --strict) — 2 broken links from commit 2ce5ac6

The A.7 user-facing close added markdown links to `src/hmm_core/fit/__init__.py`
and `src/hmm_core/cli.py` from `docs/roadmap.md`. `mkdocs --strict` only
follows links into the documentation tree — anything outside `docs/` is a
warning, and `--strict` turns warnings into errors. Replaced the link
syntax with plain backtick paths.

### E2E `scripts/build_frontend.py` — `shell=True` is wrong on Linux

`subprocess.run(["npm", "install"], shell=True)` works on Windows because
`npm` resolves to `npm.cmd` and needs the shell to be found via PATH. On
Linux / macOS the same call silently drops every argument past the first:
the POSIX shell (`/bin/sh -c`) takes only `npm` as the command string and
treats `install` as `$0`. Result: bare `npm` runs, prints help, exits 1.

Gated the flag on platform: `_NEEDS_SHELL = sys.platform == "win32"`. The
bug was pre-existing (`build_frontend.py` was authored on Windows); it
surfaced now because commit 5b20bf6 enabled the e2e workflow on
`pull_request`, which runs on `ubuntu-latest`.

Both fixes are mechanical, no behaviour change in the docs or build
script's intent.
`page.getByText(/Markov chains/i)` was matching both the card <h3> title
*and* the card description <p> ("Before HMMs are hidden, they're Markov
chains..."), triggering Playwright strict-mode rejection:

    Error: strict mode violation: getByText(/Markov chains/i) resolved to 2 elements:
        1) <h3 class="font-semibold text-base">…</h3>
        2) <p class="text-sm leading-relaxed mb-3">Before HMMs are hidden...</p>

The pattern was added pre-emptively in commit a9cf296 (lesson descriptions)
without updating the test. Surfaces now because commit 5b20bf6 enabled the
e2e workflow on `pull_request` — it was `workflow_dispatch`-only before.

Fix: switch all seven lesson-title assertions to
`getByRole("heading", { name: ... })` — semantic role-based locator that
targets <h3> exclusively and ignores any later additions to the card
description that happen to repeat the title keyword.

Doctrine: test what the user perceives (a heading), not which DOM node
happens to contain the text first.
Two stability fixes after the first round of CI runs surfaced issues:

### pytest-timeout — turn CI hangs into clean failures

The previous run's engine matrix (3.11/3.12/3.13) hung at 25m48s before
concurrency-cancel killed it. The likely culprit is `test_batch_runs_multiple_jobs`
using `ProcessPoolExecutor` on Linux, where fork() interacts badly with
pytest-cov's subprocess coverage collection.

Added `pytest-timeout>=2.3` to the `[dev]` extra and
`--timeout=180` to `addopts` in `[tool.pytest.ini_options]`. The slowest
legitimate test (Bayesian NUTS, ~30-60s) stays well under the ceiling;
any actual hang now fails fast with a clear timeout signal instead of
wasting 25 min of runner time. Individual tests can override with
`@pytest.mark.timeout(N)` if they're legitimately slow.

### E2E reverted to `workflow_dispatch` only

Enabling `push` + `pull_request` triggers (commit 5b20bf6) surfaced 4
pre-existing UI/test drifts unrelated to A.7:

1. `academy.spec.ts:53` — `locator('h1')` strict-mode (page now has 2 h1s)
2. `golden-path.spec.ts:39` — `getByText(/24 rows/i)` no longer matches
3. `warehouse.spec.ts:57` — committed fixture has 4 CSVs but test expects 3
4. `tour-recording.spec.ts:75` — `Launch fit` button stays disabled (form state)

Rather than ship A.7 + CI hardening blocked on a separate UI cleanup,
the E2E triggers go back to `workflow_dispatch` with a documented TODO
in the workflow file. The 4 drifts are listed inline so the next person
working on E2E has the exact line numbers + suggested fixes.

The fix to `academy.spec.ts:14-20` from commit 9de52b6 (heading-role
locators) is kept — it's correct and the same pattern will be needed
for line 53 in the cleanup commit.
…e hang

Root cause of the 25-min engine-matrix hangs on GitHub Actions (which the
runner eventually killed / `cancelled`):

`hmm_core.cli.batch` uses `ProcessPoolExecutor`, which defaults to `fork()`
on Linux. The engine CI step runs under `pytest-cov`, whose tracer spawns
background threads. Forking a threaded process copies locked mutexes into
the child → the worker deadlocks → the pool's `shutdown()` waits forever →
the job hangs until the runner's hard limit.

This is invisible on Windows (where the default start method is already
`spawn`, a clean re-import with no inherited threads), which is why the
batch tests pass locally in ~17s but hung indefinitely on ubuntu-latest.

`pytest-timeout` (added in dbae026) did NOT catch it: its SIGALRM fires in
the main process but cannot reap the forked-and-stuck child, and the pool's
context-manager exit blocks before the timeout handler can fail the test.

Fix: `pytest_configure` in tests/conftest.py forces
`multiprocessing.set_start_method("spawn", force=True)` for the whole test
session. `ProcessPoolExecutor` with no explicit `mp_context` then inherits
spawn on Linux too. `_run_one_job` is module-level (picklable), so spawn's
re-import-based worker startup works unchanged.

Kept `--timeout=180` from dbae026 as a defensive net for any *other* future
hang — spawn fixes this specific deadlock at the source.

Local verification: `pytest tests/test_cli.py -k batch --cov=src/hmm_core`
-> 5 passed in 15.4s (spawn path exercised; no regression).
With the spawn fix (dfa10a9) the engine matrix now completes in ~31s
instead of hanging, which surfaced the one real remaining failure:

    tests/test_cli.py::test_run_help_documents_labels
    AssertionError: assert '--labels' in result.stdout

typer renders `--help` through Rich, which truncates the option-name
column with `…` when the terminal is narrow. The test passed locally at
COLUMNS=80 but failed on ubuntu-latest, where a newer Rich and an
unset/narrow CI terminal width rendered the flag as `--label…`.

Fix: pass `env={"COLUMNS": "200"}` to the CliRunner invoke so the option
name always renders in full. Verified width-independent now — the test
passes at COLUMNS=40/60/80/200 (previously only ≥80 was safe).

Engine matrix this run: 1 failed, 329 passed, 15 skipped in 31.10s — the
hang is gone; this was the last red.
The COLUMNS=200 fix (285374a) was necessary but not sufficient. Root cause
confirmed by reproducing with FORCE_COLOR=1 locally:

    raw stdout has '--labels' literal : False
    after ANSI strip                  : True

Rich styles the markdown code-span ``--labels`` in the docstring as cyan,
and on a colour-enabled terminal it emits SGR escapes *mid-token*:

    \x1b[1;2;36m-\x1b[0m\x1b[1;2;36m-labels\x1b[0m

So `"--labels" in result.stdout` fails even though the flag is visually
there. ubuntu-latest with typer 0.24 / rich 13.x emits colour by default
(unlike the local Windows run where CliRunner stripped it), which is why
this only reproduced on CI.

Fix: add `_strip_ansi()` (regex over SGR sequences) and assert against the
de-styled text. Verified to pass under FORCE_COLOR=1 and COLUMNS=40 — the
exact hostile conditions CI imposed.

This should be the last red: the prior run was `1 failed, 329 passed,
15 skipped in 24s` with this single assertion as the only failure.
@RoJLD RoJLD merged commit 302f816 into main May 28, 2026
8 checks passed
@RoJLD RoJLD deleted the feat/a7-supervised-fit branch May 28, 2026 15:05
RoJLD added a commit that referenced this pull request Jun 3, 2026
Local main carried 60 commits (topology-editor overhaul Inc 1-3 + quick-wins,
saved-models portability, Academy lesson-16 + persisted convergence_history);
remote main carried 12 it lacked (A.7 user-facing close, CI hardening, black
formatting, README/CHANGELOG, PRs #1/#2). Forked at 253b6d4.

Conflicts resolved (3 files):
- src/hmm_core/features.py: kept HEAD — the dcor refactor extracted clustering
  into _cluster_and_pick_medoids (used by both dcor and nmi paths); the remote's
  inlined nmi clustering is the pre-refactor equivalent. No behaviour lost.
- .github/workflows/ci.yml: union — keep the remote's test-studio + readme-snippets
  jobs AND merge the frontend job to run npm ci + lint(tsc) + vitest + build.
- docs/roadmap.md: kept our re-baselined overview (16 lessons incl. L16, A.6/A.10/
  A.13 SHIPPED) which supersedes the remote's pre-baseline table.

Gates on the merged tree: backend 44 passed (features incl. dcor + remote, fit
incl. convergence, backends, /convergence endpoint); frontend tsc clean, 60
vitest, vite build OK.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant