Skip to content

test(cli): add subprocess contract tests for train/encode/decode#33

Merged
dinesh-git17 merged 1 commit into
mainfrom
test/cli-contract-tests
Apr 15, 2026
Merged

test(cli): add subprocess contract tests for train/encode/decode#33
dinesh-git17 merged 1 commit into
mainfrom
test/cli-contract-tests

Conversation

@dinesh-git17

Copy link
Copy Markdown
Owner

Summary

Closes Task 4-2 by adding 12 subprocess-level contract tests that pin the bpetite CLI to its stdout/stderr boundary, its exact JSON summary shape, and every documented failure mode.

Why

Task 4-1 shipped the CLI behind a single deterministic contract (FR-33/FR-34 channel discipline, exact train summary keys, decode raw-text semantics, typed exits), but nothing enforced it at the subprocess boundary. In-process tests collapse the stdout/stderr split, so a regression like a stray print() on stdout or a missing non-zero exit on an error path would slip through. These tests drive the real installed bpetite entry point through subprocess.run, so every assertion witnesses the same surface a user or CI smoke test would see.

Changes

  • tests/test_cli.py — 12 new tests covering:
    • train happy path (JSON summary shape pinned to exact required key set; corpus_bytes tied to the fixture's on-disk size; JSON absent from stderr on all five key substrings plus the full json.dumps string, so a double-write regression is caught)
    • train progress lifecycle ("Training started: planned=" and "Training complete: merges=" — substrings that only the _train_with_progress._on_event console.print lines produce, not the panel titles, so deleting either branch fails the test)
    • train every-100-merges branch (new progress_corpus_path session fixture generates a deterministic ~15 KB synthetic corpus via fixed-seed random.Random(0xBADC0FFEE); with --vocab-size 480 the run plans 224 merges and empirically fires 2 merge events, asserted via "Training merges:" on stderr and absent from stdout)
    • train failure modes: nonexistent input, invalid UTF-8 input (reusing invalid_utf8.bin), save without --force to existing path, save with --force overwrites cleanly
    • encode happy path (compact JSON array via json.dumps(ids, separators=(",", ":")) equality, no spaces, stderr empty under non-interactive subprocess)
    • decode happy path (raw text bytes, stderr empty; roundtrip of "Hello, world!" through encode → decode)
    • decode failure modes: unknown token ID ([999999]), invalid UTF-8 byte sequence ([128], lone continuation byte), missing model file
  • tests/conftest.py — added tiny_corpus_path session fixture (CLI subprocess tests need the actual file path, not the decoded string already exposed by tiny_corpus)

Validation

  • uv run pytest — 192 passed (12 new, no regressions)
  • uv run ruff check . — clean
  • uv run ruff format --check . — clean
  • uv run mypy --strict — clean

Manual checks:

  • Three Codex review rounds; rounds 1 and 2 surfaced real assertion-uniqueness and branch-reachability gaps (panel title vs lifecycle line, 4-merge run never firing the every-100 branch, JSON summary never checked against stderr), all fixed in this PR. Round 3 came back clean.
  • Subprocess invocation resolves the bpetite entry point via Path(sys.executable).parent / "bpetite" rather than uv run bpetite, so each test avoids nested uv resolution inside pytest and runs fast on both macOS and Linux CI targets.

Risks / Follow-ups

  • The every-100-merges test adds ~0.6 s to the test suite (full pytest: 1.65 s → 2.38 s) because training 224 merges dominates subprocess startup. Acceptable for a single coverage-motivated case.
  • Ruff S311 (non-crypto RNG) is suppressed inline on the synthetic corpus generator with a scoped # noqa: S311; the RNG is deterministic fixture generation, not a security surface.
  • Task 4-3 (TinyShakespeare download helper) is the next Phase 4 task.

@github-actions

github-actions Bot commented Apr 15, 2026

Copy link
Copy Markdown

bpetite workflows

Workflow Status Comment if failure and where
tests success ok
lint success ok
syntax success ok
format success ok
types success ok
build success ok
cli-smoke success ok
determinism success ok
policy-guard success ok
ci-meta pending waiting

PR #33: tracked workflows are still running.

@dinesh-git17 dinesh-git17 merged commit 5ae0693 into main Apr 15, 2026
14 checks passed
@dinesh-git17 dinesh-git17 deleted the test/cli-contract-tests branch April 15, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant