Reading order (refreshed 2026-05-16): new readers (operators or
collaborators) should read HANDOFF.md first (30-minute orientation),
then SYSTEM_MAP.md (structural diagram), then this file (mission +
architecture), then CLAUDE.md (agent-binding contracts).
The "Architecture" / "Experiment profiles" sections below describe the
2026-04 WILDE / SHIRAZ / GREEN cycle. The current substrate canvas
(43+ substrate trainers) is documented in SYSTEM_MAP.md §1 and
.omx/state/lane_registry.json (758+ lanes). This file is historical
architecture context, not the live frontier ledger. Read reports/latest.md
first, then HANDOFF.md §2 for the latest durable orientation snapshot.
Nomenclature: tac means Task-Aware Compression, the reusable library and
algorithmic engine. A codec is a concrete encoder/decoder or wire format inside
that broader compression stack. comma_lab owns lab operations, state
projection, custody, and reporting. See README.md and
docs/terminology_and_boundaries.md for the canonical public wording.
Mission: minimize the official challenge score on a pinned upstream snapshot using task-aware compression against the frozen scorers.
The WILDE / SHIRAZ / GREEN renderer notes below are retained for
reproducibility and paper history. They are not the live substrate canvas; for
current work, read reports/latest.md, SYSTEM_MAP.md, and the active
.omx/research/*_directive_* files.
- Train the asymmetric warp renderer (WILDE/SHIRAZ/GREEN profiles) to minimize the combined scoring formula.
- Compress the trained renderer + masks + poses into a submission archive that achieves the lowest possible score.
- Maintain contest compliance: no scorers loaded at inflate time, single forward pass, under 30 minutes on T4.
- Collect clean evidence for the writeup track from day one.
The renderer is a CLADE-conditioned U-Net (AsymmetricPairGenerator in src/tac/renderer.py):
- Frame2 rendered directly from segmentation mask via spatially-adaptive normalization.
- Frame1 derived by warping frame2 with learned optical flow + gated residual correction.
- Trained against frozen SegNet and PoseNet scorers with Fridrich inverse steganalysis losses.
- Quantized to int4+LZMA2 or FP4 for archive compression.
| Profile | Philosophy | Status |
|---|---|---|
| WILDE | Empirical 5-phase freeze/unfreeze | Training complete, proxy 0.407 |
| SHIRAZ | PCGrad + focal STE (principled) | A/B test against WILDE |
| GREEN | WILDE + radial zoom warp | Iteration 2, pending |
The agent may edit only:
configs/**docs/**prompts/**src/comma_lab/**submissions/robust_current/**runtime-rs/**cuda/**mojo/**jax/**.omx/**.ralph/**.agents/**reports/**experiments/**
The agent may not edit without explicit human approval:
- the pinned upstream snapshot
submissions/exact_current/inflate.pysubmissions/exact_current/inflate.shstart.shLICENSETHIRD_PARTY_NOTICES.md
- Never claim an improvement without a measured score.
- Prefer the official evaluator over proxies.
- Use proxy evaluation only to rank cheap local follow-up candidates before promotion. Proxy/advisory/local-substrate rows are never rank/kill or promotion authority by themselves.
- Record config, command, artifact size, and score breakdown for each promoted run.
- Label every score with its exact evidence axis:
[contest-CPU],[contest-CUDA],[macOS-CPU advisory],[macOS-MLX research-signal], diagnostic/proxy, or historical unlimited-compute context. Calibrated MLX rows may guide spend triage only with an attached calibration manifest and a full-samplecontest-CPUorcontest-CUDAauth-axis comparison payload that passestac.auth_eval_schema.required_contest_auth_axis_payload_blockers. Never promote a proxy, advisory, diagnostic, partial-sample, or MLX research axis into a public leaderboard claim.
- Contest auth eval:
archive.zip+inflate.shevaluated by the pinned upstream scorer, with[contest-CPU]and[contest-CUDA]kept separate. - Diagnostic/proxy: local, MPS, macOS CPU advisory, smoke, and component probes. These guide work but do not rank or kill submissions.
- Historical unlimited-compute: TTO or other compress-time-only studies. These are paper/methodology context unless converted into byte-closed archives and exact auth-eval artifacts.
Never conflate these axes.
The canonical pipeline (experiments/pipeline.py) runs:
- Mask extraction (SegNet on GT video)
- FP4/int4 export
- Adaptive pose TTO (convergence-driven)
- QAT fine-tuning (quality-monitored)
- Fridrich steganalytic refinement
- Weight compression
- Archive packaging
- Auth evaluation
Each step is idempotent. The pipeline iterates until convergence.
At each cycle:
- propose at most 3 experiments
- estimate expected payoff and cost
- run smoke checks
- run proxy evals
- promote only the best candidate(s) to full eval
- summarize what changed and what the evidence says
- update the next experiment queue
Every promoted run should record:
- upstream snapshot hash
- submission track
- packaging mode
- archive size
- measured score (labeled by lane)
- segnet distortion
- posenet distortion
- rate
- runtime notes
- exact commands or config diff
Be direct. Prefer small edits over sprawling rewrites. Prefer reversible experiments. Prefer measured evidence over narratives.