program

Reading order (refreshed 2026-05-16): new readers (operators or collaborators) should read HANDOFF.md first (30-minute orientation), then SYSTEM_MAP.md (structural diagram), then this file (mission + architecture), then CLAUDE.md (agent-binding contracts).

The "Architecture" / "Experiment profiles" sections below describe the 2026-04 WILDE / SHIRAZ / GREEN cycle. The current substrate canvas (43+ substrate trainers) is documented in SYSTEM_MAP.md §1 and .omx/state/lane_registry.json (758+ lanes). This file is historical architecture context, not the live frontier ledger. Read reports/latest.md first, then HANDOFF.md §2 for the latest durable orientation snapshot.

Nomenclature: tac means Task-Aware Compression, the reusable library and algorithmic engine. A codec is a concrete encoder/decoder or wire format inside that broader compression stack. comma_lab owns lab operations, state projection, custody, and reporting. See README.md and docs/terminology_and_boundaries.md for the canonical public wording.

Mission: minimize the official challenge score on a pinned upstream snapshot using task-aware compression against the frozen scorers.

Historical 2026-04 Renderer Context

The WILDE / SHIRAZ / GREEN renderer notes below are retained for reproducibility and paper history. They are not the live substrate canvas; for current work, read reports/latest.md, SYSTEM_MAP.md, and the active .omx/research/*_directive_* files.

Historical primary objectives

Train the asymmetric warp renderer (WILDE/SHIRAZ/GREEN profiles) to minimize the combined scoring formula.
Compress the trained renderer + masks + poses into a submission archive that achieves the lowest possible score.
Maintain contest compliance: no scorers loaded at inflate time, single forward pass, under 30 minutes on T4.
Collect clean evidence for the writeup track from day one.

Historical architecture

The renderer is a CLADE-conditioned U-Net (AsymmetricPairGenerator in src/tac/renderer.py):

Frame2 rendered directly from segmentation mask via spatially-adaptive normalization.
Frame1 derived by warping frame2 with learned optical flow + gated residual correction.
Trained against frozen SegNet and PoseNet scorers with Fridrich inverse steganalysis losses.
Quantized to int4+LZMA2 or FP4 for archive compression.

Historical experiment profiles

Profile	Philosophy	Status
WILDE	Empirical 5-phase freeze/unfreeze	Training complete, proxy 0.407
SHIRAZ	PCGrad + focal STE (principled)	A/B test against WILDE
GREEN	WILDE + radial zoom warp	Iteration 2, pending

Mutation frontier

The agent may edit only:

configs/**
docs/**
prompts/**
src/comma_lab/**
submissions/robust_current/**
runtime-rs/**
cuda/**
mojo/**
jax/**
.omx/**
.ralph/**
.agents/**
reports/**
experiments/**

The agent may not edit without explicit human approval:

the pinned upstream snapshot
submissions/exact_current/inflate.py
submissions/exact_current/inflate.sh
start.sh
LICENSE
THIRD_PARTY_NOTICES.md

Evidence rules

Never claim an improvement without a measured score.
Prefer the official evaluator over proxies.
Use proxy evaluation only to rank cheap local follow-up candidates before promotion. Proxy/advisory/local-substrate rows are never rank/kill or promotion authority by themselves.
Record config, command, artifact size, and score breakdown for each promoted run.
Label every score with its exact evidence axis: [contest-CPU], [contest-CUDA], [macOS-CPU advisory], [macOS-MLX research-signal], diagnostic/proxy, or historical unlimited-compute context. Calibrated MLX rows may guide spend triage only with an attached calibration manifest and a full-sample contest-CPU or contest-CUDA auth-axis comparison payload that passes tac.auth_eval_schema.required_contest_auth_axis_payload_blockers. Never promote a proxy, advisory, diagnostic, partial-sample, or MLX research axis into a public leaderboard claim.

Evidence Axes And Historical Lanes

Contest auth eval: archive.zip + inflate.sh evaluated by the pinned upstream scorer, with [contest-CPU] and [contest-CUDA] kept separate.
Diagnostic/proxy: local, MPS, macOS CPU advisory, smoke, and component probes. These guide work but do not rank or kill submissions.
Historical unlimited-compute: TTO or other compress-time-only studies. These are paper/methodology context unless converted into byte-closed archives and exact auth-eval artifacts.

Never conflate these axes.

Pipeline

The canonical pipeline (experiments/pipeline.py) runs:

Mask extraction (SegNet on GT video)
FP4/int4 export
Adaptive pose TTO (convergence-driven)
QAT fine-tuning (quality-monitored)
Fridrich steganalytic refinement
Weight compression
Archive packaging
Auth evaluation

Each step is idempotent. The pipeline iterates until convergence.

Operating loop

At each cycle:

propose at most 3 experiments
estimate expected payoff and cost
run smoke checks
run proxy evals
promote only the best candidate(s) to full eval
summarize what changed and what the evidence says
update the next experiment queue

Reporting standards

Every promoted run should record:

upstream snapshot hash
submission track
packaging mode
archive size
measured score (labeled by lane)
segnet distortion
posenet distortion
rate
runtime notes
exact commands or config diff

Style

Be direct. Prefer small edits over sprawling rewrites. Prefer reversible experiments. Prefer measured evidence over narratives.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

program

Historical 2026-04 Renderer Context

Historical primary objectives

Historical architecture

Historical experiment profiles

Mutation frontier

Evidence rules

Evidence Axes And Historical Lanes

Pipeline

Operating loop

Reporting standards

Style

FilesExpand file tree

PROGRAM.md

Latest commit

History

PROGRAM.md

File metadata and controls

program

Historical 2026-04 Renderer Context

Historical primary objectives

Historical architecture

Historical experiment profiles

Mutation frontier

Evidence rules

Evidence Axes And Historical Lanes

Pipeline

Operating loop

Reporting standards

Style