Neural video compression optimized for downstream perception models.
tac is the tight production library extracted from comma-lab (the comma video compression contest research workspace at https://github.com/adpena/comma-lab — currently being sanitized for public release; see PR #107 for the contest submission). This repo contains the reusable codec primitives, score-band predictor with refusal modes, distortion proxy, parallel-dispatch toolchain, and hardened preflight infrastructure. The full research trajectory — experimental ledger, byte-level deconstruction of public PRs, lane registry, dispatch wrappers, methodology writeup, and 585 session-memory files — lives in comma-lab.
tac trains tiny CNN post-filters that correct decoded video frames by backpropagating through frozen perception networks. The filter learns corrections that minimize the scorer's distortion metric, not generic pixel quality.
flowchart LR
subgraph Local["Local CPU (microseconds per candidate)"]
GEN["Candidate generator<br/>(apogee_intN, sidechannel sweep, etc.)"]
RANK["MetaLagrangianSearch<br/>(predictor + distortion proxy)"]
GATE{"Sanity gate<br/>(5 hardened checks)"}
end
subgraph Cloud["Paid GPU (Lightning T4 / Vast.ai 4090)"]
DISP["parallel_dispatch_top_k<br/>(ThreadPoolExecutor)"]
EVAL["upstream/evaluate.py<br/>(contest-CUDA T4)"]
JSON["contest_auth_eval.json<br/>(per-dispatch)"]
end
subgraph Feedback["Closed-loop reseed"]
HARV["harvest_and_reseed<br/>(filters [contest-CUDA] only)"]
ANCHOR["anchors_*.json<br/>(empirical calibration)"]
end
GEN --> RANK
RANK -->|"top-K"| GATE
GATE -->|"PASS"| DISP
GATE -->|"REFUSE"| GEN
DISP --> EVAL
EVAL --> JSON
JSON --> HARV
HARV --> ANCHOR
ANCHOR -->|"strengthens"| RANK
style GATE fill:#fff3cd,stroke:#856404
style ANCHOR fill:#d4edda,stroke:#155724
style EVAL fill:#cce5ff,stroke:#004085
The cycle is: rank cheap, dispatch top-K in parallel, harvest only what tagged
[contest-CUDA], reseed the calibration. The sanity gate refuses anything the
predictor or proxy can't justify (extrapolation outside calibrated range,
lossy-better-than-lossless math incoherence, missing distortion model). The
single binary running this loop end-to-end is tools/feedback_loop_sweep.py
in adpena/comma-lab.
pip install tacOptional extras:
pip install tac[mlx] # Apple Silicon acceleration
pip install tac[viz] # Plotly visualization
pip install tac[notebooks] # Marimo notebook supportfrom tac import Trainer, TrainConfig, build_postfilter
# Build a 3-layer residual CNN post-filter
model = build_postfilter("standard", hidden=64)
# Configure training with QAT, EMA, and best-checkpoint selection
config = TrainConfig(hidden=64, epochs=1000, alpha=20, tag="my_run")
# Train against frozen PoseNet + SegNet scorers
trainer = Trainer(model, config, device="mps")
trainer.fit(comp_pairs, gt_pairs, posenet, segnet, sal_weights)For the closed-loop search side of the library — meta-Lagrangian ranker,
score-band predictor with refusal modes, parallel-dispatch actuator, and
harvest-and-reseed feedback — see examples/quickstart.py.
It runs without a GPU and without comma-challenge data:
python examples/quickstart.pyMeta-Lagrangian search engine (tac.optimizer.MetaLagrangianSearch):
ranks codec candidates with a Boyd-style multi-constraint Lagrangian,
combining a closed-form distortion proxy, a score-band predictor, and a
5-gate predispatch sanity ladder. Refused candidates sort to the bottom of
the dispatch queue regardless of nominal score; the engine is deterministic,
arch-agnostic, and uses the exact contest score formula.
Predictor with refusal modes (tac.predictor.score_band): predicts a
contest-CUDA score band from rel_err and archive bytes, but refuses when
calibration support is insufficient (insufficient_anchors,
extrapolation, lossy_better_than_lossless_incoherent, ...). Built after
the apogee_int4 8x miss (predicted [0.155, 0.180]; landed 1.4287
[contest-CUDA]) — refusal is the feature, not the bug.
Parallel-dispatch actuator (production wires
tools/parallel_dispatch_top_k.py from the parent comma-lab repo):
concurrent.futures.ThreadPoolExecutor over the existing dispatch wrappers
with per-dispatch and total-cost gating, harvested-JSONL output, and strict
refusal of candidates not marked ready_for_exact_eval_dispatch=true.
Closed-loop feedback (production wires
tools/feedback_loop_sweep.py): rank → fan-out N concurrent paid-GPU
dispatches → harvest [contest-CUDA] rows → cross-verify against
contest_auth_eval.json → append empirical anchors → re-rank → repeat,
gated by --max-cycles, --max-total-cost, --max-cost-per-cycle, and
--convergence-eps.
Video compression codecs (H.264, AV1) optimize for human perception -- PSNR, SSIM, perceptual quality. But many downstream consumers are neural networks, not humans. A self-driving car's perception stack does not care about perceptual quality; it cares about whether PoseNet and SegNet produce correct outputs from the decoded frames.
tac bridges this gap:
- Compress video with a standard codec (H.265, AV1)
- Post-filter decoded frames through a tiny learned CNN
- Score using the actual downstream perception models
- Backpropagate through the frozen scorers to train the filter
The post-filter learns artifact corrections that specifically help the downstream models, even if those corrections look invisible (or worse) to human eyes.
tac provides two processing lanes:
- CPU lane: Standard codec + learned post-filter. The post-filter is a 3-layer residual CNN (~390KB int8) that runs in real-time on CPU.
- GPU lane: Mask extraction + neural rendering. SegNet masks are compressed at extreme ratios, then a neural renderer reconstructs RGB frames from masks alone.
| Module | Purpose |
|---|---|
tac.architectures |
8 post-filter architectures (Standard, Dilated, PixelShuffle, PSD, ...) |
tac.training |
Trainer with QAT, EMA, SWA, best-checkpoint selection |
tac.losses |
Scorer-aware losses (standard, feature matching, STE) |
tac.quantization |
Int8 quantization (per-channel, FakeQuant STE, LSQ) |
tac.fp4_quantize |
Extreme 4-bit quantization with codebook |
tac.mask_codec |
Mask extraction, AV1/VVC encoding, entropy coding |
tac.renderer |
Neural mask-to-RGB renderer (GPU lane) |
tac.tto |
Test-time optimization at inflation |
tac.scorer |
Scoring formula, sensitivity analysis |
tac.evaluate |
Proxy evaluation, checkpoint averaging |
tac.profiles |
Named training profiles (proven_baseline, smoke, ...) |
# Train a post-filter
tac lossy train --profile proven_baseline --precomputed data/precomputed
# Evaluate a checkpoint
tac lossy eval --checkpoint best_int8.pt --archive test.zip
# Lossless compression tools
tac lossless compress input.bin -o output.tac
tac lossless decompress output.tac -o recovered.binMIT
This library powers our submission to the comma video compression challenge, PR #107 (apogee, 0.2293 contest-CUDA T4).
Key modules used in the submission:
-
tac.optimizer.MetaLagrangianSearch— Boyd-style multi-constraint search integrating predictor + distortion proxy + 5-gate sanity ladder. Refused candidates rank to the bottom of the dispatch queue regardless of nominal score. Engine is deterministic, uses the exact contest score formula (100*seg + sqrt(10*pose) + 25*archive_bytes/37545489), and is arch-agnostic (accepts arbitrary calibration anchors). -
tac.predictor.score_band— score band predictor with explicit refusal modes (insufficient_anchors,extrapolation,lossy_better_than_lossless, etc.). Refuses outside its calibration range rather than extrapolating. -
tac.preflight— strict-mode preflight checks (~50+ structural invariants) that catch dispatch-time hazards before paid GPU spend.
The full research workspace (training scripts, dispatch tooling, ledgers) lives in a separate private repository pending sanitization for OSS release.