CompText V7 is a deterministic industrial AI transport and audit system for structured technical telemetry. It is not a free-form summarizer and must not be optimized by deleting rare but operationally meaningful evidence.
- Alarms must never disappear.
- Timestamps are immutable.
- Event ordering is immutable.
- Sparse anomalies always survive routing and replay.
- Reconstruction may never hallucinate events, causes, labels, timestamps, or operators.
- Severity labels may not be softened.
- Causal chains must remain inspectable.
- Deterministic replay is mandatory for release.
- Token accounting must be reproducible for
cl100k_baseando200k_baseor explicitly marked as fallback-regex whentiktokenis unavailable. MAX_ALLOWED_CRITICAL_LOSS = 0andMAX_ALLOWED_HIGH_LOSS = 0.
Compression is allowed only when it preserves audit-critical semantics: event count, timestamp boundaries, severity inventory, code inventory, family signatures, sparse anchors, and deterministic source fingerprints. The KVTC frame is a transport envelope, not proof of semantic equivalence; forensic validation remains authoritative.
Sparse packets carry disproportionate safety risk because a single alarm may be the entire signal. Tiny heterogeneous packets use a deterministic micro-frame when dictionary metadata would dominate. The in-memory audit layers remain available, and replay/forensic checks must prove anomaly survivability.
- Dropping alarms, anchors, event IDs, timestamps, or severity labels.
- Reordering events to improve compression.
- Normalizing Unicode in a way that changes token counts or evidence meaning without a drift finding.
- Reconstructing missing facts from priors or model guesses.
- Treating high compression ratio as sufficient evidence of semantic retention.
- Merging stale scaffolding PRs that delete current audit, benchmark, or validation controls.
All golden corpus records are fixed-order JSON Lines with fixed UTC timestamps. Replay uses deterministic seeds and compares source hashes, compressed hashes, token hashes, and forensic pass/fail state across repeated passes.
The datasets are synthetic industrial fixtures. They are suitable for deterministic engineering validation and regression protection, not vendor certification or production incident conclusions.
A release is rejected if hashes drift, critical loss is greater than zero, high loss is greater than zero, anomalies disappear, timestamps mutate, replay output changes, or tokenizer accounting drifts without explicit documented acceptance.