Comptextv7

Deterministic operational replay validation for long-horizon AI agents.

Comptextv7 tests whether compact, replay-safe operational state can preserve workflow continuity across compression, reconstruction, and CI-audited replay checks — without LLM judges, embeddings, vector databases, or external APIs.

Live showcase · Demo walkthrough · Benchmark explanation · Replay report

Why this exists

Long-running agents fail when replayed context becomes operationally untrustworthy:

constraints disappear;
blockers detach from tasks;
tool sequences mutate;
dependencies collapse;
summaries sound fluent but lose actionable state.

Comptextv7 focuses on preserving the state needed to continue work, not preserving raw chat history. The project treats replay as an auditable operational-state problem: extract the fields that matter, compact them, reconstruct them, and verify them with deterministic checks.

Proof at a glance

Evidence	Current result
Paper replay fixtures	3 dense technical papers
Agent trace fixtures	3 multi-step workflows
Paper avg compression	1.347063
Agent avg compression	1.773954
Paper replay consistency	0.791667
Agent replay consistency	1.000000
Agent operational drift	0.000000
Evaluation mode	deterministic, no LLM judging
Artifact format	committed JSON + CI upload

Sources: artifacts/paper_replay_results.json and artifacts/agent_trace_replay_results.json.

How to read these values

Paper replay is lossy under dense technical prose. The current paper fixtures include entities, limitations, sections, and metrics that are harder to preserve after compaction.
Agent trace replay is currently near-lossless because traces are structured. The checked-in traces expose explicit tasks, blockers, dependencies, tool order, and recovery actions.
1.000000 replay consistency does not mean solved memory. It means exact preservation under the current structured trace fixtures and current deterministic validator.
Operational drift is field loss, not subjective quality. A non-zero drift rate would mean replay lost required operational fields.
Next target is iterative replay degradation. The next milestone is to repeatedly compact and replay state until drift curves and collapse points are visible.

What makes this different

Not chat-history storage.
Not vector memory.
Not model-judged summarization.
Not autonomous agent orchestration.
Deterministic operational-state replay validation.

Architecture

flowchart LR
    A[Raw Context / Agent Trace]
    --> B[Operational State Extraction]
    B --> C[Compact Replay State]
    C --> D[Replay Reconstruction]
    D --> E[Deterministic Validation]
    E --> F[CI Artifact]

Comptextv7 turns noisy context into compact operational state, then validates whether replay reconstructs the fields needed to continue work.

Benchmark family

Paper Replay Benchmark

Validates: whether dense technical paper summaries preserve entities, metrics, limitations, and section structure after deterministic replay compression.
Artifact: artifacts/paper_replay_results.json.
Method: docs/benchmarks/paper_replay.md.
Current avg compression: 1.347063.
Current replay consistency: 0.791667.

Agent Trace Replay Benchmark

Validates: whether multi-step agent workflows preserve active tasks, constraints, dependencies, tool sequences, unresolved blockers, deployment requirements, and recovery actions.
Artifact: artifacts/agent_trace_replay_results.json.
Method: docs/benchmarks/agent_trace_replay.md.
Current avg compression: 1.773954.
Current replay consistency: 1.000000.
Operational drift: 0.000000.
Interpretation: current setup is near-lossless because the fixtures are structured; this is a useful baseline, not a universal memory claim.

Complementary adversarial replay stress suite

This suite is a separate long-horizon stress surface under reports/replay_continuity/. It remains useful context, but the focused README narrative is the deterministic operational replay benchmark family above.

System	Iteration 25	Iteration 50	Iteration 100	Iteration 250
Naive	0.039	0.039	0.043	0.039
Baseline	0.294	0.294	0.294	0.294
Adaptive	0.679	0.476	0.302	0.302
Comptextv7	1.000	0.995	0.824	0.572

The committed 250-iteration report records Comptextv7 mean final continuity at 0.571783, rounded to 0.572 here. Detail fidelity still degrades: hidden truth survival is 0.570173, and evaluator agreement divergence is 0.421743.

System	Approx collapse point
Naive	~1 iteration
Baseline	~10 iterations
Adaptive	~45 iterations
Comptextv7	censored at ~250 iterations in this suite

Visual artifacts

Integrity model

no LLM judging;
no embeddings;
no external APIs;
deterministic JSON artifacts;
CI reproducible;
audit-friendly.

Limitations

Fixtures are curated and checked in.
Structured agent traces currently replay near-losslessly.
This is not solved AI memory.
This is not production telemetry.
This is not an autonomous agent framework.
Evaluator divergence remains material in the long-horizon stress suite.
A stronger iterative degradation benchmark is the next technical milestone.

Next technical milestone

Next: iterative replay degradation. Repeatedly compact and replay operational state to expose drift curves, collapse points, and field-level failure modes under pressure.

Review surfaces

Surface	Link
Live showcase	`comptextv7.vercel.app`
Demo walkthrough	`docs/DEMO_WALKTHROUGH.md`
Showcase readiness	`docs/SHOWCASE_READINESS.md`
Benchmark explanation	`docs/BENCHMARK_EXPLANATION.md`
Replay report	`reports/replay_continuity/validation_report.md`
API surface	`docs/API_SURFACE.md`

Repository map

Comptextv7/
├── artifacts/                  # committed deterministic replay benchmark JSON
├── benchmarks/                 # deterministic compression, replay, and audit runners
├── contracts/                  # machine-readable validation and handoff contracts
├── dashboard/                  # backend plus React operations console
├── docs/                       # benchmark, showcase, and reviewer documentation
├── reports/replay_continuity/  # adversarial continuity metrics and SVG charts
├── scripts/                    # validation, reporting, and artifact tooling
├── src/                        # KVTC engine, audit, and semantic validation modules
├── tests/                      # Python regression and replay validation tests
└── README.md

Safety boundaries

Do not commit:

proprietary customer data;
secrets, API keys, tokens, cookies, or credentials;
raw production logs;
unsanitized replay fixtures;
private deployment credentials or environment dumps.

Comptextv7 is a deterministic, synthetic-only research prototype for operational replay persistence and reviewable diagnostic infrastructure.

Cloud-first validation

Comptextv7 is biased toward artifact-backed review rather than local machine trust.

Workflow	Role
`ci.yml`	Runs deterministic replay, tests, telemetry, and validation gates.
`agent-checks.yml`	Runs repository/report/contract checks plus dashboard validation.
`validation_runner.yml`	Publishes compact cloud validation result artifacts.

Reproducibility

Install the test dependency set:

python -m pip install -e '.[test]'

Regenerate deterministic replay artifacts:

python tests/utils/paper_replay_runner.py
python tests/utils/agent_trace_replay_runner.py
python benchmarks/run_replay_continuity.py --iterations 250 --output-dir reports/replay_continuity

Run focused checks:

pytest tests/test_paper_replay_bench.py tests/test_agent_trace_replay.py tests/test_replay_continuity.py

Run the broader local gate:

python -m pytest
python scripts/validate.py replay
python scripts/validate.py token
python scripts/validate.py forensic
python scripts/validate_contracts.py
python scripts/validate_api_exports.py

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github		.github
artifacts		artifacts
benchmarks		benchmarks
config/hash-companion		config/hash-companion
contracts		contracts
dashboard		dashboard
datasets/golden		datasets/golden
docs		docs
reports		reports
scripts		scripts
showcase		showcase
src		src
tests		tests
.gitignore		.gitignore
DETERMINISM_REPORT.md		DETERMINISM_REPORT.md
FORENSIC_AUDIT.md		FORENSIC_AUDIT.md
GEMINI.md		GEMINI.md
GOLDEN_CORPUS.md		GOLDEN_CORPUS.md
INDUSTRIAL_READINESS.md		INDUSTRIAL_READINESS.md
PR_REVIEW_MATRIX.md		PR_REVIEW_MATRIX.md
README.md		README.md
RECONSTRUCTION_DRIFT_REPORT.md		RECONSTRUCTION_DRIFT_REPORT.md
REPRODUCIBILITY.md		REPRODUCIBILITY.md
TOKEN_TELEMETRY_REPORT.md		TOKEN_TELEMETRY_REPORT.md
VALIDATION_REPORT.md		VALIDATION_REPORT.md
program.md		program.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comptextv7

Why this exists

Proof at a glance

How to read these values

What makes this different

Architecture

Benchmark family

Paper Replay Benchmark

Agent Trace Replay Benchmark

Complementary adversarial replay stress suite

Visual artifacts

Integrity model

Limitations

Next technical milestone

Review surfaces

Repository map

Safety boundaries

Cloud-first validation

Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Comptextv7

Why this exists

Proof at a glance

How to read these values

What makes this different

Architecture

Benchmark family

Paper Replay Benchmark

Agent Trace Replay Benchmark

Complementary adversarial replay stress suite

Visual artifacts

Integrity model

Limitations

Next technical milestone

Review surfaces

Repository map

Safety boundaries

Cloud-first validation

Reproducibility

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages