Skip to content

ProfRandom92/Comptextv7

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comptextv7 logo

Comptextv7

Deterministic operational replay validation for long-horizon AI agents.

Comptextv7 tests whether compact, replay-safe operational state can preserve workflow continuity across compression, reconstruction, and CI-audited replay checks — without LLM judges, embeddings, vector databases, or external APIs.

CI Python Deterministic Replay No LLM Judging Replay Artifacts Operational State

Live showcase · Demo walkthrough · Benchmark explanation · Replay report


Why this exists

Long-running agents fail when replayed context becomes operationally untrustworthy:

  • constraints disappear;
  • blockers detach from tasks;
  • tool sequences mutate;
  • dependencies collapse;
  • summaries sound fluent but lose actionable state.

Comptextv7 focuses on preserving the state needed to continue work, not preserving raw chat history. The project treats replay as an auditable operational-state problem: extract the fields that matter, compact them, reconstruct them, and verify them with deterministic checks.

Proof at a glance

Evidence Current result
Paper replay fixtures 3 dense technical papers
Agent trace fixtures 3 multi-step workflows
Paper avg compression 1.347063
Agent avg compression 1.773954
Paper replay consistency 0.791667
Agent replay consistency 1.000000
Agent operational drift 0.000000
Evaluation mode deterministic, no LLM judging
Artifact format committed JSON + CI upload

Sources: artifacts/paper_replay_results.json and artifacts/agent_trace_replay_results.json.

How to read these values

  • Paper replay is lossy under dense technical prose. The current paper fixtures include entities, limitations, sections, and metrics that are harder to preserve after compaction.
  • Agent trace replay is currently near-lossless because traces are structured. The checked-in traces expose explicit tasks, blockers, dependencies, tool order, and recovery actions.
  • 1.000000 replay consistency does not mean solved memory. It means exact preservation under the current structured trace fixtures and current deterministic validator.
  • Operational drift is field loss, not subjective quality. A non-zero drift rate would mean replay lost required operational fields.
  • Next target is iterative replay degradation. The next milestone is to repeatedly compact and replay state until drift curves and collapse points are visible.

What makes this different

  • Not chat-history storage.
  • Not vector memory.
  • Not model-judged summarization.
  • Not autonomous agent orchestration.
  • Deterministic operational-state replay validation.

Architecture

flowchart LR
    A[Raw Context / Agent Trace]
    --> B[Operational State Extraction]
    B --> C[Compact Replay State]
    C --> D[Replay Reconstruction]
    D --> E[Deterministic Validation]
    E --> F[CI Artifact]
Loading

Comptextv7 turns noisy context into compact operational state, then validates whether replay reconstructs the fields needed to continue work.

Benchmark family

Paper Replay Benchmark

Agent Trace Replay Benchmark

  • Validates: whether multi-step agent workflows preserve active tasks, constraints, dependencies, tool sequences, unresolved blockers, deployment requirements, and recovery actions.
  • Artifact: artifacts/agent_trace_replay_results.json.
  • Method: docs/benchmarks/agent_trace_replay.md.
  • Current avg compression: 1.773954.
  • Current replay consistency: 1.000000.
  • Operational drift: 0.000000.
  • Interpretation: current setup is near-lossless because the fixtures are structured; this is a useful baseline, not a universal memory claim.

Complementary adversarial replay stress suite

This suite is a separate long-horizon stress surface under reports/replay_continuity/. It remains useful context, but the focused README narrative is the deterministic operational replay benchmark family above.

System Iteration 25 Iteration 50 Iteration 100 Iteration 250
Naive 0.039 0.039 0.043 0.039
Baseline 0.294 0.294 0.294 0.294
Adaptive 0.679 0.476 0.302 0.302
Comptextv7 1.000 0.995 0.824 0.572

The committed 250-iteration report records Comptextv7 mean final continuity at 0.571783, rounded to 0.572 here. Detail fidelity still degrades: hidden truth survival is 0.570173, and evaluator agreement divergence is 0.421743.

System Approx collapse point
Naive ~1 iteration
Baseline ~10 iterations
Adaptive ~45 iterations
Comptextv7 censored at ~250 iterations in this suite

Visual artifacts

Integrity model

  • no LLM judging;
  • no embeddings;
  • no external APIs;
  • deterministic JSON artifacts;
  • CI reproducible;
  • audit-friendly.

Limitations

  • Fixtures are curated and checked in.
  • Structured agent traces currently replay near-losslessly.
  • This is not solved AI memory.
  • This is not production telemetry.
  • This is not an autonomous agent framework.
  • Evaluator divergence remains material in the long-horizon stress suite.
  • A stronger iterative degradation benchmark is the next technical milestone.

Next technical milestone

Next: iterative replay degradation. Repeatedly compact and replay operational state to expose drift curves, collapse points, and field-level failure modes under pressure.

Review surfaces

Surface Link
Live showcase comptextv7.vercel.app
Demo walkthrough docs/DEMO_WALKTHROUGH.md
Showcase readiness docs/SHOWCASE_READINESS.md
Benchmark explanation docs/BENCHMARK_EXPLANATION.md
Replay report reports/replay_continuity/validation_report.md
API surface docs/API_SURFACE.md

Repository map

Comptextv7/
├── artifacts/                  # committed deterministic replay benchmark JSON
├── benchmarks/                 # deterministic compression, replay, and audit runners
├── contracts/                  # machine-readable validation and handoff contracts
├── dashboard/                  # backend plus React operations console
├── docs/                       # benchmark, showcase, and reviewer documentation
├── reports/replay_continuity/  # adversarial continuity metrics and SVG charts
├── scripts/                    # validation, reporting, and artifact tooling
├── src/                        # KVTC engine, audit, and semantic validation modules
├── tests/                      # Python regression and replay validation tests
└── README.md

Safety boundaries

Do not commit:

  • proprietary customer data;
  • secrets, API keys, tokens, cookies, or credentials;
  • raw production logs;
  • unsanitized replay fixtures;
  • private deployment credentials or environment dumps.

Comptextv7 is a deterministic, synthetic-only research prototype for operational replay persistence and reviewable diagnostic infrastructure.

Cloud-first validation

Comptextv7 is biased toward artifact-backed review rather than local machine trust.

Workflow Role
ci.yml Runs deterministic replay, tests, telemetry, and validation gates.
agent-checks.yml Runs repository/report/contract checks plus dashboard validation.
validation_runner.yml Publishes compact cloud validation result artifacts.

Reproducibility

Install the test dependency set:

python -m pip install -e '.[test]'

Regenerate deterministic replay artifacts:

python tests/utils/paper_replay_runner.py
python tests/utils/agent_trace_replay_runner.py
python benchmarks/run_replay_continuity.py --iterations 250 --output-dir reports/replay_continuity

Run focused checks:

pytest tests/test_paper_replay_bench.py tests/test_agent_trace_replay.py tests/test_replay_continuity.py

Run the broader local gate:

python -m pytest
python scripts/validate.py replay
python scripts/validate.py token
python scripts/validate.py forensic
python scripts/validate_contracts.py
python scripts/validate_api_exports.py

Releases

No releases published

Packages

 
 
 

Contributors