ACT: Product-grade data-efficiency operationalization protocol

# ACT: Product-grade data-efficiency operationalization protocol

Date: 2026-06-17
Scope: GeoSync final product-operational readiness lane
Status: `PROTOCOL_READY / EXECUTION_PENDING / EVIDENCE_REQUIRED`

## 0. Purpose

Convert GeoSync from a verification-first research repository into a product-grade operational data-efficiency system without diluting scientific falsifiability, CI integrity, or provenance. This is not an alpha claim, not a trading-bot claim, and not a physics-law proof. It is an engineering protocol for turning validated market-structure hypotheses into reproducible, measurable, bounded operational data pipelines.

## 1. Current factual constraints

1. The fast-test oracle restoration PR (#1153) is still open. Product-readiness work must treat all prior vacuous-green critical lanes as weakened evidence until replayed under the restored oracle.
2. The revalidation-ledger gate PR (#1167) is open and designed to prevent silent promotion from `PENDING` to `REPLAYED` without `oracle_terminal_sha` and in-repository replay evidence.
3. The X10R reconstruction capsule PR (#1165) is a draft execution contract, not an implemented pipeline.
4. Productization must not promote documentation-only artifacts into runtime guarantees.
5. Market-output claims must remain falsifiable, bounded, and evidence-classed.

## 2. Operational model

```text
ORACLE_RESTORATION
  -> REVALIDATION_LEDGER_REPLAY
  -> EXECUTABLE_X10R_PIPELINE
  -> DATA_EFFICIENCY_METRICS
  -> PRODUCT_API_CONTRACT
  -> OBSERVABILITY_AND_DRIFT_GATES
  -> RELEASE_CANDIDATE_EVIDENCE_BUNDLE
```

Any skipped state is a protocol violation.

## 3. Industrial engineering objective

Deliver a product-level GeoSync lane that can answer, reproducibly and auditably:

- What data was ingested?
- What transformations were applied?
- Which physical / geometric / synchronization operators ran?
- What evidence supports or falsifies the result?
- What is the computational cost per useful signal?
- What is the decision boundary for refusing output?
- What artifact proves the run happened under the correct oracle?

## 4. Data-efficiency target definition

Data efficiency is not model decoration. It is the ratio between consumed data, compute, and validated signal output.

Required metrics:

```text
data_coverage_ratio
missingness_rate
feature_reuse_ratio
artifact_reproducibility_hash_match
compute_seconds_per_instrument_day
memory_mb_per_1k_nodes
signal_density_per_window
falsifier_pass_rate
oracle_replay_coverage
claim_promotion_error_rate
```

Minimum gate:

```text
artifact_reproducibility_hash_match == 1.0
falsifier_pass_rate == 1.0
claim_promotion_error_rate == 0.0
oracle_replay_coverage >= declared_bound
```

## 5. Required implementation lanes

### Lane A — Oracle finalization

- Merge or terminally resolve #1153.
- Preserve terminal CI evidence.
- Record the final oracle SHA.
- No data-product claims are allowed before this state.

### Lane B — Revalidation ledger replay

- Merge #1167 or equivalent gate.
- Flip seeded lanes from `PENDING` only with real replay evidence.
- Each replay entry must include `oracle_terminal_sha`, artifact path, status, residual, and bound.
- Any missing replay evidence keeps the lane non-promotable.

### Lane C — X10R executable pipeline

Implement a single-command pipeline:

```bash
python -m geosync.reconstruction.x10r --config configs/x10r.yaml --out artifacts/x10r_reconstruction
```

Required outputs:

```text
manifest.json
input_fingerprint.json
curvature_residuals.json
kuramoto_trajectory.parquet
energy_contracts.json
falsifier_results.json
metrics/data_efficiency.json
reports/product_readiness.md
RO-Crate metadata
```

### Lane D — Data contract hardening

- Define schema for market input windows.
- Add synthetic + fixture-backed real-data adapters.
- Fail closed on missing timestamps, non-monotonic series, invalid adjacency, NaN/Inf, unit ambiguity, or empty graph.
- Add golden fixtures and hash-pinned expected outputs.

### Lane E — Product API boundary

Expose read-only deterministic endpoints or CLI commands:

```text
geosync ingest
geosync reconstruct
geosync verify
geosync report
geosync export-capsule
```

No endpoint may imply trade execution or profitability.

### Lane F — Observability / drift / refusal gates

Add operational monitors:

```text
input_distribution_drift
graph_topology_drift
curvature_residual_drift
sync_regime_drift
energy_contract_violation
falsifier_regression
oracle_mismatch
```

Every monitor must have a refusal state.

### Lane G — Release candidate evidence bundle

Create:

```text
artifacts/release_candidate/VERDICT.md
artifacts/release_candidate/MANIFEST.json
artifacts/release_candidate/SHA256SUMS
artifacts/release_candidate/REPRODUCE.md
artifacts/release_candidate/METRICS.json
```

Verdict values:

```text
PRODUCT_READY
RESEARCH_ONLY
BLOCKED_BY_EVIDENCE
BLOCKED_BY_ORACLE
BLOCKED_BY_DATA_CONTRACT
```

## 6. Acceptance criteria

The task is complete only when:

1. #1153 or equivalent oracle fix is terminal and evidence-preserved.
2. Revalidation ledger has no unjustified promotion.
3. X10R pipeline is executable by one command.
4. Artifacts are hash-stable across two independent runs.
5. Falsifiers fire on tampered curvature, wrong Kuramoto scaling, invalid energy sign, and missing oracle evidence.
6. Data-efficiency metrics are emitted and bounded.
7. Product API refuses invalid or under-evidenced output.
8. CI runs unit, integration, physics, governance, and artifact-reproducibility gates.
9. `VERDICT.md` states a bounded status without marketing inflation.

## 7. Stop rules

Stop immediately if:

- tests run under a vacuous oracle;
- documentation claims exceed executable evidence;
- market-profitability language appears without registered statistical proof;
- replay evidence is missing;
- artifact hashes are unstable;
- invalid data silently produces output;
- any gate downgrades fail-closed behavior to warning-only.

## 8. Final operational verdict target

```text
TARGET_STATE=PRODUCT_GRADE_RESEARCH_OPERATIONS
TRADING_CLAIM=FORBIDDEN
PHYSICS_LAW_CLAIM=FORBIDDEN
DATA_EFFICIENCY=MEASURED_AND_BOUNDED
FALSIFIABILITY=EXECUTABLE
PROVENANCE=HASHED_AND_REPLAYABLE
RELEASE=BLOCKED_UNTIL_EVIDENCE_BUNDLE_GREEN
```

This issue is the execution protocol. Implementation must proceed as PRs bounded by the lanes above, not as one unreviewable mega-commit.

---

## 9. X11R post-critique reconstruction expansion

Status: `POST_CRITIQUE_PROTOCOL_ADDED / EXECUTION_PENDING / EPISTEMIC_EVIDENCE_REQUIRED`

This section extends X10R into X11R: rigorous epistemic reconstruction and falsification stress-test of the curvature-synchronization free-energy manifold under non-stationary market embedding with topological, causal, and uncertainty integrity enforcement.

The objective is not to intensify terminology. The objective is to convert the criticism of the prior ontology into executable research-engineering work: convergence analysis, embedding validation, causal robustness, uncertainty quantification, adversarial perturbation, and peer-review-grade artifact discipline.

### 9.1 Critical ontology constraints

X11R must explicitly preserve these boundaries:

- Kuramoto synchronization is a valid nonlinear-dynamics transfer, but Ricci-coupled Kuramoto requires stability, convergence, and perturbation analysis before any strong dynamical claim.
- Forman/Ollivier-Ricci curvature are valid network-geometric stress operators, but market graphs are not automatically Riemannian manifolds.
- Forman curvature can be unbounded below; any onset threshold such as `-2.0` is policy unless derived, stress-tested, and evidence-classed.
- Gauss-Bonnet is a topological combinatorial integrity gate, not proof of market law.
- Fractal and neuro-symbolic layers are admissible only when falsifiers, uncertainty, and negative evidence are preserved.
- Market systems are adaptive, non-stationary, heteroskedastic, and agent-mediated; therefore all physics transfers must remain heuristic until falsified or bounded by evidence.

## 10. X11R required implementation lanes

### Lane H — Graph construction and embedding validation

- Add order-book/trades-ready graph construction contracts for LOB, trades, returns, correlation, MST, PMFG, and weighted graph families.
- Add embedding distortion metrics: stress, distance preservation, curvature distortion, topology preservation, and sampling-regime metadata.
- Add multifractal frequency assignment path for natural frequencies `omega_i`, including MF-DFA or equivalent bounded estimator.
- Emit `embedding_validation.json` with declared graph family, distortion metrics, admissible interpretation class, and refusal reason if the graph is not geometrically interpretable.
- Fail closed on invalid embedding, undeclared graph family, non-stationary window without regime metadata, or distance metric ambiguity.

### Lane I — Curvature layer with Ricci-flow analysis

- Preserve exact Knill-style Gauss-Bonnet rational residual checks for valid graph families.
- Compute dual Forman/Ollivier curvature where dependencies and graph scale allow.
- Add discrete Ricci-flow simulation lane with explicit convergence criteria, stopping rules, and topological-change logging.
- Track Euler-characteristic preservation or explicitly record topological transitions.
- Add witness families: tree, cycle, complete graph, random graph, scale-free, small-world, empirical graph, and tampered graph.
- Emit `ricci_flow_convergence.json`, `topology_transition_log.json`, and `curvature_policy_bounds.json`.
- Fail closed if curvature residual is non-zero, flow convergence is claimed without criterion, or threshold policy is promoted to theorem.

### Lane J — Coupled Kuramoto on dynamic curvature manifold

- Implement curvature-modulated coupling options: `K_local(kappa)`, `omega_i(kappa)`, and bounded hybrid mode.
- Add Lyapunov/stability proxy tests: linearization spectrum, perturbation decay, phase-slip count, order-parameter convergence, and stochastic-noise sensitivity.
- Add stochastic Kuramoto extension for controlled noise injection, with deterministic seed and reproducible path.
- Track `R(t)`, coherence spectra, phase slips, synchronization transitions, and regime-shift alignment.
- Emit `kuramoto_curvature_coupling.json`, `stability_proxy.json`, `phase_slip_report.json`, and `sync_regime_transitions.parquet`.
- Fail closed on negative coupling, unstable integration, unbounded local K, NaN phase, unseeded stochasticity, or undocumented scaling ownership.

### Lane K — Free-energy and thermodynamic closure

- Preserve canonical `thermo_free_energy = U - T*S` separately from operational cost/entropy penalties.
- Add boundedness checks via epsilon-cap, delta-growth, Monte-Carlo tail stress, and finite-domain declarations.
- Add transfer-entropy or lagged cross-correlation falsifier for neuro-modulation claims.
- Add explicit units/dimensionless declaration for every energy-like quantity.
- Emit `free_energy_closure.json`, `tail_stress_report.json`, `neuromodulation_falsifier.json`, and `unit_contracts.json`.
- Fail closed on sign ambiguity, unit ambiguity, entropy penalty reversal, unbounded energy trajectory, or positive/flat correlation where inverse invariant is required.

### Lane L — Causal robustness and counterfactual falsification

- Add causal robustness tests: Granger-style predictability tests, synthetic controls, placebo interventions, and counterfactual graph perturbations.
- Add Pearl-style do-intervention terminology only where the implemented test genuinely supports intervention semantics; otherwise mark as observational only.
- Add curvature intervention tests: modify curvature while preserving topology and verify whether synchronization/energy outputs respond as claimed.
- Add regime split validation: train/calibrate on one regime, falsify or bound on another.
- Emit `causal_robustness.json`, `placebo_interventions.json`, `counterfactual_graph_tests.json`, and `regime_split_validation.json`.
- Fail closed on causal language without intervention evidence, correlation-only claims promoted to causal claims, or placebo passing as signal.

### Lane M — Bayesian uncertainty and epistemic ledger

- Add quantitative uncertainty layer for invariants: bootstrap, Bayesian posterior, MCMC, variational approximation, or explicitly bounded deterministic uncertainty where appropriate.
- Add evidence class per claim: `THEOREM`, `EXACT_INVARIANT`, `EMPIRICAL_OBSERVATION`, `HEURISTIC_POLICY`, `FAILED_FALSIFIER`, `NOT_EVIDENCED`.
- Extend contradiction ledger with prior claim, new evidence, falsifier result, posterior/evidence score, and demotion/promotion state.
- Emit `epistemic_ledger.json`, `uncertainty_quantification.json`, `claim_state_transitions.json`, and `negative_evidence.json`.
- Fail closed on claim promotion without evidence, missing uncertainty denominator, deleted negative evidence, or posterior reported without sampling/config provenance.

### Lane N — Adversarial and non-stationary stress tests

- Add adversarial graph perturbations: edge rewiring, weight shock, node dropout, timestamp shuffle, market-regime injection, heavy-tail noise, and liquidity-gap simulation.
- Add non-stationarity injection fixtures with declared shock windows and expected refusal or bounded degradation behavior.
- Add agent-based synthetic falsifier lane for intentional-agent market dynamics where graph-physics assumptions should weaken.
- Emit `adversarial_graph_stress.json`, `nonstationary_injection_report.json`, and `agent_based_falsifier.json`.
- Fail closed if adversarial perturbations preserve unsupported claims, if timestamp shuffle still produces strong causal claims, or if regime injection does not change uncertainty.

### Lane O — X11R capsule generator

- Extend the single-command interface:

```bash
python -m geosync.reconstruction.x11r \
  --config configs/x11r.yaml \
  --out artifacts/x11r_epistemic_reconstruction
```

- Required X11R outputs:

```text
manifest.json
input_fingerprint.json
embedding_validation.json
curvature_residuals.json
ricci_flow_convergence.json
kuramoto_curvature_coupling.json
stability_proxy.json
free_energy_closure.json
causal_robustness.json
uncertainty_quantification.json
epistemic_ledger.json
adversarial_graph_stress.json
falsifier_results.json
metrics/data_efficiency.json
reports/x11r_epistemic_verdict.md
ro-crate-metadata.json
SHA256SUMS
```

- Fail closed if any required artifact is missing, unhashable, non-reproducible, or not referenced in the manifest.

## 11. X11R data-efficiency additions

Add the following metrics to `metrics/data_efficiency.json` or `metrics/x11r_data_efficiency.json`:

```text
embedding_distortion_score
curvature_flow_convergence_rate
stability_proxy_pass_rate
causal_placebo_rejection_rate
counterfactual_sensitivity_ratio
uncertainty_coverage_ratio
negative_evidence_preservation_rate
adversarial_degradation_bound
nonstationary_refusal_accuracy
claim_evidence_alignment_score
```

Minimum gates:

```text
negative_evidence_preservation_rate == 1.0
claim_evidence_alignment_score == 1.0
nonstationary_refusal_accuracy >= declared_bound
causal_placebo_rejection_rate >= declared_bound
uncertainty_coverage_ratio >= declared_bound
```

No X11R verdict may use `PRODUCT_READY` unless all required X10R gates and X11R epistemic gates are green.

## 12. X11R acceptance criteria

The X11R expansion is complete only when:

1. X10R oracle, ledger, replay, reproducibility, and data-efficiency gates are already satisfied or explicitly bounded.
2. Graph embedding validity is quantified and interpretation class is declared.
3. Ricci-flow convergence is tested or explicitly bounded as non-convergent / unsupported.
4. Curvature-modulated Kuramoto coupling has stability proxy evidence.
5. Free-energy contracts preserve unit/sign separation and boundedness.
6. Causal claims are separated from observational claims and protected by placebo/counterfactual tests.
7. Uncertainty is quantified for empirical invariants or explicitly declared as unsupported.
8. Negative evidence is preserved in the ledger.
9. Adversarial and non-stationary stress tests execute and alter verdicts when assumptions break.
10. The X11R capsule is reproducible, hashed, and linked through RO-Crate metadata.

## 13. X11R stop rules

Stop immediately if:

- a market graph is called a Riemannian manifold without embedding evidence;
- a policy threshold is presented as a theorem;
- Kuramoto-Ricci coupling is claimed stable without stability proxy;
- causal language appears without causal test evidence;
- uncertainty is reported without method/config/sample provenance;
- negative evidence is removed or hidden;
- adversarial perturbation does not affect confidence yet claims remain strong;
- non-stationary injection produces unchanged product verdict;
- X11R attempts to bypass X10R oracle and replay gates.

## 14. X11R final verdict target

```text
TARGET_STATE=PRODUCT_GRADE_EPISTEMIC_RESEARCH_OPERATIONS
X10R_DEPENDENCY=REQUIRED
GRAPH_EMBEDDING=QUANTIFIED
CURVATURE_FLOW=TESTED_OR_BOUNDED
KURAMOTO_STABILITY=PROXIED_AND_HASHED
FREE_ENERGY=UNIT_SAFE_AND_BOUNDED
CAUSALITY=SEPARATED_FROM_CORRELATION
UNCERTAINTY=QUANTIFIED_OR_REFUSED
NEGATIVE_EVIDENCE=PRESERVED
ADVERSARIAL_ROBUSTNESS=MEASURED
RELEASE=BLOCKED_UNTIL_X10R_AND_X11R_EVIDENCE_GREEN
```

X11R is not a license to make larger claims. It is a stricter machine for demoting unsupported claims, preserving uncertainty, and forcing every high-status physical metaphor to either become executable evidence or shut up politely.

ACT: Product-grade data-efficiency operationalization protocol #1168

Description

ACT: Product-grade data-efficiency operationalization protocol

0. Purpose

1. Current factual constraints

2. Operational model

3. Industrial engineering objective

4. Data-efficiency target definition

5. Required implementation lanes

Lane A — Oracle finalization

Lane B — Revalidation ledger replay

Lane C — X10R executable pipeline

Lane D — Data contract hardening

Lane E — Product API boundary

Lane F — Observability / drift / refusal gates

Lane G — Release candidate evidence bundle

6. Acceptance criteria

7. Stop rules

8. Final operational verdict target

9. X11R post-critique reconstruction expansion

9.1 Critical ontology constraints

10. X11R required implementation lanes

Lane H — Graph construction and embedding validation

Lane I — Curvature layer with Ricci-flow analysis

Lane J — Coupled Kuramoto on dynamic curvature manifold

Lane K — Free-energy and thermodynamic closure

Lane L — Causal robustness and counterfactual falsification

Lane M — Bayesian uncertainty and epistemic ledger

Lane N — Adversarial and non-stationary stress tests

Lane O — X11R capsule generator

11. X11R data-efficiency additions

12. X11R acceptance criteria

13. X11R stop rules

14. X11R final verdict target

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions