###################################################################################################################################### # # # ____ _ _ ____ _ __ __ # # | _ \ _ __ _____ ____ _| |__ | | ___ | _ \(_)___ ___ _____ _____ _ __ _ _ | \/ | ___ _ __ ___ ___ _ __ _ _ # # | |_) | '__/ _ \ \ / / _` | '_ \| |/ _ \ | | | | / __|/ __/ _ \ \ / / _ \ '__| | | | | |\/| |/ _ \ '_ ` _ \ / _ \| '__| | | | # # | __/| | | (_) \ V / (_| | |_) | | __/ | |_| | \__ \ (_| (_) \ V / __/ | | |_| | | | | | __/ | | | | | (_) | | | |_| | # # |_| |_| \___/ \_/ \__,_|_.__/|_|\___| |____/|_|___/\___\___/ \_/ \___|_| \__, | |_| |_|\___|_| |_| |_|\___/|_| \__, | # # |___/ |___/ # # # ######################################################################################################################################
Provable Discovery Memory is a schema-first, verification-oriented scientific discovery platform. It turns backend-generated candidates into auditable artifacts with deterministic validation, benchmark evidence, formal proof obligations, and release manifests with checksums.
Inspired by IBM AI-Descartes, this repository extends that research direction into a contract-driven, reproducible engineering platform.
It is designed to separate discovery generation from discovery validation:
- backends generate candidate laws,
- canonical schemas enforce artifact contracts,
- formal emission encodes proof obligations,
- kernels and benchmarks evaluate quality and regressions,
- publish/export flows make outputs reproducible and inspectable.
The long-term vision is to make scientific discovery results portable, verifiable, and durable across teams, toolchains, and time.
In many research stacks, even strong results can be difficult to trust or reuse because:
- the data contract is implicit,
- transformations are hidden inside scripts,
- proof claims are disconnected from formal artifacts,
- benchmark claims are difficult to reproduce,
- release artifacts cannot be independently audited.
Most discovery pipelines fail at handoff time: data contracts remain implicit, transformations stay hidden, and quality claims are difficult to verify independently.
Provable Discovery Memory addresses this by treating discovery as a governed artifact lifecycle:
- canonical schemas define what each artifact means,
- adapters normalize heterogeneous backend outputs into shared contracts,
- formal emission links statements to machine-checkable obligations,
- deterministic kernels and benchmarks provide repeatable quality signals,
- publish manifests with checksums provide offline verifiability.
The goal is not just better model outputs.
The goal is reliable scientific memory: results that can be inspected, re-run, compared, and inherited.
If this architecture matures and achieves broad adoption, it can become a common interoperability layer for theory-guided discovery systems:
- Backend interoperability: compare multiple discovery engines on the same canonical problems and metrics.
- Reproducible science operations: move from ad-hoc experiments to repeatable, auditable release flows.
- Formal traceability: tie confidence claims to explicit proof status rather than narrative summaries.
- Cross-team portability: hand off artifacts across organizations without losing semantics.
- Long-horizon benchmarking: track progress and regressions over time with stable contracts and thresholds.
This creates a path from promising one-off runs to cumulative scientific infrastructure.
- Adapter interoperability: supports AI-Descartes import/export and additional backends (
local_json,http_discovery). - Canonical contract enforcement: validates corpus and artifacts against JSON schemas.
- Formal traceability: emits Lean obligations and tracks proof status.
- Benchmark assurance: evaluates deterministic quality metrics against baseline thresholds.
- Release governance: produces signed-off bundles/manifests with integrity checks.
- Portal export: generates consumable views for downstream UI/reporting.
Backend output
-> adapter normalization
-> canonical artifacts (schema-validated)
-> formal emission (proof obligations + Lean)
-> benchmark report (threshold-gated)
-> publish bundle + release manifest (checksums)
-> portal export
This project spans schemas, adapters, formal methods, benchmarking, and release operations. No single team can sustainably cover the full surface area of domain knowledge, backend diversity, and operational hardening.
Community participation is essential for:
- More backend adapters: broaden real-world applicability beyond current integrations.
- Richer formalization: extend supported statement grammars and proof workflows safely.
- Stronger kernels and benchmarks: improve scientific signal quality and failure detection.
- Domain-specific corpora: contribute canonical problem families with high-quality metadata.
- Operational resilience: harden CI/release workflows and observability for production reliability.
- Documentation quality: keep guidance accurate as architecture and capabilities evolve.
- Keep changes schema-first and contract-preserving.
- Add deterministic tests for new behavior and edge cases.
- Treat proof and benchmark claims as evidence-backed, not aspirational.
- Prefer explicit failure over silent fallback.
- Document assumptions, migration impacts, and operational implications.
- Python 3.11+
uv- Lean toolchain (
elan,lake,lean) - Node 20+ and
pnpm
uv sync --extra dev --locked
pnpm --dir portal install --frozen-lockfileuv run pdm doctor
uv run pdm validate-corpus
uv run ruff check .
uv run mypy pipeline/src adapters kernels
uv run pytest -q
cd formal && lake builduv run pdm import-ai-descartes --src tests/fixtures/ai_descartes/export_golden/kepler_solar --problem-id kepler_solar
uv run pdm export-backend --problem-id kepler_solar --backend ai_descartesuv run pdm emit-lean --problem-id kepler_solar
uv run pdm benchmark --problem-ids kepler_solar,langmuir_single_siteuv run pdm publish --problem-id kepler_solar
uv run pdm portal-export --out portal/public/portal_export.json
pnpm --dir portal buildFor the complete operator runbook (including strict import mode, release checks, and troubleshooting), see:
schemas/- canonical artifact contractscorpus/- canonical problems and suitesadapters/- backend integrations (ai_descartes,local_json,http_discovery)pipeline/- orchestration and CLIformal/- Lean core, generated obligations, and formal toolingkernels/- deterministic quality metricsbenchmarks/- benchmark thresholds and benchmark documentationportal/- UI and exported data consumptionscripts/- certification, governance, and operational checksartifacts/- generated run/release outputs (not source-of-truth code)
Primary quality and release gates include:
- schema validation (
pdm validate-corpus, schema checks), - type/lint/test checks (
mypy,ruff,pytest), - Lean build and generated obligation checks,
- benchmark threshold enforcement (
benchmarks/baseline_thresholds.json), - release governance and checksum verification.
Relevant workflows:
.github/workflows/ci.yml.github/workflows/lean.yml.github/workflows/benchmark.yml.github/workflows/portal.yml
- Main docs index:
docs/README.md - Architecture:
docs/architecture.md - Engineering guardrails:
docs/ENGINEERING_GUARDRAILS.md - Quality gates:
docs/QUALITY_GATES.md - AI-Descartes layouts:
docs/adapters/ai-descartes-layout.md - Formal grammar/CI:
Contributions are welcome across adapters, kernels, formalization, tests, and documentation.
Please read:
Expectations:
- keep changes schema-first and contract-preserving,
- include deterministic tests,
- document behavioral or contract changes,
- prefer explicit failures over silent fallback behavior.
Please report vulnerabilities privately.
All contributors and maintainers are expected to follow:
This project is licensed under Apache-2.0.
- Package:
provable-discovery-memory - Current version:
0.1.0(frompyproject.toml)