Productization Events 121-124: PTSP + Signed Surface backbone + practice visibility layer + 'a way to think' framing#76
Merged
Conversation
…fier CLI Event 121 pivots positioning from "cognitive tool" to Compliance Evidence Layer. The reframe survives the Event 119-120 saturation finding: model-output depth lift is structurally hard to demonstrate at frontier model strength; operator Calibration-Lift on irreversible decisions (MIRROR-aligned CFR reduction) is the new load-bearing value claim. EU AI Act Article 12 high-risk obligations apply 2026-08-02 — 84-day regulatory tailwind window. What lands here (load-bearing slice; strictly additive, parallel-track): - core/ptsp/ Provenance-Tagged Step Pipeline countering the self-conditioning effect (arXiv 2509.09677). Typed Fact/Inference/Unknown/Assumption ledgers, Promotion Gate (Invariants I1-I5), JCS canonicalization, typed-tag context injection. - core/signing/ Cryptographic signing primitives. Zero-hard-dependency Ed25519 compat layer (PyNaCl when available, structurally-tagged HMAC-SHA256 fallback for tests), Signed Reasoning Surface envelope schema with sign + verify, RFC 3161 TSA token shape, Sigstore Rekor inclusion-proof shape. Live TSA/Rekor integration deferred behind operator choice. - src/episteme/verify/ Standalone auditor CLI with deterministic exit codes (0/10/11/12/13/14/20/21/30/64). Single/batch/chain modes. No runtime dependency on episteme — auditors can ship just this module + Python stdlib. - docs/PRODUCTIZATION_PLAN.md Phase 3-5 master plan. - docs/COMPLIANCE_CROSSWALK.md Field-by-field regulatory mapping: EU AI Act (Art. 12/13/14/19/72), NIST AI RMF + GenAI Profile (NIST AI 600-1), Financial-services framework set (SR 11-7, OCC, EBA, MAS, OSFI, FINRA, SEC 17a-4(f)). Discipline: - Soak-protected kernel surfaces untouched. reasoning_surface_guard.py and reasoning-surface@1 schema continue to govern the live kernel; the new signed-surface@1.0 schema is parallel and opt-in. - Loss-averse asymmetry posture: local commit only, no push, no PR, no publish, no Probe 1 outreach, no commercial entity action. - Marketing copy that cites the 70.3% CFR-reduction number must hedge it as a MIRROR-aligned design target with OSF pre-registration link until the productive-run dataset exists. Tests: pytest -q → 953 passed, 54 subtests passed, zero regressions. New: 34 tests across tests/test_ptsp_promotion.py (15), tests/test_signing_canonical_surface.py (9), tests/test_verify_cli.py (10). Deferred (each with its own future Event + surface): - Live Sigstore Rekor integration - CCO dashboard MVP front-end (operator-gated on Probe 1 outcome) - Phase 2 productive run dataset - Probe 1 EU AI Governance outreach delivery - Promotion of signed-surface@1.0 to default-path
Mid-Event 122 operator flagged two corrections that reshaped the build:
1. LangSmith/Langfuse adapter was the wrong category (observability ≠
substrate). Pivoted Task #15 to Hermes signed-surface bridge — peer
substrate to Claude Code per pyproject keywords.
2. Broader doc reconsideration. Audit returned 5 category-error classes
in PRODUCTIZATION_PLAN.md; all 5 corrected in this Event:
- audience misallocation (CCO weighted higher than operator)
- positioning treated as validated (Compliance Evidence Layer was
a hypothesis, not a measurement)
- substrate-neutrality erosion (Skills Marketplace = Claude-only;
LangSmith = observability)
- commercialization assumption (Day-90 forced commercial-or-die)
- hedging dishonesty (70.3% with OSF-link that doesn't exist)
Empirical anchor verified mid-Event: MIRROR benchmark (arXiv 2604.19809)
finding — "providing models with their own calibration scores produces
no significant improvement; only architectural constraint is effective."
External constraint reduces LLM CFR from 0.60 to 0.14 across 5 frontier
models. This is the load-bearing rationale for episteme's structural
mechanism over procedural prompting.
Framing rewrite (docs):
- docs/PRODUCTIZATION_PLAN.md full rewrite: rationale chain (Korean
essay → MIRROR → architectural constraint → each artifact's role);
three positioning hypotheses (Compliance / Operator Audit Trail /
Pre-Action Reasoning Commitment) under structured probes, none
declared validated; four-outcome Day-90 matrix (added operator-first
OSS sustain); layer diagram replacing differentiation matrix; honest
substrate-coverage table (Claude full / Hermes partial / Codex name-
only / Cursor name-only / opencode name-only)
- docs/COMPLIANCE_CROSSWALK.md reframed preamble as downstream
structural mapping, not primary positioning
Practical realization layer (additive build):
- src/episteme/surface/ operator UX — author / sign / show / list /
status / verify (already committed in chkpt 92252fb)
- src/episteme/evidence/ auditor viewer + Regulator Evidence Packet
exporter — posture / register / show / alerts / packet build
(already committed in chkpt 92252fb)
- src/episteme/hooks/signed_surface_validator.py opt-in PreToolUse
hook running additively alongside reasoning_surface_guard.py
(already committed in chkpt 92252fb)
- src/episteme/adapters/hermes.py extended with signed-surface bridge
— SIGNED_SURFACE_PROTOCOL.md + schema reference + governance
addendum to ~/.hermes/ for substrate parity
- src/episteme/cli.py narrow Edit at entry point — top-level
surface/evidence/verify subcommands pre-dispatched before main
argparse so submodule flags pass through cleanly
Auxiliary docs:
- docs/HOW_TO_AUTHOR_SIGNED_SURFACE.md developer-facing walkthrough
- docs/HOW_TO_VERIFY_EVIDENCE_PACKET.md auditor-facing walkthrough
- docs/LIVE_REKOR_DECISION.md Sigstore public vs self-hosted
vs hybrid vs none — operator
decision matrix
- docs/OSF_PRE_REGISTRATION_DRAFT.md Phase 2 trial pre-reg draft
ready for OSF submission
- docs/MARKETING_COPY_DRAFT.md three positioning candidate
copies (A/B/C) — none landed
in README
Tests: pytest -q → 986 passed, 54 subtests, zero regressions.
New (33 tests, 4 files):
- tests/test_surface_cli.py 11
- tests/test_evidence_cli.py 7
- tests/test_signed_surface_validator_hook.py 9
- tests/test_e2e_evidence_pipeline.py 6 (incl. Hermes-bridge artifact)
Dropped in Event 122 (per framing audit):
- Skills Marketplace bundle task — Claude-only distribution; premature
- LangSmith / Langfuse adapters — observability not substrate;
deferred until Probe 1 signal arrives
Discipline:
- Kernel zero-dep posture preserved (PyNaCl optional;
test-mode HMAC structurally tagged)
- Soak-protected core/hooks/reasoning_surface_guard.py untouched
- Local commit only; no push, no PR, no publish, no Probe outreach
delivery, no OSF submission, no commercial entity action
- No marketing copy cites the 70.3% number bare; all citations point
at arXiv 2604.19809 directly
The deeper framing correction. Event 122 fixed 5 category-error classes but
still described episteme as a forcing function / artifact / compliance evidence
layer. The operator's deeper correction:
"nah bro... it should be fucking the way to think like nigga... put this somewhere."
"씨발 다 고쳐놔. 우리가 해야할게 뭔지, 이걸 어떻게 제일 잘 framing할 수 있을지 생각해서 구현해놔."
The product is a way to think — 생각의 틀 — the five-stage cognitive practice
(Frame → Decompose → Execute → Verify → Handoff) authored in
core/memory/global/cognitive_profile.md + workflow_policy.md. The signed
Reasoning Surface, the typed PTSP ledgers, the pre-tool-use gate, the
standalone verifier, the Regulator Evidence Packet are scaffolding for the
practice and residue from it. Without the practice they are theater. With the
practice they are how the practice survives at frontier model strength.
Landed (additive only, no code changes):
- docs/THE_WAY_TO_THINK.md NEW primary identity doc (~2400 words).
Operationalized index of the practice. Every claim traces to
operator-authored cognitive_profile.md / workflow_policy.md, the
foundational mental models (Kahneman / Dalio / Boyd / Munger), or
external research (MIRROR arXiv 2604.19809; long-horizon arXiv 2509.09677).
Names which artifact implements which cognitive move.
- README.md hero rewrite + section header rewrite. Lead with 생각의 틀 /
"a way to think." Deep content (ABCD blueprints, protocol synthesis,
install) preserved unchanged.
- README.ko.md parity translation of the new hero + section header.
생각의 틀 kept as operator-coined load-bearing phrase.
- docs/PRODUCTIZATION_PLAN.md § 0 rewrite. New § 0.1 ("the thing itself")
points at THE_WAY_TO_THINK.md as primary identity. Compliance / packet /
signed surface demoted to consequences of the practice. § 0.5 expanded
with "not a prompt template" + "not an AI safety system" rows.
- docs/COMPLIANCE_CROSSWALK.md preamble. Explicitly framed as residue of
the practice. Structural fit with AI Act / NIST / FS-framework
obligations is consequence-of-being-right, not goal.
- docs/MARKETING_COPY_DRAFT.md preamble + three positioning headlines.
Three positionings reframed as three audience-facing surfaces of ONE
practice, not three separate identities. Each headline leads with the
practice.
- docs/PLAN.md / docs/PROGRESS.md / docs/NEXT_STEPS.md synced with
Event 123 entries.
Deferred to follow-up Events (operator-gated):
- README.es.md + README.zh.md i18n parity (mechanical translation; batch
under operator review after EN + KO confirmed)
- kernel/ DESIGN_V1_0_*.md and adjacent docs cross-reference to
THE_WAY_TO_THINK.md
- web/ + epistemekernel.com landing-page rewrite (irreversible production
deploy; operator-gated)
Tests: pytest -q → 986 passed, 54 subtests, zero regressions. No code
changes; the 986-test suite is correct as enforcement geometry for the
practice and continues to pass without intervention.
Discipline:
- Code untouched. Kernel-protected surfaces untouched.
- Local commit only; no push.
- No AI co-author trailer.
- Every claim in THE_WAY_TO_THINK.md traces to a named source.
Operator authorized overnight autonomous continuation with three asks:
(1) verify code reflects the Event 123 "way to think" framing, (2) design
UX "high-grade unique and useful (visually, making it easier to understand
and use)," (3) realize the product as a real thing that can be created.
Subsequently lifted the loss-averse irreversible gate conditionally: "it
can be irreversible if that is the right direction for us."
What landed (additive only; soak-protected kernel untouched):
- core/practice/cognitive_moves.py source-of-truth registry of 5 stages
× N cognitive moves with name + description + System-1 failure
counter + schema-field mapping + doc anchor. Referenced by hook
errors + practice CLI + quality observer.
- core/practice/quality.py observe_surface() / observe_surfaces() —
gap observations against cognitive-move discipline. Severity:
critical / warn / advisory / info. NOT a single-score grade
(anti-gaming discipline; would induce optimizing for the score
rather than the practice).
- src/episteme/_ui.py zero-dep ANSI primitives — boxes, colored
headers, health indicators (● green/yellow/red, ASCII fallback
[+]/[~]/[!]), sparklines (Unicode-block ASCII fallback), progress,
kv-table. TTY + NO_COLOR + EPISTEME_NO_RICH detection. Stdlib only;
kernel zero-dep posture preserved.
- src/episteme/practice/ episteme practice walk | retro | demo
subcommand group. walk = narrated 5-stage walkthrough with each
cognitive move + System-1 counter. demo = worked-example surface
body (narrated or JSON-only; JSON output validates against the
surface builder so it's surface-sign-able). retro = practice
retrospective with gap observations over time window.
- src/episteme/cli.py episteme practice registered at top-level
(pre-argparse-dispatch pattern matching surface/evidence/verify).
- src/episteme/hooks/signed_surface_validator.py hook error JSON
now includes a cognitive_move block (move_id + name + stage +
counters + doc_anchor). Exit codes unchanged; the model + operator
can now read failures as named cognitive-move violations rather
than schema-field violations.
- src/episteme/surface/_cli.py (interactive path only) each prompt
rendered with cognitive-move name + System-1 failure-counter
preamble. Brief practice-quality preview after authoring shows
which gaps episteme practice retro will surface. Non-interactive
flags structurally untouched.
- src/episteme/evidence/_viewer.py upgraded posture panel: boxed
sections with health indicators (signed % / chain breaks /
test-mode-sig count colored green/yellow/red by threshold). JSON
output unchanged for scripting.
Tests: pytest -q → 1050 passed, 54 subtests, zero regressions.
(Was 986 baseline + 64 new across 5 files.)
- tests/test_practice_cognitive_moves.py 14 tests: registry
consistency, helper fns, doc anchors, every move has a named
System-1 counter
- tests/test_practice_quality.py 11 tests: observation severity
levels, retrospective aggregation, scenario-specific gap codes
- tests/test_ui_rendering.py 24 tests: env-var detection, color
forcing, health indicators (regular + inverse), boxes, headers,
sparklines, progress, kv-table, Renderer dataclass
- tests/test_practice_cli.py 11 tests: walk names all 5 stages +
foundational models + source docs; demo narrated/JSON modes;
demo JSON is validate_surface_body-valid; retro empty + populated;
top-level CLI dispatch
- tests/test_hook_cognitive_move_messages.py 4 tests: hook errors
carry correct cognitive_move metadata; hook still returns exit 2
(Claude Code block contract)
Discipline:
- Kernel zero-dep posture preserved
- Soak-protected core/hooks/reasoning_surface_guard.py + kernel/
docs untouched
- Practice quality scoring exposed ONLY via practice retro
(anti-gaming discipline; not numeric score)
- No AI co-author trailer
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Two docs contained operationally sensitive material that shouldn't live
in the public repo:
- docs/MARKETING_COPY_DRAFT.md positioning hypotheses currently
under Probe testing + literal cold-outreach script + probe
deployment table mapping copy → channel → audience. Public exposure
would let target audiences see the framing experiment being run.
- docs/PRODUCTIZATION_PLAN.md mixed: § 0 rationale + Phase 3-4
technical scope (public-appropriate) BUT Phase 5 GTM probes
(Probe 1 cold-outreach contact targeting + literal email template
+ Day-90 commercial-spin-off matrix + anti-self-deception
protocol — sensitive). Moved whole-file private; operator can pull
a public-implementation-summary back later if desired.
Both moved to ~/episteme-private/docs/ matching the existing
fully-private pattern of cp-*.md / POSTURE.md / NARRATIVE.md /
DECISION_STORY.md / etc. (No symlink stubs — those are reserved for
PLAN.md / PROGRESS.md / NEXT_STEPS.md which need public placeholders
for the hook chain to find authoritative docs.)
Cross-references updated:
- docs/THE_WAY_TO_THINK.md (2 refs) — replaced direct link with
honest "GTM strategy held in operator's private notes" phrasing
- docs/HOW_TO_VERIFY_EVIDENCE_PACKET.md (2 refs) — replaced with
inline descriptions that don't depend on the private docs
- docs/OSF_PRE_REGISTRATION_DRAFT.md (1 ref) — submission checklist
item updated to point at private notes
No code changes; pytest suite unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR consolidates four Events (121-124) of the productization cycle that opened after Events 119-120 closed the per-task A/B depth-measurement path. All work is additive only; the soak-protected
core/hooks/reasoning_surface_guard.pyhot path andreasoning-surface@1schema are untouched.Primary identity reframe (Event 123): episteme is a way to think — 생각의 틀 — operationalized at the file system level. The five-stage cognitive practice (Frame → Decompose → Execute → Verify → Handoff) authored in
core/memory/global/cognitive_profile.md+workflow_policy.mdis the product. The signed Reasoning Surface, typed PTSP ledgers, pre-tool-use gate, standalone verifier, and Regulator Evidence Packet are scaffolding for the practice and residue from it. Seedocs/THE_WAY_TO_THINK.md.Empirical anchor: the MIRROR benchmark (arXiv 2604.19809) — "providing models with their own calibration scores produces no significant improvement; only architectural constraint is effective." External constraint reduces LLM Confident Failure Rate from 0.60 to 0.14 across 5 frontier models. episteme is that external constraint at the operator decision layer.
What lands by Event
Event 121 — Phase 3 backbone (
6b90618)core/ptsp/— Provenance-Tagged Step Pipeline (typed Fact/Inference/Unknown/Assumption ledgers + Promotion Gate, Invariants I1-I5) counters arXiv 2509.09677 self-conditioningcore/signing/— Ed25519 signing + JCS canonicalization + RFC 3161 TSA shape + Sigstore Rekor inclusion-proof shape + canonical-surface envelope; zero-dep with HMAC-SHA256 fallback structurally distinguishable from production Ed25519src/episteme/verify/— standalone auditor verifier CLI, deterministic exit codes 0/10/11/12/13/14/20/21/30/64docs/PRODUCTIZATION_PLAN.md+docs/COMPLIANCE_CROSSWALK.mdEvent 122 — Practical realization + framing audit (
b47dbfc)src/episteme/surface/— operator UX CLI (episteme surface author / sign / show / list / status / verify)src/episteme/evidence/— terminal-first viewer + Regulator Evidence Packet ZIP exporter (episteme evidence posture / register / show / alerts / packet build)src/episteme/hooks/signed_surface_validator.py— opt-in PreToolUse hook running additively alongsidereasoning_surface_guard.pysrc/episteme/adapters/hermes.py— Hermes substrate bridge for signed-surface@1.0pyproject [signing]PyNaCl optional dependencyepisteme surface | evidence | verify)Event 123 — The way to think framing (
1477fb4)docs/THE_WAY_TO_THINK.md— primary identity doc (~2400 words). Every claim traces to operator-authoredcognitive_profile.md/workflow_policy.md, the foundational mental models (Kahneman / Dalio / Boyd / Munger), or external research (MIRROR + long-horizon papers)Event 124 — Practice visibility layer (
772a5ce)core/practice/cognitive_moves.py— source-of-truth registry: 5 stages × N cognitive moves, each with named System-1 failure counter + schema-field mapping + doc anchorcore/practice/quality.py—observe_surface()/observe_surfaces()returning gap observations (not a single-score grade — anti-gaming discipline)src/episteme/_ui.py— zero-dep ANSI primitives (boxes, health indicators, sparklines, kv-table) with TTY + NO_COLOR + EPISTEME_NO_RICH detectionsrc/episteme/practice/—episteme practice walk | retro | demosubcommand groupcognitive_movemetadata (id + name + stage + counters + doc_anchor)episteme surface author --interactiveprompts now render with cognitive-move-name preamble + System-1 counter; brief practice-quality preview after authoringepisteme evidence posturepanel upgraded with health indicators (green / yellow / red on signed % / chain breaks / test-mode signatures)Tests
pytest -q→ 1050 passed, 54 subtests, zero regressions across all four Events.Discipline preserved
dependencies = []zero-dep posture preserved (PyNaCl is[signing]extra;_ui.pyis stdlib-only)core/hooks/reasoning_surface_guard.py+ kernel/ docs untouchedepisteme practice retro(anti-gaming discipline)What this PR does NOT include
docs/OSF_PRE_REGISTRATION_DRAFT.md; gated on Phase 2 recruitment)signed-surface@1.0to default kernel path (gated on <100ms hot-path timing)Try it (60 seconds)
Test plan
pytest -qwith zero regressions on the existing 986-test baseline (now 1050)episteme practice walknames all 5 stages + cites operator-authored source docsepisteme practice demo --format jsonproduces a body that validates againstvalidate_surface_body()(i.e., it'ssurface sign-able)EPISTEME_NO_RICH=1falls back to plain ASCII;NO_COLORrespected per https://no-color.orgepisteme verifyround-trip across signed surfaces, mutations detected~/.hermes/SIGNED_SURFACE_PROTOCOL.md+ schema reference