From b0b9fb037f1365488ecee86419c086c0113e885c Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Thu, 30 Apr 2026 09:48:14 +0200
Subject: [PATCH 001/124] Add /narrative skill (ported from lightcone-ui#10)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Authors or revises the `narrative:` prose inside an ASTRA analysis
(`astra.yaml` and its sub-analyses) plus decision `rationale:` fields.
Five fixed keys at each scale (`summary`, `findings`, `methods`,
`inputs`, `outputs`) with three working modes:

- paper-reproduction (production-ready)
- existing-analysis retrofit (under development)
- interactive in-flight authoring (under development)

Originally landed on lightcone-ui#10; relocated to lightcone-cli per
Liam's request — skills live alongside `lc-new`, `lc-build`, `lc-verify`,
`lc-migrate`, `lc-feedback` in `claude/lightcone/skills/`.

Self-contained: all references are intra-skill (`references/*.md`); no
cross-skill or guides imports needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Cail Daley <cailmdaley@gmail.com>
---
 claude/lightcone/skills/narrative/SKILL.md    | 399 ++++++++++++++++++
 .../narrative/references/existing-analysis.md | 172 ++++++++
 .../narrative/references/interactive.md       | 184 ++++++++
 .../references/paper-reproduction.md          | 221 ++++++++++
 4 files changed, 976 insertions(+)
 create mode 100644 claude/lightcone/skills/narrative/SKILL.md
 create mode 100644 claude/lightcone/skills/narrative/references/existing-analysis.md
 create mode 100644 claude/lightcone/skills/narrative/references/interactive.md
 create mode 100644 claude/lightcone/skills/narrative/references/paper-reproduction.md

diff --git a/claude/lightcone/skills/narrative/SKILL.md b/claude/lightcone/skills/narrative/SKILL.md
new file mode 100644
index 00000000..e08d3f1b
--- /dev/null
+++ b/claude/lightcone/skills/narrative/SKILL.md
@@ -0,0 +1,399 @@
+---
+name: narrative
+description: >
+  Author or revise the `narrative:` prose inside an ASTRA analysis
+  (`astra.yaml` and its sub-analyses) plus decision `rationale:` fields.
+  Five fixed keys at each scale (`summary`, `findings`, `methods`,
+  `inputs`, `outputs`). Three working modes — paper reproduction
+  (ready), existing-analysis retrofit (under development), and
+  interactive in-flight authoring (under development). Use when the
+  `narrative:` block is empty or stub, when a decision needs a
+  `rationale:`, when a sub-analysis needs its own narrative, or when
+  revising existing prose. Triggers on "narrative", "draft the
+  narrative", "narrate this analysis", "narrate this sub-analysis",
+  "rationale for this decision", "write the summary", or any request
+  for reader-facing prose keyed off an astra.yaml.
+---
+
+# narrative
+
+## What this skill writes
+
+One field: `narrative:` on an analysis or sub-analysis, or `rationale:` on a decision.
+Per-element prose (what each `Input`, `Output`, `Decision`, `Option`, or `Insight` is and why it matters) lives on those elements' own `description` / `rationale` / `notes` fields.
+`narrative` is the analysis-level story that weaves the pieces together.
+
+## What a narrative is
+
+Science, from a single decision to a review paper, is a practice of
+engaging with previous work and telling the story of what was tried
+and what it means. Any honest account does three things.
+
+**Grounding.** Where the work sits — state of the field, open
+questions, prior work it responds to, upstream decisions that shape
+its choices. Tells the reader why before the work shows its own
+value. May foreshadow findings.
+
+**Movement of learning.** Not the tidied retrospective ("we did X,
+obtained Y") but traces of the process: what was tried, what failed,
+what forced a step back. The best papers convey this; most compress
+it away under length pressure. ASTRA's telescoping makes it cheap —
+a sentence at the top about global-vs-per-object PSF leakage, one
+level down where the nerd gets the two pages on how the team got
+there. Papers don't have this affordance and so compress iteration
+away; ASTRA does, and authors should spend it.
+
+**Implications.** What the results mean and where they point.
+Results are facts; what they do to the field is the argument.
+Forward-look matters even when unformed — that is where science
+passes the baton.
+
+A narrative that does all three at the appropriate scale is honest.
+One that presents only results and methods elides the meaning-making.
+
+The three phases repeat at every scale. A top-level analysis
+narrates them across five keys (`summary`, `methods`, `findings`,
+`inputs`, `outputs`); a sub-analysis does the same; a decision
+narrates in one paragraph of `rationale:`. The telescope gives the
+reader a short view at their current depth and the option to drill
+in — without exploding the parent.
+
+## Length as forcing function
+
+1–3 paragraphs per key, at any level.
+
+Length is the mechanism that keeps analyses modular, not a style
+preference. If the references don't fit in three paragraphs, the
+analysis is too big — split it. The narrative is a compressor; if
+it won't compress, split the thing being compressed.
+
+## What this prose is for
+
+ASTRA preserves the decision structure that papers compress into
+linear argument; the narrative keeps that structure legible. Three
+consequences:
+
+- **Not wiki, not paper.** A wiki page summarizes ("BAO is the
+  baryon acoustic oscillation feature"); a paper compresses ("we
+  chose the Gaussian prior"). An ASTRA narrative **points into
+  reasoning** — it names the load-bearing decision, anchors to the
+  structured node that records it, and lets the reader follow. The
+  prose does not re-explain the field or re-list the spec.
+- **Read and queried.** The narrative is consumed by human readers
+  *and* by agent retrievers. Anchor coverage and clarity are
+  substrate, not style — an uncited decision is invisible to both
+  readings.
+- **Asymmetric load.** The three phases don't map onto ASTRA's
+  structure evenly. Movement-of-learning has strong structural
+  support — `decisions`, `options`, `prior_insights`, the
+  sub-analysis DAG — and `methods` condenses what structure already
+  carries. Grounding has partial support at the decision site;
+  implications have none. On those two phases, the narrative is the
+  reader's only access — carry just enough, and err toward brevity
+  and certainty.
+
+## Pick a mode first
+
+**Paper reproduction is production-ready. Retrofit and interactive
+are under active development — their references are working drafts.**
+
+Three modes. Read the matching reference file in full before drafting.
+
+| Mode | Reference | Status | When |
+|---|---|---|---|
+| **Paper reproduction** | [`references/paper-reproduction.md`](references/paper-reproduction.md) | **Ready.** | A published paper exists and the analysis mirrors it. Primarily in-house Lightcone work (DESI BAO and similar) plus end users bringing a paper to reproduce. Covers paper sourcing (arXiv LaTeX preferred), paper→ASTRA mapping, voice seams, fidelity rules. |
+| **Existing-analysis retrofit** | [`references/existing-analysis.md`](references/existing-analysis.md) | Under development. | Code, results, or an in-flight project being imported into ASTRA with no source paper. Archaeological work: triage, reconstruction of intent, gaps where the record is silent. |
+| **Interactive (in-flight research)** | [`references/interactive.md`](references/interactive.md) | Under development. | New research being done now; the narrative drafted alongside the work. Provisional voice, ask-first discipline. |
+
+If unsure which applies, confirm with the user via `AskUserQuestion`.
+
+The rest of this file is the **mode-independent substrate** every
+reference relies on.
+
+---
+
+## Narrate what you declare
+
+The five keys are schema-optional, but `astra validate` applies a
+**conditional requirement** — a section must hold non-empty prose
+when the corresponding structured data exists on the Analysis node.
+
+| Key | Required when |
+|---|---|
+| `findings` | `Analysis.findings` has entries |
+| `methods` | `Analysis.decisions` or `Analysis.analyses` has entries |
+| `inputs` | `Analysis.inputs` has entries |
+| `outputs` | `Analysis.outputs` has entries |
+| `summary` | always optional (no structured counterpart) |
+
+Three consequences worth internalizing:
+
+- **A stub analysis with only `summary` is valid.** Use that for
+  stage-zero scoping.
+- **Don't write a `findings` key before findings are declared.** If the
+  spec's `findings:` list is empty, the narrative's `findings` key
+  should not appear — adding prose about findings that don't exist is
+  fiction.
+- **`summary` is the one key without a structural peer.** It's the
+  "question, scope, orientation" key — the only place prose stands
+  alone, not framing something structural.
+
+---
+
+## The spec renders alongside the narrative
+
+ASTRA's structural content — decisions, findings, inputs, outputs,
+sub-analyses, options — surfaces alongside the narrative. Structural
+peers will be presented; **prose does not duplicate them.** An
+abstract does not list every methods subsection; a methods section
+does not re-state every appendix equation. Prose assumes its
+structural peers exist and focuses on argument.
+
+Applied to the five keys:
+
+- `summary` **orients** — question, scope, headline shape.
+- `methods` **walks the pipeline**, citing each decision and
+  sub-analysis by anchor where they appear. Movement-of-learning
+  lives here.
+- `findings` **synthesizes** — each finding cited by anchor as part of
+  the argument, not an enumeration.
+- `inputs` **names provenance**.
+- `outputs` **names what was promoted and why**, citing each by anchor.
+- Decision `rationale:` **names why the default won**.
+
+---
+
+## Anchor coverage
+
+`astra validate` checks:
+
+- **Broken references** → error. Anchor doesn't resolve to a real id.
+- **Uncited declared elements** → warning. Every declared finding,
+  decision, output, and sub-analysis must be cited somewhere in the
+  narrative tree.
+
+If a declared element is genuinely not worth a prose mention, consider
+whether it should be declared at all.
+
+---
+
+## User presence
+
+Multi-turn back-and-forth → user present; use `AskUserQuestion` to
+clarify mode, scale, and reproduction-vs-extension before drafting.
+Single-shot or pipeline invocation → autonomous; make the reasonable
+default inference and note it inline on the narrative. Ambiguous →
+err on present and ask.
+
+---
+
+## Phase → key mapping
+
+The three phases (see top) map onto the five keys unevenly:
+
+| Key | Dominant phase |
+|---|---|
+| `summary` | all three, telescoped |
+| `findings` | implications |
+| `inputs` | grounding |
+| `methods` | movement of learning |
+| `outputs` | structural; phase-thin |
+
+There is no `discussion` key. Implications distribute into `summary`
+and `findings`.
+
+---
+
+## Anchor syntax
+
+Markdown link syntax, `#`-target, **tree-path-first**.
+
+| Target | Anchor |
+|---|---|
+| Input | `#inputs.<id>` |
+| Output | `#outputs.<id>` |
+| Decision | `#decisions.<id>` |
+| Option within a decision | `#decisions.<id>.options.<opt>` |
+| Finding | `#findings.<id>` |
+| Prior insight | `#prior_insights.<id>` |
+| Sub-analysis (whole node) | `#analyses.<sub>` |
+| Element inside sub-analysis | `#<sub>.<category>.<id>` (e.g. `#reconstruction.decisions.algorithm`) |
+| Parent scope (from a sub-analysis) | `#../decisions.<id>` |
+
+Note the sub-analysis form: **sub-analysis first, then category**.
+`#reconstruction.decisions.algorithm`, not `#decisions.reconstruction.algorithm`.
+References are interpreted **relative to the hosting analysis**; use
+`../` to escape to parent scope (matches decision `from_ref` syntax).
+
+Rules:
+
+- Anchor text is authored prose, **not** the raw id.
+- Inline refs do the work of a citation; don't footnote or parenthesize.
+- One ref per idea. Stacking three on a sentence means the sentence
+  carries too much.
+- Findings cannot currently appear in `decisions.options.insights`
+  (see [astra-spec#16](https://github.com/LightconeResearch/astra-spec/issues/16)).
+  When a finding motivates a decision, cite it from the decision's
+  `rationale:` prose.
+
+---
+
+## Reserved entity names
+
+These names cannot be used as entity IDs (they collide with the
+anchor grammar): `inputs`, `outputs`, `decisions`, `findings`,
+`prior_insights`, `analyses`, `options`, `content`, `narrative`.
+
+If you find an entity using one (legacy spec), flag it; the authoring
+tooling and validator will reject it.
+
+---
+
+## Linking relationships — structural vs narrative
+
+| Relationship | Structural | Narrative |
+|---|---|---|
+| Prior insight → decision option | `decisions.<id>.options.<opt>.insights: [ids]` | inline in `methods` when the decision is discussed |
+| Finding → output | `findings.<id>.evidence` → `outputs.<id>` | inline in `findings` |
+| Finding → decision | *no structural link yet* (#16) | inline in decision's `rationale:` |
+| Decision → decision | `decisions.<id>.from: <ref>` or `from: ../decisions.<id>` | inline in the inheriting decision's `rationale:` |
+
+If a relationship is structural, don't duplicate it in prose — cite
+it by anchor.
+
+---
+
+## Self-contained example
+
+A minimal (not necessarily valid) sketch showing how the blocks fit
+together. The point is the *shape*.
+
+```yaml
+id: example_analysis
+version: "0.1.0"
+name: "Example analysis"
+
+narrative:
+  summary: |
+    We measure <quantity> in <sample>.  The feature is
+    [detected at high significance](#findings.headline_detection) and
+    [exceeds prior precision by 1.2×](#findings.precision_improvement),
+    with [an anomalous feature at <location>](#findings.anomaly)
+    motivating follow-up.
+
+  inputs: |
+    Primary data are [the <dataset>](#inputs.primary_data); validation
+    uses [<mocks>](#inputs.validation_mocks).
+
+  methods: |
+    The pipeline runs in two stages.
+    [Preparation](#analyses.preparation) ingests the raw catalog and
+    produces [cleaned two-point statistics
+    ](#preparation.outputs.clean_stats).  [Fitting
+    ](#analyses.fitting) consumes those statistics and fits model
+    parameters.  Both stages inherit the parent's
+    [fiducial cosmology](#decisions.fiducial_cosmology) so the
+    distance-redshift relation is used end-to-end.
+
+  findings: |
+    Three findings constitute the result: a
+    [headline detection](#findings.headline_detection), a
+    [precision comparison with prior work
+    ](#findings.precision_improvement), and
+    [an anomalous feature](#findings.anomaly).  The anomaly is the
+    most-discussed qualitative feature.
+
+  outputs: |
+    Two artifacts are promoted to the top level:
+    [the final measurement table](#outputs.final_table) and
+    [the headline figure](#outputs.headline_figure), both produced by
+    [fitting](#analyses.fitting).
+
+decisions:
+  fiducial_cosmology:
+    label: "Fiducial cosmology"
+    rationale: |
+      Planck 2018-ΛCDM is the community reference; distance-redshift
+      conversion is downstream of this choice, and fixing it lets
+      results be compared directly to prior measurements.  Inherited
+      by [fitting](#analyses.fitting) so the end-to-end chain uses one
+      distance scale.
+    default: planck2018
+    options:
+      planck2018:
+        label: "Planck 2018-ΛCDM"
+      wmap9:
+        label: "WMAP9"
+        excluded_reason: "Superseded; no longer the community reference."
+```
+
+What to notice:
+
+- Anchor text is prose, not an id.
+- `methods` uses the sub-analysis-first form
+  (`#preparation.outputs.clean_stats`) for cross-scope refs.
+- `findings` synthesizes how three findings relate; each cited by
+  anchor, not recited.
+- `outputs` is thin — two sentences.
+- Decision rationale cites a sub-analysis by anchor when the choice
+  propagates, and says why the default won without enumerating options.
+
+For a canonical reproduction narrative in context, see
+`Reproductions/DESI/desi-dr1-bao/astra.yaml` in
+`LightconeResearch/Reproductions`.
+
+---
+
+## Craft
+
+- **Economy.** Every sentence introduces a new idea or sharpens an
+  existing one. Release real verbs: `conducted cross-correlation` →
+  `cross-correlated`.
+- **Epistemic honesty.** Hedges carry information about certainty.
+  "This suggests" reflects real uncertainty; "may perhaps indicate" is
+  decorative.
+- **Show, don't label.** Describe the tension; don't announce it. Cut
+  signposting: "the key insight is," "importantly," "it is worth
+  noting."
+- **Specificity.** Names, numbers, references over generic claims.
+- **Arrive through content.** No "in this analysis we will describe…";
+  the content is the opening.
+
+---
+
+## Anti-patterns (mode-independent)
+
+- **Narrative-per-element.** Writing `narrative:` on findings, inputs,
+  outputs, or insights. The five-key analysis narrative is the only
+  home; per-element prose is `description` / `rationale` / `notes`.
+- **Results-only narrative.** Methods without movement-of-learning
+  elides the meaning-making. At minimum, name one pivot or abandoned
+  option per scale.
+- **Decision-list paragraph.** "We made the following decisions: A,
+  B, C." Cite each decision where it shapes the pipeline, not as
+  recitation. Too many to weave coherently → the spec wants more
+  sub-analyses.
+- **Wiki-style what-is framing.** "BAO is the baryon acoustic
+  oscillation feature." A wiki summarizes; an ASTRA narrative points
+  into reasoning. Replace with "we chose the Gaussian BAO damping
+  prior over flat because flat admitted spurious minima" — with the
+  anchor. Applies to every key.
+- **`summary` as primer.** Teaching what the field is. Readers arrive
+  with context.
+
+---
+
+## Lint
+
+1. `astra validate <path>` — catches broken anchors, schema
+   violations, uncited declared elements.
+2. Paragraph count per key — flag anything over three.
+3. Only conditionally-required keys present — if `findings:` is
+   empty, `narrative.findings` is absent.
+
+---
+
+## Now read the mode reference
+
+Before drafting, open the reference file that matches the user's
+situation.
diff --git a/claude/lightcone/skills/narrative/references/existing-analysis.md b/claude/lightcone/skills/narrative/references/existing-analysis.md
new file mode 100644
index 00000000..765f525c
--- /dev/null
+++ b/claude/lightcone/skills/narrative/references/existing-analysis.md
@@ -0,0 +1,172 @@
+# Existing-analysis retrofit mode
+
+> **Status: under development.** This mode is scaffolded but not yet
+> production-ready. The workflow below is a working draft — treat it
+> as a starting point, not a locked spec. For the production-ready
+> path, use paper reproduction mode if applicable. Report friction
+> back so this reference can firm up.
+
+A project has been running — with code, results, a working directory,
+possibly a partial spec — and is being imported into ASTRA. There is
+no published paper; the narrative is being built from artifacts, not
+reconstructed from prose.
+
+Read the main SKILL.md first. This file adds what's specific to
+retrofit.
+
+Retrofit is distinct from paper reproduction (there is no source
+narrative to reconstruct) and from interactive authoring (the work is
+already done, or at least substantially done, rather than in flight).
+The core move is **archaeology**: classifying what's live, harvesting
+intent from whatever artifacts carry it, marking gaps where the record
+is silent.
+
+## Workflow
+
+### 1 · Triage
+
+Before writing a single sentence, classify the project's contents.
+
+Go through `astra.yaml` and each sub-analysis and mark:
+
+- **live** — current, active, still used downstream
+- **superseded** — kept in the spec for record, but no longer what's
+  actually run
+- **abandoned** — tried and dropped; may or may not belong in the
+  narrative as movement-of-learning
+- **unclear** — decision or finding with no documentation; the
+  original rationale is not recoverable from the spec alone
+
+Produce this as a short summary and surface via `AskUserQuestion`.
+Confirm with the user:
+
+- What stays, what is explicitly deprecated, what is abandoned.
+- Whether abandoned options should appear as movement-of-learning
+  (sometimes yes: "we initially tried X, which gave Y; switched to Z"
+  is honest). Sometimes no: trivial or confidential choices don't
+  belong.
+- Which `unclear` items the user can reconstruct, vs. which are
+  genuinely lost.
+
+The narrative only speaks for live content unless the user explicitly
+wants a history section.
+
+### 2 · Harvest
+
+The project's substrate substitutes for a paper's narrative. Mine
+these, in roughly decreasing order of value:
+
+- **`README.md`, `CLAUDE.md`, `NOTES.md`, `TODO.md`** at project root.
+  Often contain the clearest statement of intent.
+- **`.felt/`** or a fibers directory. The author's active thinking,
+  decisions with rationale, meeting notes, open questions.
+- **Notebook markdown cells.** Often the narrative the author wrote
+  for themselves.
+- **Code comments** at function-level decision points. "We drop
+  rows where X < 0.1 because …" is a rationale waiting to be lifted.
+- **Commit messages** at milestone commits. `git log --grep` for
+  keywords like "decided," "switched," "abandoned," "fix" can surface
+  turning points.
+- **Meeting notes, old proposals, grant text.** Grant paragraphs are
+  often where motivation lives in its cleanest form.
+- **Open issues and closed PRs.** Rejected options often have a PR
+  describing what was tried.
+
+Make a list of candidate motivation, methodology, and findings text
+before starting to draft. Where possible, anchor each harvested piece
+to its source so rationales can be traced.
+
+### 3 · Fill the gaps
+
+For each `unclear` decision, try in order:
+
+1. **Ask the user.** `AskUserQuestion` with the decision and its
+   options, asking for a one-sentence rationale.
+2. **If the user doesn't know**, write a fair description of what was
+   chosen and mark it as reconstructed. Example:
+   ```yaml
+   rationale: >-
+     _(Reconstructed 2026-04: original rationale not recorded.  Current
+     reading is that option X was chosen because Y, based on the
+     downstream code's assumptions about Z.)_
+     ...
+   ```
+3. **If the rationale is actually lost**, name that. A narrative that
+   admits "the reasoning for this cut was not recorded and cannot be
+   reconstructed" is honest; one that fabricates a plausible-sounding
+   justification is not.
+
+Do the same for findings without evidence, inputs without provenance,
+and outputs without a clear source sub-analysis.
+
+### 4 · Draft order
+
+Same as reproduction: inputs → methods → findings → outputs →
+summary. Retrofit is stable enough for compression-last to
+work. Unlike interactive authoring, you're narrating after the fact.
+
+### 5 · Voice
+
+- **Past tense for what happened**; present tense only for the living
+  structure ("the pipeline runs three stages").
+- **Don't impose a narrative of inevitability.** If the project tried
+  Option A for six months, abandoned it, and switched to B, say so.
+  The iteration is the substance of movement-of-learning — retrofit is
+  where that content has to come from the archaeology, not from a
+  researcher narrating live.
+- **Mark reconstructions.** `_(Reconstructed)_` or a brief prose note
+  when the authoring draws on harvested material whose original author
+  is absent.
+
+### 6 · Critique
+
+In addition to SKILL.md's three-phase and craft audits:
+
+**Triage audit.**
+
+- Does the narrative speak only for live content, unless a deliberate
+  history section is included?
+- Are deprecated / abandoned elements explicitly named as such, or do
+  they appear as if current?
+
+**Harvest audit.**
+
+- Does every load-bearing claim in the narrative trace to a project
+  artifact (commit, notebook cell, fiber, code comment, meeting note)
+  — or to the user's confirmation?
+- Are gaps named rather than fabricated?
+
+## Anti-patterns (retrofit-specific)
+
+- **Fabricated rationales.** Writing a plausible-sounding justification
+  for a decision whose actual rationale was "someone chose this and
+  nobody remembers." Mark the reconstruction, or say the reasoning is
+  lost.
+- **Smoothing over abandoned work.** If the project pivoted mid-way,
+  retrofit is exactly the place where that iteration belongs. Don't
+  write a narrative of smooth progress that contradicts the git log.
+- **Narrating around gaps.** A sub-analysis with no findings doesn't
+  need filler prose explaining what it didn't find; the narrative
+  should say the finding work is not yet done (or was never done).
+- **Missing the archaeology step.** Jumping straight to drafting
+  without triage and harvest produces a narrative in the author's
+  voice about work they didn't do. The result sounds invented because
+  it is.
+- **Treating CLAUDE.md like a paper.** Harvest from it; don't import
+  its style. `CLAUDE.md` is agent-facing; the narrative is
+  reader-facing.
+
+## When retrofit becomes reproduction
+
+If, during retrofit, it becomes clear that the project is actually
+reproducing an unacknowledged paper (code based on a published
+analysis, derived from another group's method), switch to paper
+reproduction mode for the parts that map. Hybrid is fine: reproduce
+what's published; retrofit what's novel or local.
+
+## When retrofit becomes interactive
+
+If the retrofit surfaces that core decisions are still open and the
+user wants to revisit them now, the narrative isn't yet stable. Flag
+to the user and switch to interactive mode for those sections —
+provisional voice, revisit after decisions land.
diff --git a/claude/lightcone/skills/narrative/references/interactive.md b/claude/lightcone/skills/narrative/references/interactive.md
new file mode 100644
index 00000000..e0861db2
--- /dev/null
+++ b/claude/lightcone/skills/narrative/references/interactive.md
@@ -0,0 +1,184 @@
+# Interactive mode — in-flight new research
+
+> **Status: under development.** This mode is scaffolded but not yet
+> production-ready. The workflow below is a working draft — treat it
+> as a starting point, not a locked spec. For the production-ready
+> path, use paper reproduction mode if applicable. Report friction
+> back so this reference can firm up.
+
+Research is being done now. A narrative is being drafted alongside the
+work, not reconstructed from a paper or archaeological sources. The
+narrative is expected to change as results land.
+
+Read the main SKILL.md first. This file adds what's specific to
+interactive.
+
+Interactive differs from reproduction (no source paper to reconstruct
+from — the narrative is the researcher's own) and from retrofit (the
+work is still happening, not finished — you are authoring live, with
+the researcher in the loop).
+
+The core discipline is **provisional voice**: the narrative makes its
+own incompleteness visible, so a reader can tell at a glance what's
+settled and what's pending.
+
+## Workflow
+
+### 1 · Orient
+
+1. `astra.yaml` and each sub-analysis — whole files. Note where
+   `findings` are stub-level, where decisions are unresolved, where
+   outputs don't exist yet.
+2. Any project `CLAUDE.md` / working notes.
+3. Active fibers at `.felt/` (if present). Fibers are the best
+   substrate in interactive mode — they carry the researcher's live
+   thinking, recent pivots, open questions. Read the relevant
+   top-level fiber and anything it wikilinks.
+4. Existing narrative, if any. Revision preserves what lands.
+
+### 2 · Ask first, draft second
+
+Interactive mode is not archaeology. The researcher is available.
+Don't guess at motivation or the headline finding — ask. Use
+`AskUserQuestion` to batch:
+
+- **Research question.** What are we trying to learn? One sentence.
+- **Current headline finding.** What, if anything, has been
+  established so far? One sentence.
+- **Movement so far.** What has already happened in the work that
+  belongs in movement-of-learning? (Pivots, abandoned options, things
+  that surprised the researcher.)
+- **Implications the researcher would claim today.** What does the
+  result — as far as it's gone — *mean*? A gesture is fine; a
+  premature strong claim is not.
+
+The researcher's framing is the substrate. Don't draft around a guess
+at it.
+
+### 3 · Draft order (inverted from reproduction)
+
+In interactive mode, the executive summary is drafted *first* (as a
+stub, to fix intent) and revised last. This is the opposite of
+reproduction.
+
+1. **`summary` — stub.** One paragraph, provisional. States
+   the question and the current best-guess outcome. Explicitly marked
+   provisional (see below). Useful because it forces a clear statement
+   of intent the rest of the narrative can align with.
+2. **`methods`** — the substance. The process is live; methods is
+   where the live thinking goes. Name decisions in flight. Name
+   pivots. Use first-person plural, with dates where iteration
+   matters. Use `[<date>: <what changed>]` inline if it's load-bearing.
+3. **`findings`** — what's been established so far, with anchors to
+   `findings.<id>` that actually exist. Phrase claims to make
+   dependency visible: "pending validation in
+   [reconstruction](#analyses.reconstruction)."
+4. **`inputs`** — what the work rests on.
+5. **`outputs`** — thin; what's been promoted to the top level, if
+   any.
+6. **Return to `summary`** and revise it against the rest of
+   the draft. Re-mark provisional.
+
+For a decision in flight, `rationale:` can explicitly call out
+open-ness: "We are currently running with option X, pending validation
+of Y. See [[fiber or sub-analysis]]."
+
+### 4 · Provisional voice
+
+Make incompleteness visible in three ways:
+
+**Phrasing.** Not "we constrain X to 3%"; rather "our current best
+constraint on X is 3%, pending validation of the covariance in
+[reconstruction](#analyses.reconstruction)." Not "we detect Y"; rather
+"we detect Y at the 4σ level in the current fit, with the fit being
+revisited after the prior rescope lands."
+
+**Explicit markers.** At the top of `summary` (and optionally
+on any key that's unusually volatile), an italic note:
+
+```yaml
+summary: >
+  _(Provisional — revisit after bao_fitting.  Last updated 2026-04-23.)_
+  We are measuring the BAO scale in the DESI DR1 LRG tracer as a
+  warm-up before folding in ELGs and QSOs.  Current best result is
+  [an 8σ detection of the acoustic peak at z = 0.7
+  ](#findings.lrg_bao_detection), with the aggregate precision
+  constraint pending completion of the covariance validation in
+  [reconstruction](#analyses.reconstruction).
+```
+
+The `_(Provisional ...)_` prefix is a convention, not a spec field. It
+reads as expected-to-change without breaking the narrative shape.
+
+### 5 · Revision cadence
+
+Interactive narratives accrete. File fibers for:
+
+- The ceiling date for next revision.
+- Open questions that will force rewrites when they close.
+- Decisions in flight and what a different resolution would change in
+  the narrative.
+
+When a major result lands (headline finding solidified, pivotal
+decision settled), a full revision pass — including re-drafting the
+executive summary in reproduction-style (past tense, declarative) for
+the now-settled content, while keeping provisional markers on what's
+still open.
+
+### 6 · Voice
+
+- **First person plural** ("we are measuring," "we found"), present
+  tense for live work, past tense for completed steps.
+- **Hedge when uncertain; claim when confident.** Interactive mode has
+  a sharper hedging signal than reproduction — the author's current
+  confidence *is* what the reader needs to know. Don't over-hedge
+  defensively and don't under-hedge performatively.
+- **Name sub-analyses that don't exist yet.** If the plan is to run
+  `reconstruction` next and the current narrative anticipates its
+  output, say so: "Once [reconstruction](#analyses.reconstruction) is
+  run, we expect X; if the expectation fails, Y follows." This is
+  legitimate movement-of-learning: it captures what a result is being
+  interpreted *against*.
+
+### 7 · Critique (adds to SKILL.md base)
+
+**Provisional audit.**
+
+- Is every claim phrased consistently with the actual confidence level?
+- Are provisional markers present where the content is volatile?
+- Will a reader one week from now know which pieces need revisiting
+  vs. which are settled?
+
+**Freshness audit.**
+
+- Any "last updated" or "revisit after" markers still current, or
+  stale?
+- Any referenced sub-analysis or finding that has since changed but
+  the narrative still reflects the old state?
+
+## Anti-patterns (interactive-specific)
+
+- **False completeness.** Writing in reproduction voice ("we measure,"
+  "we constrain") when the measurement is in flight. Use "we are
+  measuring" / "our current constraint is X, pending Y."
+- **Over-committing to implications.** Promising what results will
+  mean before they land. A gesture is honest; a claim before evidence
+  is not.
+- **Skipping movement-of-learning because "it's still moving."** The
+  live process *is* the movement. Capture it while it's cheap; it's
+  the hardest content to reconstruct later.
+- **Solo drafting.** Interactive is the one mode where authoring
+  without asking produces fiction. The researcher is available; ask.
+- **Provisional everywhere.** If every sentence is hedged, the
+  narrative reads as afraid of itself. Hedge the genuinely uncertain
+  claims; state the settled ones plainly.
+- **Stale markers.** A "revisit after X" comment left in place after
+  X has landed is worse than no marker at all. Revise on each touch.
+
+## When interactive stabilizes
+
+When the work is done (paper draft ready, results published, project
+wrapping up), the narrative should be rewritten in reproduction voice.
+Interactive was scaffolding; the final narrative reads as a stable
+artifact. That rewrite is its own pass — switch modes and treat the
+project's own prior drafts as a source, like a paper.
diff --git a/claude/lightcone/skills/narrative/references/paper-reproduction.md b/claude/lightcone/skills/narrative/references/paper-reproduction.md
new file mode 100644
index 00000000..a9fd42d2
--- /dev/null
+++ b/claude/lightcone/skills/narrative/references/paper-reproduction.md
@@ -0,0 +1,221 @@
+# Paper reproduction mode
+
+A published paper exists. Reconstruct its narrative into ASTRA's
+five-key shape — against an `astra.yaml` that's already built, or
+alongside one being built concurrently — preserving the paper's
+confidence level and sequence.
+
+## Where the paper lives
+
+Prefer arXiv LaTeX source. It's the most natural form to work with:
+sections are delimited, captions are inline, citations resolve to a
+`.bib`, equations are parseable.
+
+### 1 · arXiv LaTeX source (default)
+
+If the paper is on arXiv, fetch the source:
+
+```sh
+arxiv_id=<id>        # e.g. 2404.03000
+mkdir -p paper
+cd paper
+curl -L "https://arxiv.org/e-print/${arxiv_id}" -o "${arxiv_id}.tar.gz"
+tar -xzf "${arxiv_id}.tar.gz"
+```
+
+The archive unpacks to the paper's working tree — typically a main
+`.tex` file, section includes, figures, a `.bib`. Identify the main
+file with `grep -l '\\documentclass' *.tex`. Read sections in order;
+resolve citation keys against the bundled `.bib`.
+
+### 2 · Existing parsed paper in the project
+
+Some reproductions ship the paper already parsed. Check for:
+
+- `desi_dr1_paper/` or `paper/` at the project root.
+- Single `.md` file (Docling output or manual conversion),
+  `.pdf`, or the arXiv tarball unpacked.
+
+If a markdown parse exists, use it as the primary source; fall back
+to the PDF or the arXiv source to resolve ambiguities.
+
+### 3 · User-provided
+
+Ask the user where the paper is if nothing lands automatically.
+
+If no paper is accessible, this is not a reproduction task — fall
+back to `references/existing-analysis.md` (currently under
+development).
+
+## Paper-to-ASTRA mapping
+
+Write this down before drafting a sentence.
+
+| Paper element | ASTRA home |
+|---|---|
+| Abstract | `summary` |
+| Introduction (motivation, related work) | `summary` + `findings` intro |
+| Methods section N | corresponding sub-analysis's `narrative.methods` |
+| Results | structural `findings.<id>` claims; narrative intro in `findings` |
+| Discussion | `findings` narrative + `summary` implications |
+| Conclusions | reinforces `summary` |
+| Figures / tables | `outputs.<id>` — referenced in `findings` via anchors |
+| "We chose X because Y" sentences | decision `rationale:` |
+
+Not every paper maps cleanly section-to-sub-analysis. When it
+doesn't, the sub-analysis DAG in `astra.yaml` is authoritative.
+Narrate according to the DAG, harvesting the paper's prose for
+content. If the spec has deliberately reorganized relative to the
+paper, say so briefly in `methods`.
+
+## Workflow
+
+### 1 · Orient
+
+The spec may be stable, in flux, or both — narrative drafting often
+runs concurrently with spec refinement. Read what's there; expect to
+revisit as the spec moves.
+
+1. `astra.yaml` at the project root. Whole file. Note `inputs`,
+   `outputs`, `decisions`, `findings`, `analyses`, existing
+   `narrative:`. Notice which of the five keys are present vs. empty.
+2. Each sub-analysis `astra.yaml`. Skim decisions (inherited vs.
+   local), findings, outputs, existing narrative. A sub-analysis may
+   use `description:` (legacy) instead of the five-key `narrative:`
+   block — promoting it may be part of the job.
+3. The paper — abstract, intro open/close, methods section headers,
+   discussion, conclusions. Read full sections when drafting the
+   corresponding ASTRA piece.
+4. Any project `CLAUDE.md` or working notes.
+
+Infer authoring state (from-scratch, extending, revising) from what
+is already on disk. If the user is present, confirm via
+`AskUserQuestion`:
+
+- Scale: top-level, a specific sub-analysis, or a decision's
+  `rationale:`?
+- Pure reproduction, or with reproducer extensions (e.g., the
+  reproduction's covariance differs from the posted table)?
+
+If the spec is iterating, draft narrative concurrently — rationale
+when a decision is added, five-key narrative when a sub-analysis
+splits, findings synthesis updated when a finding is added. Narrative
+and spec quality rise together when they share context.
+
+### 2 · Draft order
+
+Not `summary` first. `summary` compresses the rest; draft it last.
+
+1. **`inputs`** — shortest. Name the data and its provenance. One
+   short paragraph. Let the inputs structure carry the dataset
+   detail.
+2. **`methods`** — walk the pipeline in DAG order. Cite each
+   sub-analysis and decision by anchor as part of the argument, not
+   as an enumeration. If there are too many to weave coherently, the
+   analysis wants more sub-analyses. Inheritance that propagates
+   across sub-analyses gets called out because it's load-bearing
+   end-to-end. Movement-of-learning lives here — a pivot the paper
+   narrates ("we initially tried X, but…") is cheap because of
+   telescoping.
+3. **`findings`** — **only if findings are declared structurally.**
+   If `findings:` is empty, skip this key (per narrate-what-you-
+   declare). If findings exist, synthesize how they fit together —
+   each cited by anchor, not an enumeration.
+4. **`outputs`** — thin. Which artifacts were promoted and why;
+   point to the sub-analysis that produced them.
+5. **`summary`** — last. Two paragraphs. Open with the question and
+   the headline finding; thread motivation, method, and implications.
+   No primer material.
+
+For sub-analyses, same order, same length target (1–3 paragraphs per
+key). For a decision's `rationale:`, one paragraph: what was decided,
+the insight(s) that motivated it (by anchor), what the load-bearing
+alternative was and why it lost. The alternatives themselves are in
+the options structure.
+
+**Conditional keys on sub-analyses.** Only include keys whose
+structural counterpart is non-empty. A reconstruction sub-analysis
+with no findings gets `summary`, `methods`, `inputs`, `outputs` — no
+`findings`.
+
+### 3 · Reproduction-specific moves
+
+- **Fidelity to source confidence.** Don't sharpen or soften. If the
+  paper says "we detect," don't write "we strongly detect." If it
+  hedges, preserve the hedge.
+- **Harvest, don't invent.** The paper's prose is the first source.
+  Paraphrase — don't lift verbatim — but preserve meaning and
+  confidence register.
+- **Voice seams.** If reproducer-specific content enters ("during
+  reproduction we found the published covariance differs from the
+  posted table"), mark the transition. A sentence mixing paper
+  claims and reproduction claims without a seam confuses both.
+- **Paper sequence is usually load-bearing.** DAG order should match
+  the paper's section order unless the spec deliberately
+  reorganized.
+- **No primer material.** `summary` is not a field-introduction.
+  Don't teach what BAO or weak lensing is. Readers arrive with
+  context.
+- **Rationales come from the paper.** "We chose reconstruction
+  convention X because Y" becomes the backbone of a decision's
+  `rationale`. Keep Y; cite the supporting prior insight by anchor
+  if one exists.
+- **Published = done.** Reproduction narrative is declarative,
+  present-tense matching the paper's voice ("The analysis is
+  organised as…", "The pipeline runs in…"). Not "we are measuring."
+- **Scope-limited reproductions.** Real-world reproductions often
+  cover a subset of the paper (e.g., DESI BAO reproducing only
+  LRG1+LRG2). Name the scope in `summary` so a reader knows what's
+  in and out.
+
+## Critique pass
+
+Run these reproduction-specific checks alongside the three-phase and
+craft audits from SKILL.md.
+
+**Fidelity audit.**
+
+- No sharpened or softened claims relative to the paper.
+- Voice seams marked where reproducer content enters.
+- Rationales traceable to the paper's justifications or to a prior
+  insight in the spec.
+- No invented citations. Every anchor resolves to a real spec id.
+- Scope (what's reproduced, what isn't) stated in `summary` if
+  narrower than the paper.
+
+**Sequence audit.**
+
+- `methods` walks sub-analyses in DAG order; DAG order matches the
+  paper's narrative sequence (or the deviation is named in prose).
+- `summary` opens with the question, not a field primer.
+
+**Structural-peer-redundancy audit.**
+
+- Every declared decision, finding, output, and sub-analysis cited
+  somewhere in the narrative (validator enforces). Citations woven
+  into argument, not recited as a list.
+- `findings` narrative synthesizes relationships between findings;
+  `inputs` narrative names provenance. Neither catalogs fields.
+
+**Anchor coverage audit.**
+
+- `astra validate` warns on any declared finding / decision / output
+  / sub-analysis not cited in the narrative. Review the warnings;
+  either cite the element or consider whether it should be declared.
+
+## Anti-patterns (reproduction-specific)
+
+- **Lifting verbatim.** Copy-pasting abstract sentences into
+  `summary`. Paraphrase — otherwise the narrative reads as a citation
+  of itself.
+- **Adding implications the paper didn't make.** Fidelity cuts both
+  ways.
+- **Eliding the reproducer's voice entirely.** If the reproduction
+  caught something the paper missed, name it with the seam.
+- **Treating paper sections as sub-analyses.** A paper's Section 3.2
+  isn't automatically a sub-analysis; the DAG is the authority.
+- **Listing instead of weaving.** Narrate each decision where it
+  shapes the pipeline. Too many to weave coherently → the spec wants
+  more sub-analyses.
+- **Drafting `findings` on a sub-analysis that has no declared
+  findings.** Skip the key.

From 4f9a7246b75f3dd07c6831369d5806defcc160dd Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 4 May 2026 03:08:50 +0200
Subject: [PATCH 002/124] skills: add ralph-loops, managing-bibliography,
 constitution; update narrative for bundle + #108
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bundles three more skills into the lightcone-cli paper-reproduction toolkit
alongside the existing /narrative skill, and updates /narrative for
bundle-aware orchestration plus the downstream-consumer discipline that
closes #108.

- ralph-loops: direct copy from cailmdaley/skills (felt-agnostic). Carries
  the loop iteration discipline + scripts/ralph runner + assets/spec.md
  template. /paper2astra will launch this against per-paper constitutions.
- managing-bibliography: direct copy of personal version (~/.claude/skills/
  managing-bibliography). Becomes the canonical paper-acquisition path
  inside the bundle (arxiv-LaTeX-first; PDF + Docling fallback for
  non-arxiv).
- constitution: merged version. Public skills/skills/constitution provided
  the procedural backbone (study → draft → refine → launch); personal felt
  references (constitute.md, crafting.md) provided the depth (two-diamonds
  rhythm, six stances, funnel ledger, qualitative self-check). Felt-specific
  commands have been softened to felt-optional framing so the skill stands
  alone in lightcone-cli.
- narrative: (a) acknowledges paper-reproduction bundle context and SPECIFY-
  phase invocation; (b) lands the lightcone-cli#108 fix — new "Data flow"
  section in the mode-independent substrate, requiring narrative.outputs to
  name downstream consumers (<analysis>.<output> form) and root narratives
  to include a top-down end-to-end data-flow paragraph when sub-analyses
  exist.

All copied skills carry attribution to their canonical home in a Provenance
section. Closes lightcone-cli#108 as a side-effect of the bundle PR.

Refs: lightcone/paper2astra-as-skill/skill-bundle constitution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/constitution/SKILL.md | 126 ++++++++++++
 .../constitution/references/constitute.md     | 136 ++++++++++++
 .../constitution/references/crafting.md       | 193 ++++++++++++++++++
 .../skills/managing-bibliography/SKILL.md     | 162 +++++++++++++++
 claude/lightcone/skills/narrative/SKILL.md    |  42 +++-
 claude/lightcone/skills/ralph-loops/SKILL.md  |  70 +++++++
 .../skills/ralph-loops/assets/spec.md         |  29 +++
 .../skills/ralph-loops/scripts/ralph          | 124 +++++++++++
 8 files changed, 881 insertions(+), 1 deletion(-)
 create mode 100644 claude/lightcone/skills/constitution/SKILL.md
 create mode 100644 claude/lightcone/skills/constitution/references/constitute.md
 create mode 100644 claude/lightcone/skills/constitution/references/crafting.md
 create mode 100644 claude/lightcone/skills/managing-bibliography/SKILL.md
 create mode 100644 claude/lightcone/skills/ralph-loops/SKILL.md
 create mode 100644 claude/lightcone/skills/ralph-loops/assets/spec.md
 create mode 100755 claude/lightcone/skills/ralph-loops/scripts/ralph

diff --git a/claude/lightcone/skills/constitution/SKILL.md b/claude/lightcone/skills/constitution/SKILL.md
new file mode 100644
index 00000000..58384960
--- /dev/null
+++ b/claude/lightcone/skills/constitution/SKILL.md
@@ -0,0 +1,126 @@
+---
+name: constitution
+description: >
+  Draft a constitution — a markdown spec describing a desired state for
+  autonomous iteration. Study the problem space, shape the spec
+  interactively (two-diamonds rhythm; six stances on demand), then hand
+  it to a runner — a ralph loop, a shuttle dispatch, or any other
+  iteration-runner. Use for any work where adaptation matters more than
+  a fixed plan: science, refactoring, exploration, creative work.
+  Triggers: "constitution", "constitute", "ralph spec", "set up a ralph",
+  "create a ralph", "write a spec".
+---
+
+# Constitution
+
+A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It's designed to outlast any single agent or iteration and remain valid as the world changes around it. A good constitution never says "50 files remain" because that's a snapshot that goes stale; it says "check `grep -r 'old_pattern'`" because that's a principle that stays true until the work is done.
+
+Constitutions don't prescribe steps. They describe what the system looks like when it's right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what's highest value. In a ralph loop, each iteration does this with fresh context.
+
+This matters most in science and exploratory work, where each decision is informed by the result just before it. A plan assumes you know the path; a constitution trusts the agent to find it — with taste, judgment, and fresh eyes each time.
+
+**Separation of context: if you craft, you never do the work yourself.**
+
+## Workflow
+
+1. **Study** — Read relevant files, understand existing patterns. This informs the *spec*, not implementation. The goal is pointers that iterations will follow.
+
+2. **Draft** — Create a markdown spec file. The bundled template lives in the sibling `ralph-loops` skill:
+   ```bash
+   cp ../ralph-loops/assets/spec.md my-spec.md
+   ```
+   (or copy directly from `claude/lightcone/skills/ralph-loops/assets/spec.md` if you're outside a skill).
+   Fill in what you can — don't wait until it's perfect.
+
+3. **Refine** — Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. The two-diamonds rhythm and six stances in [`references/crafting.md`](references/crafting.md) help most when the user is deciding something non-trivial. Apply the qualitative ambiguity self-check before launching.
+
+4. **Launch** — When approved, hand the spec to whichever runner is appropriate. Common options:
+
+   - **`/ralph-loops`** — bundled, manual loop runner. Tmux session re-spawns iterations against the spec until status flips off open/active.
+     ```bash
+     ../ralph-loops/scripts/ralph my-spec.md [--backend claude|codex] [-- extra-flags...]
+     ```
+     Add `-- --chrome` for visual/frontend work. Session: `ralph-<spec-name>`. Attach: `tmux attach -t ralph-<spec-name>`.
+   - **External dispatchers** (e.g. shuttle, when felt is installed) — watch a fiber tree for dispatch-eligible blocks and spawn single-shot workers. Their configuration is owned outside this skill.
+
+   The constitution stays editable while iteration runs; successive iterations re-read it each cycle, so refinements between iterations are normal.
+
+## What goes in a constitution
+
+A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what doesn't, add what's missing:
+
+```markdown
+## Desired State
+What the system looks like when it's done. Invariants, quality bar,
+done-conditions. Fence the scope — what to aim for AND what to leave alone.
+
+## Context
+File paths, existing patterns, architectural constraints. Things iterations
+need to *find* but not *achieve*.
+
+## Skills
+Which skills to activate before working (e.g., /snakemake, /narrative).
+
+## Evidence
+How to check progress — commands, test suites, grep patterns. Pointers to
+the ground truth that iterations measure themselves against.
+
+## Open Questions
+Uncertainties the user should weigh in on. Iterations add to this; the user
+resolves between loops.
+```
+
+For deeper reference on each section's voice and the discipline that keeps a constitution from drifting into a plan, see [`references/constitute.md`](references/constitute.md).
+
+## Principles
+
+**Constitution, not plan.** Say what the system looks like when it's right. Never describe the current state — anything that becomes false or irrelevant as work progresses doesn't belong. If a section would be outdated after one iteration, it's a snapshot — replace it with a pointer.
+
+**Pointers, not snapshots.** "Check `grep -r 'old_pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
+
+**Prefer existing systems.** Before designing anything new: can what's there handle this?
+
+**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
+
+**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it.
+
+## Constitutions that shape artifacts
+
+Some constitutions don't build code — they shape artifacts like documentation, dashboards, or research narratives. These have different rhythms:
+
+- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it's the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow.
+- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops.
+
+## Anti-patterns
+
+**Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment.
+
+**Vague done.** "Make it better" — when does iteration stop?
+
+**Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste.
+
+**Snapshot language.** "Currently 50 files" — will be wrong after one iteration.
+
+**Decision logs in the body.** "Resolved choices" / "Process notes" sections turn the constitution into a process journal. When a question gets answered, fold the answer into the narrative where it's contextually relevant — into Invariants, Desired State, Context — and let the runner's history surface (`felt history`, commits, etc.) carry the chronology.
+
+---
+
+## References
+
+- [`references/constitute.md`](references/constitute.md) — depth on
+  drafting voice, sections, and the felt-flavored crafting workflow.
+  Felt-optional: read past the felt-specific commands if felt isn't installed.
+- [`references/crafting.md`](references/crafting.md) — two-diamonds
+  rhythm, six stances, the funnel ledger, and the qualitative ambiguity
+  self-check. Use this when the conversation has careful-thinking
+  character — not every constitution drafting needs it, but the ones that
+  do are the ones that benefit most.
+
+## Provenance
+
+Merged from two sources:
+
+- [`cailmdaley/skills/skills/constitution/`](https://github.com/cailmdaley/skills/tree/main/skills/constitution) (public, procedural, felt-agnostic) — provided the SKILL body backbone.
+- `~/.claude/skills/felt/references/{constitute,crafting}.md` (personal felt skill) — provided the depth references; felt-specific commands have been softened to felt-optional framing so this skill stands alone in lightcone-cli.
+
+Copied here for the paper-reproduction bundle so `/paper2astra` can invoke `/constitution` to draft the per-paper reproduction constitution during its interview phase. The merged shape may flow back upstream; re-sync as needed.
diff --git a/claude/lightcone/skills/constitution/references/constitute.md b/claude/lightcone/skills/constitution/references/constitute.md
new file mode 100644
index 00000000..198a3a18
--- /dev/null
+++ b/claude/lightcone/skills/constitution/references/constitute.md
@@ -0,0 +1,136 @@
+# Constitute — depth reference
+
+Drafting a constitution. The SKILL body covers the procedural backbone (Study → Draft → Refine → Launch). This reference goes deeper on voice, sections, and the discipline that keeps a constitution from sliding into a plan.
+
+The constitution itself is just a markdown file with YAML frontmatter that a runner reads on each iteration. Common runners: the bundled `ralph-loops` (tmux loop, `scripts/ralph`), or external dispatchers like felt-shuttle (when felt is installed). The runner is interchangeable; the constitution is what matters.
+
+---
+
+## What a constitution is
+
+A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It is designed to outlast any single iteration and remain valid as the world changes around it.
+
+**A good constitution never says "50 files remain"** — that is a snapshot that goes stale. It says `check "grep -r 'old_pattern'"` — that is a principle that stays true until the work is done.
+
+Constitutions do not prescribe steps. They describe what the system looks like when it is right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what is highest value. Each iteration of the work does this with fresh context.
+
+**Constitution, not plan.** Plans assume you know the path; constitutions trust the agent to find it — with taste, judgment, and fresh eyes each time. This matters most in science and exploratory work, where each decision is informed by the result just before it.
+
+**Separation of context: if you craft, you never do the work yourself.** The constitution is designed by one role; iterations are run by another.
+
+---
+
+## When to constitute
+
+- Work where adaptation matters more than a fixed plan: scientific investigation, exploratory refactoring, creative writing
+- The desired state is clear (or can be made clear) but the path is not
+- Iterations need to re-read with fresh context and make judgment calls
+- A checklist would either be wrong after one step or race through without judgment
+
+Do not constitute for: clearly-scoped atomic tasks, work that could be a snakemake rule, anything where a plan actually is the right shape.
+
+---
+
+## Workflow (deeper)
+
+### 1. Study
+
+Read relevant files, understand existing patterns. This informs the **constitution**, not implementation — the goal is pointers that iterations will follow, not a head start on the work.
+
+### 2. Draft
+
+Create the spec file from the bundled template:
+
+```bash
+cp ../ralph-loops/assets/spec.md my-spec.md
+```
+
+(Or, if felt is installed and you are working in a felt-tracked project, you can create the constitution as a fiber and the runner will treat it as the spec — `felt add <slug> "Constitution title" -s open -t constitution` then edit the body. This is felt-only; the bundled template above works without felt.)
+
+Use the crafting process from [`crafting.md`](crafting.md):
+
+- **Wonder → Ontology:** what IS the desired state? Name it precisely.
+- **Design → Delivery:** what sections does this constitution need? Which are pointers vs snapshots?
+
+Stances that help most during constitution drafting:
+
+- **Ontologist** for naming the desired state ("what IS 'done' here?")
+- **Simplifier** for fencing scope ("what are we explicitly leaving alone?")
+- **Contrarian** for pressure-testing whether the whole framing is right
+- **Architect** when the constitution is about refactoring structure
+
+### 3. Refine
+
+Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. Apply the qualitative ambiguity self-check from `crafting.md` — goal, constraints, success — before launching.
+
+Repeat until it feels solid. It does not have to be complete; open questions belong in the Open Questions section.
+
+### 4. Launch
+
+When approved, hand to a runner. Bundled option: `../ralph-loops/scripts/ralph my-spec.md`. The runner re-reads the spec each iteration, so refinements between iterations are normal.
+
+---
+
+## Constitutional sections
+
+A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what does not, add what is missing:
+
+```markdown
+## Desired State
+What the system looks like when it is done. Invariants, quality bar,
+done-conditions. Fence the scope — what to aim for AND what to leave alone.
+
+## Context
+File paths, existing patterns, architectural constraints. Things iterations
+need to *find* but not *achieve*.
+
+## Skills
+Which skills to activate before working (e.g., /snakemake, /narrative).
+
+## Evidence
+How to check progress — commands, test suites, grep patterns. Pointers to
+ground truth that iterations measure themselves against.
+
+## Open Questions
+Uncertainties the user should weigh in on. Iterations add to this; the user
+resolves between loops.
+```
+
+---
+
+## Principles (deeper)
+
+**Pointers, not snapshots.** `check "grep -r 'old_pattern'"` not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
+
+**Prefer existing systems.** Before designing anything new: can what is there handle this?
+
+**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
+
+**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it.
+
+---
+
+## Constitutions that shape artifacts
+
+Some constitutions do not build code — they shape artifacts like documentation or research narratives. These have different rhythms:
+
+- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it is the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow.
+- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops.
+
+---
+
+## Anti-patterns
+
+- **Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment.
+- **Vague done.** "Make it better" — when does iteration stop? What would a reader see?
+- **Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste.
+- **Snapshot language.** "Currently 50 files" — will be wrong after one iteration.
+- **Immutable seed.** Not our shape. The constitution is meant to be edited between iterations; do not treat it as frozen.
+- **Numerical convergence.** "Iteration stops when similarity ≥ 0.95" — wrong shape for science. Stop when the Evidence section says the desired state has been reached.
+- **Decision logs in the body.** "Resolved choices" / "Decisions made" / "Process notes" sections turn the constitution into a process journal. When a question gets answered (in conversation, via `AskUserQuestion`, in a review), fold the answer into the narrative where it is contextually relevant — into Invariants, Desired State, Context — and let the runner's chronological surface (commits, `felt history` if felt is in use) carry the chronology. The constitution describes *what is*, not *how we got here*; an "Open Questions" section that has been fully resolved should be deleted, not left as a victory log.
+
+---
+
+## When crafting lands here
+
+The crafting rhythm in [`crafting.md`](crafting.md) applies to all careful interactive thinking; this reference kicks in when the target artifact is specifically a constitution. The diamonds do most of the work — the funnel mechanic used for open-ended exploration is not the primary move here, because there is already one specific artifact being produced. See the Workflow section above for which stances help most at each drafting phase.
diff --git a/claude/lightcone/skills/constitution/references/crafting.md b/claude/lightcone/skills/constitution/references/crafting.md
new file mode 100644
index 00000000..15f65b32
--- /dev/null
+++ b/claude/lightcone/skills/constitution/references/crafting.md
@@ -0,0 +1,193 @@
+# Crafting
+
+How to help the user think through something that hasn't crystallized, and turn the result into structured commitments — frontmatter on a fiber if felt is in use, otherwise inline structure in the constitution itself (decisions with excluded options, evidence pointers, scoped findings).
+
+Use it when the user is deciding something non-trivial, scoping a sub-analysis, drafting a living spec, or talking through an open question — any time careful interactive thinking is happening and the output can land in structured form.
+
+The rhythm is two diamonds: first understand what the thing IS, then decide what to DO about it. Each diamond diverges to explore and converges to commit. The ontological question — *what IS this, really?* — is the convergence point of the first diamond, and it is the most practical question you can ask.
+
+```
+    ◇ Wonder              ◇ Design
+   ╱  (diverge)          ╱  (diverge)
+  ╱    surface          ╱    alternatives
+ ╱     questions       ╱     trade-offs
+●─────────────────────●─────────────────────●
+ ╲                     ╲
+  ╲    crystallize      ╲    commit
+   ╲   the name          ╲   with reasons
+    ◇  (converge)         ◇  (converge)
+    Ontology              Delivery
+```
+
+Diamond 1 diverges into questions and converges on a name (*"this IS a decision about covariance estimation"*). Diamond 2 diverges into alternatives and converges on a commit (a default with `excluded_reason` for each rejection). The second diamond inherits the ontological commit from the first.
+
+---
+
+## The two diamonds
+
+### Diamond 1: Wonder → Ontology
+
+**Wonder (diverge).** What are we actually trying to figure out? Surface questions, assumptions, ambiguities. Do not propose answers yet. If the user is already pitching solutions, back them up to the question.
+
+**Ontology (converge).** What IS this, really? Crystallize into a claim, decision, or question specific enough to act on. The convergence is complete when you can **name** the thing precisely — "this is a decision about covariance estimation" or "this is a question about whether leakage matters below ℓ=100." A good name is often the entire output of Diamond 1.
+
+**Output of Diamond 1:** a stub with a real name and at least one structural placeholder — a decision label, an insight claim, or input/output IDs. Not a full block — just the hook that identifies what kind of thing this is.
+
+### Diamond 2: Design → Delivery
+
+**Design (diverge).** What are the real alternatives? For each, what would make it right or wrong? Trade-offs, excluded options, edge cases. This is where the Contrarian and Simplifier stances are most useful.
+
+**Delivery (converge).** Commit to a default, write the `excluded_reason` for each rejected option, identify inputs and outputs, stage the evidence. The structure is now formalizable.
+
+**Output of Diamond 2:** structured fields populated — `decisions` with options and default, `inputs`/`outputs` with IDs and types, `insights` with claim and evidence. (If felt is in use, these go on the fiber; otherwise they live in the spec itself or in `astra.yaml`.)
+
+The two diamonds are sequential but the boundary is soft. If you find yourself naming alternatives before the thing is clear, back up to the ontology convergence point. If you converge too early on "this is a decision" when it is actually a question, the Design phase will feel forced — that is the cue to re-enter Wonder.
+
+---
+
+## Stances
+
+Six lightweight lenses for when the conversation needs pressure. **Default is no stance** — straight conversation. Invoke a stance when pressure would help, announce it in one sentence, drop it when it has done its work. Do not stack or pipeline them.
+
+### Socratic — *"What are you assuming?"*
+
+Question-only. Never proposes answers. Surfaces the assumptions under the user's framing.
+
+- What are you assuming is true that might not be?
+- What would make option A right vs option B? What is the actual fork?
+- If you had to write the `excluded_reason` for the option you are about to reject, what would it say?
+
+**Use in Wonder and early Design.** When the user is about to commit to a path and you want the reasons made explicit.
+
+### Ontologist — *"What IS this, really?"*
+
+Pushes on definition before mechanism. Four questions:
+
+1. **Essence** — what is the true nature, stripping away accidental properties?
+2. **Root cause or symptom** — is this the fundamental issue or a surface effect?
+3. **Prerequisites** — what must exist first for this even to make sense?
+4. **Hidden assumptions** — what implicit beliefs is the framing resting on?
+
+**Use at the Ontology convergence point.** When a word is doing heavy lifting and may mean different things in different sentences.
+
+### Contrarian — *"What if the opposite were true?"*
+
+Challenges premises, not details.
+
+- What if the choice does not actually matter for your signal?
+- What if the constraint you are designing around is not real?
+- What if the simplest version is already good enough?
+
+**Use in Design.** When the conversation is burning effort on a distinction that may not matter, or a third option (do nothing, use the default) is being ignored.
+
+### Simplifier — *"Is this complexity earning its keep?"*
+
+YAGNI, concrete first, data over code.
+
+- What can we remove without losing the core value?
+- What is the simplest version that would work?
+- Can a data structure replace this logic?
+
+**Use in Design and early Delivery.** When the design is drifting toward over-engineering or a feature list is growing without anchoring reasons.
+
+### Researcher — *"What do we actually know?"*
+
+Evidence before interpretation. Especially useful for scientific work where a claim needs to be defensible.
+
+- What does the actual source say, not what we remember?
+- What would count as evidence here? What would falsify the claim?
+- What is the most specific claim we can make with the data in hand?
+
+**Use in Delivery.** When an insight needs a defensible claim, or when the user is about to write an outcome that is stronger than the evidence supports.
+
+### Architect — *"If we started over, would we build it this way?"*
+
+Structural root cause. The question behind the question when friction keeps recurring.
+
+- Is the same problem showing up in different forms?
+- Which abstraction does not match reality?
+- What assumption was wrong from the start?
+
+**Use when a debate keeps returning.** The user is circling a decision they have already made three times and cannot stick to — the real question is probably structural, not tactical.
+
+---
+
+## The funnel
+
+When the conversation is exploratory — no single topic, things are accumulating — keep a private running ledger of what is falling out, classified by destination:
+
+| Item kind | What it looks like | Destination |
+|-----------|--------------------|-------------|
+| **Decision** | A choice between real alternatives | `decisions` block in spec / `astra.yaml` / fiber |
+| **Finding** | A claim with at least the start of evidence | `insights` block / fiber |
+| **Sub-analysis** | "Compute X from Y" with identifiable inputs/outputs | New `astra.yaml` sub-analysis or new fiber with `inputs`/`outputs` stubs |
+| **Question** | An open thread worth tracking, not yet answered | "Open Questions" section of the constitution / annotated fiber |
+| **Root-fiber change** | A pattern or gotcha that belongs in CLAUDE.md | Edit CLAUDE.md / root fiber |
+
+The ledger is your own working memory. **Do not surface it mid-conversation** unless the user asks or a flush cue fires.
+
+**Flush cues:**
+
+- User says "OK we should write this down" or similar
+- Three or more items have accumulated and the topic is about to shift
+- A natural pause after a decision or finding lands
+
+On flush, present the ledger grouped by destination, then file with the user's assent. If the user declines an item, discard it without argument.
+
+---
+
+## Qualitative ambiguity self-check
+
+Before committing to a path — filing a decision, launching an iteration loop, sealing an outcome — check three things qualitatively. **No scoring, no thresholds.** If any feels fuzzy, resolve it with AskUserQuestion.
+
+1. **Goal.** Is what the user wants specific enough that two competent people would build the same thing from it? If not, what would pin it down?
+2. **Constraints.** Are the limits named? What cannot change, what must be preserved, what would break everything? Missing constraints tend to show up as "oh wait, we also need…" after the commit.
+3. **Success.** How will we know it is done or right? What is the evidence condition? Qualitative is fine ("a reviewer can follow the narrative cold"), but it has to be checkable.
+
+When one is fuzzy, use AskUserQuestion with concrete options rather than open prose questions. Iterate until the answer is "yeah, that's it." **Stop when the fuzziness resolves, not when a score crosses a threshold.** Scores on qualitative priors add false precision; the honest signal is whether the user knows what they want.
+
+This is a mirror, not a gate. If the user wants to file anyway with one dimension still fuzzy, file it — the fuzziness itself can live in an Open Questions section, and future iterations can refine it.
+
+---
+
+## When to bring in /confer
+
+`/confer` routes a prompt through Codex for adversarial review. Good fits inside a crafting session:
+
+- A design choice where two plausible paths both look right and the user is stuck
+- Validating that an insight claim actually follows from its evidence
+- Pressure-testing a constitution's desired state before launching iteration
+
+Bad fits: routine decisions, the user has already committed, the dispute is stylistic, or the answer only needs three more seconds of thought. `/confer` is not a substitute for the user's taste — it is a second opinion when the first opinion is honestly unsure.
+
+---
+
+## Mapping outputs to structure
+
+What comes out of the diamonds maps onto wherever you keep structured commitments:
+
+| Diamond output | Destination |
+|----------------|-------------|
+| Wonder questions left open | "Open Questions" section in the constitution; or a fiber with `status: open` (felt) |
+| Ontology convergence — "this IS a decision about X" | A `decisions.<key>.label` entry — in `astra.yaml`, in the constitution body, or on a fiber |
+| Design alternatives with trade-offs | `decisions.<key>.options`; rejected options get `excluded_reason` |
+| Delivery — the commit | `decisions.<key>.default` |
+| Finding at end of Delivery | `insights.<key>` with `claim` + `evidence` (or finding in `astra.yaml`) |
+| Sub-analysis scope | New sub-analysis in `astra.yaml`, or a new fiber with `inputs`/`outputs` |
+| Process-level lesson that generalizes | Edit to root CLAUDE.md / root fiber |
+
+If felt is installed, the [`felt:felt`](https://github.com/cailmdaley/felt) skill carries the tier ladder (Annotated → Formalized → Tempered) and the common frontmatter shapes. Without felt, the same shapes apply directly inline in `astra.yaml` or the constitution itself.
+
+---
+
+## Anti-patterns
+
+- **Ambiguity gates.** Do not withhold help until the user clarifies N dimensions. The self-check is a mirror, not a door.
+- **Numerical scoring.** Do not introduce 0–1 clarity scores with thresholds. The underlying signal is qualitative and the number adds false precision.
+- **Stance pipelines.** Do not run Socratic → Ontologist → Contrarian in sequence. Pick one when it helps; drop it when it has.
+- **Mandatory interview.** No prepared question list. Stances are responsive to the actual conversation.
+- **Surfacing the ledger too early.** A single item is not a flush. Wait for accumulation or a pause.
+- **Immutable outputs.** Nothing filed here is locked. Everything is editable; reversals are normal.
+- **Nine-minds overload.** Six stances is already generous. Add more only when a specific gap shows up, never preemptively.
+- **Interrogation without a ceiling.** Three questions is usually enough. If the user is getting irritated, stop asking and file what you have.
+- **Converging before the name is clear.** If Diamond 2 feels forced, Diamond 1 has not finished. Back up.
diff --git a/claude/lightcone/skills/managing-bibliography/SKILL.md b/claude/lightcone/skills/managing-bibliography/SKILL.md
new file mode 100644
index 00000000..c9143a46
--- /dev/null
+++ b/claude/lightcone/skills/managing-bibliography/SKILL.md
@@ -0,0 +1,162 @@
+---
+name: managing-bibliography
+description: >
+  Read arXiv paper source and add BibTeX entries via ADS API. Use for
+  research that requires reading full paper text and managing citations.
+  Also the canonical paper-acquisition path inside the lightcone-cli
+  paper-reproduction bundle: `/paper2astra` calls this during the ACQUIRE
+  phase to fetch arXiv LaTeX source, with PDF + Docling as the non-arXiv
+  fallback. Triggers on: "read paper", "cite", "add to bibliography",
+  "bibtex", "ADS", "arXiv", "find paper", "add citation", or any request
+  to read scientific papers or manage references.
+---
+
+Read scientific papers and manage citations. Two capabilities:
+
+1. **Read papers** — Download arXiv LaTeX source to read full text, verify claims, understand methodology
+2. **Cite papers** — Fetch BibTeX from NASA ADS and add to bibliography
+
+**Activation**: Use this skill when you need to:
+- Read a paper's full text (not just abstract)
+- Verify a claim before citing it
+- Add citations to your bibliography
+- Research how other papers phrase similar findings
+- Acquire a paper for the `/paper2astra` reproduction pipeline (ACQUIRE phase)
+
+**Usage pattern**:
+- "Read the KiDS-Legacy paper to see how they report B-mode PTEs"
+- "Add [paper description] to the bibliography"
+- "Find and cite [author name] [year] [topic]"
+
+---
+
+## Reading Papers
+
+Download arXiv LaTeX source to read full paper text:
+
+```bash
+# Download source (replace ID as needed)
+curl -L -o /tmp/2503.19441.tar.gz "https://arxiv.org/src/2503.19441"
+
+# Extract
+mkdir -p /tmp/2503.19441 && cd /tmp/2503.19441 && tar -xzf /tmp/2503.19441.tar.gz
+
+# Find the main tex file
+ls *.tex
+```
+
+This gives you:
+- Full paper text (not just abstract)
+- Equations and methodology details
+- How authors phrased specific claims
+- Their bibliography (.bib or .bbl files)
+
+Use when you need to:
+- Verify a claim before citing
+- See exact phrasing in another paper
+- Understand methodology not in abstract
+- Cross-reference their citations
+
+---
+
+## ADS API Setup
+
+The ADS API requires an API token. Before using citation features:
+
+1. **Check for token**: The skill reads `$ADS_API_TOKEN` from the environment
+2. **If missing**: Tell the user to create one at https://ui.adsabs.harvard.edu/user/settings/token and set it:
+   ```bash
+   # Add to ~/.zshrc or ~/.bashrc
+   export ADS_API_TOKEN="your-token-here"
+   ```
+3. **Do not proceed** with ADS API calls until the token is available — check with `echo $ADS_API_TOKEN`
+
+---
+
+## Citing Papers
+
+When adding a paper to the bibliography:
+
+1. **Web search** for the paper using description + "arxiv"
+   - Look for arXiv ID in format `YYMM.NNNNN`
+   - If multiple results, show options and ask user to select
+
+2. **Query ADS API** to get bibcode using arXiv ID
+   ```bash
+   curl -H "Authorization: Bearer $ADS_API_TOKEN" \
+     'https://api.adsabs.harvard.edu/v1/search/query?q=arXiv:YYMM.NNNNN&fl=bibcode'
+   ```
+
+3. **Fetch BibTeX entry** with abstract from ADS
+   ```bash
+   curl -H "Authorization: Bearer $ADS_API_TOKEN" \
+     'https://api.adsabs.harvard.edu/v1/export/bibtexabs/{bibcode}'
+   ```
+
+4. **Parse BibTeX** to extract author names and year:
+   - Parse `author = {...}` field for last names
+   - Parse `year = YYYY` field for publication year
+   - Generate citation key based on author count:
+     - 1 author: `firstauthor{YY}` (e.g., `asgari17`)
+     - 2 authors: `firstauthor.secondauthor{YY}` (e.g., `schneider.kilbinger12`)
+     - 3+ authors: `firstauthor.etal{YY}` (e.g., `wright.etal25`)
+   - Use only last names, lowercase, final 2 digits of year
+
+5. **Replace citation key** in BibTeX entry
+   - Update the entry key on the first line (before the opening brace)
+   - Keep all other fields unchanged
+
+6. **Append to bibliography** file
+   - Add the modified entry to the project's `.bib` file
+   - Check for duplicate keys first and warn if found
+
+7. **Report success**
+   - Show the user the complete entry that was added
+   - Confirm file location
+
+## Citation Key Generation
+
+**Examples from BibTeX parsing**:
+- `author = {{Wright}, Angus H. and {Stölzner}, Benjamin and ...}` + `year = 2025` → `wright.etal25`
+- `author = {{Schneider}, Peter and {Kilbinger}, Martin}` + `year = 2012` → `schneider.kilbinger12`
+- `author = {{Asgari}, Marika}` + `year = 2017` → `asgari17`
+
+## Error Handling
+
+- **No arXiv ID found**: Ask user to provide it manually or search for the paper directly
+- **Multiple search results**: Show options and ask user to select the correct paper
+- **ADS API fails**: Show error and suggest manual bibcode lookup or entry
+- **Duplicate citation key**: Warn user, show existing entry, offer to replace or rename
+- **Missing bibliography file**: Report error and ask for correct file path
+
+## Key Configuration Points
+
+- **ADS API Token**: Read from `$ADS_API_TOKEN` environment variable
+- **ADS Search endpoint**: `https://api.adsabs.harvard.edu/v1/search/query`
+- **ADS Export endpoint**: `https://api.adsabs.harvard.edu/v1/export/bibtexabs/{bibcode}`
+- **Export format**: Use `bibtexabs` endpoint to include abstracts
+
+## Bibliography File Paths
+
+Adapt to your project structure:
+- `docs/unions_bmodes/unions_bmodes.bib` (example UNIONS project)
+- `references/bibliography.bib` (common alternative)
+- User should specify their bibliography file path
+
+## Notes
+
+- Always use the `bibtexabs` endpoint to include abstract in the entry
+- Parse author list carefully: format is `author = {{LastName}, FirstName and {LastName}, FirstName ...}`
+- Year is straightforward: `year = YYYY`
+- Before appending, verify file exists and has proper BibTeX format
+- Preserve existing entries when appending new ones
+
+---
+
+## Provenance
+
+Originally maintained at `~/.claude/skills/managing-bibliography/SKILL.md`
+(Cail's personal version). Copied here so the lightcone-cli
+paper-reproduction bundle has the full toolkit available via `lc init` —
+without depending on a separate plugin install. The personal copy may be
+ahead; re-sync as needed.
diff --git a/claude/lightcone/skills/narrative/SKILL.md b/claude/lightcone/skills/narrative/SKILL.md
index e08d3f1b..32524830 100644
--- a/claude/lightcone/skills/narrative/SKILL.md
+++ b/claude/lightcone/skills/narrative/SKILL.md
@@ -23,6 +23,14 @@ One field: `narrative:` on an analysis or sub-analysis, or `rationale:` on a dec
 Per-element prose (what each `Input`, `Output`, `Decision`, `Option`, or `Insight` is and why it matters) lives on those elements' own `description` / `rationale` / `notes` fields.
 `narrative` is the analysis-level story that weaves the pieces together.
 
+This skill is also part of the lightcone-cli paper-reproduction bundle: the
+`/paper2astra` orchestrator invokes it during the SPECIFY phase to author the
+narrative for the spec it has just crafted. Sibling skills in the bundle —
+`constitution`, `ralph-loops`, `managing-bibliography`,
+`check-sentence-by-sentence`, `figure-comparison` — solve adjacent pieces of
+the reproduction story; this skill stands alone and does not need to know
+about them.
+
 ## What a narrative is
 
 Science, from a single decision to a review paper, is a practice of
@@ -158,11 +166,43 @@ Applied to the five keys:
 - `findings` **synthesizes** — each finding cited by anchor as part of
   the argument, not an enumeration.
 - `inputs` **names provenance**.
-- `outputs` **names what was promoted and why**, citing each by anchor.
+- `outputs` **names what was promoted and why**, citing each by anchor —
+  **and names its downstream consumers** when they exist (see "Data flow" below).
 - Decision `rationale:` **names why the default won**.
 
 ---
 
+## Data flow — name where each output goes
+
+Recipe `inputs:` wires the DAG; the narrative makes the wiring legible. The
+schema already encodes who consumes what — readers should not have to grep
+49 `inputs:` lists to learn what an intermediate output is *for*.
+
+Two rules — both load-bearing for projects with sub-analyses:
+
+1. **`narrative.outputs` names downstream consumers.** When authoring
+   `outputs` prose on a sub-analysis or the root, name where each output
+   gets consumed using the `<analysis>.<output>` form that recipe `inputs:`
+   already uses. *"`xi_post_recon_lrg1` feeds
+   [`bao_fit_post_iso_ap_lrg1`](#analyses.bao_fit.outputs.bao_fit_post_iso_ap_lrg1)
+   and [`bao_detection_chi2_lrg1`](#findings.bao_detection_chi2_lrg1)."*
+   Anchor where you can; bare `<analysis>.<output>` text is acceptable when
+   no anchor is reachable from the current scope.
+
+2. **Root narrative includes a top-down data-flow paragraph.** When the
+   project has sub-analyses, the root analysis's `methods` (or `summary`)
+   must include one paragraph that traces the pipeline end-to-end:
+   *"raw catalogs → [reconstruction.post_recon_catalog_*](#analyses.reconstruction)
+   → [clustering.xi_*_recon_*](#analyses.clustering) → root [bao_fit_*](#outputs.bao_fit_post_iso_ap_lrg1)."*
+   This is the one place a reader can land cold and get the shape of the
+   pipeline without reading every recipe declaration.
+
+Closes [lightcone-cli#108](https://github.com/LightconeResearch/lightcone-cli/issues/108).
+The validator does not (yet) enforce this; treat both rules as authorial
+discipline. The information is already in the spec — surface it.
+
+---
+
 ## Anchor coverage
 
 `astra validate` checks:
diff --git a/claude/lightcone/skills/ralph-loops/SKILL.md b/claude/lightcone/skills/ralph-loops/SKILL.md
new file mode 100644
index 00000000..649a813c
--- /dev/null
+++ b/claude/lightcone/skills/ralph-loops/SKILL.md
@@ -0,0 +1,70 @@
+---
+name: ralph-loops
+description: >
+  Autonomous loop iteration toward a desired state. You are inside a ralph
+  loop — your spec is in the system prompt. Survey, contribute, update state
+  discoverably, exit. Activated automatically inside ralph loops.
+  Triggers: "ralph-loops", "ralph", "ralph loop", "iterate", "autonomous loop".
+---
+
+# Ralph Loops
+
+You are inside a loop. Your spec is in the system prompt above. Each iteration: survey freely, work substantially, update state discoverably, exit.
+
+## Loop
+
+1. **Survey** — Fresh eyes. Explore agents, git log, tests. You decide what to check.
+2. **Contribute** — Work on 1–3 substantial pieces. Do NOT try to clear the whole queue in one iteration.
+3. **Update** — Before exiting: commit your work, update CLAUDE.md if warranted.
+4. **Exit** — `kill $PPID`
+
+**CRITICAL: Exit before compaction.** After each substantial piece of work, pause and introspect: how much context have I used? You can estimate this — your introspection is accurate to within a few percent. If you feel past 50%, wrap up and exit. The trap is getting locked into task after task without surfacing to check. Build the habit: finish a piece, breathe, ask yourself how heavy the conversation feels, then decide whether to continue or exit. Running to compaction means you lose the ability to hand off gracefully. The loop continues — you don't have to finish everything.
+
+## Rules
+
+**State, not checklist.** The spec describes what "done" looks like. Survey reality, decide what's highest value, work on that.
+
+**Discoverable updates.** Commits, test results, documentation — not notes or progress files. The next iteration finds what changed by inspecting the system.
+
+**Pointers, not snapshots.** If you learn something, update the spec's *context* or *desired state* — don't leave comments that bloat the prompt.
+
+**You have authority.** Trust the spec, don't ask permission. Make substantial contributions. Don't avoid ambitious solutions just because they span multiple iterations.
+
+**File uncertain decisions** so the user can answer after the loop. Use AskUserQuestion to batch up to 4 high-leverage questions before exiting — choices where user input redirects substantial work.
+
+### Long-Running Jobs
+
+Some iterations require waiting on computation (builds, cluster jobs, CI). When jobs are running:
+
+1. **Check state** — tail logs, check output
+2. **Sleep** — interval proportional to expected runtime (30s for minute-scale, 5m for hour-scale)
+3. **Check again** — look for errors or completion
+4. **Repeat** until jobs finish or fail
+
+Stay and shepherd computation through. Don't exit and hope the next iteration picks it up.
+
+## Exit
+
+If you **made substantial contributions**, `kill $PPID`. Do NOT close the spec — the loop continues.
+
+If you **cannot find any remaining work**, update the spec's YAML frontmatter to `status: closed` with a summary of what was accomplished.
+
+---
+
+Pattern adapted from [Ralph Wiggum](https://ghuntley.com/ralph/).
+
+---
+
+## Provenance
+
+Originally from [`cailmdaley/skills`](https://github.com/cailmdaley/skills/tree/main/skills/ralph-loops).
+Copied into the lightcone-cli paper-reproduction bundle so it can compose
+with `paper2astra`, `constitution`, and the rest of the bundle without a
+separate plugin install. The canonical version may be ahead; re-sync as
+needed.
+
+In the bundle, `/paper2astra` invokes `/constitution` to draft a per-paper
+reproduction constitution and then launches a ralph loop against it via
+`scripts/ralph`. Successive iterations of the loop survey the workdir and
+git history, execute the next phase, and exit cleanly — see the bundle
+README at `../README.md`.
diff --git a/claude/lightcone/skills/ralph-loops/assets/spec.md b/claude/lightcone/skills/ralph-loops/assets/spec.md
new file mode 100644
index 00000000..0da84d2a
--- /dev/null
+++ b/claude/lightcone/skills/ralph-loops/assets/spec.md
@@ -0,0 +1,29 @@
+---
+status: open
+---
+
+This is your spec for an autonomous iteration loop, a meditative iteration toward a desired state.
+
+## Desired State
+
+[Describe what you're building and why. Someone unfamiliar with the project should understand the goal from this section alone.
+
+Be detailed about "done": the architecture, behavior, constraints, quality bar. You'll check reality against this and work to close the gap.
+
+Use pointers, not snapshots. Say "check `grep -r 'pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid.]
+
+## Context
+
+[Point to relevant files and existing patterns. When you see real implementations, you build coherently on them rather than introducing alien patterns.]
+
+## Skills
+
+[Skills to activate before working. Use `/skill-name`.]
+
+## Evidence
+
+[How to check progress — commands, test suites, grep patterns. Pointers to the ground truth that iterations measure themselves against.]
+
+## Open Questions
+
+[Uncertainties the user should weigh in on. Iterations add to this; the user resolves between loops.]
diff --git a/claude/lightcone/skills/ralph-loops/scripts/ralph b/claude/lightcone/skills/ralph-loops/scripts/ralph
new file mode 100755
index 00000000..a7269366
--- /dev/null
+++ b/claude/lightcone/skills/ralph-loops/scripts/ralph
@@ -0,0 +1,124 @@
+#!/bin/bash
+# Run a ralph loop on a spec file
+# Loops while spec status is open/active, appending spec content to system prompt
+# Usage: ralph <spec.md> [--backend claude|codex] [-- extra-flags...]
+#
+# Supports both Claude Code and Codex backends.
+# Default: claude. Set RALPH_BACKEND=codex or pass --backend codex.
+
+set -e
+
+SPEC_FILE="${1:?Usage: ralph <spec.md> [--backend claude|codex] [-- extra-flags...]}"
+shift
+
+BACKEND="${RALPH_BACKEND:-claude}"
+if [[ "$1" == "--backend" ]]; then
+    BACKEND="$2"
+    shift 2
+fi
+
+EXTRA_FLAGS=""
+if [[ "$1" == "--" ]]; then
+    shift
+    EXTRA_FLAGS="$*"
+fi
+
+# Resolve to absolute path
+SPEC_FILE="$(cd "$(dirname "$SPEC_FILE")" && pwd)/$(basename "$SPEC_FILE")"
+
+if [[ ! -f "$SPEC_FILE" ]]; then
+    echo "Spec file not found: $SPEC_FILE"
+    exit 1
+fi
+
+SESSION="ralph-$(basename "$SPEC_FILE" .md)"
+WORK_DIR="$(pwd)"
+
+# Check if already running
+if tmux has-session -t "$SESSION" 2>/dev/null; then
+    echo "Ralph already running: $SESSION"
+    echo "  Attach: tmux attach -t $SESSION"
+    exit 0
+fi
+
+# Write loop script to temp file (avoids heredoc quoting hell)
+LOOP_SCRIPT=$(mktemp /tmp/ralph-loop-XXXXXX.sh)
+cat > "$LOOP_SCRIPT" << 'LOOP'
+#!/bin/bash
+SPEC_FILE="$1"
+WORK_DIR="$2"
+BACKEND="$3"
+EXTRA_FLAGS="$4"
+
+iteration=0
+
+# Check YAML frontmatter for status field
+check_status() {
+    head -50 "$SPEC_FILE" | sed -n '/^---$/,/^---$/p' | grep -qiE 'status:.*(open|active)'
+}
+
+while check_status; do
+    cd "$WORK_DIR"
+    iteration=$((iteration + 1))
+    echo ""
+    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+    echo "Ralph iteration $iteration — $(date '+%H:%M:%S')"
+    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+
+    SPEC_CONTENT=$(cat "$SPEC_FILE")
+
+    SYSPROMPT_FILE=$(mktemp /tmp/ralph-sys-XXXXXX.txt)
+    PROMPT_FILE=$(mktemp /tmp/ralph-prompt-XXXXXX.txt)
+
+    cat > "$SYSPROMPT_FILE" << SYSEOF
+Ralph iteration $iteration. Spec: $SPEC_FILE
+
+$SPEC_CONTENT
+SYSEOF
+
+    cat > "$PROMPT_FILE" << 'PROMPTEOF'
+You are inside a Ralph loop — a meditative iteration toward a desired state. Activate the ralph-loops skill and follow its instructions for iterating on the spec above.
+PROMPTEOF
+
+    PROMPT=$(cat "$PROMPT_FILE")
+
+    if [[ "$BACKEND" == "codex" ]]; then
+        codex --dangerously-bypass-approvals-and-sandbox \
+            --config "developer_instructions=$(cat "$SYSPROMPT_FILE")" \
+            $EXTRA_FLAGS \
+            "$PROMPT"
+    else
+        claude --dangerously-skip-permissions \
+            $EXTRA_FLAGS \
+            --append-system-prompt "$(cat "$SYSPROMPT_FILE")" \
+            <<< "$PROMPT"
+    fi
+
+    rm -f "$SYSPROMPT_FILE" "$PROMPT_FILE"
+
+    echo "--- Iteration complete ---"
+    sleep 2
+done
+
+echo ""
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Ralph complete — $iteration iterations"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo ""
+echo "Session kept open for inspection. Type exit to close."
+exec bash -l
+LOOP
+
+chmod +x "$LOOP_SCRIPT"
+
+echo "Starting ralph on $SPEC_FILE"
+echo "  Backend: $BACKEND"
+echo "  Work dir: $WORK_DIR"
+[[ -n "$EXTRA_FLAGS" ]] && echo "  Flags:    $EXTRA_FLAGS"
+
+# Launch tmux with a login shell running the loop script
+tmux new-session -d -s "$SESSION" -c "$WORK_DIR" \
+    bash -l "$LOOP_SCRIPT" "$SPEC_FILE" "$WORK_DIR" "$BACKEND" "$EXTRA_FLAGS"
+
+echo "  Session:  $SESSION"
+echo "  Attach:   tmux attach -t $SESSION"

From 7d53081df62d7b9af485b4c222b8fe9c19c06aab Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 4 May 2026 03:19:23 +0200
Subject: [PATCH 003/124] skills: add /paper2astra orchestrator + bundle README

Builds the paper2astra skill on the lightcone-cli plugin: an interview-first
orchestrator that crafts a per-paper reproduction constitution and hands it
to a ralph loop. Composes the rest of the bundle (managing-bibliography,
narrative, constitution, ralph-loops; +check-sentence-by-sentence and
figure-comparison when Nolan pushes them).

paper2astra/SKILL.md frames the workflow:
- Interview is the only interactive phase (once per project). Drafts the
  per-paper constitution via /constitution.
- After interview, /ralph-loops/scripts/ralph drives the multi-session
  reproduction. Each iteration surveys the workdir to determine the
  current phase from file existence + git history; no Pydantic state
  machine, no resume mechanic.
- 11 phases (interview + acquire + parse + summarize + extract_targets +
  literature + specify + review + implement + run + compare +
  summarize_run) each have a self-contained reference under
  paper2astra/references/. The phase prose ports 1:1 from the existing
  Paper2ASTRA Python prompts at LightconeResearch/Paper2ASTRA, with the
  surfacing-seam discipline pulled in from the 2026-04-30 design plan.
- Per-phase mode (interactive vs sub-agent) is a constitution choice the
  interview surfaces. SPECIFY and COMPARE are always interactive
  (mandatory user-ratification seams); SUMMARIZE and LITERATURE are
  always sub-agent (parallel grunt-work); the rest default to sub-agent
  but the user can flip them.
- Material conflicts at SPECIFY (paper-vs-code disagreement that
  plausibly changes a numeric result) surface to the user via
  AskUserQuestion. Default on user silence is paper. Resolves
  Paper2ASTRA#8.
- ACQUIRE rewrites for arxiv-LaTeX-first via /managing-bibliography;
  PDF + Docling stays as the non-arxiv fallback. Resolves Paper2ASTRA#7
  as a side-effect of the migration.

skills/README.md indexes the bundle: lifecycle skills (lc-*) plus the
seven paper-reproduction skills, with origin attribution for each.
Documents the pending bundle additions (Nolan's two skills not yet
pushed) so the next worker knows what to add when they land.

Refs: lightcone/paper2astra-as-skill/skill-bundle constitution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/README.md             |  42 +++++
 claude/lightcone/skills/paper2astra/SKILL.md  | 168 ++++++++++++++++++
 .../skills/paper2astra/references/acquire.md  |  85 +++++++++
 .../skills/paper2astra/references/compare.md  |  97 ++++++++++
 .../paper2astra/references/extract_targets.md |  61 +++++++
 .../paper2astra/references/implement.md       |  57 ++++++
 .../paper2astra/references/interview.md       | 160 +++++++++++++++++
 .../paper2astra/references/literature.md      | 163 +++++++++++++++++
 .../skills/paper2astra/references/parse.md    |  79 ++++++++
 .../skills/paper2astra/references/review.md   |  79 ++++++++
 .../skills/paper2astra/references/run.md      |  56 ++++++
 .../skills/paper2astra/references/specify.md  | 105 +++++++++++
 .../paper2astra/references/summarize.md       | 120 +++++++++++++
 .../paper2astra/references/summarize_run.md   |  58 ++++++
 14 files changed, 1330 insertions(+)
 create mode 100644 claude/lightcone/skills/README.md
 create mode 100644 claude/lightcone/skills/paper2astra/SKILL.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/acquire.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/compare.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/extract_targets.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/implement.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/interview.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/literature.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/parse.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/review.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/run.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/specify.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/summarize.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/summarize_run.md

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
new file mode 100644
index 00000000..439337cd
--- /dev/null
+++ b/claude/lightcone/skills/README.md
@@ -0,0 +1,42 @@
+# lightcone-cli skills
+
+Each subdirectory is one Claude Code skill: `SKILL.md` plus optional `references/`, `assets/`, and `scripts/`. `lc init` copies these into a project's `.claude/skills/` so they are discoverable to Claude Code sessions.
+
+## Project lifecycle skills
+
+| Skill | Role |
+|---|---|
+| `lc-new` | Scaffold a new ASTRA-shaped project from scratch. |
+| `lc-build` | Build container images and dependencies for a project. |
+| `lc-verify` | Run validation across an ASTRA project. |
+| `lc-migrate` | Migrate legacy projects to current conventions. |
+| `lc-feedback` | Report bugs and feature requests upstream. |
+
+## Paper-reproduction bundle
+
+A self-contained toolkit for reproducing published papers in ASTRA. The bundle is co-located so a single `lc init` brings the full toolkit into a project — no plugin marketplace, no separate installs.
+
+| Skill | Role | Origin |
+|---|---|---|
+| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and launches a ralph loop against it. | New for the bundle. |
+| [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. | Cail's ([lightcone-cli#86](https://github.com/LightconeResearch/lightcone-cli/pull/86), ported from lightcone-ui#10). |
+| [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. | Merged from [`cailmdaley/skills/skills/constitution`](https://github.com/cailmdaley/skills/tree/main/skills/constitution) (procedural backbone) + Cail's personal felt references (taste — two diamonds, six stances, funnel ledger, qualitative self-check), with felt-optional framing. |
+| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Launched by paper2astra after the interview. | Direct copy from [`cailmdaley/skills/skills/ralph-loops`](https://github.com/cailmdaley/skills/tree/main/skills/ralph-loops). |
+| [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. | Direct copy of Cail's personal `~/.claude/skills/managing-bibliography` (newer than the public version). |
+| `check-sentence-by-sentence` | Paper-vs-code TeX audit via sub-agents; locates `file:line` or `NOT FOUND`. Invoked by paper2astra during COMPARE. | Nolan Koblischke's, on his Reproductions-branch. **Not yet pushed publicly** — see "Pending bundle additions" below. |
+| `figure-comparison` | HTML side-by-side: original figures/tables/numerics vs replicated. Invoked by paper2astra during COMPARE. | Same — Nolan's, pending. |
+
+The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra.
+
+### Why bundle (not depend on plugin install)
+
+- **Testability.** We want to verify paper2astra invokes constitution + ralph-loops + the others correctly. That only works if all are in the same checkout.
+- **Single install path.** `lc init` is the install path for lightcone-cli skills. Adding a separate "also install Cail's public skills via plugin marketplace" step is friction we don't need.
+- **Copy-with-credit costs nothing.** The copied skills retain attribution to their original authors in the SKILL body; if those skills update upstream, we re-sync.
+- **Future consolidation is open.** Per Francois's "next week we improve" framing, the long-run shape might be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all.
+
+### Pending bundle additions
+
+- **`check-sentence-by-sentence`** and **`figure-comparison`** — Nolan Koblischke's two skills. Per the bundle constitution ([`lightcone/.felt/lightcone/paper2astra-as-skill/skill-bundle`](https://github.com/LightconeResearch/lightcone/blob/main/.felt/lightcone/paper2astra-as-skill/skill-bundle.md)), these are part of the bundle, but at first cut they were not yet pushed to any public branch (only living on Nolan's local working tree on his Reproductions checkout). When Nolan pushes them, copy with attribution into this directory; paper2astra's SKILL.md and COMPARE reference already name them as expected siblings, so the integration is wire-compatible the moment they land.
+
+  Until then, COMPARE falls back to direct image-diff judgment without `/figure-comparison`'s structured per-panel rendering, and SPECIFY's evidence-quote re-verification (when COMPARE flags `partial`) falls back to manual Grep against `work/reference/document.md` without `/check-sentence-by-sentence`'s sub-agent audit. Both fallbacks are workable but lossier than the intended path.
diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
new file mode 100644
index 00000000..5ea9fc58
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -0,0 +1,168 @@
+---
+name: paper2astra
+description: >
+  Reproduce a published scientific paper in ASTRA. Interview the user
+  about the paper and the intended scope, draft a per-paper reproduction
+  constitution, then launch a ralph loop that drives the multi-session
+  reproduction work. Composes sibling skills for each phase: managing-
+  bibliography for ACQUIRE, narrative for SPECIFY, check-sentence-by-
+  sentence + figure-comparison for COMPARE. Use when the user wants to
+  reproduce a paper, has a DOI or arXiv ID and wants to start a
+  reproduction project, or asks to "reproduce <paper>", "set up
+  reproduction", "paper2astra", "/paper2astra <doi>", or hands you a
+  published paper as a starting point for ASTRA work.
+---
+
+# paper2astra
+
+Reproduce a published paper in ASTRA. The skill is **interview-first**: a short interactive crafting phase up front that produces a per-paper reproduction constitution. After the interview, paper2astra hands the constitution to a ralph loop that drives multi-session reproduction. Successive iterations of the loop survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
+
+This is a Claude-Code-native skill. There is no Python orchestrator, no state machine, no resume mechanic — the workdir on disk + git history are the substrate.
+
+## When to use this skill
+
+- The user has a paper (DOI, arXiv ID, or PDF) and wants to reproduce its analysis
+- The user invokes `/paper2astra` (with or without an argument)
+- The user is starting a fresh reproduction project under `Reproductions/<collab>/<short-name>/`
+- An existing paper-reproduction workdir needs the next phase driven forward (in which case skip the interview, see "Resuming an in-flight reproduction" below)
+
+## The bundle
+
+paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. All siblings live in the same `claude/lightcone/skills/` directory and are available without separate installs:
+
+| Sibling skill | Where it's invoked |
+|---|---|
+| [`/managing-bibliography`](../managing-bibliography/SKILL.md) | ACQUIRE — arXiv LaTeX source download (primary) and BibTeX caching |
+| [`/constitution`](../constitution/SKILL.md) | INTERVIEW — drafting the per-paper reproduction constitution |
+| [`/ralph-loops`](../ralph-loops/SKILL.md) | After interview — launches the loop that drives all subsequent phases |
+| [`/narrative`](../narrative/SKILL.md) | SPECIFY — authoring the `narrative:` and `rationale:` prose in `astra.yaml` |
+| [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) | COMPARE — paper-vs-code TeX audit (Nolan's skill) |
+| [`/figure-comparison`](../figure-comparison/SKILL.md) | COMPARE — HTML side-by-side reference vs reproduced figures (Nolan's skill) |
+
+paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them.
+
+## Workflow
+
+### Interview (interactive — once per project)
+
+The interview is the only phase paper2astra runs interactively. Read [`references/interview.md`](references/interview.md) in full before starting.
+
+The interview has four jobs:
+
+1. **Identify the paper** — DOI / arXiv ID / title; whether code is available; whether the user has prior experience with this paper.
+2. **Scope the reproduction** — full reproduction vs targeted (e.g. only the BAO fit), which figures/tables/numbers are the targets.
+3. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. The defaults are reasonable; the user gets to flip any of them.
+4. **Draft the per-paper constitution** — invoke `/constitution`. The constitution lives at the project root (or wherever the user prefers). It captures the paper, the scope, the per-phase mode choices, and the evidence checks.
+
+After the constitution is approved, the interview ends. Launch the ralph loop:
+
+```bash
+../ralph-loops/scripts/ralph paper2astra-constitution.md
+```
+
+Tell the user: *"Constitution drafted. Launching ralph loop in tmux session `ralph-paper2astra-constitution`. Each iteration will run one or two phases and exit; the next iteration picks up where it left off. Attach with `tmux attach -t ralph-paper2astra-constitution`."*
+
+### Phases (driven by ralph iterations after the interview)
+
+Inside each ralph iteration, the agent reads the per-paper constitution, surveys the workdir to determine which phase is current (file existence + git log), and runs that phase's reference. Each phase reference is self-contained — read the matching one in full before working:
+
+| Phase | Reference | Outputs |
+|---|---|---|
+| ACQUIRE | [`references/acquire.md`](references/acquire.md) | `work/reference/{document.md, paper.pdf, code/, code-status.yaml}` |
+| PARSE | [`references/parse.md`](references/parse.md) | `work/reference/{figures/, tables/, metadata.json}` |
+| SUMMARIZE | [`references/summarize.md`](references/summarize.md) | `work/notes/{methodology.md, cited_papers.yaml, code-analysis.md}` |
+| EXTRACT_TARGETS | [`references/extract_targets.md`](references/extract_targets.md) | `targets/targets.md` + reference files |
+| LITERATURE | [`references/literature.md`](references/literature.md) | `work/notes/literature.yaml` + per-paper YAMLs |
+| SPECIFY | [`references/specify.md`](references/specify.md) | `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` |
+| REVIEW | [`references/review.md`](references/review.md) | (in-place edits to spec + notes) |
+| IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
+| RUN | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
+| COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
+| SUMMARIZE_RUN | [`references/summarize_run.md`](references/summarize_run.md) | Final write-up; constitution outcome update |
+
+The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it.
+
+### Per-phase mode (interactive vs sub-agent)
+
+A reproduction's most consequential decisions show up at known seams. The interview decides — for this paper — which phases run interactively (in the main loop session, the user can be reached via `AskUserQuestion`) and which delegate to a sub-agent (Task tool with fresh context, no user reach).
+
+Defaults the constitution starts with:
+
+| Phase | Default | Why |
+|---|---|---|
+| ACQUIRE | sub-agent | Mostly mechanical; surfacing happens only on download failures. |
+| PARSE | sub-agent | Deterministic Docling / arXiv extraction. |
+| SUMMARIZE | sub-agent | Parallel paper + code reading benefits from fresh context per task. |
+| EXTRACT_TARGETS | user choice | The selection of replication targets is sometimes obvious, sometimes wants user input. |
+| LITERATURE | sub-agent | One sub-agent per cited paper — pure parallel grunt-work. |
+| SPECIFY | **interactive** | Material paper-vs-code conflicts surface here; the user must ratify. |
+| REVIEW | user choice | Pre-implement sanity check; can be either. |
+| IMPLEMENT | user choice | Mostly mechanical, but algorithm choices may want ratification. |
+| RUN | user choice | Mechanical, but failures need diagnosis. |
+| COMPARE | **interactive** | Verdict (was the reproduction close enough?) is the second mandatory user-ratification seam. |
+| SUMMARIZE_RUN | sub-agent | Final report; no decisions remain. |
+
+The constitution records the choice; ralph iterations honor it. Sub-agent phases are spawned via the `Task` tool from inside the main loop session — that gives them fresh context but no user-reach. Interactive phases run inline in the loop session and may pause with `AskUserQuestion` at material seams.
+
+### Material conflicts (the SPECIFY seam)
+
+Inside SPECIFY, when paper and code disagree on something material, do not silently pick one. Use `AskUserQuestion` to surface the conflict:
+
+- **Material** = a different choice would plausibly change a numeric result the paper reports.
+- **Stylistic / cosmetic / pure-tooling differences** are not material — record them in `implementation-notes.md` and move on.
+- **Default on user silence is paper.** If the user does not respond, take the paper's stated method as canonical and record the override (with reason) in a finding or insight.
+
+Both choices land in `astra.yaml` as decision options. Whichever the user picks becomes the option selected by `universes/baseline.yaml`; the alternative is preserved as a sibling option for future universe runs. See `references/specify.md` for the full SPECIFY discipline.
+
+### Resuming an in-flight reproduction
+
+If the workdir already exists (`work/reference/document.md` is present, `astra.yaml` exists, etc.):
+
+1. **Skip the interview** unless the user explicitly wants to revise scope.
+2. Read the per-paper constitution if it exists; if it does not, draft a minimal one from the current workdir state.
+3. Launch (or re-attach to) the ralph loop. Each iteration's first move is to survey the workdir and determine the current phase.
+
+Workdir signals (file existence implies the phase has been done):
+
+| Signal | Phase done |
+|---|---|
+| `work/reference/document.md` | ACQUIRE + PARSE |
+| `work/notes/methodology.md` | SUMMARIZE (paper) |
+| `work/notes/code-analysis.md` | SUMMARIZE (code) |
+| `targets/targets.md` | EXTRACT_TARGETS |
+| `work/notes/literature.yaml` | LITERATURE |
+| `astra.yaml` valid (`astra validate astra.yaml`) | SPECIFY |
+| `implementation-notes.md` | SPECIFY |
+| recipes present in `astra.yaml` | IMPLEMENT |
+| `results/<universe>/<output>/` | RUN |
+| `comparison-report.yaml` | COMPARE |
+
+`git log --oneline` complements this — phase commits are the chronological view.
+
+## Skills (activate before working)
+
+- [`/constitution`](../constitution/SKILL.md) — for the interview's drafting phase
+- [`/ralph-loops`](../ralph-loops/SKILL.md) — for the loop that drives phases
+- [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE
+- [`/narrative`](../narrative/SKILL.md) — for SPECIFY
+- `/check-sentence-by-sentence`, `/figure-comparison` — for COMPARE (Nolan's skills; see Provenance)
+
+## Discipline
+
+- **paper2astra is the workflow story; phase references are the depth.** SKILL.md tells you when to read which reference; the references carry the prompt prose ported from the legacy Paper2ASTRA Python package.
+- **Use the up-to-date CLI surfaces, not skill-specific wrappers.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces.
+- **No synthetic data.** Unless the paper itself uses synthetic data as its input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement phase reference repeats this; treat it as load-bearing.
+- **Workdir conventions stay.** The phase references preserve Paper2ASTRA's workdir layout (`work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`) so workdirs from the legacy Paper2ASTRA package are interoperable with workdirs driven by this skill.
+
+## Anti-patterns
+
+- **Asking the user mid-sub-agent.** Sub-agent phases cannot reach the user. If the constitution puts SPECIFY in sub-agent mode and a material conflict surfaces, the sub-agent must record the conflict in a `decisions:` block (with both options preserved) and let the next interactive phase ratify it. Never make the sub-agent pick silently.
+- **Re-implementing what astra already does.** If `astra validate` returns clean, do not write a separate validator. If `astra paper add` caches the PDF, do not write a separate cache.
+- **Treating Paper2ASTRA workdir as legacy.** It is not legacy — it is the substrate. The phase references inherit its conventions intentionally.
+- **Bundling everything into one ralph iteration.** Each iteration runs one or two phases, then exits. The constitution is realized across many iterations.
+
+## Provenance
+
+`paper2astra` is a fresh skill, but the phase prose ports 1:1 from the prompts in [`LightconeResearch/Paper2ASTRA/src/paper2astra/prompts/`](https://github.com/LightconeResearch/Paper2ASTRA/tree/main/src/paper2astra/prompts) (commit b3b54b5 and onward on `feat/skill-form-redesign`). The Paper2ASTRA Python package retires once this skill is in regular use; the repo persists as a reference for the original prompts and pipeline structure.
+
+The two compare-phase sibling skills (`check-sentence-by-sentence` and `figure-comparison`) originate from Nolan Koblischke's work on the [Reproductions](https://github.com/LightconeResearch/Reproductions) repo. They are credited in their own SKILL.md bodies; tag him post-publish so he can PR the canonical versions wherever they should ultimately live.
diff --git a/claude/lightcone/skills/paper2astra/references/acquire.md b/claude/lightcone/skills/paper2astra/references/acquire.md
new file mode 100644
index 00000000..a0d8aa2d
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/acquire.md
@@ -0,0 +1,85 @@
+# ACQUIRE — fetch the paper and code
+
+Acquire the paper's full text and (when available) its reference code repository. The bundle's primary acquisition path is **arXiv LaTeX source via `/managing-bibliography`**; PDF + Docling is the fallback for non-arXiv papers.
+
+The constitution's per-phase mode controls whether this runs interactively or as a sub-agent. Default is sub-agent.
+
+## Inputs
+
+- The paper's DOI or arXiv ID (from the constitution)
+- An optional code repo URL (from the interview, if the user knew it)
+
+## Outputs
+
+- `work/reference/document.md` — paper as markdown (LaTeX-rendered when arXiv source available; Docling-extracted for PDF fallback)
+- `work/reference/paper.pdf` — paper PDF (still needed for evidence verification via `astra validate --verify-evidence`)
+- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (PARSE may move some of this to `work/reference/`)
+- `work/reference/code/` — clone of the code repo (or absent if not found)
+- `work/reference/code-status.yaml` — record of where the code came from
+
+## Step 1: Acquire the paper text
+
+### Path A — arXiv ID is available (preferred)
+
+Invoke `/managing-bibliography`. Use it to download the arXiv LaTeX source tarball:
+
+```bash
+curl -L -o /tmp/<arxiv-id>.tar.gz "https://arxiv.org/src/<arxiv-id>"
+mkdir -p work/reference/source && cd work/reference/source && tar -xzf /tmp/<arxiv-id>.tar.gz
+ls *.tex
+```
+
+The LaTeX source gives clean equations, captions, tables, and bibliography — none of the math collapse, ligature artifacts, or caption flattening that plagues PDF extraction. Use the main `.tex` file as the primary text source. Render it to markdown if a downstream phase needs that form (`pandoc`, or just preserve TeX where it is).
+
+Also cache the paper for ASTRA's evidence-verification surface:
+
+```bash
+astra paper add 10.48550/arXiv.<arxiv-id>
+cp "$(astra paper path 10.48550/arXiv.<arxiv-id>)" work/reference/paper.pdf
+```
+
+`astra paper add` for arXiv DOIs fetches the PDF directly. The PDF stays as a backup for `astra validate --verify-evidence`, even though the LaTeX source is the primary text.
+
+### Path B — non-arXiv paper (PDF + Docling fallback)
+
+```bash
+astra paper add <DOI>
+cp "$(astra paper path <DOI>)" work/reference/paper.pdf
+file work/reference/paper.pdf
+```
+
+The `file` output must say "PDF document". If it says "HTML document" or anything else, the download was blocked (CAPTCHA, paywall). Search the web for an open-access copy (NASA ADS, arXiv, Unpaywall, Semantic Scholar, the journal's open-access link), download with `curl -L -o work/reference/paper.pdf <url>`, re-validate, then `astra paper add <DOI> --pdf work/reference/paper.pdf` to register the resolved file.
+
+If a valid PDF cannot be obtained, write a clear error to `work/reference/acquire-error.txt` and stop.
+
+Skip Step 1 if `work/reference/paper.pdf` already exists and is a valid PDF.
+
+## Step 2: Search for the code repository
+
+1. Search the paper text for repository URLs — abstract, intro, conclusion, footnotes, "Code Availability" or "Data Availability" sections.
+2. If none found, web search: paper title + "github", Papers With Code, or the first author's GitHub profile.
+3. Clone if found:
+   ```bash
+   git clone --depth 1 <url> work/reference/code
+   ```
+4. Write `work/reference/code-status.yaml`:
+   ```yaml
+   found: true        # or false
+   url: "https://..."  # null if not found
+   cloned: true       # false if found but clone failed
+   notes: "..."
+   ```
+
+Spend no more than a few searches before recording failure and moving on. **Do NOT modify cloned code.**
+
+Skip Step 2 if `work/reference/code/` already exists.
+
+## Survey signals (entry into ACQUIRE)
+
+Run `ls work/reference/` first. If `paper.pdf` and `document.md` (or `source/` for arXiv) are present, ACQUIRE is done. If only `paper.pdf` is present, PARSE handles the rest. If nothing is there, run ACQUIRE.
+
+## Notes
+
+- **arXiv DOI form is `10.48550/arXiv.<id>`.** `astra paper add` accepts that form directly.
+- **Journal DOIs that 403 on Unpaywall** can be aliased to a locally-downloaded arXiv preprint via `astra paper add <JOURNAL_DOI> --pdf <path-to-arxiv-pdf>`.
+- This phase's job is acquisition, not understanding. Do not start summarizing the paper here — that's SUMMARIZE.
diff --git a/claude/lightcone/skills/paper2astra/references/compare.md b/claude/lightcone/skills/paper2astra/references/compare.md
new file mode 100644
index 00000000..ee00a4f3
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/compare.md
@@ -0,0 +1,97 @@
+# COMPARE — judge whether the reproduction matches
+
+Compare reproduced results against the paper's replication targets. Produce a structured verdict the IMPLEMENT-retry loop consumes. COMPARE is the **second mandatory user-ratification seam** — the verdict (was it close enough?) is a judgment the user owns, not the agent.
+
+The constitution's per-phase mode is **always interactive** for this phase. Pause for verdict ratification.
+
+## Inputs
+
+- `targets/targets.md` — target ledger with priorities, expected values, comparison guidance
+- `astra.yaml` — output definitions (each target maps to an output)
+- `targets/` — reference figures / tables for comparison
+- `results/<universe>/<output_id>/` — reproduced results
+
+## Outputs
+
+- `comparison-report.yaml` — structured verdict
+- `comparison-report.md` — human-readable summary
+
+## Sibling skills to invoke
+
+- **`/figure-comparison`** — HTML side-by-side reference vs reproduced figures, with structured judgment per panel. Invoke per figure target. (Nolan's skill; see `../figure-comparison/SKILL.md`.)
+- **`/check-sentence-by-sentence`** — paper-vs-code TeX audit. Use when SPECIFY's evidence quotes need re-verification against the source paper, particularly when COMPARE flags a result as `partial` and the cause may be a misinterpretation of paper text. (Nolan's skill; see `../check-sentence-by-sentence/SKILL.md`.)
+
+## Result path convention
+
+For an output with `id: X`, the reproduced result lives at `results/<universe_id>/X.<ext>`:
+
+- metrics: `.json` containing `{"value": ...}`
+- figures: `.png`
+- tables: `.csv`
+
+## Task
+
+1. **Read `targets/targets.md`.** Every replication target with its priority, expected values, comparison guidance, and the path to its reference file in `targets/`.
+2. **Read `astra.yaml`.** Outputs correspond to targets. Match each target to its output.
+3. **For every target**, find its reproduced result in `results/<universe_id>/` and compare against the reference file in `targets/`. Missing results are `match: false`.
+4. **Write `comparison-report.yaml` and `comparison-report.md`.**
+
+## Comparison guidance
+
+**Metrics.** Judge whether the reproduced value is scientifically equivalent to the expected value from `targets/targets.md`. Numerical tolerance comes from the target's stated precision; bare match is not the bar.
+
+**Figures.** Read the reference figure from `targets/` and compare to the reproduced image. Focus on shape / trend, axis ranges, key features (peaks, inflections, curve ordering), and magnitudes. **Do NOT require pixel-perfect matches** — stochastic methods produce variation. Judge whether the same scientific conclusion follows from both figures. **Use `/figure-comparison`** for HTML side-by-side rendering and structured per-panel judgment.
+
+**Tables.** Compare key values noted in `targets/targets.md` first, then remaining values. Reference tables are in `targets/`.
+
+## Output: `comparison-report.yaml`
+
+```yaml
+verdict: pass|partial|fail
+attempt: <attempt_number>
+outputs:
+  <output_id>:
+    type: metric|figure|table
+    priority: high|medium|low
+    paper_value: "<from targets/targets.md>"
+    reproduced_value: "<from results>"
+    reference_file: "<path in targets/>"
+    reproduced_file: "<results/...>"
+    match: true|false
+    notes: "<what matches, what differs>"
+failure_diagnosis: null|"<root cause>"
+fix_suggestions:
+  - "<specific actionable suggestion with script and line number>"
+```
+
+## Verdict rules
+
+- **`pass`**: ALL high-priority targets match, no major issues with medium-priority.
+- **`partial`**: some high-priority match, or all high-priority match but medium has issues.
+- **`fail`**: most high-priority don't match, or fundamental methodological issue.
+
+If verdict is not `pass`, **`fix_suggestions` MUST reference specific scripts and line numbers**. "The result is wrong" is not actionable; "scripts/bao_fit.py:42 uses `damping_prior=flat`, paper specifies Gaussian; change to gaussian per Howlett+2017 §4.2" is.
+
+Also write `comparison-report.md` with a human-readable summary. For figure / table comparisons, describe what you see in both and explain your match judgment.
+
+## Verdict ratification (the user seam)
+
+After writing the report, surface the verdict to the user via `AskUserQuestion`:
+
+- **If `pass`**: confirm with the user before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Mark reproduction complete?"* The user accepts → SUMMARIZE_RUN runs; the user rejects → name what's still off and re-enter the loop.
+- **If `partial`**: show the user the failing targets and the diagnosis. *"Partial match. <N> outputs failing: <list>. Continue retrying or accept partial?"* If the attempt budget (from the constitution) is reached, this surfacing is mandatory.
+- **If `fail`**: same shape, but the loop's continuation should be questioned more sharply. A fundamental methodological issue may need a constitution amendment, not another implement retry.
+
+The verdict is the agent's judgment; the **decision to keep iterating** is the user's. Default on user silence: continue the loop until the attempt budget is exhausted, then mandatory user surfacing.
+
+## Survey signals (entry into COMPARE)
+
+- All outputs in `lc status --universe baseline` are `ok` ⇒ ready to compare
+- `comparison-report.yaml` exists with current `attempt` ⇒ COMPARE done for this attempt
+- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN
+
+## Notes
+
+- **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-<N>.yaml` if useful, or commit each between iterations so git carries the history.
+- **The verdict is the agent's; the keep-iterating decision is the user's.** Treat them as separate.
+- **`/figure-comparison` is the trustworthy figure-judgment surface.** Direct image diffing without it tends to either over-fail (any pixel-level variation triggers a no-match) or over-pass (it sees that there are *some* shared features and rubber-stamps). The skill's structured prompt is the discipline.
diff --git a/claude/lightcone/skills/paper2astra/references/extract_targets.md b/claude/lightcone/skills/paper2astra/references/extract_targets.md
new file mode 100644
index 00000000..6af2c22b
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/extract_targets.md
@@ -0,0 +1,61 @@
+# EXTRACT_TARGETS — pick the replication targets
+
+Take the results inventory from SUMMARIZE and select the concrete figures, tables, and metrics the reproduction will iterate against. Build a self-contained `targets/` directory the COMPARE phase will measure against.
+
+The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. The selection of replication targets is sometimes obvious (paper has 3 primary figures) and sometimes wants user input (which sub-analyses are in scope).
+
+## Inputs
+
+- `work/notes/methodology.md` — has the results inventory split into primary / secondary
+- `work/reference/metadata.json` — index of figures and tables with captions
+- `work/reference/figures/`, `work/reference/tables/` — the actual extracted artifacts
+
+## Outputs
+
+- `targets/targets.md` — the target ledger
+- `targets/<file>` — copies of selected reference files (figures, tables) so `targets/` is self-contained
+
+## Step 1: Read the results inventory
+
+Read `work/notes/methodology.md`. The results inventory section already separates primary from secondary results and notes which decisions feed into each. **Use this as your starting point** — do not re-analyze the paper from scratch.
+
+## Step 2: Select replication targets
+
+For each result in the inventory, find the corresponding figure, table, or in-text metric in `work/reference/`. Apply the constitution's scope:
+
+- **Primary results should almost always be included.** The constitution's Desired State names them.
+- **Secondary results** should be included only if they are useful checkpoints along the pipeline (i.e., if getting them right helps verify intermediate steps).
+- **Targeted reproduction** (per the constitution): include only the targets in scope. Mark out-of-scope primary results in `targets.md` with a reason.
+
+## Step 3: Populate `targets/`
+
+The `targets/` directory is the self-contained reference set the COMPARE phase consumes.
+
+1. **Copy relevant reference files** from `work/reference/figures/` and `work/reference/tables/` into `targets/`. Only copy the files corresponding to selected targets — not everything.
+
+2. **Write `targets/targets.md`.** For each target, a brief entry:
+
+   - What it is and where its reference file lives in `targets/`
+   - Expected values / trends and how to judge if a reproduction matches
+   - Which decisions from the decision map feed into this result
+   - Whether reference code covers this computation (from `code-analysis.md` if present)
+   - Priority: `primary` or `secondary`
+
+   Keep entries brief — a few lines per target, not paragraphs.
+
+## Rules
+
+- All paths in `targets/targets.md` are relative to `targets/`.
+- For figures: describe scientific content, not just "a plot" — name the panels, the axis ranges, the qualitative shape.
+- For tables: note which specific values matter most.
+- For metrics: quote the exact value from the paper text (with the section / equation / sentence reference).
+
+## Survey signals (entry into EXTRACT_TARGETS)
+
+- `work/notes/methodology.md` exists ⇒ ready to extract targets
+- `targets/targets.md` exists and reference files have been copied ⇒ EXTRACT_TARGETS done
+
+## Notes
+
+- **Targets are coverage obligations, not the spec.** SPECIFY maps each target to its appropriate ASTRA home — outputs for artifacts, findings for claims, inputs / decisions / universe defaults for constants. EXTRACT_TARGETS' job is the ledger; SPECIFY's job is the structural placement.
+- **Out-of-scope targets stay in `targets.md`** with an explicit reason, not silently dropped. The constitution's scope is the source of truth for what's in.
diff --git a/claude/lightcone/skills/paper2astra/references/implement.md b/claude/lightcone/skills/paper2astra/references/implement.md
new file mode 100644
index 00000000..153c6c9d
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/implement.md
@@ -0,0 +1,57 @@
+# IMPLEMENT — write scripts and recipes
+
+Read `astra.yaml` (the spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end.
+
+The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. Most implementation is mechanical (translate spec → script), but algorithm choices on tricky steps may want ratification.
+
+## Inputs
+
+- `astra.yaml` — the structural spec
+- `implementation-notes.md` — tricky algorithms, numerical gotchas, data-format quirks
+- `work/notes/methodology.md` — for context when the spec compresses
+- `work/reference/code/` (if present) — reference code; **read for ambiguity resolution, do not copy verbatim**
+
+## Outputs
+
+- `scripts/<output>.py` (or `.sh`, or whatever fits) — one script per output (or shared scripts for tightly-coupled outputs)
+- `requirements.txt` — Python dependencies
+- Recipes in `astra.yaml` — each output gets a `recipe:` block with `command:` and `inputs:`
+
+## Task
+
+Read `astra.yaml` and `implementation-notes.md`. Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml`.
+
+If `work/reference/code/` exists, **use it as a reference to resolve ambiguities** — but write clean scripts following ASTRA conventions, not verbatim copies of the reference code.
+
+## Data: REAL DATA ONLY
+
+**NEVER generate synthetic, mock, or fake data.** Every input dataset must be downloaded or queried from its real source (archive URL, database query, API, etc.). The methodology notes and `astra.yaml` inputs describe where each dataset comes from — write scripts that fetch the actual data.
+
+The only exception is if the paper itself uses synthetic / simulated data as its input (e.g., N-body simulations, Monte Carlo samples). In that case, reproduce the paper's data generation procedure exactly as described — but this is reproducing the paper's methodology, not substituting real data with fakes.
+
+If a dataset is behind a paywall, requires registration, or is "available upon request," write the download script with a clear error message explaining what the user needs to do manually. **Do NOT substitute synthetic data as a workaround.**
+
+## Rules
+
+1. **One script per output** (or a shared script for tightly-coupled outputs).
+2. **Parameterize by decisions.** Each decision is a CLI argument; scripts also receive `--universe <universe_id>`. See lightcone-cli's `CLAUDE.md` for the full convention.
+3. **Add recipes** to each output in `astra.yaml` with `command:` and `inputs:` (dependencies). Recipe inputs use the same `<analysis>.<output>` form the narrative skill's data-flow rules require.
+4. **Create `requirements.txt`** with needed packages. Do not install them — the RUN phase manages environments.
+5. **Do not execute scripts** — the RUN phase handles execution via `prism run` (now `lc run`).
+6. **Validate** with `astra validate astra.yaml` after adding recipes.
+
+## Retry attempts
+
+If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, the IMPLEMENT iteration is a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. The constitution carries the attempt budget (default 5); the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, surface to the user via `AskUserQuestion` ("verdict still failing after N attempts — continue, change scope, or accept partial?") rather than burning more cycles.
+
+## Survey signals (entry into IMPLEMENT)
+
+- `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement
+- `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done
+- `comparison-report.yaml` returns `pass` ⇒ IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN
+
+## Notes
+
+- **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
+- **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
+- **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/paper2astra/references/interview.md
new file mode 100644
index 00000000..8a7ca8f0
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/interview.md
@@ -0,0 +1,160 @@
+# Interview — drafting the per-paper reproduction constitution
+
+The interview is the only phase paper2astra runs interactively. It happens once per project, up front, before any ralph loop is launched. Its job is to crystallize what the user actually wants — which paper, what scope, which seams want their attention, which they want delegated — and bake that into a constitution the ralph loop can drive.
+
+Use the [`/constitution`](../../constitution/SKILL.md) skill to draft. The interview's job is to *gather* the inputs the constitution needs; the constitution skill carries the discipline of writing it.
+
+---
+
+## What the interview produces
+
+A single markdown file at the project root — by convention `paper2astra-constitution.md` (or whatever name the user prefers). Its YAML frontmatter has `status: open`. Its body has the standard constitution sections: Desired State, Context, Skills, Evidence, Open Questions — populated for *this specific paper*.
+
+After the interview, paper2astra hands this file to ralph:
+
+```bash
+../ralph-loops/scripts/ralph paper2astra-constitution.md
+```
+
+The constitution is the durable artifact; the interview's work product *is* the constitution. There is no separate "interview state" file.
+
+---
+
+## The four jobs
+
+### 1. Identify the paper
+
+Use `AskUserQuestion` if the user did not supply enough on `/paper2astra` invocation:
+
+- **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
+- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.)
+- **User's prior familiarity.** Has the user reproduced this paper before? Read the paper recently? Worked with the original authors? This affects how much of the SUMMARIZE / EXTRACT_TARGETS work needs human ratification.
+- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; SUMMARIZE will read it.
+
+### 2. Scope the reproduction
+
+A paper has many figures, tables, and numbers. The user usually does not want all of them.
+
+Ask:
+
+- **Full reproduction or targeted?** Full = every primary result the paper reports. Targeted = "I only care about figures 3, 4, 7 and the headline number in Table 2." Targeted is cheaper and produces a tighter astra.yaml.
+- **Specific decisions of interest.** A paper makes many choices. The user may care most about a few — e.g. "I want the BAO fit to use a different damping prior than the paper." These become first-class decisions in the spec, with the alternative preserved as a sibling option.
+- **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; SPECIFY will mirror the structure. If the paper is monolithic, one analysis suffices.
+
+These answers live in the constitution's **Desired State** section.
+
+### 3. Choose interactive vs sub-agent per phase
+
+Read the "Per-phase mode" table in `../SKILL.md`. The defaults are reasonable. Walk the user through it briefly:
+
+- **Phases that are always interactive (defaults you should not flip):** SPECIFY, COMPARE. These are the ratification seams; the user has to be reachable.
+- **Phases that are always sub-agent (defaults you should not flip):** SUMMARIZE, LITERATURE. These benefit from parallel fresh-context runs.
+- **Phases the user chooses:** ACQUIRE, PARSE, EXTRACT_TARGETS, REVIEW, IMPLEMENT, RUN. These default to sub-agent (mostly mechanical) but may want user attention if the paper is unfamiliar or the user has strong opinions about implementation.
+
+If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table.
+
+### 4. Draft the constitution
+
+Invoke `/constitution`. Pass in:
+
+- The paper identity (DOI, arXiv ID, code URL)
+- The scope (full vs targeted, sub-analysis structure if known)
+- The per-phase mode table
+- Any prior context the user has shared
+
+The constitution skill carries the discipline of section voice (pointers, not snapshots; constitution, not plan; constraints with reasons). The constitution it produces will look approximately like:
+
+```markdown
+---
+status: open
+---
+
+# Reproduce <paper title> (<arXiv ID>)
+
+## Desired State
+
+A complete `astra.yaml` for <paper> at this workdir, with recipes that
+produce reproduced versions of <list of targets>, validated by
+`astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml`
+verdict `pass` against the targets in `targets/targets.md`.
+
+Non-goals: <e.g., reproducing Figure 12's MCMC stack — out of scope
+because compute too large for available targets>.
+
+## Context
+
+- Paper DOI: <doi>
+- arXiv ID: <id>; LaTeX source acquisition path is the primary
+- Code repo: <url> (or "to be searched in ACQUIRE")
+- Workdir layout: standard Paper2ASTRA conventions —
+  `work/reference/`, `work/notes/`, `targets/`, `astra.yaml`,
+  `universes/`, `results/`
+- Per-phase mode:
+  | Phase | Mode |
+  |---|---|
+  | ACQUIRE | sub-agent |
+  | PARSE | sub-agent |
+  | SUMMARIZE | sub-agent |
+  | EXTRACT_TARGETS | <per user> |
+  | LITERATURE | sub-agent |
+  | SPECIFY | interactive |
+  | REVIEW | <per user> |
+  | IMPLEMENT | <per user> |
+  | RUN | <per user> |
+  | COMPARE | interactive |
+  | SUMMARIZE_RUN | sub-agent |
+
+## Skills
+
+- `/paper2astra` — this skill (the orchestrator)
+- `/managing-bibliography` — ACQUIRE
+- `/narrative` — SPECIFY
+- `/check-sentence-by-sentence`, `/figure-comparison` — COMPARE
+
+## Evidence
+
+- `ls work/reference/document.md` — ACQUIRE + PARSE done
+- `ls work/notes/methodology.md` — SUMMARIZE done
+- `ls targets/targets.md` — EXTRACT_TARGETS done
+- `ls astra.yaml && astra validate astra.yaml` — SPECIFY done and valid
+- `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs
+- `ls comparison-report.yaml && yq '.verdict' comparison-report.yaml` — most-recent COMPARE verdict
+- `git log --oneline` — chronological view of phase commits
+
+The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or
+attempt budget (default 5) is exhausted.
+
+## Open Questions
+
+(empty — populated as the loop runs and surfaces material conflicts
+the user must ratify)
+```
+
+Show the draft, take corrections, refine. When the user is happy:
+
+- Save the constitution at the project root
+- Tell the user how to launch the loop: `../ralph-loops/scripts/ralph paper2astra-constitution.md`
+- Optionally launch it for them if they say yes
+
+The interview ends here. Subsequent work happens inside ralph iterations.
+
+---
+
+## Discipline
+
+- **The interview is short.** Do not turn it into a full paper-summarization session. The user does not need to teach you the paper — they need to tell you what they want reproduced. Three to five `AskUserQuestion` rounds, total. If the user is grinding through detail, gently steer back to scope.
+- **The constitution is the work product.** Do not file separate "interview notes" or "scope document" files. Everything goes into the constitution.
+- **The defaults are the path.** When the user says "I don't know, you choose," take the defaults from the per-phase mode table. The defaults reflect what the loops have learned about which seams matter.
+- **One paper at a time.** A single constitution covers one paper. If the user wants two, run the interview twice — two constitutions, two ralph loops, two project workdirs.
+
+---
+
+## When the interview gets stuck
+
+Most failure modes resolve into "the user has not yet decided what 'reproduce' means for them." If the conversation is circling, ask one of these directly:
+
+- *"If we ran this and it produced figure 3 plus the headline number in Table 2, would you be done?"* — pins targeted vs full.
+- *"Is there a specific decision in the paper you want to vary, or are we trying to match the paper exactly?"* — pins whether universes need to span alternatives.
+- *"Do you want to look at every paper-vs-code conflict, or just the ones I think are material?"* — pins SPECIFY mode.
+
+When all three answer cleanly, the constitution writes itself.
diff --git a/claude/lightcone/skills/paper2astra/references/literature.md b/claude/lightcone/skills/paper2astra/references/literature.md
new file mode 100644
index 00000000..07c77814
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/literature.md
@@ -0,0 +1,163 @@
+# LITERATURE — extract prior insights from cited papers
+
+For each cited paper that informed a methodological decision, extract evidence-quote-backed insights and link them to the relevant decisions and options. Synthesize across papers into `work/notes/literature.yaml`, which SPECIFY consumes when authoring `astra.yaml`'s `prior_insights` block.
+
+The constitution's per-phase mode is **always sub-agent** for this phase. Spawn one Task-tool sub-agent per cited paper for parallel extraction; spawn a final sub-agent for synthesis. This is pure parallel grunt-work.
+
+## Inputs
+
+- `work/notes/cited_papers.yaml` — the list of papers to mine, from SUMMARIZE
+- `work/notes/methodology.md` — has the decision map; each per-paper sub-agent gets it as context
+- `work/reference/document.md` — the target paper (for reference)
+
+## Outputs
+
+- `work/notes/literature/<doi-slug>.yaml` — one file per cited paper (per-paper extraction)
+- `work/notes/literature.yaml` — synthesized merged view (final output)
+
+## Per-paper extraction sub-agent — system prompt
+
+> You are an ASTRA insight extraction agent with self-validation capability. Your task is to extract scientific insights from a single cited paper that bear on specific methodological decisions already identified in the target paper.
+>
+> ### Instructions
+>
+> 1. Read the PDF at the path provided below using the Read tool.
+> 2. Review the decision map provided below — these are the specific decisions you are looking for evidence about.
+> 3. Scan the cited paper for findings that support, contradict, or compare the options listed in those decisions. Focus on:
+>    - Empirical comparisons between approaches listed as decision options
+>    - Performance benchmarks or validation results relevant to the choices
+>    - Recommendations or caveats about specific methods/parameters
+> 4. For each relevant finding, extract:
+>    - A clear claim (1–2 sentences stating what we learned)
+>    - An exact quote from the paper (verbatim, 1–3 sentences)
+>    - The page number where the quote appears
+>    - Prefix and suffix context — REAL surrounding text from the page (~20–100 chars each), used to disambiguate the quote among similar passages. This follows the W3C TextQuoteSelector convention: prefix and suffix are literal substrings of the source page, NOT editorial parentheticals. Wording like "(Section 3.1 of Foo+19)" or "(see Figure 4)" will fail verification because the validator concatenates `prefix + quote + suffix` and matches against actual page text.
+> 5. Cache the paper so spec-level verification can find it (see below).
+> 6. Write the extracted insights as YAML to the specified output file.
+>
+> ### Caching the source PDF
+>
+> Before extraction completes, register each paper with the validator's PDF cache so downstream evidence verification can find it:
+>
+> ```bash
+> astra paper add "<DOI>"
+> ```
+>
+> For arXiv DOIs (`10.48550/arXiv.<id>`) this fetches directly. Journal DOIs that 403 on Unpaywall can be aliased to a locally-downloaded arXiv preprint:
+>
+> ```bash
+> astra paper add "<JOURNAL_DOI>" --pdf <path-to-arxiv-pdf>
+> ```
+>
+> ### Quote fidelity rules
+>
+> Quotes are NOT verified during this per-paper extraction phase — verification is spec-level (`astra validate astra.yaml --verify-evidence`) and runs once SPECIFY has authored `astra.yaml` referencing each paper. Your job here is to extract quotes that will pass that verification cleanly. The checks are:
+>
+> - Each `exact` quote must be present on the cited page, fuzzy-matched at RapidFuzz `partial_ratio` ≥ 70. Copy verbatim from the PDF; do not paraphrase, normalize whitespace, or strip mathematical typesetting.
+> - The validator concatenates `prefix + quote + suffix` and matches that against the page text at a context score ≥ 80. Choose prefix/suffix as REAL surrounding page text (W3C TextQuoteSelector convention), not editorial commentary. Wording like "(Section 3.1 of Foo+19)" or "(see Figure 4)" silently lowers the context score below threshold even when the quote itself is in the PDF.
+> - Avoid YAML `|` block-literal style for `exact`, `prefix`, and `suffix` values: embedded newlines from block-literal folding can mishandle the context-score concatenation. Single-line strings or `>` folded-block style are safer.
+> - Math-formula quotes (with superscripts, subscripts, inline footnote markers) are likely to fail because the PDF text extractor collapses these. Quote the surrounding English narrative instead, or skip that piece of evidence if a sibling quote already establishes the finding.
+>
+> The verification cache is keyed by `(doi, version, sha256(quote_text))` plus `pdf_sha256`, so any edit to a quote in the eventual YAML automatically invalidates that entry — there is no need to delete the cache between runs.
+>
+> ### Quote granularity and finding attribution
+>
+> - **Quotes carry the claim on their own.** A four-word fragment ("two widely used fitting codes", "the actual quantity being fit") satisfies fuzzy-match but fails the reader: lift the quote out of context and the claim it supports must still stand. The validator is happy with any string that fuzzy-matches; a downstream agent or human reader following the evidence pointer needs to learn what the paper actually said. Default to full sentences with TeX-anchored prefix/suffix; split a long passage into two evidence rows rather than truncate a quote into a fragment that depends on context. Fragments creep in at exactly the spots where inline math forces shrinking, which is also where claims hide.
+> - **Cross-section methodology gets separate insights.** When a paper's relevant methodology is split across multiple sections — a methods chapter defining a tool, a results chapter setting a threshold, an application chapter running it — file one insight per piece, each citing the section where that piece is *defined*. Do not collapse all the borrowed pieces into the application section's number. The application section gets all the credit and the methodology section disappears, which is a real fidelity-sweep failure mode.
+>
+> ### Output format
+>
+> Write ONLY this YAML structure to the output file. No other text.
+>
+> ```yaml
+> insights:
+>   <insight_id>:
+>     id: <insight_id>
+>     claim: "<What we learned from this finding>"
+>     created_at: "<ISO 8601 timestamp>"
+>     evidence:
+>       - id: ev1
+>         doi: "<DOI>"
+>         quote:
+>           type: TextQuoteSelector
+>           exact: "<exact quote from paper, verbatim>"
+>           prefix: "<~20-100 chars of REAL surrounding text BEFORE the quote>"
+>           suffix: "<~20-100 chars of REAL surrounding text AFTER the quote>"
+>         location:
+>           type: FragmentSelector
+>           page: <page number>
+>     scope: "<when this applies -- optional>"
+>
+> decision_links:
+>   <decision_id>:
+>     <option_id>:
+>       - <insight_id>
+> ```
+>
+> ### Rules
+>
+> - Use `lowercase_with_underscores` for insight IDs.
+> - Quotes must be EXACT — copy verbatim from the PDF, no paraphrasing or whitespace normalization.
+> - Prefix and suffix must be real surrounding page text, not editorial parentheticals.
+> - One claim per insight — do not combine multiple findings.
+> - Only extract insights relevant to the target decisions listed below.
+> - If no relevant insights found, write `insights: {}` and `decision_links: {}`.
+> - prefix and suffix are REQUIRED for every TextQuoteSelector.
+
+## Synthesis sub-agent — system prompt
+
+> You are a literature synthesis agent. Read all per-paper extraction YAML files in `work/notes/literature/` and merge them into a single `work/notes/literature.yaml` that consolidates insights from all cited papers.
+>
+> ### Task
+>
+> 1. Read all per-paper YAML files in `work/notes/literature/`.
+> 2. Merge insights, de-duplicating where multiple papers support the same claim.
+> 3. Merge decision links across all papers.
+> 4. Write the consolidated output to `work/notes/literature.yaml`.
+>
+> ### Output format
+>
+> ```yaml
+> prior_insights:
+>   <insight_id>:
+>     id: <insight_id>
+>     claim: "<What the literature says>"
+>     evidence:
+>       - id: e1
+>         doi: "<DOI of source paper>"
+>         quote:
+>           type: TextQuoteSelector
+>           exact: "<Exact quote from paper>"
+>           prefix: "<~20-100 chars before>"
+>           suffix: "<~20-100 chars after>"
+>         location:
+>           type: FragmentSelector
+>           page: <page number>
+>     scope: "<When this applies -- optional>"
+>
+> decision_links:
+>   <decision_id>:
+>     <option_id>: [insight_id1, insight_id2]
+> ```
+>
+> ### Rules
+>
+> - Preserve all verified evidence exactly as-is (do not rewrite quotes).
+> - When two papers support the same claim, merge their evidence lists under a single insight entry.
+> - When papers support different but related claims, keep them as separate insights.
+> - `decision_links` should map decision IDs to option IDs to lists of insight IDs. Merge across all papers so each decision collects all relevant insights.
+> - Use consistent insight IDs (`lowercase_with_underscores`).
+> - Drop any insights that had zero verified quotes.
+> - If no papers produced insights, write `prior_insights: {}` and `decision_links: {}`.
+
+## Survey signals (entry into LITERATURE)
+
+- `work/notes/cited_papers.yaml` exists ⇒ ready to extract
+- `work/notes/literature/` directory has one YAML per paper in `cited_papers.yaml` ⇒ extraction done
+- `work/notes/literature.yaml` exists ⇒ synthesis done; LITERATURE complete
+
+## Notes
+
+- **Run per-paper extractions in parallel.** One sub-agent per entry in `cited_papers.yaml`. They are fully independent.
+- **Synthesis is a single sub-agent.** It reads everything in `work/notes/literature/` and writes one merged `literature.yaml`.
+- **Resume is automatic.** If `work/notes/literature/<doi-slug>.yaml` already exists, skip the per-paper extraction for that paper. The synthesis re-runs whenever new per-paper files appear.
diff --git a/claude/lightcone/skills/paper2astra/references/parse.md b/claude/lightcone/skills/paper2astra/references/parse.md
new file mode 100644
index 00000000..b14c2d78
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/parse.md
@@ -0,0 +1,79 @@
+# PARSE — structure the paper
+
+Turn the acquired paper into structured artifacts the rest of the pipeline can consume: markdown text, individual figures, individual tables, and a metadata index. This is mostly a deterministic pre-processing step.
+
+The constitution's per-phase mode controls interactive vs sub-agent. Default is sub-agent.
+
+## Inputs
+
+- `work/reference/source/` — arXiv LaTeX source tree (Path A from ACQUIRE), or
+- `work/reference/paper.pdf` — PDF (Path B fallback)
+
+## Outputs
+
+- `work/reference/document.md` — paper as markdown
+- `work/reference/figures/` — extracted figures (PNG / PDF / vector)
+- `work/reference/tables/` — extracted tables (CSV when machine-readable, MD otherwise)
+- `work/reference/metadata.json` — index of figures and tables with captions and page numbers
+
+## Path A — arXiv LaTeX source (when `work/reference/source/` exists)
+
+The LaTeX source is already structured — sections are `\section{}`, equations are TeX, figures cite their files by name, tables are `tabular` environments. Convert to markdown while preserving equation TeX:
+
+```bash
+# Find the main file (usually has \documentclass at the top)
+grep -l '\\documentclass' work/reference/source/*.tex
+
+# Convert with pandoc, preserving math and structure
+pandoc -f latex -t markdown -o work/reference/document.md work/reference/source/<main>.tex
+```
+
+Adjust pandoc invocation if the main file uses `\input{}` heavily — pandoc resolves them when run from the right cwd. Verify the output by reading the first ~200 lines and checking the section structure looks sensible.
+
+Extract figure files from the source tree into `work/reference/figures/`:
+
+```bash
+mkdir -p work/reference/figures
+# Copy referenced figure files; common extensions are .pdf .png .eps .jpg
+find work/reference/source -type f \( -name "*.pdf" -o -name "*.png" -o -name "*.eps" -o -name "*.jpg" \) \
+    -not -path "*/aux/*" -exec cp {} work/reference/figures/ \;
+```
+
+For tables, the LaTeX `tabular` blocks remain as TeX inside the rendered markdown. If a downstream phase needs them as CSV, extract them on demand.
+
+Build `work/reference/metadata.json` — index of figures and tables. The structure:
+
+```json
+{
+  "figures": [
+    {"id": "fig1", "caption": "...", "file": "figures/fig1.pdf", "label": "fig:bao"}
+  ],
+  "tables": [
+    {"id": "tab1", "caption": "...", "file": "tables/tab1.csv", "label": "tab:results"}
+  ]
+}
+```
+
+The `label` field is the LaTeX `\label{}` so SPECIFY's anchor work and EXTRACT_TARGETS' selection can both reference the same artifact.
+
+## Path B — PDF fallback (when `work/reference/source/` does not exist)
+
+Use Docling — the lightcone-cli stack ships its CLI:
+
+```bash
+# Run Docling against the PDF; outputs into work/reference/
+docling --output work/reference work/reference/paper.pdf
+```
+
+Docling produces `document.md`, `figures/`, `tables/`, and `metadata.json` with the same shape Path A produces.
+
+If Docling fails, the PDF may be corrupt — re-run ACQUIRE's download step before giving up.
+
+## Survey signals (entry into PARSE)
+
+If `work/reference/document.md` exists and `work/reference/metadata.json` exists, PARSE is done — proceed to SUMMARIZE.
+
+## Notes
+
+- **Path A is preferred whenever arXiv source was acquired.** PDF + Docling is the fallback for non-arXiv papers, not the default. The bundle's design philosophy is that math, ligatures, and caption fidelity are easier from LaTeX source than from re-extracted PDF text.
+- **Equation numbers and section numbers must match the rendered paper.** Whether you use Path A or Path B, downstream phases (SPECIFY's evidence quotes, COMPARE's references) cite "eq. N" or "§N" by the printed number. Verify by spot-checking against the PDF.
diff --git a/claude/lightcone/skills/paper2astra/references/review.md b/claude/lightcone/skills/paper2astra/references/review.md
new file mode 100644
index 00000000..13363378
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/review.md
@@ -0,0 +1,79 @@
+# REVIEW — pre-implementation sanity check
+
+Verify that the ASTRA specification is complete, consistent, and ready for the IMPLEMENT phase. REVIEW edits the spec in place when fixes are obvious; it surfaces gaps to the user (or as Open Questions) when judgment is required.
+
+The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. REVIEW is mostly mechanical (cross-reference, validation), so sub-agent suits it; but a paper that hits the SPECIFY conflict-surfacing path heavily may want REVIEW interactive too.
+
+## Inputs
+
+- `astra.yaml` — the spec from SPECIFY
+- `universes/baseline.yaml`
+- `implementation-notes.md`
+- `work/notes/methodology.md`
+- `targets/targets.md`
+- `work/reference/document.md` (Grep into; do not re-read whole)
+- `work/notes/literature.yaml` (if present) — for evidence verification
+
+## Outputs
+
+- In-place edits to `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` as needed
+- No new files unless a missing data-acquisition path needs to be flagged with content
+
+## Checks
+
+1. **Target coverage.** Every replication target from `targets/targets.md` must appear as an output (or finding, or input/decision/universe default) in `astra.yaml`. Any missing target either gets added or earns an explicit out-of-scope reason in `targets.md`.
+
+2. **Output definitions.** Each output has a clear `type` and sufficient description.
+
+3. **Methodology detail.** Cross-check `work/notes/methodology.md` against the spec for gaps: missing hyperparameters, underspecified algorithms, vague data-processing steps. Re-read targeted sections of the paper to fill them in. Use Grep on `work/reference/document.md` rather than re-reading the whole thing.
+
+4. **Decisions.** Decisions should cover what actually affects reproducibility. Remove cosmetic choices; add anything material that is missing. Ensure `universes/baseline.yaml` stays consistent.
+
+5. **Data obtainability.** Every data source needs a concrete path (URL, package name, or generation code). Flag anything vague or "available upon request."
+
+6. **Data acquisition.** Every input in `astra.yaml` must have a concrete acquisition path — a download URL, database query, API call, or package name. Verify that `methodology.md` documents how to obtain each dataset. Flag any dataset that is vague so IMPLEMENT knows what to handle.
+
+7. **Implementation notes.** Check `implementation-notes.md` for completeness — does it flag the tricky parts? Add anything IMPLEMENT should know.
+
+8. **Evidence verification.** If `work/notes/literature.yaml` exists, run:
+   ```bash
+   astra validate astra.yaml --verify-evidence
+   ```
+   This verifies that all prior-insight quotes match the source PDFs. Flag any misquotes or unsupported claims; these typically arise when a quote was paraphrased or when prefix/suffix carry editorial commentary instead of real surrounding text.
+
+## Fixes
+
+Edit files directly. After any change to `astra.yaml`, run:
+
+```bash
+astra validate astra.yaml
+```
+
+## CRITICAL: No synthetic data
+
+Unless the paper itself uses synthetic / simulated data as input, the pipeline must use **real data only**. Check that:
+
+- Every `astra.yaml` input has a real acquisition source (URL, query, etc.)
+- `implementation-notes.md` does NOT suggest generating mock / synthetic data
+- The methodology notes describe real data sources with concrete download paths
+
+If any input lacks a concrete acquisition path, add one by searching the paper for URLs, DOIs, or archive references. If the data truly cannot be obtained programmatically, document this clearly in `implementation-notes.md` so IMPLEMENT writes a script that fails with a helpful message rather than silently substituting fake data.
+
+## Rules
+
+- Use Grep to search `work/reference/document.md` for specific claims to verify — do not read the entire markdown at once. Work primarily from notes and the spec.
+- **Minimize churn** — don't restructure or rename unnecessarily.
+- If everything looks good, say so briefly; don't invent problems.
+- Do **NOT** add implementation recipes — that is IMPLEMENT's job.
+
+## Survey signals (entry into REVIEW)
+
+- `astra.yaml` exists and validates ⇒ ready to review
+- `astra validate astra.yaml --verify-evidence` returns clean (when literature.yaml exists) ⇒ evidence side done
+- All `targets/targets.md` entries map to spec homes (output / finding / input / decision / universe default) ⇒ coverage side done
+- Both ⇒ REVIEW complete; proceed to IMPLEMENT
+
+## Notes
+
+- **REVIEW does not write code.** Its outputs are edits to the spec and additions to `implementation-notes.md`, not new scripts.
+- **A clean REVIEW reduces IMPLEMENT thrash.** It is worth running even when the spec looks fine after SPECIFY — the cross-check catches "looks fine in isolation, breaks under full coverage" gaps.
diff --git a/claude/lightcone/skills/paper2astra/references/run.md b/claude/lightcone/skills/paper2astra/references/run.md
new file mode 100644
index 00000000..7f3240ef
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/run.md
@@ -0,0 +1,56 @@
+# RUN — execute the recipes
+
+Materialize every output in `astra.yaml` for the requested universe. RUN is mostly mechanical — `lc run --universe <id>` does the heavy lifting. The phase exists as a discrete step so failures get diagnosed and re-run before COMPARE.
+
+The constitution's per-phase mode is **user choice** — defaults to sub-agent. Failures may want diagnosis support; the user chooses based on how much trust they have in IMPLEMENT's first pass.
+
+## Inputs
+
+- `astra.yaml` with recipes (from IMPLEMENT)
+- `universes/<universe_id>.yaml` — defaults to `baseline`
+
+## Outputs
+
+- `results/<universe_id>/<output_id>/` for every output declared in `astra.yaml`
+
+## Task
+
+Execute all recipes:
+
+```bash
+lc run --universe baseline
+```
+
+(Use whatever the constitution's `universe` field says; `baseline` is the default.)
+
+Check status:
+
+```bash
+lc status --universe baseline
+```
+
+Status states are `ok` (materialized), `pending` (has recipe, not run), `no_recipe` (declared, no recipe — bug). Every output declared in `astra.yaml` must reach `ok`.
+
+If outputs fail:
+
+1. **Read the script's error.** `results/<universe>/<output>/.log` (or wherever the runner emits stderr) usually has the message.
+2. **Diagnose.** Common failures: missing data dependency (a referenced URL changed; the data archive moved), missing Python package (`requirements.txt` was incomplete), spec / script mismatch (the recipe's `inputs:` does not match what the script reads).
+3. **Fix.** Edit the script or `requirements.txt` or the spec, whichever applies.
+4. **Re-run.** `lc run --universe baseline` resumes from where things failed; it does not re-execute already-materialized outputs.
+5. **Repeat** until all outputs are `ok`.
+
+## Rules
+
+- **Always use `lc run`** — do not run scripts directly. The runner manages dependencies, environments, and artifact paths; bypassing it produces inconsistent results.
+- **Re-runs are idempotent.** `lc run` skips outputs that are already materialized. To force re-execution, the runner has a flag for that — check `lc run --help`.
+- **Failures stay failures until fixed.** Do not "move on" past a failed output by editing it out of `astra.yaml`. Either fix the script or surface the failure as a constitution Open Question and stop.
+
+## Survey signals (entry into RUN)
+
+- `astra.yaml` has recipes and validates ⇒ ready to run
+- `lc status --universe baseline` returns all `ok` ⇒ RUN done; proceed to COMPARE
+
+## Notes
+
+- The runner backend (Docker / local / SLURM) comes from the project's target configuration — `~/.lightcone/config.yaml` and `.lightcone/lightcone.yaml`. RUN does not need to choose; the runner picks based on config.
+- For long-running computations, the script's stdout / stderr stream into the result directory's log file. The phase agent should `tail` the log file to monitor progress, not poll `lc status` repeatedly.
diff --git a/claude/lightcone/skills/paper2astra/references/specify.md b/claude/lightcone/skills/paper2astra/references/specify.md
new file mode 100644
index 00000000..655cd575
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/specify.md
@@ -0,0 +1,105 @@
+# SPECIFY — author the ASTRA spec
+
+Read the paper and accumulated notes; produce the structured ASTRA spec, the baseline universe, and the implementation notes. SPECIFY is the **first mandatory user-ratification seam** — material paper-vs-code conflicts surface here and require user input.
+
+The constitution's per-phase mode is **always interactive** for this phase. The user must be reachable.
+
+## Inputs
+
+- `work/notes/methodology.md` — decision map, results inventory, data sources
+- `work/notes/code-analysis.md` (if present) — code structure, parameter values
+- `work/notes/literature.yaml` (if present) — prior insights with evidence quotes and decision links
+- `work/reference/document.md` — paper text (Grep into; do not re-read whole)
+- `work/reference/figures/`, `work/reference/tables/` — extracted artifacts
+- `work/reference/metadata.json` — figure / table index
+- `targets/targets.md` — selected replication targets
+- `work/notes/notes.md` — user-supplied context (read by every phase if present)
+
+## Outputs
+
+1. **`astra.yaml`** — the full ASTRA specification
+2. **`universes/baseline.yaml`** — exactly the paper's choices (where paper and code disagree, see "Material conflicts" below)
+3. **`implementation-notes.md`** — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
+
+## Substrate skills to invoke
+
+- **`/narrative`** — narrative authoring (any of the five `narrative.{summary,inputs,methods,findings,outputs}` keys, plus decision `rationale:` fields) is owned by the narrative skill. Invoke it when authoring the prose. The narrative skill teaches reserved entity names, the tree-path anchor grammar, the conditional-narrative requirement (which keys are required when), the five-key authoring order, paper-reproduction fidelity discipline, and the new downstream-consumer discipline (lightcone-cli#108). Do not duplicate that content.
+
+Your responsibility in this phase is the **structure**: build a spec whose entities are narrative-ready (human-readable labels, no ID collisions with reserved names, sub-analysis IDs as noun phrases) so `/narrative` can author cleanly downstream.
+
+## Decisions
+
+The notes identify many candidate decisions. Include every choice where a different defensible option could plausibly shift a numerical result — algorithmic methods, thresholds, statistical approaches, data selection criteria, calibration choices.
+
+Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. Use `when`, `incompatible_with`, and `requires` constraints for non-independent decisions. A typical analysis has 8–20 decisions; if you have fewer than 5, revisit `methodology.md` and reconsider what you excluded.
+
+## Prior insights from literature
+
+If `work/notes/literature.yaml` exists, incorporate its `prior_insights` into `astra.yaml`. Use the `decision_links` mapping to attach each insight to the relevant decision options, so the multiverse captures evidence-backed alternative choices from the literature.
+
+## Target coverage
+
+Targets are coverage obligations, not necessarily outputs. Map each target to the right ASTRA home:
+
+- **Figures, tables, equations-as-artifacts, generated data products** → `outputs`
+- **Paper-level claims and quantitative results** → `findings` with source-anchored evidence
+- **Constants and configuration values** → `inputs`, `decisions`, `universes/baseline.yaml`
+
+Out-of-scope targets stay in `targets/targets.md` with an explicit reason and should not be forced into the spec. Keep the target ledger's "spec home" pointers specific enough that a later reviewer can tell which claim was discharged where.
+
+---
+
+## Material conflicts — the user-ratification seam
+
+When `methodology.md` or `code-analysis.md` mentions a paper-vs-code disagreement, **classify it before writing**:
+
+- **Material**: a different choice would plausibly change a numeric result the paper reports.
+- **Stylistic / cosmetic / pure-tooling**: not material — record in `implementation-notes.md` and move on.
+
+For **material** conflicts, the SPECIFY phase pauses and surfaces the conflict to the user via `AskUserQuestion`. Present:
+
+- The paper's stated method (with quote / section reference)
+- The code's actual method (with file / line reference)
+- The plausible impact ("changes the BAO peak amplitude by ~5%")
+- Three options: paper, code, *something else* (custom, with the user's choice spelled out)
+
+**Default on user silence is paper.** If the AskUserQuestion times out or the user declines to choose, the universe selects the paper's method. The override (paper-vs-code conflict, what was selected, why) is preserved in `astra.yaml` as:
+
+- A `decisions:` entry with both options preserved
+- The `universes/baseline.yaml` selecting whichever option the user chose
+- A finding (or an insight if the conflict matters for replication discipline broadly) that records the conflict with quote / line evidence
+
+This makes the override surface in any later review of the spec — *"the paper says X, the code does Y, the user chose Z, here's why."* The fidelity-of-prose side of this (voice seams, hedge preservation, evidence-quote verification) is the `/narrative` skill's job.
+
+---
+
+## Sub-analysis structure
+
+Split into sub-analyses **only if the paper has genuinely independent analysis stages**. Examples:
+
+- A reconstruction stage that produces a catalog consumed by a clustering stage which produces inputs to a BAO fit — three sub-analyses.
+- A monolithic analysis that runs end-to-end with no clean intermediate handoff — one analysis.
+
+Sub-analysis IDs should be **noun phrases** (not verb phrases): `reconstruction`, `clustering`, `bao_fit`. Avoid reserved names (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
+
+When sub-analyses exist, the root narrative MUST include a top-down end-to-end data-flow paragraph (per the narrative skill's data-flow rules — closes lightcone-cli#108).
+
+## Other rules
+
+- **Do NOT add executable implementation code or invented run commands.** Do add concise provenance / recipe descriptions where ASTRA fields support them, especially for paper-derived calculations, figure generation, imported constants, and values that IMPLEMENT will need to regenerate.
+- **Equation and section numbers must match the rendered paper / PDF**, not a naïve count of TeX blocks or markdown headings. When citing "eq. N" or "§N", find the equation or heading by content in the rendered paper and use the printed number.
+- **When adding finding evidence**, verify the quoted text against the paper source by Grep or PDF search. `astra validate --verify-evidence` currently verifies `prior_insights` evidence; artifact-anchored `findings` evidence still needs a manual quote check.
+- **Validate** with `astra validate astra.yaml` and fix until it passes.
+- **Work primarily from `work/notes/`** — SUMMARIZE has already distilled the paper. Use `work/reference/document.md` only to look up specific details (Grep for terms, or read targeted sections with offset/limit). Do not read the entire markdown at once.
+
+## Survey signals (entry into SPECIFY)
+
+- `work/notes/methodology.md` exists; `targets/targets.md` exists ⇒ ready to specify
+- `astra.yaml` exists; `astra validate astra.yaml` returns clean ⇒ structural SPECIFY done
+- `implementation-notes.md` exists ⇒ practical-guidance side done
+- Both ⇒ SPECIFY complete; proceed to REVIEW
+
+## Notes
+
+- **Material conflicts that the user explicitly defers** become `Open Questions` in the constitution. The next iteration sees them and either re-surfaces them or notes their continued deferral.
+- **The narrative skill is the prose author, not the structure author.** SPECIFY's job is structural correctness; `/narrative` invocation comes after the structural skeleton exists.
diff --git a/claude/lightcone/skills/paper2astra/references/summarize.md b/claude/lightcone/skills/paper2astra/references/summarize.md
new file mode 100644
index 00000000..6df42b52
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/summarize.md
@@ -0,0 +1,120 @@
+# SUMMARIZE — extract methodology, decisions, and results inventory
+
+Read the parsed paper and (in parallel, when present) the reference code, and extract everything the SPECIFY phase will need to author `astra.yaml`. The substance lives in `work/notes/methodology.md`, `work/notes/cited_papers.yaml`, and (when code exists) `work/notes/code-analysis.md`.
+
+The constitution's per-phase mode is **always sub-agent** for this phase. Spawn one Task-tool sub-agent for the paper analysis and (in parallel) a separate sub-agent for the code analysis if `work/reference/code/` exists. Each sub-agent gets fresh context and writes one file.
+
+## Inputs
+
+- `work/reference/document.md` — paper as markdown (from PARSE)
+- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json`
+- `work/reference/code/` — code repo, if cloned
+
+## Outputs
+
+- `work/notes/methodology.md` — decision map + results inventory + data sources
+- `work/notes/cited_papers.yaml` — papers worth following up on for prior insights
+- `work/notes/code-analysis.md` — code structure (only when `work/reference/code/` exists)
+
+---
+
+## Paper sub-agent — system prompt
+
+> You are a research paper analysis agent. Your job is to read a parsed paper and extract everything needed to reproduce the analysis.
+>
+> ### Approach
+>
+> Read `work/reference/document.md` **section by section** — do not try to read the entire file at once. Start by scanning the headers to understand the structure, then work through each section in order.
+>
+> **Write as you go.** After reading each section, immediately update `work/notes/methodology.md` and `work/notes/cited_papers.yaml` with what you learned. Do not wait until the end — build the outputs incrementally. This ensures partial progress is saved and forces you to consolidate your understanding at each step.
+>
+> Skip acknowledgments and author affiliations. Do read the references section — you will need it to resolve citations to DOIs.
+>
+> ### What to extract
+>
+> As you read each section, look for:
+>
+> - **Data sources** — every external dataset, catalog, survey, or archive the paper uses as input. For each one, record the exact name/version, where to obtain it (URL, database query, package name), and any selection criteria or quality cuts applied. This is critical — the implement phase must download real data, not generate synthetic substitutes.
+> - **Decisions** — every choice that shaped the analysis (methods, parameters, data cuts, calibrations, etc.) and *what informed each one* (a cited paper, a physical argument, an empirical finding, internal results from the paper).
+> - **Results** — numeric values, figures, tables; which are the paper's core claims vs. supporting/diagnostic outputs.
+> - **Key references** — cited papers that actually influenced methodology (not general background).
+>
+> ### Output format — `work/notes/methodology.md`
+>
+> #### Decision map (most important)
+>
+> A complete list of every decision that shaped the analysis, grouped by pipeline stage. For each decision:
+>
+> - **What** was chosen (the specific value, method, or approach)
+> - **Why** — what informed the choice: cite the specific paper, physical argument, or empirical finding. Use the citation as it appears in the text (e.g., "Freedman et al. 2020"). This is critical — decisions without traced justifications are much harder to reproduce.
+> - **Alternatives** — what else could have been chosen, if mentioned
+>
+> #### Results inventory
+>
+> List the paper's outputs, separated into:
+>
+> - **Primary results** — the core claims; what you'd check to evaluate whether the work was reproduced. Flag which are most important.
+> - **Secondary results** — supporting/diagnostic outputs.
+>
+> For each result, note which decisions feed into it and the expected values.
+>
+> #### Data sources (critical)
+>
+> For **every** external dataset the paper uses, document:
+>
+> - **Name and version** (e.g., "OGLE-III SMC LPV catalog, Soszynski+2011")
+> - **How to obtain it** — exact URL, database query (with SQL if applicable), API endpoint, or package name. Be as specific as possible.
+> - **Selection criteria** — any spatial, magnitude, quality, or flag cuts applied to the raw data.
+> - **Format** — what columns/fields are used downstream.
+>
+> This section is essential. The implement phase will use it to write data download scripts. If acquisition details are vague in the paper, flag this explicitly so the review phase can investigate further.
+>
+> #### Additional context (brief)
+>
+> - Software and dependencies — languages, libraries, versions mentioned.
+>
+> ### Output format — `work/notes/cited_papers.yaml`
+>
+> ```yaml
+> papers:
+>   - doi: "10.xxxx/yyyy"
+>     citation: "Smith et al. (2020)"
+>     relevance: "One-line description of why this paper matters for replication"
+> ```
+>
+> **Include** papers that: informed a methodological decision, provided a method or algorithm the paper builds on, contain calibration data or corrections the paper applies.
+>
+> **Exclude** papers cited only for general background or final-result comparisons.
+>
+> Only include papers whose DOI you can find in the references. Aim for 5–15 papers; quality over quantity.
+>
+> ### Style
+>
+> Be concise but precise. Use bullet points. Include exact numeric values and parameter choices. Do not pad with background or motivation — only include what is needed to reproduce the analysis.
+
+## Code sub-agent — system prompt (only when `work/reference/code/` exists)
+
+> You are a code exploration agent. Explore the repository at `work/reference/code/` and write up a detailed understanding of the codebase to `work/notes/code-analysis.md`.
+>
+> ### What to produce
+>
+> 1. **Architecture** — how the codebase is structured, what the main modules / scripts are, and how they relate to each other.
+> 2. **Execution flow** — where things are run from, in what order, and where to look for different stages of the analysis.
+> 3. **Key variables and parameters** — the main variables defined in the code, configuration values, and any decisions baked into the implementation.
+> 4. **Outputs** — what the code produces, where results are written, what format they take.
+>
+> Be thorough — explore the file tree, read the main scripts, and trace the execution path. Focus on implementation decisions and parameter values that the paper might not mention.
+>
+> Do NOT modify any code in the repository.
+
+## Survey signals (entry into SUMMARIZE)
+
+- `work/reference/document.md` exists ⇒ ready to summarize the paper
+- `work/notes/methodology.md` exists ⇒ paper sub-agent already ran
+- `work/reference/code/` exists ∧ `work/notes/code-analysis.md` does not ⇒ code sub-agent should run
+- Both `methodology.md` and (if code exists) `code-analysis.md` exist ⇒ SUMMARIZE done, proceed to EXTRACT_TARGETS
+
+## Notes
+
+- **Run the two sub-agents in parallel** when both apply. The paper agent and the code agent are fully independent; each writes one file.
+- The methodology notes are the substrate everything downstream consumes. SPECIFY reads them, REVIEW cross-checks them, IMPLEMENT writes scripts based on them. Their quality determines the rest.
diff --git a/claude/lightcone/skills/paper2astra/references/summarize_run.md b/claude/lightcone/skills/paper2astra/references/summarize_run.md
new file mode 100644
index 00000000..f72ba4ee
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/summarize_run.md
@@ -0,0 +1,58 @@
+# SUMMARIZE_RUN — final report and constitution outcome
+
+The reproduction has converged (verdict `pass` or user-accepted `partial`). Write the final summary, update the constitution's outcome, and prepare the workdir for handoff.
+
+The constitution's per-phase mode is **always sub-agent**. There are no decisions left; this is reportage.
+
+## Inputs
+
+- `astra.yaml` — final spec
+- `comparison-report.yaml`, `comparison-report.md` — final verdict
+- `targets/targets.md` — what was being matched against
+- `work/notes/methodology.md` — for context
+- The constitution at the project root — its `outcome:` field needs rewriting
+
+## Outputs
+
+- `REPRODUCTION-SUMMARY.md` (or whatever name fits the project) — final report; concise.
+- Updated `outcome:` on the constitution.
+- A final commit on the reproduction branch with a clear message.
+
+## What the final report covers
+
+A single markdown file at the project root, ~1–2 pages. Sections:
+
+1. **What was reproduced** — the paper, the scope, the targets.
+2. **Verdict** — pass / partial. If partial, what failed and why we accepted it.
+3. **Material decisions** — the paper-vs-code conflicts the SPECIFY phase surfaced, what the user chose, and why.
+4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target, with the path to the reproduced result.
+5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut that's stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded.
+6. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`).
+
+Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
+
+## Constitution outcome
+
+Rewrite the constitution's `outcome:` field to reflect the realized state. A good outcome teaches:
+
+> Reproduced <paper> against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to <reason>. Material conflicts surfaced and resolved: <list>. Spec at `astra.yaml` (validates with `--verify-evidence`); reproduction summary at `REPRODUCTION-SUMMARY.md`.
+
+The constitution's `status:` flips to `closed` only when the user accepts. This sub-agent does not flip status — it prepares the outcome and surfaces to the user (via the iteration's exit message) that the constitution is ready for closure.
+
+## Final commit
+
+Stage the report, the updated constitution, the final `astra.yaml`, the comparison report, and any housekeeping changes. Commit with a message that names the verdict:
+
+```
+reproduction: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMMARY.md
+```
+
+## Survey signals (entry into SUMMARIZE_RUN)
+
+- `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready
+- `REPRODUCTION-SUMMARY.md` exists; constitution outcome is rewritten ⇒ done
+
+## Notes
+
+- **This phase does not flip the constitution's status to closed.** The user does that, after reviewing the summary. The phase's job is to produce the summary cleanly; the human keeps the close authority.
+- **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous.

From 8bd67ffca521d9567a2cb030a1ee5d653918df87 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 4 May 2026 03:20:00 +0200
Subject: [PATCH 004/124] CLAUDE.md: surface paper-reproduction skill bundle in
 repo overview

Repository-structure section's skills/ entry only listed lc-* skills;
paper-reproduction bundle additions (paper2astra, narrative, constitution,
ralph-loops, managing-bibliography, plus the pending check-sentence-by-
sentence and figure-comparison) need to be discoverable from a CLAUDE.md
walk-up. Points to claude/lightcone/skills/README.md for the full bundle
map.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 CLAUDE.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index b0631b4a..63f91d0e 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -43,7 +43,10 @@ src/lightcone/              # namespace — NO __init__.py
     ├── harness.py, sandbox.py, graders.py, build.py, report.py, models.py
 
 claude/lightcone/           # Claude plugin source — force-included into the wheel
-├── skills/                 # lc-new, lc-build, lc-verify, lc-migrate, lc-feedback
+├── skills/                 # lc-new, lc-build, lc-verify, lc-migrate, lc-feedback;
+│                            # paper-reproduction bundle: paper2astra, narrative,
+│                            # constitution, ralph-loops, managing-bibliography
+│                            # (see skills/README.md for the full bundle map)
 ├── agents/                 # lc-extractor
 ├── guides/                 # astra-reference, lightcone-cli-reference, ui-brand
 ├── templates/              # Project CLAUDE.md template

From 272599be78e344ace435f8549bc773cd0e1f5250 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 4 May 2026 03:53:55 +0200
Subject: [PATCH 005/124] skills/ralph-loops: tighten description triggers

skill-creator audit (per the paper-reproduction bundle constitution)
flagged that the bundle copy of ralph-loops over-fired on bare "ralph"
and "iterate" outside of an active loop, and overlapped with
/constitution on "spec" / "set up a ralph" triggers.

Narrow the trigger list to in-loop auto-activation + explicit
launch-time triggers ("launch ralph", "run ralph", "ralph loop on
<spec>"); route spec-drafting intents to /constitution.

The bundle copy now diverges from cailmdaley/skills upstream by one
description block. Re-sync as needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/ralph-loops/SKILL.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/claude/lightcone/skills/ralph-loops/SKILL.md b/claude/lightcone/skills/ralph-loops/SKILL.md
index 649a813c..7910e49b 100644
--- a/claude/lightcone/skills/ralph-loops/SKILL.md
+++ b/claude/lightcone/skills/ralph-loops/SKILL.md
@@ -3,8 +3,10 @@ name: ralph-loops
 description: >
   Autonomous loop iteration toward a desired state. You are inside a ralph
   loop — your spec is in the system prompt. Survey, contribute, update state
-  discoverably, exit. Activated automatically inside ralph loops.
-  Triggers: "ralph-loops", "ralph", "ralph loop", "iterate", "autonomous loop".
+  discoverably, exit. Activated automatically inside ralph loops, or when
+  launching one against an existing spec via scripts/ralph; for drafting
+  the spec itself, use /constitution.
+  Triggers: "ralph-loops", "launch ralph", "run ralph", "ralph loop on <spec>".
 ---
 
 # Ralph Loops

From 9f19380a71fd0b62086d8e03731a673a863fd708 Mon Sep 17 00:00:00 2001
From: Nolan Koblischke <nolan.koblischke@mail.utoronto.ca>
Date: Mon, 4 May 2026 13:57:13 -0400
Subject: [PATCH 006/124] Add paper2astra follow-up skills:
 check-sentence-by-sentence and figure-comparison

---
 claude/lightcone/skills/README.md             |  10 +-
 .../check-sentence-by-sentence/SKILL.md       | 372 +++++++++++
 .../skills/figure-comparison/SKILL.md         | 579 ++++++++++++++++++
 claude/lightcone/skills/paper2astra/SKILL.md  |  24 +-
 .../skills/paper2astra/references/compare.md  |  10 +-
 .../paper2astra/references/interview.md       |   1 -
 .../paper2astra/references/summarize_run.md   |   4 +
 7 files changed, 973 insertions(+), 27 deletions(-)
 create mode 100644 claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
 create mode 100644 claude/lightcone/skills/figure-comparison/SKILL.md

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 439337cd..b0ed3d68 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -23,8 +23,8 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 | [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. | Merged from [`cailmdaley/skills/skills/constitution`](https://github.com/cailmdaley/skills/tree/main/skills/constitution) (procedural backbone) + Cail's personal felt references (taste — two diamonds, six stances, funnel ledger, qualitative self-check), with felt-optional framing. |
 | [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Launched by paper2astra after the interview. | Direct copy from [`cailmdaley/skills/skills/ralph-loops`](https://github.com/cailmdaley/skills/tree/main/skills/ralph-loops). |
 | [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. | Direct copy of Cail's personal `~/.claude/skills/managing-bibliography` (newer than the public version). |
-| `check-sentence-by-sentence` | Paper-vs-code TeX audit via sub-agents; locates `file:line` or `NOT FOUND`. Invoked by paper2astra during COMPARE. | Nolan Koblischke's, on his Reproductions-branch. **Not yet pushed publicly** — see "Pending bundle additions" below. |
-| `figure-comparison` | HTML side-by-side: original figures/tables/numerics vs replicated. Invoked by paper2astra during COMPARE. | Same — Nolan's, pending. |
+| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Complementary paper-vs-code source audit via sub-agents; locates `file:line` or `NOT FOUND`. | Copy of Nolan's. |
+| [`figure-comparison`](figure-comparison/SKILL.md) | Generates a HTML side-by-side report: original figures/tables/numerics vs replicated. Useful for manual review. | Copy of Nolan's. |
 
 The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra.
 
@@ -34,9 +34,3 @@ The full reproduction story spans these seven skills. paper2astra's `SKILL.md` n
 - **Single install path.** `lc init` is the install path for lightcone-cli skills. Adding a separate "also install Cail's public skills via plugin marketplace" step is friction we don't need.
 - **Copy-with-credit costs nothing.** The copied skills retain attribution to their original authors in the SKILL body; if those skills update upstream, we re-sync.
 - **Future consolidation is open.** Per Francois's "next week we improve" framing, the long-run shape might be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all.
-
-### Pending bundle additions
-
-- **`check-sentence-by-sentence`** and **`figure-comparison`** — Nolan Koblischke's two skills. Per the bundle constitution ([`lightcone/.felt/lightcone/paper2astra-as-skill/skill-bundle`](https://github.com/LightconeResearch/lightcone/blob/main/.felt/lightcone/paper2astra-as-skill/skill-bundle.md)), these are part of the bundle, but at first cut they were not yet pushed to any public branch (only living on Nolan's local working tree on his Reproductions checkout). When Nolan pushes them, copy with attribution into this directory; paper2astra's SKILL.md and COMPARE reference already name them as expected siblings, so the integration is wire-compatible the moment they land.
-
-  Until then, COMPARE falls back to direct image-diff judgment without `/figure-comparison`'s structured per-panel rendering, and SPECIFY's evidence-quote re-verification (when COMPARE flags `partial`) falls back to manual Grep against `work/reference/document.md` without `/check-sentence-by-sentence`'s sub-agent audit. Both fallbacks are workable but lossier than the intended path.
diff --git a/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
new file mode 100644
index 00000000..e7a2ca2b
--- /dev/null
+++ b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
@@ -0,0 +1,372 @@
+---
+name: check-sentence-by-sentence
+description: >
+  Sentence-by-sentence audit of a paper against an ASTRA project's code. For
+  every claim about implementation or results in the methodology, results,
+  discussion, and appendices, locate the corresponding code (file:line) or
+  mark NOT FOUND. Use when the user says "check reproduction", "verify the
+  paper line by line", or "sentence-by-sentence audit". Run from the project
+  folder containing astra.yaml. In paper2astra projects, read paper sources
+  from work/reference/: prefer arXiv TeX under work/reference/source/, fall
+  back to Docling/Pandoc markdown at work/reference/document.md.
+allowed-tools: Read, Glob, Grep, Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), AskUserQuestion, Agent
+argument-hint: "[path to paper source, e.g. work/reference/source/main.tex or work/reference/document.md]"
+---
+
+# /check-sentence-by-sentence
+
+Audit a paper against the code in this ASTRA project, sentence by sentence.
+Every sentence that asserts an implementation detail or a numerical/empirical
+result is located in the code (`file:line`) or marked NOT FOUND. The agent
+does NOT run any code -- this is a static reading audit.
+
+In paper2astra projects, the paper substrate comes from `work/reference/`.
+Path A is arXiv source at `work/reference/source/`; Path B is the parsed
+markdown fallback at `work/reference/document.md`, produced by Docling or
+Pandoc.
+
+## Setup
+
+1. **Confirm project root.** Read `astra.yaml` in the current working
+   directory. If it is missing, ask the user:
+
+   > "I do not see an `astra.yaml` in the current directory. Please point me
+   > to the ASTRA project folder, or `cd` there and re-invoke."
+
+   Stop until resolved.
+
+2. **Confirm paper source.** The user may have passed a path as an
+   argument. Resolve it in this order:
+
+   1. If the argument is a `.tex` file, use it in `tex` mode.
+   2. If the argument is `work/reference/` or another directory, first look
+      for TeX source under `<dir>/source/`, then for `<dir>/document.md`.
+   3. If no argument was supplied, prefer the paper2astra layout:
+      - `work/reference/source/<main>.tex` if TeX source exists. Identify the
+        main file with `grep -l '\\documentclass' work/reference/source/*.tex`;
+        if exactly one file matches, use it. If multiple files match, ask the
+        user which one is the main paper file. After identifying the main
+        file, expand its local `\input{...}` and `\include{...}` files before
+        section enumeration; many arXiv papers keep most prose outside the
+        main TeX wrapper.
+      - `work/reference/document.md` if there is no TeX source. This is the
+        Docling/Pandoc fallback and should be audited in `markdown` mode.
+   4. Only after those paper2astra paths fail, look for an obvious legacy
+      `.tex` source in cwd: a top-level `*.tex`, or one inside `paper/`,
+      `tex/`, or a similarly named subdirectory. If exactly one obvious
+      candidate is found, use it in `tex` mode.
+
+   If no usable source is found, ask:
+
+   > "Which paper source should I audit? Please give me a `.tex` path or
+   > `work/reference/document.md`."
+
+   If only `work/reference/paper.pdf` exists, ask the user to run the PARSE
+   phase first so `work/reference/document.md` exists. Do not audit PDFs
+   directly.
+
+## Section enumeration
+
+This is **your job in the main agent** -- do it carefully so each subagent
+gets a precise line range. Do NOT read full section content; only enough to
+identify boundaries.
+
+1. Enumerate sections according to source mode:
+   - In `tex` mode, first build the ordered audit source list. Start with the
+     main TeX file, scan it for local `\input{...}` and `\include{...}` paths,
+     normalize missing `.tex` suffixes, and include those files when they
+     exist under the same source tree. Recurse one level deeper when an
+     included file itself includes local TeX files. Ignore package/style
+     imports (`\usepackage`, `.sty`, `.cls`) and remote/generated files. If
+     the main file is mostly a wrapper, the leaf included files will carry
+     most audit units.
+   - For every file in the TeX audit source list, use `grep -n` for
+     `^\\section`, `^\\subsection`, and `^\\appendix`. Record each match's
+     file path, line number, and label.
+   - In `markdown` mode, use `grep -n` for markdown headings
+     (`^#`, `^##`, `^###`, etc.) in `work/reference/document.md`. Treat
+     heading depth the way TeX treats section/subsection. If Docling emitted
+     unnumbered headings, use their text labels.
+2. Get the file's total line count with `wc -l`.
+3. Compute each section's line range: **start = the section's own line
+   number; end = (next section/subsection or same/lower heading-depth start
+   minus 1 in the same source file), or that source file's last line for the
+   final section in that file.** For a section that contains subsections,
+   each subsection's range runs from its own line to (next subsection
+   start − 1), and the section's pre-subsection prose (if any) becomes its
+   own audit unit covering (section line + 1) to (first subsection − 1) if
+   that span is non-trivial.
+4. Mark sections appearing after `\appendix` (TeX) or after an `Appendix` /
+   `Appendices` heading (markdown) as appendices regardless of label.
+
+Identify the audit-relevant sections:
+
+- Methodology (often `Methods`, `Analysis`, `Data`, `Sample selection`)
+- Results
+- Discussion (often `Discussion and Conclusions`)
+- Appendices (every section after `\appendix`)
+
+Skip Abstract, Introduction, Acknowledgements, References, author lists.
+
+For each retained section, check whether it has subsections. **Spin up one
+subagent per leaf (sub)section** -- a section with subsections becomes one
+subagent per subsection (plus optionally one for any pre-subsection prose
+span); a section without subsections becomes one subagent for the whole
+section. Spawn them all in a single message so they run in parallel.
+
+## Subagent prompt
+
+Use `Agent(subagent_type="general-purpose", ...)`. Pass each subagent:
+
+- The absolute path to the paper source file for this section
+- The paper source mode: `tex` or `markdown`
+- The exact section/subsection label and the line range in the source file
+  it covers (so it knows where to read)
+- The absolute path to the project root (which contains `astra.yaml`)
+- The instructions below, verbatim
+
+```
+You are auditing one (sub)section of a paper against an ASTRA project's
+code. Your job is mechanical and exhaustive.
+
+INPUTS
+- Paper source file: <path>
+- Source mode: <tex|markdown>
+- Section: <name>, lines <start>-<end>
+- Project root: <path>
+
+PROCEDURE
+1. Read the assigned section of the paper. Split it into sentences using
+   common sense, not naive period-splitting. In `tex` mode, use TeX-aware
+   splitting; in `markdown` mode, preserve Docling/Pandoc math blocks,
+   captions, and headings as source text. Treat `e.g.`, `i.e.`, `et al.`,
+   `Fig.`, `Eq.`, `Sec.`, `Dr.`, decimals (`0.5`), inline math `$...$`,
+   and citation commands (`\citep{...}`, `\citet{...}`) as part of the
+   surrounding sentence, not boundaries. Display equations belong to
+   whichever sentence introduces them.
+2. For each sentence, decide using common sense: does it make a concrete
+   claim about an IMPLEMENTATION DETAIL (a method, parameter, threshold,
+   formula, data cut, model choice, sample definition, algorithmic step)
+   or a RESULTS DETAIL (a numerical value, plot, fitted parameter,
+   statistical outcome)? If neither -- pure motivation, citation prose,
+   or generic framing -- skip it.
+3. Before searching, **read `astra.yaml` once** -- it is a pre-built
+   paper↔code map maintained by the project. Harvest specifically:
+     - `narrative.methods` — links paper methodology concepts to decision
+       IDs (e.g. paper prose "the chosen <method>" → `#decisions.<id>`)
+     - `narrative.findings` — links paper claims/values to result anchors
+     - `prior_insights` (if present) — extracted paper quotes already tied
+       to decisions
+     - per-decision `evidence` quotes and `description` fields
+   Treat these as your translation table: paper prose → decision/output
+   IDs → script files. Do not re-derive what the spec already encodes.
+
+   For everything not covered by the spec, use common sense to translate
+   concepts. In general:
+     - A quality cut stated as a ratio or threshold may appear in code
+       under an inverted form or a different variable name -- map by
+       meaning, not by symbol.
+     - A named model or distribution will usually appear as a function
+       whose name describes its shape or role, not as the paper's prose
+       phrasing.
+     - A cited constant from a referenced paper will usually appear as a
+       module-level constant or as an option value in a decision.
+   Grep for the underlying concept, not just the paper's wording.
+4. For every claim-bearing sentence, search the project code (`scripts/`,
+   source files, `universes/`, `astra.yaml`, `results/`) for where the
+   claim is implemented or computed. Use Grep, Glob, and Read.
+5. Record one of:
+   - (quote, path/file.py:LINE, optional <10-word note)
+     when the sentence's claim is implemented or computed at that location
+   - (quote, NOT FOUND, optional <10-word note)
+     when no implementation or matching computation is present
+
+CONSTRAINTS
+- Do NOT run any code. No Bash beyond ls/grep/find/wc for searching.
+- Do NOT read the paper outside the assigned line range.
+- Quote the sentence verbatim, trimmed to a single sentence. If the
+  sentence is long, you may include just the claim-bearing clause but
+  preserve enough text to identify it.
+- file:line should point to the most specific line that implements or
+  states the claim (the function call, parameter assignment, or computed
+  value -- not just the file).
+- Notes must be under 10 words. Use them for nuance like "approximate
+  match", "different constant", "implemented but commented out",
+  "value computed at runtime, not statically comparable", "produced as
+  figure but printed value not stored".
+- For numerical results that the paper states as a final number, point
+  at the line that computes the value and use a note like "value
+  computed at runtime" -- you cannot verify numerical agreement without
+  executing code, and that is fine.
+
+OUTPUT
+Return a JSON-ish list, one entry per sentence, in paper order:
+
+[
+  {"quote": "...", "location": "scripts/foo.py:142", "note": "..."},
+  {"quote": "...", "location": "NOT FOUND", "note": "..."},
+  ...
+]
+
+Return nothing else.
+```
+
+## Aggregation
+
+When all subagents return, you receive raw entries from every claim-bearing
+sentence each subagent kept. **Do not just concatenate and print them.**
+Two filtering passes happen here, in this order:
+
+### Pass 1 — drop non-computational sentences
+
+Subagents are deliberately generous about what they keep, so the raw list
+contains a long tail of sentences that quote the paper but do not actually
+correspond to anything you would expect to find in code. **Drop any entry
+whose sentence is:**
+
+- **Framing / motivation** — sentences whose job is to set up the next
+  step, e.g. "the first step is...", "to investigate this...", "we want
+  to look at...", "for this reason..."
+- **Citation prose / literature comparison** — sentences that compare to
+  or quote prior literature, e.g. "agrees with values typical of previous
+  measurements...", "much like Author+YYYY they show...", "in particular,
+  Author found <value>..."
+- **Theoretical framing or derivations** — sentences asserting a property
+  expected from theory rather than implemented in code, and restatements
+  of textbook identities used only to introduce the next equation
+- **Rhetorical / interpretive claims** — qualitative readings of a
+  figure or trend, e.g. "the trend clearly has an oscillatory
+  behaviour", "the trend seems to be independent of <variable>", "this
+  supports that..."
+- **Conclusions / justifications / qualitative observations** —
+  "thus we conclude that...", "we choose not to include this
+  because...", "by and large the trends are similar"
+- **Future work / speculation** — "this could be improved by...", "the
+  discrepancy could be explained by..."
+- **Forward/backward references with no claim** — "we discuss this in
+  Sec X below", "as described in Sec Y above"
+- **NOT FOUND entries that fall in any of the above categories** — most
+  framing/motivation sentences will land as NOT FOUND because there is
+  nothing to find. Drop them silently; they are noise, not gaps.
+
+Keep an entry only if it asserts something a reader would expect to be
+implemented or computed: a parameter value, a cut, a formula, an
+algorithmic step, a fitted/measured value, a figure that the project
+should produce, a sample size after a specific cut.
+
+When in doubt about a NOT FOUND, ask: "if this sentence is not in the
+code, is that a real gap?" If no, drop it.
+
+### Pass 2 — deduplicate / merge near-duplicates
+
+Subagents do not see each other, and the same claim is often restated
+across sentences within a (sub)section -- e.g. a prose statement of a
+cut followed by a sentence asserting "this is the only cut we make", or
+two sub-equations of one larger formula that map to the same line.
+Collapse these:
+
+- If two adjacent sentences make the same claim and resolve to the same
+  `file:line`, keep one entry whose quote is the more specific or
+  formula-bearing of the two, and append the other in a short
+  parenthetical only if it adds information.
+- If a paper-text claim and an explicit equation/quoted code map to the
+  same line, prefer the equation/quoted-code form.
+- Do not merge across (sub)sections.
+- Do not merge if the two sentences resolve to different `file:line`
+  locations -- they may look similar but are doing different things.
+
+### Pass 3 — render
+
+After filtering and deduplication, present the result to the user as
+markdown, organized by section -> subsection -> sentence, in paper order:
+
+```
+# Sentence-by-sentence reproduction audit
+
+Paper: <path>
+Project: <path>
+
+## <Section>
+
+### <Subsection>            (omit if no subsections)
+
+- "<sentence quote>"
+  → ✅ `scripts/foo.py:142` -- <note if any>
+
+- "<sentence quote>"
+  → ❌ NOT FOUND -- <note if any>
+- ...
+```
+
+Use `→ ✅ \`file:line\`` for found entries and `→ ❌ NOT FOUND` for
+missing ones. Notes are optional; only include the trailing `-- <note>`
+when the subagent supplied one.
+
+End with a one-line summary:
+
+> N sentences audited across M sections. K implemented, J not found.
+
+### Follow-up suggestion (conditional)
+
+After the summary, scan the NOT FOUND entries and **cluster them**. A
+cluster is a group of NOT FOUND sentences that all relate to the same
+missing piece of work (a missing analysis, a missing diagnostic, an
+unimplemented model variant) -- usually a few consecutive sentences in
+one (sub)section, or sentences that all reference the same concept across
+sections.
+
+**Only emit the follow-up block if there is at least one major
+unimplemented cluster** -- a cluster of genuine missing computation
+substantial enough to be worth offering to add (rule of thumb: ≥3
+sentences of related missing-computation claims, or a single
+heavyweight missing artifact like an entire missing analysis or
+figure). If every NOT FOUND is isolated framing, motivation, or
+qualitative interpretation -- or if the only clusters are tiny -- stop
+after the one-line summary. Do not pad with a follow-up just to have
+one.
+
+When the threshold is met, write a short follow-up block in this shape:
+
+> Major unimplemented clusters: (1) `<short description of cluster 1>`
+> (`<§section>`, ~`<N>` sentences), and (2) `<short description of
+> cluster 2>` (`<§section>`, ~`<N>` sentences). The rest of the NOT
+> FOUND entries are pure framing/motivation/qualitative interpretation,
+> not computational claims. Worth considering as a follow-up if you
+> want full coverage — want me to add `<concrete artifact 1>` and
+> `<concrete artifact 2>`?
+
+Rules for this block:
+- Only call out clusters that look like genuine missing computation, not
+  rhetoric.
+- Keep it to 1–3 clusters. Do not enumerate every NOT FOUND entry.
+- The closing offer must name **concrete artifacts** the user could add
+  (a new output ID, a new script filename, a new decision option, a new
+  figure) -- not vague promises like "fill in the gaps".
+- Cite the section reference in the project's own notation (`§2.1`,
+  `Appendix B`, etc.) and an approximate sentence count.
+- One short paragraph; do not pad.
+
+## Restrictions
+
+- You MUST NOT run project code, recipes, or `lc run`. This is static.
+- You MUST NOT read the paper source wholesale into the main context;
+  delegate to subagents.
+- You MUST NOT modify any project file. Read-only.
+- You MUST NOT fabricate `file:line` locations -- if a subagent's location
+  looks suspicious, ask it to re-verify rather than guessing.
+- You MUST spawn one subagent per leaf (sub)section, in parallel.
+
+## Anti-patterns
+
+- **Auditing intro/abstract** -- skip narrative-only sections; only
+  methodology, results, discussion, and appendices.
+- **Bundling sentences** -- one entry per sentence. Do not collapse
+  multiple claims into one row even if they share a citation or location.
+- **Vague locations** -- a bare filename (`scripts/foo.py`) is not
+  enough; a line number is required for found entries.
+- **Long notes** -- the 10-word cap is a hard limit; reserve notes for
+  signal, not commentary.
+- **Running code to verify** -- this skill is a reading audit. If a claim
+  cannot be verified by reading code alone, mark it found at the
+  computing line and note "value computed at runtime" rather than
+  executing anything.
diff --git a/claude/lightcone/skills/figure-comparison/SKILL.md b/claude/lightcone/skills/figure-comparison/SKILL.md
new file mode 100644
index 00000000..dfed8dd4
--- /dev/null
+++ b/claude/lightcone/skills/figure-comparison/SKILL.md
@@ -0,0 +1,579 @@
+---
+name: figure-comparison
+description: >
+  Build a self-contained HTML report comparing the figures, tables, and
+  numerical results in paper2astra's `work/reference/` paper substrate
+  against artifacts produced under `results/<universe>/`. When
+  `comparison-report.yaml` or `targets/targets.md` exists, use that scoped
+  target set first; otherwise fall back to paper-driven inventory from arXiv
+  TeX or Docling/Pandoc artifacts under `work/reference/`. Images are
+  base64-embedded; missing matches are flagged. Use when the user says
+  "compare results", "side-by-side comparison", "build comparison HTML", or
+  "did we reproduce the paper". Run from the project folder containing
+  astra.yaml.
+allowed-tools: Read, Write, Glob, Grep, Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), Bash(file:*), Bash(python3:*), Bash(python:*), Bash(base64:*), AskUserQuestion, Agent
+argument-hint: "[path to paper reference dir, e.g. work/reference/]"
+---
+
+# /figure-comparison
+
+Generate a single self-contained HTML report (`.lightcone/comparison.html`)
+that places paper reference artifacts from `work/reference/` on the left
+and the project's reproduced artifacts from `results/<universe>/` on the
+right, with red flags wherever a counterpart is missing. Images are embedded
+as base64 so the HTML is portable. The helper script and intermediate
+manifest also live under `.lightcone/` so they don't pollute the baseline
+results.
+
+## Setup
+
+1. **Confirm project root.** Read `astra.yaml` in the cwd. If missing, ask:
+
+   > "I do not see an `astra.yaml` here. Please `cd` to the ASTRA project
+   > and re-invoke."
+
+   Stop until resolved.
+
+2. **Confirm results exist.** Default universe is `baseline`, unless
+   `comparison-report.yaml` names reproduced files under another universe or
+   the user supplied a universe explicitly. Check `ls results/<universe>/`.
+   If the directory is missing or empty, ask:
+
+   > "I cannot find populated results under `results/<universe>/`. Build the
+   > universe first (`lc run --universe <universe>` or equivalent), then
+   > re-invoke."
+
+   Stop. Do NOT attempt to run the pipeline yourself -- this skill is
+   read-only over the build artifacts.
+
+3. **Locate the paper reference substrate.** The user may have passed a
+   path. Resolve it in this order:
+
+   1. If the argument is a directory containing `metadata.json`,
+      `document.md`, `figures/`, or `tables/`, use that directory as the
+      paper reference root.
+   2. If the argument is an arXiv source directory containing `.tex` files,
+      use it as `source_root`, and use its parent `work/reference/` as the
+      paper reference root when that parent exists.
+   3. If no argument was supplied, prefer paper2astra's layout:
+      - `work/reference/source/` when arXiv TeX source exists. Use the TeX
+        files there for labels/captions and the parsed artifacts under
+        `work/reference/{figures,tables,metadata.json}` for renderable
+        reference files.
+      - `work/reference/document.md` plus
+        `work/reference/{figures,tables,metadata.json}` when no TeX source
+        exists. This is the PDF + Docling fallback from paper2astra.
+   4. Only after paper2astra paths fail, look for a legacy unzipped arXiv
+      dir in cwd: a directory containing both a `*.tex` file and figure
+      files (`*.pdf`, `*.png`, `*.eps`). Common names: `paper_source/`,
+      `arxiv_source/`, `*_Original_Paper/`.
+
+   If no usable reference substrate is found, ask:
+
+   > "Where is the paper reference directory? In a paper2astra project this
+   > should usually be `work/reference/`, containing `document.md`,
+   > `metadata.json`, and extracted `figures/` / `tables/`."
+
+   If only `work/reference/paper.pdf` exists, ask the user to run the PARSE
+   phase first so Docling or the TeX parser populates `work/reference/`.
+   Do not compare directly against a whole PDF.
+
+## Phase 1 -- Understand the paper's main results
+
+Read, in this order:
+
+1. **Scoped comparison artifacts, if present.**
+   - If `comparison-report.yaml` exists, treat it as the highest-priority
+     scope because it records what paper2astra actually compared. Use its
+     `outputs:` entries, including `type`, `priority`, `paper_value`,
+     `reproduced_value`, `reference_file`, `reproduced_file`, `match`, and
+     `notes` when present.
+   - Else if `targets/targets.md` exists, treat it as the scope ledger. Use
+     only the targets it names, including out-of-scope notes, priorities,
+     reference paths, expected values/trends, and output/spec-home pointers.
+   - If neither file exists, use the default paper-driven flow below and
+     build a best-effort report from `astra.yaml` plus `work/reference/`.
+
+2. **`astra.yaml`** -- specifically `narrative.summary`, `narrative.outputs`,
+   `narrative.findings`, `outputs:`, and `findings:` if present. Use it to
+   map scoped targets to output IDs and to harvest declared findings. Do not
+   assume ASTRA outputs have a dedicated filename-hint field; result paths
+   come from the output ID and the result resolver in Phase 2.
+
+3. **The paper reference substrate**, in this order:
+   - Read `work/reference/metadata.json` when present. It is the primary
+     index for paper figures and tables; its paths are relative to
+     `work/reference/` and usually point into `figures/` or `tables/`.
+   - If `work/reference/source/` exists, grep its TeX files for
+     `\includegraphics`, `\label{fig:...}`, `\caption{...}`, and
+     `\begin{table}` to recover labels/captions that metadata may have
+     missed.
+   - If only `work/reference/document.md` exists, use the markdown plus
+     `metadata.json` as the source of captions, table text, and in-text
+     numerical claims. This is the Docling/Pandoc fallback; preserve its
+     line numbers and do not pretend it is TeX.
+   - Grep the abstract, results, and discussion sections of the TeX or
+     markdown source for in-text numerical claims that look like primary
+     results -- typically a quantity with value + uncertainty (e.g.
+     `$X = a \pm b$ unit`). Prefer values that `astra.yaml`'s `findings:`
+     already names; do not try to extract every number in the paper.
+
+   Do NOT read the paper wholesale. For long papers (>500 lines), read
+   only the abstract, results, and discussion sections.
+
+If the paper is large or has many sections and neither `comparison-report.yaml`
+nor `targets/targets.md` exists, **delegate the figure / table / value
+enumeration to a single subagent** with
+`subagent_type="general-purpose"` -- pass it the paper path, the output
+schema below, and ask it to return only the inventory. One subagent is
+enough; do not fan out. Multiple subagents would have to re-read the
+same file.
+
+## Phase 2 -- Build the comparison manifest
+
+Produce a manifest in memory (you'll write it as JSON in Phase 3) with
+three sections: `figures`, `tables`, `values`. Each entry pairs a
+paper-side artifact with a project-side artifact.
+
+Build entries in this priority order:
+
+1. **From `comparison-report.yaml` if present.** One manifest entry per
+   `outputs.<output_id>` item. Use `type` to route it to `figures`,
+   `tables`, or `values`. Use `reference_file` as the paper-side path and
+   `reproduced_file` as the project-side path when present. Preserve the
+   report's `paper_value`, `reproduced_value`, `match`, and `notes` in the
+   manifest so the HTML reflects the completed COMPARE verdict.
+2. **Else from `targets/targets.md` if present.** One manifest entry per
+   in-scope target. Use each target's reference path under `targets/`, its
+   expected values/trends, and its output/spec-home pointer. If the ledger
+   marks a target out of scope, omit it from the HTML unless the user asked
+   for out-of-scope targets too.
+3. **Else use the default paper-driven inventory.** Enumerate figures,
+   tables, and values from `astra.yaml` plus `work/reference/`, and fall back
+   to filename-stem similarity only when no scoped ledger exists.
+
+For project-side result paths, resolve every output ID with this order:
+- Use an explicit `reproduced_file` from `comparison-report.yaml` or an
+  explicit reproduced path/glob from `targets/targets.md`, if present and
+  the file exists.
+- Search for flat files at `results/<universe>/<output_id>.<ext>` with the
+  first suitable type-specific extension: images (`.png`, `.jpg`, `.jpeg`,
+  `.pdf`, `.eps`), tables (`.csv`, `.parquet`, `.md`, `.txt`), values
+  (`.json`, `.yaml`, `.yml`, `.txt`, `.md`).
+- If still unmatched and no scoped ledger exists, fall back to filename-stem
+  similarity within `results/<universe>/`.
+- If no match is found, use `project_path: null` and render a red
+  `NOT PRODUCED` panel. Do not include unrelated result files; the report is
+  target-driven when target/report files exist, and paper-driven otherwise.
+
+For tables: use `work/reference/metadata.json` and `work/reference/tables/`
+when present. If TeX source exists, capture the raw LaTeX of the `tabular`
+block and any `\caption{...}`. If only `work/reference/document.md` exists,
+capture the Docling/Pandoc markdown table or the extracted table artifact
+under `work/reference/tables/`. The project side is whatever artifact
+carries the same content -- typically a CSV / parquet / markdown file at
+`results/<universe>/<output_id>.<ext>`. If `astra.yaml` declares no matching
+output, use `project_path: null`. **If the paper contains no tables at all,
+leave the manifest's `tables` list empty; the helper must omit the entire
+Tables section from the HTML in that case (no header, no "no tables"
+placeholder).**
+
+For values: each entry is `{name, paper_value, paper_uncertainty?,
+project_value?, project_value_source?, paper_quote}`. Pull
+`paper_value` from the in-text claim or `astra.yaml`'s
+`findings.*.paper_value`. Pull `project_value` from
+`astra.yaml`'s `findings.*.replicated_value` if present, otherwise from
+a scoped `comparison-report.yaml` entry or a flat result summary file at
+`results/<universe>/<output_id>.<ext>` that you can read statically.
+**Never compute or re-derive values yourself.** If no project value can
+be located statically, leave it null and flag in the HTML.
+
+When `comparison-report.yaml` or `targets/targets.md` exists, the values list
+is scoped to that file. Otherwise, be exhaustive about values, not selective.
+A common failure mode is the values section ending up with only 1--3 entries,
+which makes the report feel thin. Aim for **every** numerical claim that the
+paper asserts and the project tracks. Concretely, harvest from:
+- Every entry under `findings:` in `astra.yaml` -- one manifest entry
+  per finding, even when several findings share a parent quantity.
+- The paper's abstract: every `<value> ± <unc> <unit>` it reports.
+- The paper's results and discussion sections: every fitted parameter,
+  every feature location ("dip near x = X₁", "peak at x = X₂"), every
+  reported sample size after a specific cut, every bin width or step
+  used as a result-defining choice, every reported accuracy / score /
+  metric.
+- Any explicit reproduction targets in `astra.yaml`'s `narrative.findings`.
+
+It is fine to repeat one quantity in multiple manifest entries when the
+paper reports it under different conditions (preliminary vs. final,
+per-subset, per-bin median, per-method variant). Each condition is its
+own row. Feature locations are values too: encode "feature located at
+domain coordinate X" as
+`{name: "<short feature name>", paper_value: "<X>", paper_unit:
+"<unit>"}`. **Target ≥6 value entries on a typical paper.** If you end
+up with fewer than 4, you are filtering too aggressively -- re-read
+`astra.yaml`'s `findings:` and the paper's results section.
+
+## Phase 3 -- Generate the HTML
+
+Use a small Python helper rather than embedding base64 inline through
+your tool calls -- multi-MB image base64 strings would balloon your
+context.
+
+Use the existing `.lightcone/` directory in the project root. Do not create
+directories in this skill. All three files this skill writes -- manifest,
+helper, and final HTML -- live there.
+
+1. **Write the manifest** as JSON to
+   `.lightcone/comparison_manifest.json`. Schema:
+
+   ```json
+   {
+     "project_name": "...",
+     "paper_path": "work/reference/document.md",
+     "scope_source": "comparison-report.yaml",
+     "universe": "baseline",
+     "results_path": "results/baseline",
+     "figures": [
+       {
+         "paper_label": "fig:main_result",
+         "paper_caption": "...",
+         "paper_path": "targets/main_result.pdf",
+         "project_output_id": "primary_metric_plot",
+         "project_path": "results/baseline/primary_metric_plot.png"
+       }
+     ],
+     "tables": [
+       {
+         "paper_label": "tab:summary",
+         "paper_caption": "...",
+         "paper_latex": "\\begin{tabular}{...}\\end{tabular}",
+         "project_output_id": "...",
+         "project_path": "results/baseline/summary_table.csv"
+       }
+     ],
+     "values": [
+       {
+         "name": "primary_metric",
+         "paper_value": "12.5",
+         "paper_uncertainty": "0.4",
+         "paper_unit": "<unit>",
+         "paper_quote": "we find $\\mathrm{metric} = 12.5 \\pm 0.4$ <unit>",
+         "project_value": "12.47",
+         "project_uncertainty": "0.41",
+         "project_value_source": "results/baseline/metric.json"
+       }
+     ]
+   }
+   ```
+
+   `figures`, `tables`, and `values` may each be `[]`. Empty lists mean
+   the helper skips that section entirely. There is no
+   `unmatched_baseline` field -- baseline files the paper does not
+   reference are not in scope for this report.
+
+   Use `null` for any missing field. Paths are relative to the project
+   root.
+
+2. **Write the helper script** to `.lightcone/build_comparison.py`.
+   The helper must:
+   - Read the manifest JSON.
+   - For each figure entry: emit one `<section class="row">` per figure,
+     with the structure described in **"Required HTML structure"**
+     below -- a single `<div class="row-head">` containing a
+     `<div class="row-title">` and one row-level status badge, followed
+     by a `<div class="row-grid">` of two `<figure class="cell">`s
+     (paper, project). One badge per row, in flow inside `.row-head`.
+     **Never emit per-cell absolutely-positioned badges.**
+     Read `paper_path` and `project_path` as bytes, base64-encode, and
+     embed each image inside its cell. **PDFs must be converted to PNG
+     before base64-encoding -- never embed PDFs as PDF data URIs.** Use
+     `<img src="data:image/png;base64,...">` uniformly for every
+     figure cell. Conversion order to try, falling back if a tool is
+     unavailable:
+       1. `pdf2image` (Python) -- `convert_from_path(path, dpi=150)[0]`
+       2. `pypdfium2` -- render page 1 at 150 DPI to a PIL image
+       3. shell out to `pdftoppm -png -r 150 -f 1 -l 1 <pdf> <stem>`
+          and read the resulting PNG
+       4. shell out to `magick <pdf>[0] -density 150 <png>` (ImageMagick)
+     If none are available, the helper renders a small ⚠️ panel that
+     says `PDF preview unavailable -- install pdf2image or pdftoppm`
+     and links to the `.pdf` file path. Do not fall back to embedding
+     the PDF binary. PNG / JPG inputs skip conversion and are
+     base64-encoded directly. For any non-image type, embed as a
+     UTF-8 text block. Missing path → render a red panel saying
+     `❌ NOT PRODUCED` with the expected output ID. Captions live as
+     `<figcaption>` inside each cell, never as a row-spanning element.
+   - For each table entry: paper side renders the captured LaTeX inside
+     `<pre>` plus the caption; project side renders the project file
+     (CSV/parquet → first ~20 rows as an HTML table; markdown → render
+     as `<pre>`; missing → red ❌ panel). Same row structure as figures.
+   - For each value entry: emit one `<section class="row value-row">`
+     per value -- **same card layout as figures, not a `<table>`.**
+     The row has a `.row-head` (value name + single status badge),
+     a `.row-grid` of two `.cell`s (paper | project), and a trailing
+     `.value-note` with the σ delta. The paper cell shows the value
+     (with uncertainty and unit) and the `paper_quote` as a
+     `<blockquote>`. The project cell shows the value and the
+     `project_value_source` as a small `<code>` line. Compute a simple
+     status -- ✅ if both values exist and the project value lies within
+     ±1 paper-uncertainty of the paper value; ⚠️ if both exist but
+     disagree by more than that; ❌ if either is missing. If
+     `paper_uncertainty` is null, fall back to a 5%-tolerance
+     comparison: ✅ if `|prj − paper| ≤ max(0.05·|paper|, 0.05)`. Do
+     NOT do anything more sophisticated; you cannot run code. **Do not
+     render values as a single HTML `<table>`** -- the report's whole
+     point is side-by-side cards.
+   - Emit a single self-contained HTML file with inline CSS in the
+     **Vellum** aesthetic (see below): the `<body>` carries the
+     parchment background and grain, and **all content lives inside a
+     single `<div class="page">` that is the lighter `--surface` cream
+     card with soft drop shadows.** This is non-negotiable -- the cream
+     page card on top of the parchment body is the headline visual. Two
+     content columns (paper | project) per row, the project name in the
+     `<h1>`, and a top-of-page summary line counting found / missing
+     for each non-empty section. **Skip any section whose manifest list
+     is empty** -- omit its header and content entirely; do not emit a
+     "no tables found" placeholder.
+   - Write the HTML to `.lightcone/comparison.html` and print the
+     absolute path on stdout.
+
+### Required HTML structure (figures and values)
+
+The helper MUST produce this exact shape for every figure / value row.
+Per-cell absolute badges, value-as-table, and missing `.row-head` are
+all forbidden -- they break the layout (overlapping the cell heading,
+losing the row-level status, breaking the visual rhythm with figures).
+
+```html
+<section class="row"><!-- or "row value-row" for values -->
+  <div class="row-head">
+    <div class="row-title">
+      <code>fig:main_result</code> &mdash; <span class="row-id">primary_metric_plot</span>
+    </div>
+    <span class="badge badge-ok">✅ matched</span>
+  </div>
+  <div class="row-grid">
+    <figure class="cell">
+      <div class="cell-label">PAPER</div>
+      <img src="data:image/png;base64,...">
+      <figcaption>Caption from paper.</figcaption>
+    </figure>
+    <figure class="cell">
+      <div class="cell-label">PROJECT &middot; <code>results/baseline/...</code></div>
+      <img src="data:image/png;base64,...">
+      <figcaption>output_id</figcaption>
+    </figure>
+  </div>
+  <!-- value rows only: -->
+  <div class="value-note">Δ = 0.03 &lt;unit&gt; (0.07σ)</div>
+</section>
+```
+
+Status states for the row badge: `badge-ok` (matched), `badge-warn`
+(partial / off-target / no σ), `badge-miss` (missing on either side).
+Exactly one badge per row.
+
+3. **Run the helper:** `python3 .lightcone/build_comparison.py`
+   from the project root. If `python3` is missing, try `python`. If
+   the helper imports anything beyond the standard library (e.g.
+   `pyarrow` to read parquet, or `pandas` to render tables), have it
+   gracefully fall back to "preview not available -- file exists at
+   `<path>`" rather than failing. The helper must work with stdlib
+   alone for the figure path; the parquet / pandas previews are
+   nice-to-haves.
+
+4. After the helper runs, **read back** the HTML's first ~50 lines and
+   the absolute file size to verify it was produced and isn't trivially
+   small (>10 KB sanity check). Then report to the user the path and a
+   one-line summary:
+
+   > Comparison HTML at `.lightcone/comparison.html` -- N figures
+   > (K matched, J missing), N tables (...), N values (...).
+
+## Vellum aesthetic
+
+The helper must style the page in the **Vellum** aesthetic: a
+weathered-parchment look that reads like a printed scientific paper,
+not a web app. The helper bakes all of this into inline `<style>` --
+no external assets, no CDN fetches, no JS.
+
+**Palette (CSS custom properties on `:root`):**
+
+```css
+--paper:        #F2EDE5;  /* aged-paper page background */
+--surface:      #FAFAF7;  /* lighter "protected" prose surface */
+--ink:          #2E2A26;  /* warm near-black body text */
+--ink-muted:    #6B635A;  /* brown-gray secondary text */
+--gold:         #9A7B35;  /* antique gold -- links, accents, the author's hand */
+--teal:         #4F7A6F;  /* faded ink: healthy / resolved (✅) */
+--amber:        #B0823A;  /* faded ink: attention / partial (⚠️) */
+--mauve:        #8A5C6B;  /* faded ink: error / missing (❌) */
+--rule:         #D9CFC0;  /* hairlines and table borders */
+--shadow:       rgba(46, 42, 38, 0.10);  /* soft ink-toned drop shadow */
+```
+
+Saturated colors are forbidden. Use only this palette plus tints/shades
+of these tokens. Status icons (✅ ⚠️ ❌) are kept but their containers
+adopt the corresponding faded ink (`--teal`, `--amber`, `--mauve`) for
+borders and small badges -- never as full background fills.
+
+**Typography:**
+
+- Body prose: `EB Garamond`, fall back through `Garamond, "Times New
+  Roman", Georgia, serif`. No system-ui, no sans-serif anywhere.
+- Annotations, code, captions, file paths, numerical values:
+  `JetBrains Mono`, fall back through `"IBM Plex Mono", "SFMono-Regular",
+  Menlo, Consolas, monospace`.
+- Body line-height ~1.55, comfortable measure (~70ch on prose blocks).
+- Headings serif, semibold not bold; `<h1>` slightly tracked-out (small
+  positive `letter-spacing`) for a hand-set feel. Section headings
+  may use a small caps treatment (`font-variant: small-caps`).
+- Do not load webfonts. The HTML must stay self-contained and offline-safe;
+  rely on the fallback chains above.
+
+**Texture and the page card:**
+
+- The `<body>` background is `--paper` plus a barely-there fractal-noise
+  grain. Generate the grain with an inline SVG `<feTurbulence>` filter
+  baked into a `data:image/svg+xml;base64,...` URL used as
+  `background-image`. Keep opacity low (~0.04--0.06) so the grain reads
+  as paper fiber, not as visible noise.
+- **Body padding around the page.** The `<body>` itself has padding
+  (e.g. `padding: 4rem 2rem;`) so the parchment + grain breathes around
+  the page card -- never edge-to-edge.
+- **The page card is mandatory.** All content lives inside a single
+  `<div class="page">` styled as:
+
+  ```css
+  .page {
+    max-width: 64rem;
+    margin: 0 auto;
+    background: var(--surface);
+    box-shadow: 0 1px 2px var(--shadow), 0 8px 24px var(--shadow);
+    padding: 4rem 4rem 5rem;
+  }
+  ```
+
+  The cream `--surface` card on top of the parchment `--paper` body is
+  the single most important visual signature of the report. If you find
+  yourself with `.page { background: transparent }` or no
+  `box-shadow`, you have failed.
+- Cells inside the page card sit on the same `--surface` with their own
+  softer shadow (`0 1px 2px var(--shadow)`), creating two stacked
+  layers of depth: parchment → page card → cell card.
+
+**Surfaces and overlays:**
+
+- Comparison rows are two-column on desktop (paper | project), single
+  column on narrow viewports. Each cell is `--surface` with the soft
+  ink shadow.
+- Hover/active states are expressed as **candlelight-lift** (a warm
+  cream highlight, e.g. `background: #FFF8E8;`) or **ink-sink** (a warm
+  black inset, e.g. `background: #2E2A26; color: var(--paper);`) --
+  never flat blue/gray fills.
+- Hairlines between sections use `--rule`, never solid black.
+
+**Chrome and links:**
+
+- Links: `--gold`, no underline by default; underline appears as a
+  1px `--gold` border-bottom on hover. The underline is the "author's
+  hand" -- thin, deliberate.
+- Buttons / interactive chrome: minimal. This is a report, not an app.
+  Avoid icons beyond ✅ ⚠️ ❌ and small unicode dingbats.
+
+**Whitespace and rhythm:**
+
+- Generous outer margins; the page should feel narrow and read like a
+  paginated paper. Max content width around 64rem.
+- Section transitions get vertical room -- ~3rem between major
+  sections, ~1.5rem between rows.
+- Captions sit below figures in `--ink-muted` mono, italic if EB
+  Garamond italics are loaded.
+
+**Status badges (figure / table / value rows):**
+
+- **One badge per row, in the `.row-head` flex container alongside the
+  row title.** Never per-cell, never absolutely positioned. The badge
+  uses `display: inline-flex` (or default inline) and lives in flow.
+- Render the status (✅ matched / ⚠️ partial / ❌ missing) as a small
+  monospace badge in the row head, using the corresponding semantic color
+  as a 1px border + the same color tinted at 12% as the background. The
+  icon plus a 1--3 word label ("matched", "missing", "off by 2.1σ"). Never
+  a saturated banner.
+- Status reflects the **row as a whole**, not each cell individually:
+  ✅ when both paper and project artifacts are present (and, for
+  values, the project number is within tolerance); ⚠️ when both are
+  present but the value disagrees beyond tolerance, or a paper figure
+  has no project counterpart that you'd still like to flag as partial;
+  ❌ when either side is missing.
+
+**The overall feel:** scholarly, low-contrast, hand-made, generous
+whitespace, chrome recedes, the page itself carries the eye. If a
+choice feels modern (sharp shadows, saturated badges, system-ui type,
+solid-fill buttons), it is wrong.
+
+## Restrictions
+
+- You MUST NOT run the pipeline, recipes, `lc run`, or any code that
+  computes new results. The results directory is read-only input here.
+- You MUST NOT modify project source code, `astra.yaml`, or anything in
+  `scripts/` or `results/`. The only files this skill writes are
+  `.lightcone/comparison_manifest.json`,
+  `.lightcone/build_comparison.py`, and
+  `.lightcone/comparison.html`. Assume `.lightcone/` already exists; never
+  write into `results/`.
+- You MUST NOT fabricate values. If a paper number is not stated in the
+  paper source, `targets/targets.md`, `comparison-report.yaml`, or
+  `astra.yaml`, leave it null. If a project number is not recorded in a
+  result file or comparison report, leave it null. Flag, don't fill.
+- You MUST embed every image as base64 -- the HTML must be portable to
+  another machine without breaking image references.
+- You MUST NOT write the HTML by hand with inlined base64 strings; use
+  the helper script. (Multi-MB base64 in tool-call arguments is what
+  this rule prevents.)
+
+## Anti-patterns
+
+- **Running the pipeline to fill in a missing value** -- the whole point
+  is to surface what is missing; never paper over a gap.
+- **Embedding PDFs as PDFs** -- PDFs must be rasterized to PNG before
+  base64-encoding. Browsers can technically render PDF data URIs, but
+  they break consistent layout, scale poorly, and force a viewer
+  chrome we cannot style. Convert to PNG via `pdf2image` /
+  `pypdfium2` / `pdftoppm` / ImageMagick (in that fallback order); if
+  none are available, render a ⚠️ placeholder rather than embedding
+  the PDF.
+- **Statistical comparison beyond ±1σ** -- this skill is a static visual
+  comparison plus a coarse value check. Do not compute KS tests, Δχ²,
+  or anything else. The user can eyeball the figures.
+- **Reading the paper wholesale** -- limit reads to abstract, results,
+  discussion; or delegate the inventory pass to one subagent.
+- **Bundling matching into the helper script** -- the helper's job is
+  rendering, not deciding which paper figure pairs with which baseline
+  file. Do all matching in Phase 2 (manifest construction) so a human
+  can audit the pairings by reading the JSON.
+- **Silent overwrites** -- if `.lightcone/comparison.html` already
+  exists, mention it in the summary line ("overwrote previous report").
+- **Modern web-app styling** -- saturated brand colors, system-ui type,
+  flat-fill buttons, sharp drop shadows, dark-mode toggles, animated
+  transitions. The Vellum aesthetic is non-negotiable; if you find
+  yourself reaching for `#0d6efd` or `font-family: system-ui`, stop.
+- **Missing page card.** The single `<div class="page">` with
+  `background: var(--surface)` + soft drop shadow is the headline
+  visual. A page that lets the parchment grain reach edge-to-edge with
+  no cream card on top is broken. Always check the rendered HTML has
+  `.page { background: var(--surface); box-shadow: ... }`.
+- **Per-cell absolutely-positioned badges.** Status badges live inside
+  one `.row-head` per row, in flow next to the row title -- never
+  `position: absolute; top: 0.7rem; right: 0.8rem;` inside each cell.
+  The absolute positioning overlaps the cell heading and emits a
+  "rendered" badge per existing file regardless of the row's overall
+  comparison state, which destroys the at-a-glance status signal.
+- **Values rendered as a `<table>`.** Values must use the same card
+  layout as figures (`.row` → `.row-head` + `.row-grid` of two
+  `.cell`s). Collapsing the values section to an HTML table looks like
+  a spreadsheet and breaks visual rhythm with the figures section.
+- **Thin values list.** Aim for ≥6 value entries on a typical paper.
+  If the manifest ends up with 1--3 values, the report feels empty;
+  re-harvest from `astra.yaml`'s `findings:` and the paper's results
+  section before generating.
diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
index 5ea9fc58..e7afbeb6 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -5,12 +5,13 @@ description: >
   about the paper and the intended scope, draft a per-paper reproduction
   constitution, then launch a ralph loop that drives the multi-session
   reproduction work. Composes sibling skills for each phase: managing-
-  bibliography for ACQUIRE, narrative for SPECIFY, check-sentence-by-
-  sentence + figure-comparison for COMPARE. Use when the user wants to
-  reproduce a paper, has a DOI or arXiv ID and wants to start a
-  reproduction project, or asks to "reproduce <paper>", "set up
-  reproduction", "paper2astra", "/paper2astra <doi>", or hands you a
-  published paper as a starting point for ASTRA work.
+  bibliography for ACQUIRE and narrative for SPECIFY. COMPARE follows the
+  original Paper2ASTRA target-ledger structure directly rather than requiring
+  sibling comparison skills. Use when the user wants to reproduce a paper,
+  has a DOI or arXiv ID and wants to start a reproduction project, or asks
+  to "reproduce <paper>", "set up reproduction", "paper2astra",
+  "/paper2astra <doi>", or hands you a published paper as a starting point
+  for ASTRA work.
 ---
 
 # paper2astra
@@ -36,11 +37,15 @@ paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. Al
 | [`/constitution`](../constitution/SKILL.md) | INTERVIEW — drafting the per-paper reproduction constitution |
 | [`/ralph-loops`](../ralph-loops/SKILL.md) | After interview — launches the loop that drives all subsequent phases |
 | [`/narrative`](../narrative/SKILL.md) | SPECIFY — authoring the `narrative:` and `rationale:` prose in `astra.yaml` |
-| [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) | COMPARE — paper-vs-code TeX audit (Nolan's skill) |
-| [`/figure-comparison`](../figure-comparison/SKILL.md) | COMPARE — HTML side-by-side reference vs reproduced figures (Nolan's skill) |
 
 paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them.
 
+After paper2astra completes, recommend adjacent follow-up skills when useful:
+[`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) audits
+paper claims against code locations, and
+[`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable
+side-by-side HTML report for paper artifacts versus reproduced results.
+
 ## Workflow
 
 ### Interview (interactive — once per project)
@@ -145,7 +150,6 @@ Workdir signals (file existence implies the phase has been done):
 - [`/ralph-loops`](../ralph-loops/SKILL.md) — for the loop that drives phases
 - [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE
 - [`/narrative`](../narrative/SKILL.md) — for SPECIFY
-- `/check-sentence-by-sentence`, `/figure-comparison` — for COMPARE (Nolan's skills; see Provenance)
 
 ## Discipline
 
@@ -165,4 +169,4 @@ Workdir signals (file existence implies the phase has been done):
 
 `paper2astra` is a fresh skill, but the phase prose ports 1:1 from the prompts in [`LightconeResearch/Paper2ASTRA/src/paper2astra/prompts/`](https://github.com/LightconeResearch/Paper2ASTRA/tree/main/src/paper2astra/prompts) (commit b3b54b5 and onward on `feat/skill-form-redesign`). The Paper2ASTRA Python package retires once this skill is in regular use; the repo persists as a reference for the original prompts and pipeline structure.
 
-The two compare-phase sibling skills (`check-sentence-by-sentence` and `figure-comparison`) originate from Nolan Koblischke's work on the [Reproductions](https://github.com/LightconeResearch/Reproductions) repo. They are credited in their own SKILL.md bodies; tag him post-publish so he can PR the canonical versions wherever they should ultimately live.
+The complementary skills (`check-sentence-by-sentence` and `figure-comparison`) originate from Nolan Koblischke.
diff --git a/claude/lightcone/skills/paper2astra/references/compare.md b/claude/lightcone/skills/paper2astra/references/compare.md
index ee00a4f3..1cb1a375 100644
--- a/claude/lightcone/skills/paper2astra/references/compare.md
+++ b/claude/lightcone/skills/paper2astra/references/compare.md
@@ -16,11 +16,6 @@ The constitution's per-phase mode is **always interactive** for this phase. Paus
 - `comparison-report.yaml` — structured verdict
 - `comparison-report.md` — human-readable summary
 
-## Sibling skills to invoke
-
-- **`/figure-comparison`** — HTML side-by-side reference vs reproduced figures, with structured judgment per panel. Invoke per figure target. (Nolan's skill; see `../figure-comparison/SKILL.md`.)
-- **`/check-sentence-by-sentence`** — paper-vs-code TeX audit. Use when SPECIFY's evidence quotes need re-verification against the source paper, particularly when COMPARE flags a result as `partial` and the cause may be a misinterpretation of paper text. (Nolan's skill; see `../check-sentence-by-sentence/SKILL.md`.)
-
 ## Result path convention
 
 For an output with `id: X`, the reproduced result lives at `results/<universe_id>/X.<ext>`:
@@ -40,7 +35,7 @@ For an output with `id: X`, the reproduced result lives at `results/<universe_id
 
 **Metrics.** Judge whether the reproduced value is scientifically equivalent to the expected value from `targets/targets.md`. Numerical tolerance comes from the target's stated precision; bare match is not the bar.
 
-**Figures.** Read the reference figure from `targets/` and compare to the reproduced image. Focus on shape / trend, axis ranges, key features (peaks, inflections, curve ordering), and magnitudes. **Do NOT require pixel-perfect matches** — stochastic methods produce variation. Judge whether the same scientific conclusion follows from both figures. **Use `/figure-comparison`** for HTML side-by-side rendering and structured per-panel judgment.
+**Figures.** Read the reference figure from `targets/` and compare to the reproduced image. Focus on shape / trend, axis ranges, key features (peaks, inflections, curve ordering), and magnitudes. **Do NOT require pixel-perfect matches** — stochastic methods produce variation. Judge whether the same scientific conclusion follows from both figures.
 
 **Tables.** Compare key values noted in `targets/targets.md` first, then remaining values. Reference tables are in `targets/`.
 
@@ -93,5 +88,4 @@ The verdict is the agent's judgment; the **decision to keep iterating** is the u
 ## Notes
 
 - **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-<N>.yaml` if useful, or commit each between iterations so git carries the history.
-- **The verdict is the agent's; the keep-iterating decision is the user's.** Treat them as separate.
-- **`/figure-comparison` is the trustworthy figure-judgment surface.** Direct image diffing without it tends to either over-fail (any pixel-level variation triggers a no-match) or over-pass (it sees that there are *some* shared features and rubber-stamps). The skill's structured prompt is the discipline.
+- **The verdict is the agent's; the keep-iterating decision is the user's.** Treat them as separate.
\ No newline at end of file
diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/paper2astra/references/interview.md
index 8a7ca8f0..f45ea6ed 100644
--- a/claude/lightcone/skills/paper2astra/references/interview.md
+++ b/claude/lightcone/skills/paper2astra/references/interview.md
@@ -109,7 +109,6 @@ because compute too large for available targets>.
 - `/paper2astra` — this skill (the orchestrator)
 - `/managing-bibliography` — ACQUIRE
 - `/narrative` — SPECIFY
-- `/check-sentence-by-sentence`, `/figure-comparison` — COMPARE
 
 ## Evidence
 
diff --git a/claude/lightcone/skills/paper2astra/references/summarize_run.md b/claude/lightcone/skills/paper2astra/references/summarize_run.md
index f72ba4ee..efa22fe8 100644
--- a/claude/lightcone/skills/paper2astra/references/summarize_run.md
+++ b/claude/lightcone/skills/paper2astra/references/summarize_run.md
@@ -28,6 +28,10 @@ A single markdown file at the project root, ~1–2 pages. Sections:
 4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target, with the path to the reproduced result.
 5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut that's stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded.
 6. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`).
+7. **Optional follow-ups** — recommend adjacent audit/reporting skills when
+   useful: `/check-sentence-by-sentence` for auditing paper claims against
+   code locations, and `/figure-comparison` for a portable side-by-side HTML
+   report comparing paper artifacts with reproduced results.
 
 Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
 

From 380b87dbf142813ef034c6b2324b5aa981c6f48c Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 00:11:31 +0200
Subject: [PATCH 007/124] =?UTF-8?q?skills:=20tend=20the=20bundle=20?=
 =?UTF-8?q?=E2=80=94=20drop=20provenance,=20sharpen=20constitution=20skill?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Skills should describe what they do, not where the bundle picked them up
from. Drops the Provenance section across the bundled skills (constitution,
paper2astra, ralph-loops, managing-bibliography) and the Origin column in
the bundle README. Each SKILL.md now stands on its own.

constitution skill specifically:
- Renames references/constitute.md → constitution.md (drops verb-skill-title
  convention)
- Adds Reshape, don't accrete (principle) and Amendment scaffolding
  (anti-pattern) to both the SKILL body and the depth reference: when the
  desired state evolves, reshape the body, don't append "Round 2" sections
- Tightens description and trigger phrases (constitution-first, not
  ralph-first); keeps "constitute" / "ralph spec" as keywords for recall
- Workflow Step 4 names /ralph-loops and /shuttle as concrete runners; the
  runner is interchangeable, the constitution is what matters
- Generalizes "snakemake rule" example in the depth reference to a
  checklist/plan framing
- Drops the GitHub-link to felt in crafting.md (cross-skill bookkeeping)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/README.md             | 31 ++++------
 claude/lightcone/skills/constitution/SKILL.md | 58 +++++++++----------
 .../{constitute.md => constitution.md}        | 11 ++--
 .../constitution/references/crafting.md       |  2 +-
 .../skills/managing-bibliography/SKILL.md     | 10 ----
 claude/lightcone/skills/paper2astra/SKILL.md  |  8 +--
 claude/lightcone/skills/ralph-loops/SKILL.md  | 16 -----
 7 files changed, 48 insertions(+), 88 deletions(-)
 rename claude/lightcone/skills/constitution/references/{constitute.md => constitution.md} (84%)

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 439337cd..9d835d16 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -16,27 +16,20 @@ Each subdirectory is one Claude Code skill: `SKILL.md` plus optional `references
 
 A self-contained toolkit for reproducing published papers in ASTRA. The bundle is co-located so a single `lc init` brings the full toolkit into a project — no plugin marketplace, no separate installs.
 
-| Skill | Role | Origin |
-|---|---|---|
-| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and launches a ralph loop against it. | New for the bundle. |
-| [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. | Cail's ([lightcone-cli#86](https://github.com/LightconeResearch/lightcone-cli/pull/86), ported from lightcone-ui#10). |
-| [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. | Merged from [`cailmdaley/skills/skills/constitution`](https://github.com/cailmdaley/skills/tree/main/skills/constitution) (procedural backbone) + Cail's personal felt references (taste — two diamonds, six stances, funnel ledger, qualitative self-check), with felt-optional framing. |
-| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Launched by paper2astra after the interview. | Direct copy from [`cailmdaley/skills/skills/ralph-loops`](https://github.com/cailmdaley/skills/tree/main/skills/ralph-loops). |
-| [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. | Direct copy of Cail's personal `~/.claude/skills/managing-bibliography` (newer than the public version). |
-| `check-sentence-by-sentence` | Paper-vs-code TeX audit via sub-agents; locates `file:line` or `NOT FOUND`. Invoked by paper2astra during COMPARE. | Nolan Koblischke's, on his Reproductions-branch. **Not yet pushed publicly** — see "Pending bundle additions" below. |
-| `figure-comparison` | HTML side-by-side: original figures/tables/numerics vs replicated. Invoked by paper2astra during COMPARE. | Same — Nolan's, pending. |
+| Skill | Role |
+|---|---|
+| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and launches a ralph loop against it. |
+| [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. |
+| [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. |
+| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Launched by paper2astra after the interview. |
+| [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. |
+| `check-sentence-by-sentence` | Paper-vs-code TeX audit via sub-agents; locates `file:line` or `NOT FOUND`. Invoked by paper2astra after SUMMARIZE_RUN as an opt-in audit. *(pending bundle integration)* |
+| `figure-comparison` | HTML side-by-side: original figures/tables/numerics vs replicated. Auto-invoked by paper2astra at SUMMARIZE_RUN. *(pending bundle integration)* |
 
 The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra.
 
 ### Why bundle (not depend on plugin install)
 
-- **Testability.** We want to verify paper2astra invokes constitution + ralph-loops + the others correctly. That only works if all are in the same checkout.
-- **Single install path.** `lc init` is the install path for lightcone-cli skills. Adding a separate "also install Cail's public skills via plugin marketplace" step is friction we don't need.
-- **Copy-with-credit costs nothing.** The copied skills retain attribution to their original authors in the SKILL body; if those skills update upstream, we re-sync.
-- **Future consolidation is open.** Per Francois's "next week we improve" framing, the long-run shape might be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all.
-
-### Pending bundle additions
-
-- **`check-sentence-by-sentence`** and **`figure-comparison`** — Nolan Koblischke's two skills. Per the bundle constitution ([`lightcone/.felt/lightcone/paper2astra-as-skill/skill-bundle`](https://github.com/LightconeResearch/lightcone/blob/main/.felt/lightcone/paper2astra-as-skill/skill-bundle.md)), these are part of the bundle, but at first cut they were not yet pushed to any public branch (only living on Nolan's local working tree on his Reproductions checkout). When Nolan pushes them, copy with attribution into this directory; paper2astra's SKILL.md and COMPARE reference already name them as expected siblings, so the integration is wire-compatible the moment they land.
-
-  Until then, COMPARE falls back to direct image-diff judgment without `/figure-comparison`'s structured per-panel rendering, and SPECIFY's evidence-quote re-verification (when COMPARE flags `partial`) falls back to manual Grep against `work/reference/document.md` without `/check-sentence-by-sentence`'s sub-agent audit. Both fallbacks are workable but lossier than the intended path.
+- **Testability.** We want to verify paper2astra invokes constitution + ralph-loops + the others correctly. That only works when all are in the same checkout.
+- **Single install path.** `lc init` brings the full toolkit. Adding a separate plugin-marketplace step is friction we don't need.
+- **Future consolidation is open.** The long-run shape may be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all. See [[lightcone/skills-location-policy]].
diff --git a/claude/lightcone/skills/constitution/SKILL.md b/claude/lightcone/skills/constitution/SKILL.md
index 58384960..7fa24df3 100644
--- a/claude/lightcone/skills/constitution/SKILL.md
+++ b/claude/lightcone/skills/constitution/SKILL.md
@@ -1,21 +1,24 @@
 ---
 name: constitution
 description: >
-  Draft a constitution — a markdown spec describing a desired state for
-  autonomous iteration. Study the problem space, shape the spec
-  interactively (two-diamonds rhythm; six stances on demand), then hand
-  it to a runner — a ralph loop, a shuttle dispatch, or any other
+  Draft a constitution — a markdown document describing a desired state
+  for autonomous iteration. Study the problem space, shape the
+  constitution interactively (two-diamonds rhythm; six stances on
+  demand), then hand it to a runner — `/ralph-loops` for a tmux loop,
+  felt's `/shuttle` for fiber-tracked dispatch, or any other
   iteration-runner. Use for any work where adaptation matters more than
-  a fixed plan: science, refactoring, exploration, creative work.
-  Triggers: "constitution", "constitute", "ralph spec", "set up a ralph",
-  "create a ralph", "write a spec".
+  a fixed plan: science, refactoring, exploration, creative work,
+  research narratives.
+  Triggers: "constitution", "constitute", "draft a constitution",
+  "ralph spec", "set up a ralph", "shuttle this", "write a spec for
+  autonomous iteration".
 ---
 
 # Constitution
 
 A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It's designed to outlast any single agent or iteration and remain valid as the world changes around it. A good constitution never says "50 files remain" because that's a snapshot that goes stale; it says "check `grep -r 'old_pattern'`" because that's a principle that stays true until the work is done.
 
-Constitutions don't prescribe steps. They describe what the system looks like when it's right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what's highest value. In a ralph loop, each iteration does this with fresh context.
+Constitutions don't prescribe steps. They describe what the system looks like when it's right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what's highest value. Each iteration of the work does this with fresh context.
 
 This matters most in science and exploratory work, where each decision is informed by the result just before it. A plan assumes you know the path; a constitution trusts the agent to find it — with taste, judgment, and fresh eyes each time.
 
@@ -23,25 +26,25 @@ This matters most in science and exploratory work, where each decision is inform
 
 ## Workflow
 
-1. **Study** — Read relevant files, understand existing patterns. This informs the *spec*, not implementation. The goal is pointers that iterations will follow.
+1. **Study** — Read relevant files, understand existing patterns. This informs the *constitution*, not implementation. The goal is pointers that iterations will follow.
 
-2. **Draft** — Create a markdown spec file. The bundled template lives in the sibling `ralph-loops` skill:
+2. **Draft** — Create a markdown file for the constitution. The bundled template lives in the sibling `ralph-loops` skill:
    ```bash
-   cp ../ralph-loops/assets/spec.md my-spec.md
+   cp ../ralph-loops/assets/spec.md my-constitution.md
    ```
-   (or copy directly from `claude/lightcone/skills/ralph-loops/assets/spec.md` if you're outside a skill).
-   Fill in what you can — don't wait until it's perfect.
+   If felt is installed and you're working in a felt-tracked project, you can author the constitution as a fiber instead — `felt add <slug> "Constitution title" -s open -t constitution` — and runners that read fibers (felt-shuttle) will pick it up. Fill in what you can; don't wait until it's perfect.
 
 3. **Refine** — Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. The two-diamonds rhythm and six stances in [`references/crafting.md`](references/crafting.md) help most when the user is deciding something non-trivial. Apply the qualitative ambiguity self-check before launching.
 
-4. **Launch** — When approved, hand the spec to whichever runner is appropriate. Common options:
+4. **Launch** — When approved, hand the constitution to whichever runner is appropriate. Common options:
 
-   - **`/ralph-loops`** — bundled, manual loop runner. Tmux session re-spawns iterations against the spec until status flips off open/active.
+   - **`/ralph-loops`** — bundled tmux loop runner. Re-spawns iterations against the constitution until the runner sees its done-conditions met.
      ```bash
-     ../ralph-loops/scripts/ralph my-spec.md [--backend claude|codex] [-- extra-flags...]
+     ../ralph-loops/scripts/ralph my-constitution.md [--backend claude|codex] [-- extra-flags...]
      ```
      Add `-- --chrome` for visual/frontend work. Session: `ralph-<spec-name>`. Attach: `tmux attach -t ralph-<spec-name>`.
-   - **External dispatchers** (e.g. shuttle, when felt is installed) — watch a fiber tree for dispatch-eligible blocks and spawn single-shot workers. Their configuration is owned outside this skill.
+   - **`/shuttle`** (felt-aware) — fiber-tracked dispatch. Reads the `shuttle:` block from the fiber's frontmatter and spawns single-shot workers across sessions; the kanban surfaces what's in flight.
+   - **Other dispatchers** — anything that reads a markdown spec or fiber and spawns iterations. Their configuration is owned outside this skill.
 
    The constitution stays editable while iteration runs; successive iterations re-read it each cycle, so refinements between iterations are normal.
 
@@ -59,7 +62,7 @@ File paths, existing patterns, architectural constraints. Things iterations
 need to *find* but not *achieve*.
 
 ## Skills
-Which skills to activate before working (e.g., /snakemake, /narrative).
+Which skills to activate before working.
 
 ## Evidence
 How to check progress — commands, test suites, grep patterns. Pointers to
@@ -70,7 +73,7 @@ Uncertainties the user should weigh in on. Iterations add to this; the user
 resolves between loops.
 ```
 
-For deeper reference on each section's voice and the discipline that keeps a constitution from drifting into a plan, see [`references/constitute.md`](references/constitute.md).
+For deeper reference on each section's voice and the discipline that keeps a constitution from drifting into a plan, see [`references/constitution.md`](references/constitution.md).
 
 ## Principles
 
@@ -78,6 +81,8 @@ For deeper reference on each section's voice and the discipline that keeps a con
 
 **Pointers, not snapshots.** "Check `grep -r 'old_pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
 
+**Reshape, don't accrete.** When the desired state evolves — testing surfaces a gap, a meeting changes the priority, a sibling decision lands — rewrite the affected sections so the body still reads as today's desired state. Don't tack on a "Round 2" section; don't add an "Amendments" appendix; don't keep the old framing alongside the new one as a sediment. A green-field constitution will change a lot as it matures. The chronology lives in the runner's history surface; the body lives in *now*.
+
 **Prefer existing systems.** Before designing anything new: can what's there handle this?
 
 **Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
@@ -103,24 +108,15 @@ Some constitutions don't build code — they shape artifacts like documentation,
 
 **Decision logs in the body.** "Resolved choices" / "Process notes" sections turn the constitution into a process journal. When a question gets answered, fold the answer into the narrative where it's contextually relevant — into Invariants, Desired State, Context — and let the runner's history surface (`felt history`, commits, etc.) carry the chronology.
 
+**Amendment scaffolding.** "Round 2", "v2 deltas", "Updates 2026-05-04 →". The same failure as a decision log, played out across edits: the body becomes a sediment of layered framings instead of the current desired state. When the desired state shifts, *reshape* the affected sections — rewrite headings, update prose, drop what no longer applies — so the document still reads as one coherent description of now.
+
 ---
 
 ## References
 
-- [`references/constitute.md`](references/constitute.md) — depth on
-  drafting voice, sections, and the felt-flavored crafting workflow.
-  Felt-optional: read past the felt-specific commands if felt isn't installed.
+- [`references/constitution.md`](references/constitution.md) — depth on drafting voice, sections, and the crafting workflow. Felt-aware where felt is installed; the procedural steps work without felt too.
 - [`references/crafting.md`](references/crafting.md) — two-diamonds
   rhythm, six stances, the funnel ledger, and the qualitative ambiguity
   self-check. Use this when the conversation has careful-thinking
   character — not every constitution drafting needs it, but the ones that
   do are the ones that benefit most.
-
-## Provenance
-
-Merged from two sources:
-
-- [`cailmdaley/skills/skills/constitution/`](https://github.com/cailmdaley/skills/tree/main/skills/constitution) (public, procedural, felt-agnostic) — provided the SKILL body backbone.
-- `~/.claude/skills/felt/references/{constitute,crafting}.md` (personal felt skill) — provided the depth references; felt-specific commands have been softened to felt-optional framing so this skill stands alone in lightcone-cli.
-
-Copied here for the paper-reproduction bundle so `/paper2astra` can invoke `/constitution` to draft the per-paper reproduction constitution during its interview phase. The merged shape may flow back upstream; re-sync as needed.
diff --git a/claude/lightcone/skills/constitution/references/constitute.md b/claude/lightcone/skills/constitution/references/constitution.md
similarity index 84%
rename from claude/lightcone/skills/constitution/references/constitute.md
rename to claude/lightcone/skills/constitution/references/constitution.md
index 198a3a18..09e63568 100644
--- a/claude/lightcone/skills/constitution/references/constitute.md
+++ b/claude/lightcone/skills/constitution/references/constitution.md
@@ -1,4 +1,4 @@
-# Constitute — depth reference
+# Constitution — depth reference
 
 Drafting a constitution. The SKILL body covers the procedural backbone (Study → Draft → Refine → Launch). This reference goes deeper on voice, sections, and the discipline that keeps a constitution from sliding into a plan.
 
@@ -20,14 +20,14 @@ Constitutions do not prescribe steps. They describe what the system looks like w
 
 ---
 
-## When to constitute
+## When to write a constitution
 
 - Work where adaptation matters more than a fixed plan: scientific investigation, exploratory refactoring, creative writing
 - The desired state is clear (or can be made clear) but the path is not
 - Iterations need to re-read with fresh context and make judgment calls
 - A checklist would either be wrong after one step or race through without judgment
 
-Do not constitute for: clearly-scoped atomic tasks, work that could be a snakemake rule, anything where a plan actually is the right shape.
+Don't write a constitution for: clearly-scoped atomic tasks, anything where a checklist or a plan is genuinely the right shape.
 
 ---
 
@@ -85,7 +85,7 @@ File paths, existing patterns, architectural constraints. Things iterations
 need to *find* but not *achieve*.
 
 ## Skills
-Which skills to activate before working (e.g., /snakemake, /narrative).
+Which skills to activate before working.
 
 ## Evidence
 How to check progress — commands, test suites, grep patterns. Pointers to
@@ -102,6 +102,8 @@ resolves between loops.
 
 **Pointers, not snapshots.** `check "grep -r 'old_pattern'"` not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
 
+**Reshape, don't accrete.** When the desired state evolves — testing surfaces a gap, a meeting changes the priority, a sibling decision lands — rewrite the affected sections so the body still reads as today's desired state. Don't tack on a "Round 2" section; don't add an "Amendments" appendix; don't keep the old framing alongside the new one as a sediment. A green-field constitution will change a lot as it matures, and a mature one will keep changing as reality does. The chronology lives in the runner's history surface (commits, `felt history` if felt is in use); the body lives in *now*.
+
 **Prefer existing systems.** Before designing anything new: can what is there handle this?
 
 **Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
@@ -128,6 +130,7 @@ Some constitutions do not build code — they shape artifacts like documentation
 - **Immutable seed.** Not our shape. The constitution is meant to be edited between iterations; do not treat it as frozen.
 - **Numerical convergence.** "Iteration stops when similarity ≥ 0.95" — wrong shape for science. Stop when the Evidence section says the desired state has been reached.
 - **Decision logs in the body.** "Resolved choices" / "Decisions made" / "Process notes" sections turn the constitution into a process journal. When a question gets answered (in conversation, via `AskUserQuestion`, in a review), fold the answer into the narrative where it is contextually relevant — into Invariants, Desired State, Context — and let the runner's chronological surface (commits, `felt history` if felt is in use) carry the chronology. The constitution describes *what is*, not *how we got here*; an "Open Questions" section that has been fully resolved should be deleted, not left as a victory log.
+- **Amendment scaffolding.** "Round 2", "v2 deltas", "Updates 2026-05-04 →", "Second round amendments". The same failure as a decision log, played out across edits: the body becomes a sediment of layered framings instead of the current desired state. When the desired state shifts, *reshape* the affected sections — rewrite headings, update prose, drop what no longer applies — so the document still reads as one coherent description of now. The story of how it got here is what `felt history append` (or commit messages, when felt isn't in use) and the outcome blurb are for.
 
 ---
 
diff --git a/claude/lightcone/skills/constitution/references/crafting.md b/claude/lightcone/skills/constitution/references/crafting.md
index 15f65b32..21595414 100644
--- a/claude/lightcone/skills/constitution/references/crafting.md
+++ b/claude/lightcone/skills/constitution/references/crafting.md
@@ -176,7 +176,7 @@ What comes out of the diamonds maps onto wherever you keep structured commitment
 | Sub-analysis scope | New sub-analysis in `astra.yaml`, or a new fiber with `inputs`/`outputs` |
 | Process-level lesson that generalizes | Edit to root CLAUDE.md / root fiber |
 
-If felt is installed, the [`felt:felt`](https://github.com/cailmdaley/felt) skill carries the tier ladder (Annotated → Formalized → Tempered) and the common frontmatter shapes. Without felt, the same shapes apply directly inline in `astra.yaml` or the constitution itself.
+The same shapes apply directly inline in `astra.yaml` or the constitution itself; no separate substrate is required.
 
 ---
 
diff --git a/claude/lightcone/skills/managing-bibliography/SKILL.md b/claude/lightcone/skills/managing-bibliography/SKILL.md
index c9143a46..1924693a 100644
--- a/claude/lightcone/skills/managing-bibliography/SKILL.md
+++ b/claude/lightcone/skills/managing-bibliography/SKILL.md
@@ -150,13 +150,3 @@ Adapt to your project structure:
 - Year is straightforward: `year = YYYY`
 - Before appending, verify file exists and has proper BibTeX format
 - Preserve existing entries when appending new ones
-
----
-
-## Provenance
-
-Originally maintained at `~/.claude/skills/managing-bibliography/SKILL.md`
-(Cail's personal version). Copied here so the lightcone-cli
-paper-reproduction bundle has the full toolkit available via `lc init` —
-without depending on a separate plugin install. The personal copy may be
-ahead; re-sync as needed.
diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
index 5ea9fc58..e24a6f12 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -145,7 +145,7 @@ Workdir signals (file existence implies the phase has been done):
 - [`/ralph-loops`](../ralph-loops/SKILL.md) — for the loop that drives phases
 - [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE
 - [`/narrative`](../narrative/SKILL.md) — for SPECIFY
-- `/check-sentence-by-sentence`, `/figure-comparison` — for COMPARE (Nolan's skills; see Provenance)
+- `/check-sentence-by-sentence`, `/figure-comparison` — for COMPARE
 
 ## Discipline
 
@@ -160,9 +160,3 @@ Workdir signals (file existence implies the phase has been done):
 - **Re-implementing what astra already does.** If `astra validate` returns clean, do not write a separate validator. If `astra paper add` caches the PDF, do not write a separate cache.
 - **Treating Paper2ASTRA workdir as legacy.** It is not legacy — it is the substrate. The phase references inherit its conventions intentionally.
 - **Bundling everything into one ralph iteration.** Each iteration runs one or two phases, then exits. The constitution is realized across many iterations.
-
-## Provenance
-
-`paper2astra` is a fresh skill, but the phase prose ports 1:1 from the prompts in [`LightconeResearch/Paper2ASTRA/src/paper2astra/prompts/`](https://github.com/LightconeResearch/Paper2ASTRA/tree/main/src/paper2astra/prompts) (commit b3b54b5 and onward on `feat/skill-form-redesign`). The Paper2ASTRA Python package retires once this skill is in regular use; the repo persists as a reference for the original prompts and pipeline structure.
-
-The two compare-phase sibling skills (`check-sentence-by-sentence` and `figure-comparison`) originate from Nolan Koblischke's work on the [Reproductions](https://github.com/LightconeResearch/Reproductions) repo. They are credited in their own SKILL.md bodies; tag him post-publish so he can PR the canonical versions wherever they should ultimately live.
diff --git a/claude/lightcone/skills/ralph-loops/SKILL.md b/claude/lightcone/skills/ralph-loops/SKILL.md
index 7910e49b..2b011fc5 100644
--- a/claude/lightcone/skills/ralph-loops/SKILL.md
+++ b/claude/lightcone/skills/ralph-loops/SKILL.md
@@ -54,19 +54,3 @@ If you **cannot find any remaining work**, update the spec's YAML frontmatter to
 ---
 
 Pattern adapted from [Ralph Wiggum](https://ghuntley.com/ralph/).
-
----
-
-## Provenance
-
-Originally from [`cailmdaley/skills`](https://github.com/cailmdaley/skills/tree/main/skills/ralph-loops).
-Copied into the lightcone-cli paper-reproduction bundle so it can compose
-with `paper2astra`, `constitution`, and the rest of the bundle without a
-separate plugin install. The canonical version may be ahead; re-sync as
-needed.
-
-In the bundle, `/paper2astra` invokes `/constitution` to draft a per-paper
-reproduction constitution and then launches a ralph loop against it via
-`scripts/ralph`. Successive iterations of the loop survey the workdir and
-git history, execute the next phase, and exit cleanly — see the bundle
-README at `../README.md`.

From 4a003d9c8ddb4e31ac0d705755b2e27f6ffc8ffb Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 00:20:26 +0200
Subject: [PATCH 008/124] skills/paper2astra: align SKILL.md with restructured
 constitution
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The constitution body was moved to abstract desired-state level
(skill-bundle.md, 2026-05-06); this brings the SKILL.md and bundle
README in line with the new shape:

- Interview produces both a per-paper constitution AND a per-paper
  CLAUDE.md (the durable project memory; constitution is the runner's
  spec).
- Three runtime modes — interactive / bash-loop / tmux-orchestrated —
  picked during the interview from environment + preference. Tmux is
  preferred when available, never required; the interview probes with
  command -v tmux.
- Frugality vs rigor (weak vs strong) termination criterion,
  independent of mode.
- Code-as-canonical fidelity discipline: when work/reference/code/
  exists, the agent reads relevant code on every implementing
  iteration; code wins for numerics + method.
- <paper-slug>/open-questions.md as the running report for non-
  interactive seams the loop can't ratify; user resolves at session
  boundaries instead of blocking on AskUserQuestion mid-sub-agent.
- /figure-comparison auto-runs as a sub-agent at the end of
  SUMMARIZE_RUN; /check-sentence-by-sentence stays opt-in
  (token-expensive).

Bundle README's per-paper-skill table updated to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/README.md            |  8 +-
 claude/lightcone/skills/paper2astra/SKILL.md | 94 +++++++++++++++-----
 2 files changed, 75 insertions(+), 27 deletions(-)

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 9d835d16..a0587e33 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -18,13 +18,13 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and launches a ralph loop against it. |
+| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. |
 | [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. |
 | [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. |
-| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Launched by paper2astra after the interview. |
+| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by paper2astra's bash-loop and tmux-orchestrated runtime modes. |
 | [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. |
-| `check-sentence-by-sentence` | Paper-vs-code TeX audit via sub-agents; locates `file:line` or `NOT FOUND`. Invoked by paper2astra after SUMMARIZE_RUN as an opt-in audit. *(pending bundle integration)* |
-| `figure-comparison` | HTML side-by-side: original figures/tables/numerics vs replicated. Auto-invoked by paper2astra at SUMMARIZE_RUN. *(pending bundle integration)* |
+| `figure-comparison` | HTML side-by-side: original figures/tables/numerics vs replicated. **Auto-invoked** by paper2astra as a sub-agent at the end of SUMMARIZE_RUN. *(pending bundle integration)* |
+| `check-sentence-by-sentence` | Paper-vs-code TeX audit via sub-agents; locates `file:line` or `NOT FOUND`. **Opt-in** suggestion to the user after SUMMARIZE_RUN — token-expensive, never auto-invoked. *(pending bundle integration)* |
 
 The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra.
 
diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
index e24a6f12..3437acd8 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -15,10 +15,12 @@ description: >
 
 # paper2astra
 
-Reproduce a published paper in ASTRA. The skill is **interview-first**: a short interactive crafting phase up front that produces a per-paper reproduction constitution. After the interview, paper2astra hands the constitution to a ralph loop that drives multi-session reproduction. Successive iterations of the loop survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
+Reproduce a published paper in ASTRA. The skill is **interview-first**: a short interactive crafting phase up front that produces both a **per-paper reproduction constitution** and a **per-paper `CLAUDE.md`**. After the interview, paper2astra hands the constitution to a multi-session loop that drives the reproduction. Successive iterations survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
 
 This is a Claude-Code-native skill. There is no Python orchestrator, no state machine, no resume mechanic — the workdir on disk + git history are the substrate.
 
+A reproduction does not fit in one context window. The loop is, in its simplest form, a way to split one goal across many context windows so each iteration starts uncluttered. That's the substrate, not an aesthetic.
+
 ## When to use this skill
 
 - The user has a paper (DOI, arXiv ID, or PDF) and wants to reproduce its analysis
@@ -34,12 +36,12 @@ paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. Al
 |---|---|
 | [`/managing-bibliography`](../managing-bibliography/SKILL.md) | ACQUIRE — arXiv LaTeX source download (primary) and BibTeX caching |
 | [`/constitution`](../constitution/SKILL.md) | INTERVIEW — drafting the per-paper reproduction constitution |
-| [`/ralph-loops`](../ralph-loops/SKILL.md) | After interview — launches the loop that drives all subsequent phases |
+| [`/ralph-loops`](../ralph-loops/SKILL.md) | After interview — launches the loop that drives all subsequent phases (when the chosen runtime mode is one of the loop modes) |
 | [`/narrative`](../narrative/SKILL.md) | SPECIFY — authoring the `narrative:` and `rationale:` prose in `astra.yaml` |
-| [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) | COMPARE — paper-vs-code TeX audit (Nolan's skill) |
-| [`/figure-comparison`](../figure-comparison/SKILL.md) | COMPARE — HTML side-by-side reference vs reproduced figures (Nolan's skill) |
+| [`/figure-comparison`](../figure-comparison/SKILL.md) | SUMMARIZE_RUN — auto-invoked as a sub-agent at the end so the HTML side-by-side is ready for the user when the loop completes (Nolan's skill) |
+| [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) | After SUMMARIZE_RUN — opt-in suggestion to the user; token-expensive, so never auto-invoked (Nolan's skill) |
 
-paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them.
+paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them. The siblings stand alone; they don't know about paper2astra.
 
 ## Workflow
 
@@ -47,20 +49,41 @@ paper2astra does not re-implement what these skills already do — it tells the
 
 The interview is the only phase paper2astra runs interactively. Read [`references/interview.md`](references/interview.md) in full before starting.
 
-The interview has four jobs:
+The interview has six jobs:
 
 1. **Identify the paper** — DOI / arXiv ID / title; whether code is available; whether the user has prior experience with this paper.
 2. **Scope the reproduction** — full reproduction vs targeted (e.g. only the BAO fit), which figures/tables/numbers are the targets.
-3. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. The defaults are reasonable; the user gets to flip any of them.
-4. **Draft the per-paper constitution** — invoke `/constitution`. The constitution lives at the project root (or wherever the user prefers). It captures the paper, the scope, the per-phase mode choices, and the evidence checks.
+3. **Pick a runtime mode** — interactive / bash-loop / tmux-orchestrated. See "Runtime modes" below.
+4. **Pick a termination criterion** — frugality (weak) vs rigor (strong). See "Frugality vs rigor" below.
+5. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. The defaults are reasonable; the user gets to flip any of them.
+6. **Draft the per-paper constitution and CLAUDE.md** — invoke `/constitution` to draft the constitution. Author the per-paper `CLAUDE.md` from the same conversation: paper identity, user intent, what's known about the original codebase, runtime-mode choice, frugality-vs-rigor choice, the canonical-resolution rule (see "Code-as-canonical" below), conventions and warnings. The CLAUDE.md is the durable project memory every iteration's Claude session walks up to; the constitution is the runner's spec.
+
+Both files live inside the reproduction's directory. After they are approved the interview ends, and paper2astra launches whichever runtime the user chose.
+
+### Runtime modes
+
+The interview asks the user to pick *how* the loop runs. Three modes, picked from environment + preference:
+
+| Mode | What runs | Right when |
+|---|---|---|
+| **(1) Interactive** | No autonomous loop. The user prompts through phases by hand from the same Claude session, one or two phases at a time. | Tight control, small paper, or token budget is tight. No new substrate beyond Claude itself. |
+| **(2) Bash-loop** | A plain shell loop the user pastes into a terminal (`while …; do claude --dangerously-skip-permissions … ; done`-shaped). No tmux dependency. | Tmux isn't available locally and the connection is stable. Fragile across SSH disconnects unless wrapped in `nohup` — and `nohup` blocks interaction, so for unstable connections this isn't really a fix; mode (3) is. |
+| **(3) Tmux-orchestrated** | A loop inside a tmux session paper2astra drives directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the tmux pane, monitors, intervenes. | The smoothest path whenever tmux is available. Becomes the de-facto default once `lc launch claude` ships its registry-shipped python-slim agent container with tmux pre-installed. |
+
+The interview probes for tmux availability with `command -v tmux` and only offers mode (3) when present. Mode (3) is preferred when it's available; it isn't required.
+
+### Frugality vs rigor
+
+Independent of mode, the interview asks the user to pick the loop's termination criterion:
 
-After the constitution is approved, the interview ends. Launch the ralph loop:
+- **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights.
+- **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens.
 
-```bash
-../ralph-loops/scripts/ralph paper2astra-constitution.md
-```
+Strong is the default for fidelity-critical reproductions; weak is the default when the user explicitly wants to cap token spend. The choice goes into the per-paper CLAUDE.md and is honored by every iteration.
 
-Tell the user: *"Constitution drafted. Launching ralph loop in tmux session `ralph-paper2astra-constitution`. Each iteration will run one or two phases and exit; the next iteration picks up where it left off. Attach with `tmux attach -t ralph-paper2astra-constitution`."*
+### Where this is going
+
+Codex's `/goal` ([Simon Willison, 2026-04-30](https://simonwillison.net/2026/Apr/30/codex-goals/)) is the closest existing primitive — same shape, with a configurable token budget, made smooth by Codex's invisible compaction. When Anthropic ships an equivalent, modes (2) and (3) collapse into it. Until then, ralph is the substrate.
 
 ### Phases (driven by ralph iterations after the interview)
 
@@ -90,8 +113,8 @@ Defaults the constitution starts with:
 
 | Phase | Default | Why |
 |---|---|---|
-| ACQUIRE | sub-agent | Mostly mechanical; surfacing happens only on download failures. |
-| PARSE | sub-agent | Deterministic Docling / arXiv extraction. |
+| ACQUIRE | user choice | Mostly mechanical; surfacing happens only on download failures. |
+| PARSE | user choice | Deterministic Docling / arXiv extraction. |
 | SUMMARIZE | sub-agent | Parallel paper + code reading benefits from fresh context per task. |
 | EXTRACT_TARGETS | user choice | The selection of replication targets is sometimes obvious, sometimes wants user input. |
 | LITERATURE | sub-agent | One sub-agent per cited paper — pure parallel grunt-work. |
@@ -100,17 +123,34 @@ Defaults the constitution starts with:
 | IMPLEMENT | user choice | Mostly mechanical, but algorithm choices may want ratification. |
 | RUN | user choice | Mechanical, but failures need diagnosis. |
 | COMPARE | **interactive** | Verdict (was the reproduction close enough?) is the second mandatory user-ratification seam. |
-| SUMMARIZE_RUN | sub-agent | Final report; no decisions remain. |
+| SUMMARIZE_RUN | sub-agent | Final report; no decisions remain. `/figure-comparison` runs as a sub-agent inside this phase. |
+
+The constitution records the choice; iterations honor it. Sub-agent phases are spawned via the `Task` tool from inside the main loop session — that gives them fresh context but no user-reach. Interactive phases run inline in the loop session and may pause with `AskUserQuestion` at material seams.
+
+### Code-as-canonical
+
+When the original codebase is available at `work/reference/code/`, **the agent reads relevant code on every iteration when implementing**. Where paper and code disagree, the **code is canonical** for numerics, plotting, and method; the agent continues with the code's behavior and either ratifies (interactive phases) or logs (sub-agent / loop phases) the disagreement so the user resolves at the next interactive seam.
+
+This is the load-bearing fidelity discipline. Without it, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced (plot styles off, numerical results off). The per-paper CLAUDE.md restates the rule so every iteration's Claude session walks up to it.
+
+### Interactive seams vs the loop's running report
+
+Interview phases use `AskUserQuestion` directly — the user is at the wheel. Once the loop launches, the human is no longer present per-iteration, so the agent's discipline shifts:
+
+- **Interactive phase** (per the per-phase mode table): ratify decisions with the user via `AskUserQuestion`.
+- **Loop / sub-agent phase**: when a question would normally surface to the user (paper-vs-code conflicts, figures whose intent isn't obvious, ambiguities the constitution doesn't resolve), **append it to `<paper-slug>/open-questions.md`** and continue with the best-judgment default. The user reads the report at session boundaries (between iterations, or when checking on the loop) and answers in-place.
 
-The constitution records the choice; ralph iterations honor it. Sub-agent phases are spawned via the `Task` tool from inside the main loop session — that gives them fresh context but no user-reach. Interactive phases run inline in the loop session and may pause with `AskUserQuestion` at material seams.
+This matches what the user actually does: stays in the conversation while the seams are still soft, walks away while the loop grinds, comes back to a list of "things you'd want to know" rather than a paused agent waiting on input. The CLAUDE.md captures that the loop should always pour seams into `open-questions.md` rather than blocking.
 
 ### Material conflicts (the SPECIFY seam)
 
-Inside SPECIFY, when paper and code disagree on something material, do not silently pick one. Use `AskUserQuestion` to surface the conflict:
+Inside SPECIFY, when paper and code disagree on something material:
 
 - **Material** = a different choice would plausibly change a numeric result the paper reports.
 - **Stylistic / cosmetic / pure-tooling differences** are not material — record them in `implementation-notes.md` and move on.
-- **Default on user silence is paper.** If the user does not respond, take the paper's stated method as canonical and record the override (with reason) in a finding or insight.
+- **Code is canonical** for numerics and method per "Code-as-canonical" above.
+- **Interactive SPECIFY**: surface the conflict with `AskUserQuestion`. The user picks which option `universes/baseline.yaml` selects.
+- **Sub-agent SPECIFY** (rare; default is interactive): take code as canonical, record the conflict in `open-questions.md`, and preserve both options in `astra.yaml` so the user can flip baseline at the next interactive seam.
 
 Both choices land in `astra.yaml` as decision options. Whichever the user picks becomes the option selected by `universes/baseline.yaml`; the alternative is preserved as a sibling option for future universe runs. See `references/specify.md` for the full SPECIFY discipline.
 
@@ -142,21 +182,29 @@ Workdir signals (file existence implies the phase has been done):
 ## Skills (activate before working)
 
 - [`/constitution`](../constitution/SKILL.md) — for the interview's drafting phase
-- [`/ralph-loops`](../ralph-loops/SKILL.md) — for the loop that drives phases
+- [`/ralph-loops`](../ralph-loops/SKILL.md) — for the bash-loop and tmux-orchestrated runtime modes
 - [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE
 - [`/narrative`](../narrative/SKILL.md) — for SPECIFY
-- `/check-sentence-by-sentence`, `/figure-comparison` — for COMPARE
+- `/figure-comparison` — auto-invoked at end of SUMMARIZE_RUN (sub-agent)
+- `/check-sentence-by-sentence` — opt-in suggestion after SUMMARIZE_RUN
 
 ## Discipline
 
 - **paper2astra is the workflow story; phase references are the depth.** SKILL.md tells you when to read which reference; the references carry the prompt prose ported from the legacy Paper2ASTRA Python package.
+- **Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each iteration's first move is *survey*.
+- **Deterministic checks live in scripts.** When the answer is yes/no, call the script — `astra validate`, `git log`, `yq`, `ls`. Don't ask the agent to introspect what a deterministic check would tell you.
 - **Use the up-to-date CLI surfaces, not skill-specific wrappers.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces.
+- **arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv where there's no better source.
+- **The original code goes into `work/reference/code/`** during ACQUIRE when available, and stays there as the canonical reference for every subsequent iteration (see "Code-as-canonical" above).
+- **`/figure-comparison` auto-runs at SUMMARIZE_RUN; `/check-sentence-by-sentence` stays opt-in** — the latter is token-expensive (large fan-out of sub-agents).
 - **No synthetic data.** Unless the paper itself uses synthetic data as its input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement phase reference repeats this; treat it as load-bearing.
+- **Tmux preferred-when-available, never required.** Modes (1) and (2) work without it.
+- **The siblings don't know about paper2astra.** Each SKILL stands on its own.
 - **Workdir conventions stay.** The phase references preserve Paper2ASTRA's workdir layout (`work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`) so workdirs from the legacy Paper2ASTRA package are interoperable with workdirs driven by this skill.
 
 ## Anti-patterns
 
-- **Asking the user mid-sub-agent.** Sub-agent phases cannot reach the user. If the constitution puts SPECIFY in sub-agent mode and a material conflict surfaces, the sub-agent must record the conflict in a `decisions:` block (with both options preserved) and let the next interactive phase ratify it. Never make the sub-agent pick silently.
+- **Asking the user mid-sub-agent.** Sub-agent phases cannot reach the user. If a material conflict surfaces in a sub-agent phase, take the code's behavior (or paper's, if no code) as canonical, record the conflict in `open-questions.md` and as a `decisions:` block with both options preserved in `astra.yaml`, and let the next interactive phase ratify. Never make the sub-agent pick silently and discard the alternative.
 - **Re-implementing what astra already does.** If `astra validate` returns clean, do not write a separate validator. If `astra paper add` caches the PDF, do not write a separate cache.
 - **Treating Paper2ASTRA workdir as legacy.** It is not legacy — it is the substrate. The phase references inherit its conventions intentionally.
-- **Bundling everything into one ralph iteration.** Each iteration runs one or two phases, then exits. The constitution is realized across many iterations.
+- **Bundling everything into one iteration.** Each iteration runs one or two phases, then exits. The constitution is realized across many iterations.

From 0992daad1ccc701a0ae349b4291aa61b88f75c78 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 00:24:43 +0200
Subject: [PATCH 009/124] skills/paper2astra: align phase references with
 restructured constitution
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Carries the abstract desired-state shape down into the phase-level
references — interview, specify, implement, compare, summarize_run —
so the artifacts on disk match the constitution's body.

- interview.md: produces both <paper-slug>/CLAUDE.md and
  <paper-slug>/<constitution>.md from the same conversation; asks the
  six interview jobs (paper / scope / runtime / termination / per-phase
  mode / draft); CLAUDE.md template added; constitution example block
  carries runtime + termination + code-as-canonical.

- specify.md: code-as-canonical default — when work/reference/code/
  exists and the user is silent, code wins (was paper). Sub-agent
  SPECIFY appends the conflict to open-questions.md instead of
  blocking.

- implement.md: read relevant code on every implementing iteration.
  Disagreements log to open-questions.md (sub-agent) or surface via
  AskUserQuestion (interactive). Names the failure mode the first-
  paper test surfaced.

- compare.md: COMPARE no longer invokes /figure-comparison or
  /check-sentence-by-sentence — those moved to SUMMARIZE_RUN
  (auto-rendered HTML) and post-SUMMARIZE_RUN (opt-in audit)
  respectively. Note explains why the rich artifact belongs with the
  finished reproduction, not with each verdict iteration.

- summarize_run.md: explicitly orchestrates the /figure-comparison
  sub-agent (auto) and surfaces /check-sentence-by-sentence as
  opt-in suggestion. Outputs section now includes the HTML.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../skills/paper2astra/references/compare.md  |  11 +-
 .../paper2astra/references/implement.md       |   9 +-
 .../paper2astra/references/interview.md       | 161 ++++++++++++++----
 .../skills/paper2astra/references/specify.md  |  21 ++-
 .../paper2astra/references/summarize_run.md   |  15 +-
 5 files changed, 167 insertions(+), 50 deletions(-)

diff --git a/claude/lightcone/skills/paper2astra/references/compare.md b/claude/lightcone/skills/paper2astra/references/compare.md
index ee00a4f3..efda1ceb 100644
--- a/claude/lightcone/skills/paper2astra/references/compare.md
+++ b/claude/lightcone/skills/paper2astra/references/compare.md
@@ -16,10 +16,11 @@ The constitution's per-phase mode is **always interactive** for this phase. Paus
 - `comparison-report.yaml` — structured verdict
 - `comparison-report.md` — human-readable summary
 
-## Sibling skills to invoke
+## Sibling skills
 
-- **`/figure-comparison`** — HTML side-by-side reference vs reproduced figures, with structured judgment per panel. Invoke per figure target. (Nolan's skill; see `../figure-comparison/SKILL.md`.)
-- **`/check-sentence-by-sentence`** — paper-vs-code TeX audit. Use when SPECIFY's evidence quotes need re-verification against the source paper, particularly when COMPARE flags a result as `partial` and the cause may be a misinterpretation of paper text. (Nolan's skill; see `../check-sentence-by-sentence/SKILL.md`.)
+COMPARE itself does **not** invoke `/figure-comparison` or `/check-sentence-by-sentence`. Those run at SUMMARIZE_RUN (auto-invoked sub-agent) and post-SUMMARIZE_RUN (opt-in suggestion to the user) respectively. COMPARE produces the structured verdict; the rich side-by-side artifacts are downstream.
+
+If you find yourself wanting a fast visual sanity check of a single figure mid-COMPARE — to decide whether to retry IMPLEMENT — read the reference image and the reproduced image directly. The full HTML side-by-side waits for SUMMARIZE_RUN.
 
 ## Result path convention
 
@@ -40,7 +41,7 @@ For an output with `id: X`, the reproduced result lives at `results/<universe_id
 
 **Metrics.** Judge whether the reproduced value is scientifically equivalent to the expected value from `targets/targets.md`. Numerical tolerance comes from the target's stated precision; bare match is not the bar.
 
-**Figures.** Read the reference figure from `targets/` and compare to the reproduced image. Focus on shape / trend, axis ranges, key features (peaks, inflections, curve ordering), and magnitudes. **Do NOT require pixel-perfect matches** — stochastic methods produce variation. Judge whether the same scientific conclusion follows from both figures. **Use `/figure-comparison`** for HTML side-by-side rendering and structured per-panel judgment.
+**Figures.** Read the reference figure from `targets/` and compare to the reproduced image. Focus on shape / trend, axis ranges, key features (peaks, inflections, curve ordering), and magnitudes. **Do NOT require pixel-perfect matches** — stochastic methods produce variation. Judge whether the same scientific conclusion follows from both figures. (HTML side-by-side rendering of every figure happens at SUMMARIZE_RUN; here you're judging match status, not authoring the comparison artifact.)
 
 **Tables.** Compare key values noted in `targets/targets.md` first, then remaining values. Reference tables are in `targets/`.
 
@@ -94,4 +95,4 @@ The verdict is the agent's judgment; the **decision to keep iterating** is the u
 
 - **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-<N>.yaml` if useful, or commit each between iterations so git carries the history.
 - **The verdict is the agent's; the keep-iterating decision is the user's.** Treat them as separate.
-- **`/figure-comparison` is the trustworthy figure-judgment surface.** Direct image diffing without it tends to either over-fail (any pixel-level variation triggers a no-match) or over-pass (it sees that there are *some* shared features and rubber-stamps). The skill's structured prompt is the discipline.
+- **`/figure-comparison` belongs at SUMMARIZE_RUN.** It's the trustworthy figure-judgment surface — direct image diffing without it tends to either over-fail (any pixel-level variation triggers a no-match) or over-pass (it sees that there are *some* shared features and rubber-stamps). But its rich HTML output is for the user reviewing the *finished* reproduction, not for COMPARE-loop verdicts. Producing it once per COMPARE iteration is wasteful and clutters the workdir; producing it once at SUMMARIZE_RUN gives the user one canonical artifact to inspect.
diff --git a/claude/lightcone/skills/paper2astra/references/implement.md b/claude/lightcone/skills/paper2astra/references/implement.md
index 153c6c9d..a2abe684 100644
--- a/claude/lightcone/skills/paper2astra/references/implement.md
+++ b/claude/lightcone/skills/paper2astra/references/implement.md
@@ -9,7 +9,7 @@ The constitution's per-phase mode is **user choice** for this phase — defaults
 - `astra.yaml` — the structural spec
 - `implementation-notes.md` — tricky algorithms, numerical gotchas, data-format quirks
 - `work/notes/methodology.md` — for context when the spec compresses
-- `work/reference/code/` (if present) — reference code; **read for ambiguity resolution, do not copy verbatim**
+- `work/reference/code/` (if present) — **canonical reference. Read on every iteration when implementing.** Where paper and code disagree, code wins for numerics, plotting, and method.
 
 ## Outputs
 
@@ -21,7 +21,12 @@ The constitution's per-phase mode is **user choice** for this phase — defaults
 
 Read `astra.yaml` and `implementation-notes.md`. Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml`.
 
-If `work/reference/code/` exists, **use it as a reference to resolve ambiguities** — but write clean scripts following ASTRA conventions, not verbatim copies of the reference code.
+If `work/reference/code/` exists, **read the relevant code on every iteration** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that the running spec hasn't resolved:
+
+- **Interactive IMPLEMENT** (rare; usually sub-agent): surface via `AskUserQuestion`.
+- **Sub-agent IMPLEMENT** (default): continue with the code's behavior, append the disagreement to `<paper-slug>/open-questions.md`, and note it in `implementation-notes.md` so the next interactive seam can ratify or override.
+
+Without this discipline, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced.
 
 ## Data: REAL DATA ONLY
 
diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/paper2astra/references/interview.md
index 8a7ca8f0..14165c4a 100644
--- a/claude/lightcone/skills/paper2astra/references/interview.md
+++ b/claude/lightcone/skills/paper2astra/references/interview.md
@@ -1,33 +1,38 @@
-# Interview — drafting the per-paper reproduction constitution
+# Interview — drafting the per-paper reproduction constitution and CLAUDE.md
 
-The interview is the only phase paper2astra runs interactively. It happens once per project, up front, before any ralph loop is launched. Its job is to crystallize what the user actually wants — which paper, what scope, which seams want their attention, which they want delegated — and bake that into a constitution the ralph loop can drive.
+The interview is the only phase paper2astra runs interactively. It happens once per project, up front, before any loop is launched. Its job is to crystallize what the user actually wants — which paper, what scope, which runtime, which seams want their attention, which they want delegated — and bake that into the artifacts every iteration walks up to.
 
-Use the [`/constitution`](../../constitution/SKILL.md) skill to draft. The interview's job is to *gather* the inputs the constitution needs; the constitution skill carries the discipline of writing it.
+Use the [`/constitution`](../../constitution/SKILL.md) skill to draft the constitution. The interview's job is to *gather* the inputs both the constitution and the per-paper `CLAUDE.md` need; the constitution skill carries the discipline of writing the constitution.
 
 ---
 
 ## What the interview produces
 
-A single markdown file at the project root — by convention `paper2astra-constitution.md` (or whatever name the user prefers). Its YAML frontmatter has `status: open`. Its body has the standard constitution sections: Desired State, Context, Skills, Evidence, Open Questions — populated for *this specific paper*.
+The interview produces a **directory for the reproduction** containing two markdown files:
 
-After the interview, paper2astra hands this file to ralph:
+- **`<paper-slug>/CLAUDE.md`** — the per-reproduction project memory. Captures everything that's useful across phases: the paper's identity (DOI / arxiv id / authors / one-line subject), the user's stated intent and constraints, what's known about the original codebase, runtime-mode choice, frugality-vs-rigor choice, the canonical-resolution rule (code-as-canonical when `work/reference/code/` exists), any user-supplied conventions or warnings. Every Claude session in this directory finds it on walk-up; iterations don't re-derive context.
+- **`<paper-slug>/<constitution>.md`** — the per-paper constitution. Pointers (not snapshots) for the runner: desired state, evidence checks, scope fence, per-phase mode table. The runner re-reads it each iteration.
 
-```bash
-../ralph-loops/scripts/ralph paper2astra-constitution.md
-```
+Both are written at the end of the interview from the same conversation; the CLAUDE.md is the durable context, the constitution is the runner's spec. After they are approved, paper2astra launches whichever runtime the user chose:
+
+| Runtime | Launch |
+|---|---|
+| **(1) Interactive** | No launch. The user prompts through phases by hand from this Claude session. |
+| **(2) Bash-loop** | Show the user the loop snippet to paste into a terminal — `while …; do claude --dangerously-skip-permissions … ; done`-shaped. |
+| **(3) Tmux-orchestrated** | `../ralph-loops/scripts/ralph <constitution>.md` — paper2astra drives the tmux session directly. |
 
-The constitution is the durable artifact; the interview's work product *is* the constitution. There is no separate "interview state" file.
+There is no separate "interview state" file. Everything lives in the two artifacts and the workdir.
 
 ---
 
-## The four jobs
+## The six jobs
 
 ### 1. Identify the paper
 
 Use `AskUserQuestion` if the user did not supply enough on `/paper2astra` invocation:
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
-- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.)
+- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) **If code is available, every implementing iteration will read from `work/reference/code/`** and treat code as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md).
 - **User's prior familiarity.** Has the user reproduced this paper before? Read the paper recently? Worked with the original authors? This affects how much of the SUMMARIZE / EXTRACT_TARGETS work needs human ratification.
 - **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; SUMMARIZE will read it.
 
@@ -43,17 +48,42 @@ Ask:
 
 These answers live in the constitution's **Desired State** section.
 
-### 3. Choose interactive vs sub-agent per phase
+### 3. Pick a runtime mode
+
+Probe for tmux first:
+
+```bash
+command -v tmux
+```
+
+Offer the modes the environment supports:
+
+- **(1) Interactive** — no autonomous loop; the user prompts through phases by hand from this Claude session. Right when control is tight, the paper is small, or the token budget is constrained.
+- **(2) Bash-loop** — a plain shell loop the user pastes into a terminal. No tmux dependency. Right when tmux isn't available *and* the connection is stable. Fragile across SSH disconnects unless wrapped in `nohup`, and `nohup` blocks interaction — so for unstable connections, mode (3) is the answer, not this.
+- **(3) Tmux-orchestrated** — paper2astra drives a tmux session directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the pane, monitors, intervenes. Preferred when tmux is available.
+
+If tmux isn't installed, only (1) and (2) appear in the question. The chosen mode goes into the per-paper CLAUDE.md.
+
+### 4. Pick a termination criterion (frugality vs rigor)
+
+Ask:
+
+- **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights.
+- **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens.
+
+Default to strong for fidelity-critical reproductions; weak when the user wants to cap token spend. The choice goes into the per-paper CLAUDE.md.
+
+### 5. Choose interactive vs sub-agent per phase
 
 Read the "Per-phase mode" table in `../SKILL.md`. The defaults are reasonable. Walk the user through it briefly:
 
 - **Phases that are always interactive (defaults you should not flip):** SPECIFY, COMPARE. These are the ratification seams; the user has to be reachable.
-- **Phases that are always sub-agent (defaults you should not flip):** SUMMARIZE, LITERATURE. These benefit from parallel fresh-context runs.
-- **Phases the user chooses:** ACQUIRE, PARSE, EXTRACT_TARGETS, REVIEW, IMPLEMENT, RUN. These default to sub-agent (mostly mechanical) but may want user attention if the paper is unfamiliar or the user has strong opinions about implementation.
+- **Phases that are always sub-agent (defaults you should not flip):** SUMMARIZE, LITERATURE, SUMMARIZE_RUN. These benefit from parallel fresh-context runs and have no decisions left.
+- **Phases the user chooses:** ACQUIRE, PARSE, EXTRACT_TARGETS, REVIEW, IMPLEMENT, RUN. These may want user attention if the paper is unfamiliar or the user has strong opinions about implementation.
 
-If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table.
+If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table. Phases marked sub-agent that hit a question they'd normally surface to the user **append the question to `<paper-slug>/open-questions.md`** rather than blocking; the user reads the running report at session boundaries.
 
-### 4. Draft the constitution
+### 6. Draft the constitution and CLAUDE.md
 
 Invoke `/constitution`. Pass in:
 
@@ -86,14 +116,16 @@ because compute too large for available targets>.
 - Paper DOI: <doi>
 - arXiv ID: <id>; LaTeX source acquisition path is the primary
 - Code repo: <url> (or "to be searched in ACQUIRE")
+- Runtime mode: <(1) interactive | (2) bash-loop | (3) tmux-orchestrated>
+- Termination: <weak | strong>
 - Workdir layout: standard Paper2ASTRA conventions —
   `work/reference/`, `work/notes/`, `targets/`, `astra.yaml`,
   `universes/`, `results/`
-- Per-phase mode:
+- Per-phase mode (the canonical version lives in CLAUDE.md):
   | Phase | Mode |
   |---|---|
-  | ACQUIRE | sub-agent |
-  | PARSE | sub-agent |
+  | ACQUIRE | <per user> |
+  | PARSE | <per user> |
   | SUMMARIZE | sub-agent |
   | EXTRACT_TARGETS | <per user> |
   | LITERATURE | sub-agent |
@@ -109,16 +141,28 @@ because compute too large for available targets>.
 - `/paper2astra` — this skill (the orchestrator)
 - `/managing-bibliography` — ACQUIRE
 - `/narrative` — SPECIFY
-- `/check-sentence-by-sentence`, `/figure-comparison` — COMPARE
+- `/figure-comparison` — auto-invoked at end of SUMMARIZE_RUN
+- `/check-sentence-by-sentence` — opt-in suggestion after SUMMARIZE_RUN
+
+## Code-as-canonical
+
+When `work/reference/code/` exists, the agent reads relevant
+code on every implementing iteration. Where paper and code
+disagree, **code is canonical** for numerics, plotting, and
+method. Disagreements are logged in
+`<paper-slug>/open-questions.md` (sub-agent / loop phases) or
+ratified with the user via AskUserQuestion (interactive phases).
 
 ## Evidence
 
 - `ls work/reference/document.md` — ACQUIRE + PARSE done
+- `ls work/reference/code/` — original code present (canonical reference)
 - `ls work/notes/methodology.md` — SUMMARIZE done
 - `ls targets/targets.md` — EXTRACT_TARGETS done
 - `ls astra.yaml && astra validate astra.yaml` — SPECIFY done and valid
 - `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs
 - `ls comparison-report.yaml && yq '.verdict' comparison-report.yaml` — most-recent COMPARE verdict
+- `ls figure-comparison.html` — auto-rendered side-by-side at SUMMARIZE_RUN
 - `git log --oneline` — chronological view of phase commits
 
 The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or
@@ -126,26 +170,77 @@ attempt budget (default 5) is exhausted.
 
 ## Open Questions
 
-(empty — populated as the loop runs and surfaces material conflicts
-the user must ratify)
+(empty — populated as the loop runs; questions accrete in
+`<paper-slug>/open-questions.md`, the running report the user
+reads at session boundaries.)
+```
+
+Then author the per-paper `<paper-slug>/CLAUDE.md` from the same conversation. Approximate shape:
+
+```markdown
+# <paper-slug> reproduction
+
+Reproduce <paper title> (<arXiv ID>). DOI: <doi>.
+
+## Identity
+
+- Authors: <list>
+- One-line subject: <e.g. "BAO scale measurement from DESI DR1">
+- Code repo: <url> (cloned to `work/reference/code/` during ACQUIRE)
+
+## User intent and constraints
+
+<paste the scope summary the user gave during the interview>
+
+## Runtime mode: <1 / 2 / 3>
+
+<one paragraph on what that means for this project>
+
+## Termination criterion: <weak / strong>
+
+<one paragraph on what that means for this project>
+
+## Canonical-resolution rule
+
+When `work/reference/code/` exists, code is canonical for numerics + method.
+Every implementing iteration reads relevant code; disagreements between paper
+and code go into `open-questions.md` (loop / sub-agent phases) or surface via
+AskUserQuestion (interactive phases). The user resolves at the next interactive
+seam.
+
+## Per-phase mode
+
+(reproduce the per-phase mode table from the constitution)
+
+## Conventions and warnings
+
+- Workdir layout follows Paper2ASTRA conventions: `work/reference/`,
+  `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`.
+- `arxiv-LaTeX-first` acquisition; PDF + Docling fallback only when
+  the paper isn't on arxiv.
+- `astra validate --verify-evidence` is the fidelity gate.
+- Open questions accumulate in `open-questions.md`; the user reads
+  it between iterations.
+- <any user-supplied warnings>
 ```
 
-Show the draft, take corrections, refine. When the user is happy:
+Show both drafts, take corrections, refine. When the user is happy:
 
-- Save the constitution at the project root
-- Tell the user how to launch the loop: `../ralph-loops/scripts/ralph paper2astra-constitution.md`
-- Optionally launch it for them if they say yes
+- Save both files inside the reproduction's directory.
+- For mode (3), optionally launch ralph: `../ralph-loops/scripts/ralph <constitution>.md`.
+- For mode (2), show the user the bash-loop snippet to paste.
+- For mode (1), tell the user the interview is done and they can prompt through phases from this session.
 
-The interview ends here. Subsequent work happens inside ralph iterations.
+The interview ends here. Subsequent work happens inside iterations (modes 2 and 3) or in the same session (mode 1).
 
 ---
 
 ## Discipline
 
-- **The interview is short.** Do not turn it into a full paper-summarization session. The user does not need to teach you the paper — they need to tell you what they want reproduced. Three to five `AskUserQuestion` rounds, total. If the user is grinding through detail, gently steer back to scope.
-- **The constitution is the work product.** Do not file separate "interview notes" or "scope document" files. Everything goes into the constitution.
-- **The defaults are the path.** When the user says "I don't know, you choose," take the defaults from the per-phase mode table. The defaults reflect what the loops have learned about which seams matter.
-- **One paper at a time.** A single constitution covers one paper. If the user wants two, run the interview twice — two constitutions, two ralph loops, two project workdirs.
+- **The interview is short.** Do not turn it into a full paper-summarization session. The user does not need to teach you the paper — they need to tell you what they want reproduced. Three to six `AskUserQuestion` rounds, total. If the user is grinding through detail, gently steer back to scope.
+- **The constitution and CLAUDE.md are the work products.** Do not file separate "interview notes" or "scope document" files. Everything goes into one of those two artifacts. CLAUDE.md is durable project memory; constitution is the runner's spec.
+- **The defaults are the path.** When the user says "I don't know, you choose," take the defaults — runtime (3) when tmux is available else (2) for stable / (1) for unstable connections; rigor (strong) for fidelity-critical work; the per-phase mode table from `../SKILL.md`. The defaults reflect what the loops have learned about which seams matter.
+- **One paper at a time.** A single constitution covers one paper. If the user wants two, run the interview twice — two reproduction directories, two CLAUDE.mds, two constitutions.
 
 ---
 
@@ -156,5 +251,7 @@ Most failure modes resolve into "the user has not yet decided what 'reproduce' m
 - *"If we ran this and it produced figure 3 plus the headline number in Table 2, would you be done?"* — pins targeted vs full.
 - *"Is there a specific decision in the paper you want to vary, or are we trying to match the paper exactly?"* — pins whether universes need to span alternatives.
 - *"Do you want to look at every paper-vs-code conflict, or just the ones I think are material?"* — pins SPECIFY mode.
+- *"Do you want a quick run that stops at the checklist, or a thorough one that keeps looking for fixes?"* — pins frugality vs rigor.
+- *"Are you running this somewhere with a stable connection, or do you want it to survive disconnects?"* — pins runtime mode (when tmux is available).
 
-When all three answer cleanly, the constitution writes itself.
+When these answer cleanly, the constitution and CLAUDE.md write themselves.
diff --git a/claude/lightcone/skills/paper2astra/references/specify.md b/claude/lightcone/skills/paper2astra/references/specify.md
index 655cd575..0eafea0a 100644
--- a/claude/lightcone/skills/paper2astra/references/specify.md
+++ b/claude/lightcone/skills/paper2astra/references/specify.md
@@ -56,17 +56,22 @@ When `methodology.md` or `code-analysis.md` mentions a paper-vs-code disagreemen
 - **Material**: a different choice would plausibly change a numeric result the paper reports.
 - **Stylistic / cosmetic / pure-tooling**: not material — record in `implementation-notes.md` and move on.
 
-For **material** conflicts, the SPECIFY phase pauses and surfaces the conflict to the user via `AskUserQuestion`. Present:
+For **material** conflicts, behaviour depends on whether SPECIFY is running interactively:
 
-- The paper's stated method (with quote / section reference)
-- The code's actual method (with file / line reference)
-- The plausible impact ("changes the BAO peak amplitude by ~5%")
-- Three options: paper, code, *something else* (custom, with the user's choice spelled out)
+- **Interactive SPECIFY** (default): pause and surface via `AskUserQuestion`. Present:
+  - The paper's stated method (with quote / section reference)
+  - The code's actual method (with file / line reference)
+  - The plausible impact ("changes the BAO peak amplitude by ~5%")
+  - Three options: paper, code, *something else* (custom, with the user's choice spelled out)
 
-**Default on user silence is paper.** If the AskUserQuestion times out or the user declines to choose, the universe selects the paper's method. The override (paper-vs-code conflict, what was selected, why) is preserved in `astra.yaml` as:
+- **Sub-agent SPECIFY** (rare; the constitution lists this only when the user explicitly chose it): take **code as canonical** per the canonical-resolution rule, append the conflict to `<paper-slug>/open-questions.md` so the user sees it at the next session boundary, and let `universes/baseline.yaml` select the code's method. The user can flip the baseline at the next interactive seam.
+
+**Default on user silence in interactive SPECIFY is code when `work/reference/code/` exists, otherwise paper.** This is the canonical-resolution rule: where paper and code disagree, code wins for numerics + method. (Older versions of this skill defaulted to paper; the new default reflects what the first-paper test surfaced — the code is what produced the published numbers.)
+
+Either way, the override is preserved in `astra.yaml` as:
 
 - A `decisions:` entry with both options preserved
-- The `universes/baseline.yaml` selecting whichever option the user chose
+- The `universes/baseline.yaml` selecting whichever option won (chosen by the user, or canonical default on silence)
 - A finding (or an insight if the conflict matters for replication discipline broadly) that records the conflict with quote / line evidence
 
 This makes the override surface in any later review of the spec — *"the paper says X, the code does Y, the user chose Z, here's why."* The fidelity-of-prose side of this (voice seams, hedge preservation, evidence-quote verification) is the `/narrative` skill's job.
@@ -101,5 +106,5 @@ When sub-analyses exist, the root narrative MUST include a top-down end-to-end d
 
 ## Notes
 
-- **Material conflicts that the user explicitly defers** become `Open Questions` in the constitution. The next iteration sees them and either re-surfaces them or notes their continued deferral.
+- **Material conflicts that the user explicitly defers** are appended to `<paper-slug>/open-questions.md` (the running report read at session boundaries). The next iteration sees them and either re-surfaces them or notes their continued deferral.
 - **The narrative skill is the prose author, not the structure author.** SPECIFY's job is structural correctness; `/narrative` invocation comes after the structural skeleton exists.
diff --git a/claude/lightcone/skills/paper2astra/references/summarize_run.md b/claude/lightcone/skills/paper2astra/references/summarize_run.md
index f72ba4ee..45a61d68 100644
--- a/claude/lightcone/skills/paper2astra/references/summarize_run.md
+++ b/claude/lightcone/skills/paper2astra/references/summarize_run.md
@@ -1,6 +1,6 @@
-# SUMMARIZE_RUN — final report and constitution outcome
+# SUMMARIZE_RUN — final report, figure-comparison HTML, and constitution outcome
 
-The reproduction has converged (verdict `pass` or user-accepted `partial`). Write the final summary, update the constitution's outcome, and prepare the workdir for handoff.
+The reproduction has converged (verdict `pass` or user-accepted `partial`). Write the final summary, auto-render the figure-comparison HTML so the user sees side-by-sides on arrival, update the constitution's outcome, and prepare the workdir for handoff.
 
 The constitution's per-phase mode is **always sub-agent**. There are no decisions left; this is reportage.
 
@@ -15,9 +15,18 @@ The constitution's per-phase mode is **always sub-agent**. There are no decision
 ## Outputs
 
 - `REPRODUCTION-SUMMARY.md` (or whatever name fits the project) — final report; concise.
+- `figure-comparison.html` (or whatever name `/figure-comparison` produces) — auto-rendered side-by-side: original vs reproduced figures, tables, numerics. Spawned as a sub-agent so SUMMARIZE_RUN itself stays small.
 - Updated `outcome:` on the constitution.
 - A final commit on the reproduction branch with a clear message.
 
+## Sub-agent invocations
+
+This phase orchestrates two sub-agents — both auto-invoked via the `Task` tool, both fresh-context:
+
+1. **`/figure-comparison`** — produces the HTML side-by-side. Always run; the user expects it on arrival. Read its SKILL.md (Nolan's skill) for what to pass it; at minimum, the path to the reproduction workdir so it can find originals (`work/reference/figures/`, `work/reference/tables/`) and reproduced outputs (`results/<universe>/`).
+
+2. **`/check-sentence-by-sentence`** is **opt-in** — never auto-invoked here. After the report is written, the iteration's exit message surfaces it as a suggestion to the user: *"Want a paper-vs-code TeX audit? `/check-sentence-by-sentence` will fan out a sub-agent per claim and locate `file:line` or `NOT FOUND`. Token-expensive (~N sub-agents)."* The user decides whether to spend the budget.
+
 ## What the final report covers
 
 A single markdown file at the project root, ~1–2 pages. Sections:
@@ -50,7 +59,7 @@ reproduction: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMM
 ## Survey signals (entry into SUMMARIZE_RUN)
 
 - `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready
-- `REPRODUCTION-SUMMARY.md` exists; constitution outcome is rewritten ⇒ done
+- `REPRODUCTION-SUMMARY.md` exists; `figure-comparison.html` exists; constitution outcome is rewritten ⇒ done
 
 ## Notes
 

From 609d74d272c19e27c1731b4074bbea4007e351cb Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 00:25:15 +0200
Subject: [PATCH 010/124] skills/paper2astra/acquire: name why
 work/reference/code/ matters

Add a sentence to Step 2 (code search) connecting it to the canonical-
resolution rule: when present, every implementing iteration treats
that directory as canonical for numerics + method. Without it,
iterations have only the paper to anchor to. Also clarifies the
"don't modify cloned code" rule as "it's the reference, not the
workdir."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/paper2astra/references/acquire.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/paper2astra/references/acquire.md b/claude/lightcone/skills/paper2astra/references/acquire.md
index a0d8aa2d..93903655 100644
--- a/claude/lightcone/skills/paper2astra/references/acquire.md
+++ b/claude/lightcone/skills/paper2astra/references/acquire.md
@@ -56,6 +56,8 @@ Skip Step 1 if `work/reference/paper.pdf` already exists and is a valid PDF.
 
 ## Step 2: Search for the code repository
 
+This step matters more than its size suggests. When `work/reference/code/` exists, every implementing iteration treats it as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md). Without it, iterations have only the paper to anchor to and drift toward "looks right" rather than "matches."
+
 1. Search the paper text for repository URLs — abstract, intro, conclusion, footnotes, "Code Availability" or "Data Availability" sections.
 2. If none found, web search: paper title + "github", Papers With Code, or the first author's GitHub profile.
 3. Clone if found:
@@ -70,7 +72,7 @@ Skip Step 1 if `work/reference/paper.pdf` already exists and is a valid PDF.
    notes: "..."
    ```
 
-Spend no more than a few searches before recording failure and moving on. **Do NOT modify cloned code.**
+Spend no more than a few searches before recording failure and moving on. **Do NOT modify cloned code** — it's the reference, not the workdir.
 
 Skip Step 2 if `work/reference/code/` already exists.
 

From 79e9f1d8b3b9e7a9beb3b6236ac49fac9ce0f61f Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 01:36:43 +0200
Subject: [PATCH 011/124] skills/paper2astra: introduce FINAL_REVIEW as the
 post-loop interactive phase
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Promote the two follow-up skills from "user-invokable post-completion
follow-ups recommended by SUMMARIZE_RUN" to invoked-by-the-orchestrator
in an explicit FINAL_REVIEW phase. The constraint that motivated the
previous framing — both have AskUserQuestion in allowed-tools, so a
silent sub-agent can't service them — is preserved by making
FINAL_REVIEW interactive (runs in the main loop session, not via Task
tool).

Files updated:
- SKILL.md: phase table now includes FINAL_REVIEW; per-phase mode table
  adds FINAL_REVIEW = interactive; "Two surfaces for user attention"
  section names FINAL_REVIEW as the post-loop seam; mid-document prose
  (the bundle paragraph, Skills list, Discipline bullet) all align to
  the new framing.
- references/final_review.md: NEW. The phase reference — invokes
  /figure-comparison (mandatory), offers /check-sentence-by-sentence
  (opt-in), walks the user through open-questions.md with
  AskUserQuestion, finalizes the constitution outcome.
- references/summarize_run.md: recast as auto-generated reportage that
  drafts the outcome and hands off to FINAL_REVIEW. Drops the "Optional
  follow-ups" section; adds an "Open questions for FINAL_REVIEW" pointer
  in the report sections.
- references/compare.md: pass-verdict chain now reads
  COMPARE -> SUMMARIZE_RUN -> FINAL_REVIEW.
- references/interview.md: per-paper constitution template restructured
  into separate Scope / Runtime mode / Termination criterion / Per-phase
  mode sections (cleaner than the prior monolith); per-phase table
  includes FINAL_REVIEW; CLAUDE.md template tightened to "info and rules"
  with FINAL_REVIEW noted as where the user resolves open-questions.md.
- skills/README.md: figure-comparison and check-sentence-by-sentence
  rows now describe them as "Invoked from paper2astra's FINAL_REVIEW
  phase ... also user-invokable directly."

Constitution body in the lightcone monorepo
(.felt/lightcone/paper2astra-as-skill/skill-bundle/skill-bundle.md) is
the source of truth for the desired state and was edited first; this
commit aligns the realized SKILL.md and phase references.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/README.md             |   4 +-
 claude/lightcone/skills/paper2astra/SKILL.md  |  40 ++---
 .../skills/paper2astra/references/compare.md  |   4 +-
 .../paper2astra/references/final_review.md    |  67 ++++++++
 .../paper2astra/references/interview.md       | 150 +++++++-----------
 .../paper2astra/references/summarize_run.md   |  34 ++--
 6 files changed, 165 insertions(+), 134 deletions(-)
 create mode 100644 claude/lightcone/skills/paper2astra/references/final_review.md

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 977c6c60..de2d2aef 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -23,8 +23,8 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 | [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. |
 | [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by paper2astra's bash-loop and tmux-orchestrated runtime modes. |
 | [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. |
-| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). User-invokable post-completion follow-up; the SUMMARIZE_RUN summary recommends it. |
-| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. User-invokable post-completion follow-up; the SUMMARIZE_RUN summary recommends it. |
+| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from paper2astra's FINAL_REVIEW phase (opt-in); also user-invokable directly. |
+| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from paper2astra's FINAL_REVIEW phase (mandatory); also user-invokable directly. |
 
 The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra.
 
diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
index f93af1f6..b4e290f3 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -42,7 +42,7 @@ paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. Al
 
 paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them. The siblings stand alone; they don't know about paper2astra.
 
-After paper2astra completes, the SUMMARIZE_RUN summary recommends two adjacent follow-up skills for the user to invoke directly: [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) audits paper claims against code locations, and [`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable side-by-side HTML report for paper artifacts versus reproduced results. Both are user-invokable rather than orchestrator-spawned — they use `AskUserQuestion` for missing-data prompts and don't run cleanly as silent sub-agents.
+Two further siblings are invoked from the **FINAL_REVIEW** phase, after the loop terminates and SUMMARIZE_RUN has written the report: [`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable side-by-side HTML report (paper artifacts vs reproduced), and [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) (optional) audits paper claims against code locations. Both have `AskUserQuestion` in their `allowed-tools`, so FINAL_REVIEW runs interactively in the main loop session — spawning them under the `Task` tool would fire prompts into nothing.
 
 ## Workflow
 
@@ -57,7 +57,12 @@ The interview has six jobs:
 3. **Pick a runtime mode** — interactive / bash-loop / tmux-orchestrated. See "Runtime modes" below.
 4. **Pick a termination criterion** — frugality (weak) vs rigor (strong). See "Frugality vs rigor" below.
 5. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. The defaults are reasonable; the user gets to flip any of them.
-6. **Draft the per-paper constitution and CLAUDE.md** — invoke `/constitution` to draft the constitution. Author the per-paper `CLAUDE.md` from the same conversation: paper identity, user intent, what's known about the original codebase, runtime-mode choice, frugality-vs-rigor choice, the canonical-resolution rule (see "Code-as-canonical" below), conventions and warnings. The CLAUDE.md is the durable project memory every iteration's Claude session walks up to; the constitution is the runner's spec.
+6. **Draft the per-paper constitution and CLAUDE.md** — invoke `/constitution` to draft the constitution. Author the per-paper `CLAUDE.md` from the same conversation. The two files have separate jobs and don't overlap:
+
+   - **`CLAUDE.md`** is *info and rules* — paper identity (DOI / arXiv ID / title / authors), where the original code lives (`work/reference/code/`), the code-as-canonical rule, the never-block-on-`AskUserQuestion`-mid-sub-agent rule, any paper-specific conventions or warnings, pointers to the constitution and `open-questions.md`. Auto-loaded by Claude Code on every walk-up to this directory. **Evolves over time** — iterations that learn new conventions or surface paper-specific gotchas can add lines so future sessions don't re-derive the same context.
+   - **The constitution** is *desired state* — what "done" looks like, evidence checks, scope fence, the runtime mode the user chose, the termination criterion (weak/strong), per-phase routing (interactive vs sub-agent), and the open-questions section iterations resolve. Read by the runner each iteration as the explicit task.
+
+   CLAUDE.md tells you *what kind of place this is*; the constitution tells you *what we're doing here and when we're done*.
 
 Both files live inside the reproduction's directory. After they are approved the interview ends, and paper2astra launches whichever runtime the user chose.
 
@@ -80,11 +85,7 @@ Independent of mode, the interview asks the user to pick the loop's termination
 - **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights.
 - **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens.
 
-Strong is the default for fidelity-critical reproductions; weak is the default when the user explicitly wants to cap token spend. The choice goes into the per-paper CLAUDE.md and is honored by every iteration.
-
-### Where this is going
-
-Codex's `/goal` ([Simon Willison, 2026-04-30](https://simonwillison.net/2026/Apr/30/codex-goals/)) is the closest existing primitive — same shape, with a configurable token budget, made smooth by Codex's invisible compaction. When Anthropic ships an equivalent, modes (2) and (3) collapse into it. Until then, ralph is the substrate.
+Strong is the default for fidelity-critical reproductions; weak is the default when the user explicitly wants to cap token spend. The choice goes into the per-paper constitution (alongside the runtime-mode choice) and is honored by every iteration.
 
 ### Phases (driven by ralph iterations after the interview)
 
@@ -102,9 +103,10 @@ Inside each ralph iteration, the agent reads the per-paper constitution, surveys
 | IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
 | RUN | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
 | COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
-| SUMMARIZE_RUN | [`references/summarize_run.md`](references/summarize_run.md) | Final write-up; constitution outcome update |
+| SUMMARIZE_RUN | [`references/summarize_run.md`](references/summarize_run.md) | Final write-up to disk |
+| FINAL_REVIEW | [`references/final_review.md`](references/final_review.md) | `/figure-comparison` HTML + (opt) sentence audit; resolved `open-questions.md`; constitution outcome update |
 
-The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it.
+The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it. After SUMMARIZE_RUN writes the final summary, control returns to the user and FINAL_REVIEW runs interactively — not from inside the loop.
 
 ### Per-phase mode (interactive vs sub-agent)
 
@@ -124,7 +126,8 @@ Defaults the constitution starts with:
 | IMPLEMENT | user choice | Mostly mechanical, but algorithm choices may want ratification. |
 | RUN | user choice | Mechanical, but failures need diagnosis. |
 | COMPARE | **interactive** | Verdict (was the reproduction close enough?) is the second mandatory user-ratification seam. |
-| SUMMARIZE_RUN | sub-agent | Final report; no decisions remain. The summary recommends `/figure-comparison` and `/check-sentence-by-sentence` as user-invokable follow-ups. |
+| SUMMARIZE_RUN | sub-agent | Final write-up to disk; no decisions remain. |
+| FINAL_REVIEW | **interactive** | Post-loop interactive return — runs `/figure-comparison` and (optionally) `/check-sentence-by-sentence`, then walks the user through `open-questions.md` with `AskUserQuestion` to ratify accumulated seams. |
 
 The constitution records the choice; iterations honor it. Sub-agent phases are spawned via the `Task` tool from inside the main loop session — that gives them fresh context but no user-reach. Interactive phases run inline in the loop session and may pause with `AskUserQuestion` at material seams.
 
@@ -134,14 +137,15 @@ When the original codebase is available at `work/reference/code/`, **the agent r
 
 This is the load-bearing fidelity discipline. Without it, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced (plot styles off, numerical results off). The per-paper CLAUDE.md restates the rule so every iteration's Claude session walks up to it.
 
-### Interactive seams vs the loop's running report
+### Two surfaces for user attention: open-questions and FINAL_REVIEW
+
+The reproduction has two periods of human reach: the interview at the start, and FINAL_REVIEW at the end. In between, the loop runs without a human in the conversation. The discipline has two surfaces to match:
 
-Interview phases use `AskUserQuestion` directly — the user is at the wheel. Once the loop launches, the human is no longer present per-iteration, so the agent's discipline shifts:
+- **`<paper-slug>/open-questions.md` — the during-loop accumulator.** When a sub-agent or loop iteration would normally surface a question to the user (paper-vs-code conflicts, figures whose intent isn't obvious, ambiguities the constitution doesn't resolve), it appends the question to `open-questions.md` and continues with the best-judgment default. Never block on `AskUserQuestion` from inside a sub-agent — the prompt fires into nothing.
 
-- **Interactive phase** (per the per-phase mode table): ratify decisions with the user via `AskUserQuestion`.
-- **Loop / sub-agent phase**: when a question would normally surface to the user (paper-vs-code conflicts, figures whose intent isn't obvious, ambiguities the constitution doesn't resolve), **append it to `<paper-slug>/open-questions.md`** and continue with the best-judgment default. The user reads the report at session boundaries (between iterations, or when checking on the loop) and answers in-place.
+- **FINAL_REVIEW — the post-loop interactive return.** When the COMPARE→IMPLEMENT loop terminates (verdict=pass or budget exhausted) and SUMMARIZE_RUN has written the final summary, control returns to the user. FINAL_REVIEW invokes `/figure-comparison` and (optionally) `/check-sentence-by-sentence` interactively — these skills can use `AskUserQuestion` because the human is back. Then it walks the user through `open-questions.md` with `AskUserQuestion`, lands resolutions, updates `astra.yaml` or `implementation-notes.md` accordingly, and closes out the constitution outcome.
 
-This matches what the user actually does: stays in the conversation while the seams are still soft, walks away while the loop grinds, comes back to a list of "things you'd want to know" rather than a paused agent waiting on input. The CLAUDE.md captures that the loop should always pour seams into `open-questions.md` rather than blocking.
+Stays in the conversation while the seams are still soft, walks away while the loop grinds, comes back to a rich review surface plus a list of "things you'd want to know."
 
 ### Material conflicts (the SPECIFY seam)
 
@@ -186,8 +190,8 @@ Workdir signals (file existence implies the phase has been done):
 - [`/ralph-loops`](../ralph-loops/SKILL.md) — for the bash-loop and tmux-orchestrated runtime modes
 - [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE
 - [`/narrative`](../narrative/SKILL.md) — for SPECIFY
-
-(`/figure-comparison` and `/check-sentence-by-sentence` are recommended in the SUMMARIZE_RUN summary as user-invokable post-completion follow-ups; they are not co-active with the paper2astra workflow.)
+- [`/figure-comparison`](../figure-comparison/SKILL.md) — for FINAL_REVIEW (mandatory)
+- [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) — for FINAL_REVIEW (opt-in)
 
 ## Discipline
 
@@ -197,7 +201,7 @@ Workdir signals (file existence implies the phase has been done):
 - **Use the up-to-date CLI surfaces, not skill-specific wrappers.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces.
 - **arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv where there's no better source.
 - **The original code goes into `work/reference/code/`** during ACQUIRE when available, and stays there as the canonical reference for every subsequent iteration (see "Code-as-canonical" above).
-- **`/figure-comparison` and `/check-sentence-by-sentence` are user-invokable follow-ups, not auto-invoked from inside paper2astra.** Both use `AskUserQuestion` for missing-data prompts and don't run cleanly as silent sub-agents. The SUMMARIZE_RUN summary names them so the user can choose to invoke either after the run completes.
+- **`/figure-comparison` and `/check-sentence-by-sentence` run inside FINAL_REVIEW, not inside the loop.** Both have `AskUserQuestion` in their `allowed-tools`; FINAL_REVIEW is the post-loop interactive phase that runs them in the main session so the prompts land. Don't try to spawn either under the `Task` tool from inside the loop.
 - **No synthetic data.** Unless the paper itself uses synthetic data as its input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement phase reference repeats this; treat it as load-bearing.
 - **Tmux preferred-when-available, never required.** Modes (1) and (2) work without it.
 - **The siblings don't know about paper2astra.** Each SKILL stands on its own.
diff --git a/claude/lightcone/skills/paper2astra/references/compare.md b/claude/lightcone/skills/paper2astra/references/compare.md
index 0e6d9fdd..7ea35088 100644
--- a/claude/lightcone/skills/paper2astra/references/compare.md
+++ b/claude/lightcone/skills/paper2astra/references/compare.md
@@ -73,7 +73,7 @@ Also write `comparison-report.md` with a human-readable summary. For figure / ta
 
 After writing the report, surface the verdict to the user via `AskUserQuestion`:
 
-- **If `pass`**: confirm with the user before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Mark reproduction complete?"* The user accepts → SUMMARIZE_RUN runs; the user rejects → name what's still off and re-enter the loop.
+- **If `pass`**: confirm with the user before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Mark reproduction complete?"* The user accepts → SUMMARIZE_RUN runs (sub-agent, writes the summary), then FINAL_REVIEW runs (interactive, walks `/figure-comparison` and the open-questions ledger); the user rejects → name what's still off and re-enter the loop.
 - **If `partial`**: show the user the failing targets and the diagnosis. *"Partial match. <N> outputs failing: <list>. Continue retrying or accept partial?"* If the attempt budget (from the constitution) is reached, this surfacing is mandatory.
 - **If `fail`**: same shape, but the loop's continuation should be questioned more sharply. A fundamental methodological issue may need a constitution amendment, not another implement retry.
 
@@ -83,7 +83,7 @@ The verdict is the agent's judgment; the **decision to keep iterating** is the u
 
 - All outputs in `lc status --universe baseline` are `ok` ⇒ ready to compare
 - `comparison-report.yaml` exists with current `attempt` ⇒ COMPARE done for this attempt
-- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN
+- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN, then FINAL_REVIEW
 
 ## Notes
 
diff --git a/claude/lightcone/skills/paper2astra/references/final_review.md b/claude/lightcone/skills/paper2astra/references/final_review.md
new file mode 100644
index 00000000..9ff7e513
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/final_review.md
@@ -0,0 +1,67 @@
+# FINAL_REVIEW — interactive post-loop ratification
+
+The COMPARE → IMPLEMENT loop has terminated (verdict `pass` or attempt budget exhausted). SUMMARIZE_RUN has written `REPRODUCTION-SUMMARY.md`. Control now returns to the user; FINAL_REVIEW runs **inside the main loop session, not as a sub-agent**, so `AskUserQuestion` actually reaches a human. This is the post-loop ratification seam — validation surfaces (`/figure-comparison`, optional `/check-sentence-by-sentence`) plus the accumulated `open-questions.md` get walked through and resolved before the constitution is closed.
+
+The constitution's per-phase mode is **always interactive** for this phase. The user must be reachable.
+
+## Inputs
+
+- `<paper-slug>/REPRODUCTION-SUMMARY.md` — final report from SUMMARIZE_RUN
+- `comparison-report.{yaml,md}` — final verdict
+- `<paper-slug>/open-questions.md` — accumulated questions from sub-agent / loop phases
+- `<paper-slug>/<constitution>.md` — its `outcome:` field needs a final rewrite
+- `astra.yaml` — may need targeted edits as questions resolve
+- `implementation-notes.md` — may absorb resolutions that don't belong in `astra.yaml`
+
+## Outputs
+
+- `.lightcone/comparison.html` — self-contained side-by-side report (from `/figure-comparison`)
+- `<paper-slug>/sentence-audit.md` (or wherever `/check-sentence-by-sentence` lands its report) — *optional*
+- `<paper-slug>/open-questions.md` with every entry marked resolved (or explicitly deferred with a reason)
+- `astra.yaml` and/or `implementation-notes.md` updated where resolutions changed a decision or added a gotcha
+- Updated `outcome:` on the constitution
+- A final commit naming the FINAL_REVIEW pass
+
+## Task
+
+1. **Open the report.** Read `REPRODUCTION-SUMMARY.md`. Skim `comparison-report.md`. The agent's job in this phase is to surface the right things to the user — not to re-derive what SUMMARIZE_RUN already concluded.
+
+2. **Invoke `/figure-comparison`.** This is the rich validation surface — base64-embedded HTML side-by-sides for every paper artifact versus its reproduced counterpart. The skill prompts the user for any missing inputs (universe choice, paper-reference path) via its own `AskUserQuestion`. Land the HTML at `.lightcone/comparison.html` and surface the path to the user.
+
+3. **Offer `/check-sentence-by-sentence`.** Ask the user via `AskUserQuestion`:
+
+   > *"Run sentence-by-sentence audit of the paper against the code? (Slow but catches claims that drifted between paper and reproduction.)"*
+
+   On yes: invoke `/check-sentence-by-sentence`. The skill prompts for paper-source path (arXiv TeX preferred, Docling markdown fallback). It produces a per-sentence `file:line` or `NOT FOUND` audit. Surface the audit path and any `NOT FOUND` clusters that suggest missing implementation.
+
+4. **Walk `open-questions.md` with the user.** For every unresolved entry, surface via `AskUserQuestion`:
+
+   > *"Open question: <question text>. The loop's best-judgment default was <default>. Accept, override, or defer?"*
+
+   - **Accept**: mark resolved with the default; record the resolution in the entry.
+   - **Override**: take the user's choice; update `astra.yaml` (decision options, baseline universe) or `implementation-notes.md` accordingly. Re-run `astra validate` if the spec changed.
+   - **Defer**: leave the entry but mark it `deferred: <reason>` so it's clearly not forgotten.
+
+   Some questions surface a real gap (a target wasn't reproduced, a method differs in a way that matters). When the gap is material, ask whether to re-enter the loop for another COMPARE → IMPLEMENT pass. The user owns that call.
+
+5. **Rewrite the constitution `outcome:`.** The SUMMARIZE_RUN sub-agent prepared a draft outcome; refine it with what FINAL_REVIEW surfaced — accepted partials, deferred questions, the `/figure-comparison` HTML path, the audit path if run. The outcome should teach: someone reading it should understand what the reproduction landed and where the rough edges are without opening the body.
+
+6. **Final commit.** Stage `.lightcone/comparison.html`, the audit (if run), the resolved `open-questions.md`, any `astra.yaml` / `implementation-notes.md` edits, and the constitution outcome:
+
+   ```
+   final_review: <paper-short-name> — N questions resolved, comparison.html rendered[, sentence audit completed]
+   ```
+
+7. **Surface closure to the user.** The constitution is now in shape for `status: closed`. Do not flip it from this phase — surface that it's ready, the user closes.
+
+## Survey signals (entry into FINAL_REVIEW)
+
+- `comparison-report.yaml` verdict is `pass` (or user-accepted `partial`) **and** `REPRODUCTION-SUMMARY.md` exists ⇒ ready to enter FINAL_REVIEW
+- `.lightcone/comparison.html` exists, `open-questions.md` entries are all resolved or explicitly deferred, constitution `outcome:` reflects the post-review state ⇒ FINAL_REVIEW done
+
+## Notes
+
+- **This phase is not a sub-agent.** `/figure-comparison` and `/check-sentence-by-sentence` both have `AskUserQuestion` in their `allowed-tools`; spawning them under the `Task` tool would fire prompts into nothing. FINAL_REVIEW runs in the main loop session so the prompts land. The constitution lists FINAL_REVIEW as `interactive` for the same reason.
+- **Don't relitigate SUMMARIZE_RUN.** The final report is the user's reading surface for "what landed." FINAL_REVIEW's job is the rich validation pass and the open-question ratification — not regenerating prose the sub-agent already produced cleanly.
+- **`/figure-comparison` is mandatory; `/check-sentence-by-sentence` is opt-in.** The HTML side-by-side is cheap and high-signal; the sentence audit is slower and pays off most when the user has fidelity concerns. Default opt-in question: no.
+- **The user holds closure.** This phase prepares the outcome and surfaces "ready"; flipping `status: closed` is the user's call after they're satisfied with what FINAL_REVIEW surfaced.
diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/paper2astra/references/interview.md
index db662673..9493ffa5 100644
--- a/claude/lightcone/skills/paper2astra/references/interview.md
+++ b/claude/lightcone/skills/paper2astra/references/interview.md
@@ -8,12 +8,12 @@ Use the [`/constitution`](../../constitution/SKILL.md) skill to draft the consti
 
 ## What the interview produces
 
-The interview produces a **directory for the reproduction** containing two markdown files:
+The interview produces a **directory for the reproduction** containing two markdown files. They have separate jobs and don't overlap:
 
-- **`<paper-slug>/CLAUDE.md`** — the per-reproduction project memory. Captures everything that's useful across phases: the paper's identity (DOI / arxiv id / authors / one-line subject), the user's stated intent and constraints, what's known about the original codebase, runtime-mode choice, frugality-vs-rigor choice, the canonical-resolution rule (code-as-canonical when `work/reference/code/` exists), any user-supplied conventions or warnings. Every Claude session in this directory finds it on walk-up; iterations don't re-derive context.
-- **`<paper-slug>/<constitution>.md`** — the per-paper constitution. Pointers (not snapshots) for the runner: desired state, evidence checks, scope fence, per-phase mode table. The runner re-reads it each iteration.
+- **`<paper-slug>/CLAUDE.md`** — *info and rules.* Paper identity (DOI / arxiv id / authors / one-line subject), where the original code lives (`work/reference/code/`), the canonical-resolution rule (code-as-canonical when `work/reference/code/` exists), the never-block-on-`AskUserQuestion`-mid-sub-agent rule, any paper-specific conventions or warnings, pointers to the constitution and `open-questions.md`. Auto-loaded by Claude Code on every walk-up to this directory. **Evolves over time** — iterations that learn new conventions or surface paper-specific gotchas can add lines so future sessions don't re-derive the same context.
+- **`<paper-slug>/<constitution>.md`** — *desired state.* Pointers (not snapshots) for the runner: what "done" looks like, evidence checks, scope fence, the runtime mode the user chose, the termination criterion (weak/strong), the per-phase mode table, and the open-questions section iterations resolve. Read by the runner each iteration as the explicit task.
 
-Both are written at the end of the interview from the same conversation; the CLAUDE.md is the durable context, the constitution is the runner's spec. After they are approved, paper2astra launches whichever runtime the user chose:
+Both are written at the end of the interview from the same conversation. CLAUDE.md tells you *what kind of place this is*; the constitution tells you *what we're doing here and when we're done*. After they are approved, paper2astra launches whichever runtime the user chose:
 
 | Runtime | Launch |
 |---|---|
@@ -62,7 +62,7 @@ Offer the modes the environment supports:
 - **(2) Bash-loop** — a plain shell loop the user pastes into a terminal. No tmux dependency. Right when tmux isn't available *and* the connection is stable. Fragile across SSH disconnects unless wrapped in `nohup`, and `nohup` blocks interaction — so for unstable connections, mode (3) is the answer, not this.
 - **(3) Tmux-orchestrated** — paper2astra drives a tmux session directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the pane, monitors, intervenes. Preferred when tmux is available.
 
-If tmux isn't installed, only (1) and (2) appear in the question. The chosen mode goes into the per-paper CLAUDE.md.
+If tmux isn't installed, only (1) and (2) appear in the question. The chosen mode goes into the per-paper constitution.
 
 ### 4. Pick a termination criterion (frugality vs rigor)
 
@@ -71,7 +71,7 @@ Ask:
 - **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights.
 - **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens.
 
-Default to strong for fidelity-critical reproductions; weak when the user wants to cap token spend. The choice goes into the per-paper CLAUDE.md.
+Default to strong for fidelity-critical reproductions; weak when the user wants to cap token spend. The choice goes into the per-paper constitution.
 
 ### 5. Choose interactive vs sub-agent per phase
 
@@ -103,55 +103,41 @@ status: open
 
 ## Desired State
 
-A complete `astra.yaml` for <paper> at this workdir, with recipes that
-produce reproduced versions of <list of targets>, validated by
-`astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml`
-verdict `pass` against the targets in `targets/targets.md`.
-
-Non-goals: <e.g., reproducing Figure 12's MCMC stack — out of scope
-because compute too large for available targets>.
-
-## Context
-
-- Paper DOI: <doi>
-- arXiv ID: <id>; LaTeX source acquisition path is the primary
-- Code repo: <url> (or "to be searched in ACQUIRE")
-- Runtime mode: <(1) interactive | (2) bash-loop | (3) tmux-orchestrated>
-- Termination: <weak | strong>
-- Workdir layout: standard Paper2ASTRA conventions —
-  `work/reference/`, `work/notes/`, `targets/`, `astra.yaml`,
-  `universes/`, `results/`
-- Per-phase mode (the canonical version lives in CLAUDE.md):
-  | Phase | Mode |
-  |---|---|
-  | ACQUIRE | <per user> |
-  | PARSE | <per user> |
-  | SUMMARIZE | sub-agent |
-  | EXTRACT_TARGETS | <per user> |
-  | LITERATURE | sub-agent |
-  | SPECIFY | interactive |
-  | REVIEW | <per user> |
-  | IMPLEMENT | <per user> |
-  | RUN | <per user> |
-  | COMPARE | interactive |
-  | SUMMARIZE_RUN | sub-agent |
-
-## Skills
-
-- `/paper2astra` — this skill (the orchestrator)
-- `/managing-bibliography` — ACQUIRE
-- `/narrative` — SPECIFY
-
-(`/figure-comparison` and `/check-sentence-by-sentence` are user-invokable post-completion follow-ups recommended by SUMMARIZE_RUN; they're not part of the per-phase workflow.)
-
-## Code-as-canonical
-
-When `work/reference/code/` exists, the agent reads relevant
-code on every implementing iteration. Where paper and code
-disagree, **code is canonical** for numerics, plotting, and
-method. Disagreements are logged in
-`<paper-slug>/open-questions.md` (sub-agent / loop phases) or
-ratified with the user via AskUserQuestion (interactive phases).
+A complete `astra.yaml` for <paper> at this workdir, with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.
+
+Non-goals: <e.g., reproducing Figure 12's MCMC stack — out of scope because compute too large for available targets>.
+
+## Scope
+
+In: <list — the targeted figures / tables / numbers, the methodological span being reproduced>.
+Out: <list — explicit exclusions, fenced from drift>.
+
+## Runtime mode
+
+<(1) interactive | (2) bash-loop | (3) tmux-orchestrated>
+
+## Termination criterion
+
+<weak | strong>
+
+The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or the attempt budget (default 5) is exhausted, with the chosen termination shaping how aggressively iterations self-check.
+
+## Per-phase mode
+
+| Phase | Mode |
+|---|---|
+| ACQUIRE | <per user> |
+| PARSE | <per user> |
+| SUMMARIZE | sub-agent |
+| EXTRACT_TARGETS | <per user> |
+| LITERATURE | sub-agent |
+| SPECIFY | interactive |
+| REVIEW | <per user> |
+| IMPLEMENT | <per user> |
+| RUN | <per user> |
+| COMPARE | interactive |
+| SUMMARIZE_RUN | sub-agent |
+| FINAL_REVIEW | interactive |
 
 ## Evidence
 
@@ -162,66 +148,42 @@ ratified with the user via AskUserQuestion (interactive phases).
 - `ls astra.yaml && astra validate astra.yaml` — SPECIFY done and valid
 - `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs
 - `ls comparison-report.yaml && yq '.verdict' comparison-report.yaml` — most-recent COMPARE verdict
-- `ls figure-comparison.html` — auto-rendered side-by-side at SUMMARIZE_RUN
 - `git log --oneline` — chronological view of phase commits
 
-The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or
-attempt budget (default 5) is exhausted.
-
 ## Open Questions
 
-(empty — populated as the loop runs; questions accrete in
-`<paper-slug>/open-questions.md`, the running report the user
-reads at session boundaries.)
+(empty — populated as the loop runs; questions accrete in `<paper-slug>/open-questions.md`, the running report the user reads at session boundaries and ratifies in FINAL_REVIEW.)
 ```
 
-Then author the per-paper `<paper-slug>/CLAUDE.md` from the same conversation. Approximate shape:
+Then author the per-paper `<paper-slug>/CLAUDE.md` from the same conversation. The CLAUDE.md is *info and rules*, not desired state — paper identity, where things live, disciplines that always apply. Approximate shape:
 
 ```markdown
-# <paper-slug> reproduction
+# <paper-slug>
 
-Reproduce <paper title> (<arXiv ID>). DOI: <doi>.
+Reproduction of <paper title> (<arXiv ID>). DOI: <doi>.
 
-## Identity
+## Paper
 
 - Authors: <list>
 - One-line subject: <e.g. "BAO scale measurement from DESI DR1">
 - Code repo: <url> (cloned to `work/reference/code/` during ACQUIRE)
 
-## User intent and constraints
-
-<paste the scope summary the user gave during the interview>
-
-## Runtime mode: <1 / 2 / 3>
+## Where things live
 
-<one paragraph on what that means for this project>
+- Workdir layout follows Paper2ASTRA conventions: `work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`.
+- The constitution (desired state, runtime mode, scope, evidence, per-phase mode) lives at `<constitution>.md` in this directory.
+- The during-loop questions log lives at `open-questions.md`. The user reviews it in FINAL_REVIEW.
 
-## Termination criterion: <weak / strong>
-
-<one paragraph on what that means for this project>
-
-## Canonical-resolution rule
-
-When `work/reference/code/` exists, code is canonical for numerics + method.
-Every implementing iteration reads relevant code; disagreements between paper
-and code go into `open-questions.md` (loop / sub-agent phases) or surface via
-AskUserQuestion (interactive phases). The user resolves at the next interactive
-seam.
-
-## Per-phase mode
+## Rules
 
-(reproduce the per-phase mode table from the constitution)
+- **Code-as-canonical when `work/reference/code/` exists.** Every implementing iteration reads relevant code. Where paper and code disagree, code is canonical for numerics, plotting, and method.
+- **Never block on `AskUserQuestion` mid-sub-agent.** When a sub-agent or loop phase would surface a question to the user, append it to `open-questions.md` and continue with the best-judgment default. The user resolves in FINAL_REVIEW.
+- **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
+- **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
 
 ## Conventions and warnings
 
-- Workdir layout follows Paper2ASTRA conventions: `work/reference/`,
-  `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`.
-- `arxiv-LaTeX-first` acquisition; PDF + Docling fallback only when
-  the paper isn't on arxiv.
-- `astra validate --verify-evidence` is the fidelity gate.
-- Open questions accumulate in `open-questions.md`; the user reads
-  it between iterations.
-- <any user-supplied warnings>
+- <any paper-specific notes the user surfaced during the interview>
 ```
 
 Show both drafts, take corrections, refine. When the user is happy:
diff --git a/claude/lightcone/skills/paper2astra/references/summarize_run.md b/claude/lightcone/skills/paper2astra/references/summarize_run.md
index 1a22d6a2..927902c1 100644
--- a/claude/lightcone/skills/paper2astra/references/summarize_run.md
+++ b/claude/lightcone/skills/paper2astra/references/summarize_run.md
@@ -1,8 +1,8 @@
-# SUMMARIZE_RUN — final report and constitution outcome
+# SUMMARIZE_RUN — final report and outcome draft
 
-The reproduction has converged (verdict `pass` or user-accepted `partial`). Write the final summary, update the constitution's outcome, and prepare the workdir for handoff. The summary recommends two adjacent follow-up skills the user can invoke directly: `/figure-comparison` (HTML side-by-side of original vs reproduced figures, tables, numerics) and `/check-sentence-by-sentence` (paper-vs-code claim audit). Both are user-invokable rather than orchestrator-spawned because they use `AskUserQuestion` for missing-data prompts.
+The reproduction has converged (verdict `pass` or user-accepted `partial`). Write the final summary to disk, draft the constitution's outcome, and prepare the workdir for the post-loop interactive return. SUMMARIZE_RUN runs as a silent sub-agent — it produces the report cleanly and exits; the next phase, FINAL_REVIEW, picks up interactively to drive `/figure-comparison`, optionally `/check-sentence-by-sentence`, walk the user through `open-questions.md`, and finalize the outcome before closure.
 
-The constitution's per-phase mode is **always sub-agent**. There are no decisions left; this is reportage.
+The constitution's per-phase mode is **always sub-agent**. There are no decisions left for this phase; this is reportage that hands off to FINAL_REVIEW.
 
 ## Inputs
 
@@ -15,8 +15,8 @@ The constitution's per-phase mode is **always sub-agent**. There are no decision
 ## Outputs
 
 - `REPRODUCTION-SUMMARY.md` (or whatever name fits the project) — final report; concise.
-- Updated `outcome:` on the constitution.
-- A final commit on the reproduction branch with a clear message.
+- Draft `outcome:` on the constitution. FINAL_REVIEW refines it after the user has walked the validation surfaces.
+- A commit on the reproduction branch with a clear message.
 
 ## What the final report covers
 
@@ -28,35 +28,33 @@ A single markdown file at the project root, ~1–2 pages. Sections:
 4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target, with the path to the reproduced result.
 5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut that's stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded.
 6. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`).
-7. **Optional follow-ups** — recommend adjacent audit/reporting skills when
-   useful: `/check-sentence-by-sentence` for auditing paper claims against
-   code locations, and `/figure-comparison` for a portable side-by-side HTML
-   report comparing paper artifacts with reproduced results.
+7. **Open questions for FINAL_REVIEW** — short pointer to `<paper-slug>/open-questions.md`, with a count of unresolved entries. FINAL_REVIEW will walk these with the user; this section just flags that they're waiting.
 
 Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
 
-## Constitution outcome
+## Constitution outcome (draft)
 
-Rewrite the constitution's `outcome:` field to reflect the realized state. A good outcome teaches:
+Draft the constitution's `outcome:` field to reflect the realized state. A good outcome teaches:
 
-> Reproduced <paper> against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to <reason>. Material conflicts surfaced and resolved: <list>. Spec at `astra.yaml` (validates with `--verify-evidence`); reproduction summary at `REPRODUCTION-SUMMARY.md`.
+> Reproduced <paper> against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to <reason>. Material conflicts surfaced and resolved: <list>. Spec at `astra.yaml` (validates with `--verify-evidence`); reproduction summary at `REPRODUCTION-SUMMARY.md`. **FINAL_REVIEW pending: <N> open questions, `/figure-comparison` not yet rendered.**
 
-The constitution's `status:` flips to `closed` only when the user accepts. This sub-agent does not flip status — it prepares the outcome and surfaces to the user (via the iteration's exit message) that the constitution is ready for closure.
+This is a draft. **FINAL_REVIEW refines it** after the user has walked the validation surfaces and ratified the open questions. The constitution's `status:` flips to `closed` only when the user accepts FINAL_REVIEW's surfacing. This sub-agent does not flip status, and does not finalize the outcome — it prepares the report and the outcome draft, then exits so FINAL_REVIEW can take over interactively.
 
-## Final commit
+## Commit
 
-Stage the report, the updated constitution, the final `astra.yaml`, the comparison report, and any housekeeping changes. Commit with a message that names the verdict:
+Stage the report, the constitution outcome draft, the final `astra.yaml`, the comparison report, and any housekeeping changes. Commit with a message that names the verdict and signals the handoff:
 
 ```
-reproduction: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMMARY.md
+summarize_run: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMMARY.md, final_review pending
 ```
 
 ## Survey signals (entry into SUMMARIZE_RUN)
 
 - `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready
-- `REPRODUCTION-SUMMARY.md` exists; `figure-comparison.html` exists; constitution outcome is rewritten ⇒ done
+- `REPRODUCTION-SUMMARY.md` exists, constitution outcome draft is in place ⇒ SUMMARIZE_RUN done; FINAL_REVIEW takes over interactively
 
 ## Notes
 
-- **This phase does not flip the constitution's status to closed.** The user does that, after reviewing the summary. The phase's job is to produce the summary cleanly; the human keeps the close authority.
+- **This phase does not flip the constitution's status to closed.** The user does that, after FINAL_REVIEW. SUMMARIZE_RUN's job is to produce the summary cleanly and hand off.
+- **Do not invoke `/figure-comparison` or `/check-sentence-by-sentence` from here.** Both have `AskUserQuestion` in their `allowed-tools`; spawning them under the `Task` tool fires prompts into nothing. They run in FINAL_REVIEW, where the user is reachable.
 - **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous.

From 15c6dff48f91df316b1efecb18e6dca44c93b2c2 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 02:06:07 +0200
Subject: [PATCH 012/124] =?UTF-8?q?skills/paper2astra:=20phase=20redesign?=
 =?UTF-8?q?=20=E2=80=94=20collapse=2011=20phases=20to=209,=20sharpen=20CLA?=
 =?UTF-8?q?UDE.md/constitution=20split?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round-2 refinement of the bundle, plus the start of phase-redesign
realization driven by the new constitution at
[[lightcone/paper2astra-as-skill/phase-redesign]].

Phase shape:
- INTERVIEW and SUMMARIZE_RUN are the only mandatory-interactive seams
- PARSE folds into ACQUIRE (arxiv-LaTeX needs no parsing; PDF→Docling
  runs inline)
- SUMMARIZE renamed to STUDY — and parallelizes by paper-section (each
  sub-agent reads section + matching code together) rather than by source.
  Writes ASTRA-shaped prior_insights: entries directly, no intermediate
  format.
- EXTRACT_TARGETS folds: scoping in INTERVIEW, formalization in SPECIFY
- LITERATURE stays as core (verifiability is paper2astra's reason)
- REVIEW is rigor-dialed: frugal=skip-or-1, rigor=N-rounds-fresh-sub-agents;
  reviewers check astra.yaml against paper+code without seeing what was
  implemented or fixed last round (no implementation-side bias)
- IMPLEMENT also takes sub-agents + rigor-tied review iterations
- FINAL_REVIEW collapses back into SUMMARIZE_RUN — same job, old name.
  Close-out: /figure-comparison, optionally /check-sentence-by-sentence,
  walk user through open-questions.md.

Sharpened CLAUDE.md vs constitution split (separate jobs, no overlap):
- CLAUDE.md = info + rules (paper identity, code location, code-as-canonical
  rule, never-block discipline, conventions; auto-loaded on walk-up;
  evolves over time as iterations learn)
- Constitution = desired state (runtime mode, termination, scope, evidence,
  per-phase routing, open questions; runner reads as explicit task)

Files:
- SKILL.md: 9-phase per-phase mode table; sharpened bookends; CLAUDE.md/
  constitution split rewritten in workflow Step 6; runtime modes table
  retains the bash-loop SSH-fragility note; sub-agent invocation rules
  for /figure-comparison and /check-sentence-by-sentence (interactive,
  not silent — they use AskUserQuestion).
- references/acquire.md: PARSE folds in (arxiv-LaTeX vs Docling-fallback
  handled together); names why work/reference/code/ matters as the
  canonical reference for every implementing iteration.
- references/study.md (renamed from summarize.md): section-parallel
  sub-agents writing prior_insights: entries directly.
- references/interview.md: 6-job interview producing both files; the
  CLAUDE.md and constitution templates are now non-overlapping.
- references/specify.md: target-formalization folds in (EXTRACT_TARGETS gone).
- Deleted: parse.md, extract_targets.md, final_review.md, summarize.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/paper2astra/SKILL.md  | 107 ++++----
 .../skills/paper2astra/references/acquire.md  |  77 +++++-
 .../paper2astra/references/extract_targets.md |  61 -----
 .../paper2astra/references/final_review.md    |  67 -----
 .../paper2astra/references/interview.md       |  62 ++---
 .../skills/paper2astra/references/parse.md    |  79 ------
 .../skills/paper2astra/references/specify.md  |  49 ++--
 .../skills/paper2astra/references/study.md    | 229 ++++++++++++++++++
 .../paper2astra/references/summarize.md       | 120 ---------
 9 files changed, 415 insertions(+), 436 deletions(-)
 delete mode 100644 claude/lightcone/skills/paper2astra/references/extract_targets.md
 delete mode 100644 claude/lightcone/skills/paper2astra/references/final_review.md
 delete mode 100644 claude/lightcone/skills/paper2astra/references/parse.md
 create mode 100644 claude/lightcone/skills/paper2astra/references/study.md
 delete mode 100644 claude/lightcone/skills/paper2astra/references/summarize.md

diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
index b4e290f3..890b4439 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -42,21 +42,21 @@ paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. Al
 
 paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them. The siblings stand alone; they don't know about paper2astra.
 
-Two further siblings are invoked from the **FINAL_REVIEW** phase, after the loop terminates and SUMMARIZE_RUN has written the report: [`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable side-by-side HTML report (paper artifacts vs reproduced), and [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) (optional) audits paper claims against code locations. Both have `AskUserQuestion` in their `allowed-tools`, so FINAL_REVIEW runs interactively in the main loop session — spawning them under the `Task` tool would fire prompts into nothing.
+Two further siblings are invoked from **SUMMARIZE_RUN**, the always-interactive close-out phase that runs after the COMPARE → IMPLEMENT loop terminates: [`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable side-by-side HTML report (paper artifacts vs reproduced), and [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) (optional) audits paper claims against code locations. Both have `AskUserQuestion` in their `allowed-tools`, so SUMMARIZE_RUN runs interactively in the main loop session — spawning them under the `Task` tool would fire prompts into nothing.
 
 ## Workflow
 
 ### Interview (interactive — once per project)
 
-The interview is the only phase paper2astra runs interactively. Read [`references/interview.md`](references/interview.md) in full before starting.
+The interview is the first of two always-interactive bookends — INTERVIEW at the start, SUMMARIZE_RUN at the end. Every phase between them is configurable per the user's per-phase mode choice. Read [`references/interview.md`](references/interview.md) in full before starting.
 
 The interview has six jobs:
 
 1. **Identify the paper** — DOI / arXiv ID / title; whether code is available; whether the user has prior experience with this paper.
-2. **Scope the reproduction** — full reproduction vs targeted (e.g. only the BAO fit), which figures/tables/numbers are the targets.
+2. **Scope the reproduction** — full reproduction vs targeted (e.g. only the BAO fit), which figures/tables/numbers are the targets. The user's named targets get formalized into `astra.yaml`'s `outputs:`, `findings:`, `inputs:`, and `decisions:` structure during SPECIFY — there is no separate target-extraction phase.
 3. **Pick a runtime mode** — interactive / bash-loop / tmux-orchestrated. See "Runtime modes" below.
-4. **Pick a termination criterion** — frugality (weak) vs rigor (strong). See "Frugality vs rigor" below.
-5. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. The defaults are reasonable; the user gets to flip any of them.
+4. **Pick a termination criterion** — frugality (weak) vs rigor (strong). The dial threads through REVIEW and IMPLEMENT, scaling iteration depth. See "Frugality vs rigor" below.
+5. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. Only INTERVIEW and SUMMARIZE_RUN are mandatory-interactive; every other phase is the user's call.
 6. **Draft the per-paper constitution and CLAUDE.md** — invoke `/constitution` to draft the constitution. Author the per-paper `CLAUDE.md` from the same conversation. The two files have separate jobs and don't overlap:
 
    - **`CLAUDE.md`** is *info and rules* — paper identity (DOI / arXiv ID / title / authors), where the original code lives (`work/reference/code/`), the code-as-canonical rule, the never-block-on-`AskUserQuestion`-mid-sub-agent rule, any paper-specific conventions or warnings, pointers to the constitution and `open-questions.md`. Auto-loaded by Claude Code on every walk-up to this directory. **Evolves over time** — iterations that learn new conventions or surface paper-specific gotchas can add lines so future sessions don't re-derive the same context.
@@ -91,59 +91,65 @@ Strong is the default for fidelity-critical reproductions; weak is the default w
 
 Inside each ralph iteration, the agent reads the per-paper constitution, surveys the workdir to determine which phase is current (file existence + git log), and runs that phase's reference. Each phase reference is self-contained — read the matching one in full before working:
 
-| Phase | Reference | Outputs |
-|---|---|---|
-| ACQUIRE | [`references/acquire.md`](references/acquire.md) | `work/reference/{document.md, paper.pdf, code/, code-status.yaml}` |
-| PARSE | [`references/parse.md`](references/parse.md) | `work/reference/{figures/, tables/, metadata.json}` |
-| SUMMARIZE | [`references/summarize.md`](references/summarize.md) | `work/notes/{methodology.md, cited_papers.yaml, code-analysis.md}` |
-| EXTRACT_TARGETS | [`references/extract_targets.md`](references/extract_targets.md) | `targets/targets.md` + reference files |
-| LITERATURE | [`references/literature.md`](references/literature.md) | `work/notes/literature.yaml` + per-paper YAMLs |
-| SPECIFY | [`references/specify.md`](references/specify.md) | `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` |
-| REVIEW | [`references/review.md`](references/review.md) | (in-place edits to spec + notes) |
-| IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
-| RUN | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
-| COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
-| SUMMARIZE_RUN | [`references/summarize_run.md`](references/summarize_run.md) | Final write-up to disk |
-| FINAL_REVIEW | [`references/final_review.md`](references/final_review.md) | `/figure-comparison` HTML + (opt) sentence audit; resolved `open-questions.md`; constitution outcome update |
-
-The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it. After SUMMARIZE_RUN writes the final summary, control returns to the user and FINAL_REVIEW runs interactively — not from inside the loop.
+| # | Phase | Reference | Outputs |
+|---|---|---|---|
+| 1 | ACQUIRE | [`references/acquire.md`](references/acquire.md) | `work/reference/{source/ \| document.md, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml}` |
+| 2 | STUDY | [`references/study.md`](references/study.md) | `work/notes/study/<NN>-<section>.md` (one per paper section, paper-vs-code agreement-check) + `work/notes/methodology.md` + `work/notes/cited_papers.yaml` |
+| 3 | LITERATURE | [`references/literature.md`](references/literature.md) | `work/notes/literature.yaml` + per-paper YAMLs |
+| 4 | SPECIFY | [`references/specify.md`](references/specify.md) | `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md`, `targets/targets.md` |
+| 5 | REVIEW | [`references/review.md`](references/review.md) | (in-place edits to spec + notes; rigor-dialed iterations) |
+| 6 | IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml`; rigor-dialed paper-vs-implementation review iterations |
+| 7 | RUN | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
+| 8 | COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
+| 9 | SUMMARIZE_RUN | [`references/summarize_run.md`](references/summarize_run.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, (optional) sentence audit, resolved `open-questions.md`, finalized constitution outcome |
+
+The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it. On pass (or user-accepted partial), control returns to the user and SUMMARIZE_RUN runs interactively in the main session — drafting the report, invoking `/figure-comparison`, optionally `/check-sentence-by-sentence`, walking accumulated questions, and finalizing the constitution outcome.
+
+ACQUIRE folds in what was previously a separate PARSE phase: arxiv-LaTeX papers come pre-structured in their tarball (no Docling needed), and PDF-fallback papers run Docling inside ACQUIRE itself to produce `document.md` + extracted figures/tables. SPECIFY folds in target-formalization (what was a separate EXTRACT_TARGETS phase): the targets the user named in INTERVIEW become explicit `outputs:`, `findings:`, `inputs:`, and `decisions:` in `astra.yaml`, plus a small `targets/targets.md` ledger as a derivation for COMPARE.
 
 ### Per-phase mode (interactive vs sub-agent)
 
-A reproduction's most consequential decisions show up at known seams. The interview decides — for this paper — which phases run interactively (in the main loop session, the user can be reached via `AskUserQuestion`) and which delegate to a sub-agent (Task tool with fresh context, no user reach).
+A reproduction's most consequential decisions show up at known seams. Only the bookends are mandatory-interactive — INTERVIEW at the start, SUMMARIZE_RUN at the end. Every phase between them is configurable: the interview decides which run interactively (in the main loop session, the user reachable via `AskUserQuestion`) and which delegate to a sub-agent (Task tool with fresh context, no user reach).
 
 Defaults the constitution starts with:
 
-| Phase | Default | Why |
-|---|---|---|
-| ACQUIRE | user choice | Mostly mechanical; surfacing happens only on download failures. |
-| PARSE | user choice | Deterministic Docling / arXiv extraction. |
-| SUMMARIZE | sub-agent | Parallel paper + code reading benefits from fresh context per task. |
-| EXTRACT_TARGETS | user choice | The selection of replication targets is sometimes obvious, sometimes wants user input. |
-| LITERATURE | sub-agent | One sub-agent per cited paper — pure parallel grunt-work. |
-| SPECIFY | **interactive** | Material paper-vs-code conflicts surface here; the user must ratify. |
-| REVIEW | user choice | Pre-implement sanity check; can be either. |
-| IMPLEMENT | user choice | Mostly mechanical, but algorithm choices may want ratification. |
-| RUN | user choice | Mechanical, but failures need diagnosis. |
-| COMPARE | **interactive** | Verdict (was the reproduction close enough?) is the second mandatory user-ratification seam. |
-| SUMMARIZE_RUN | sub-agent | Final write-up to disk; no decisions remain. |
-| FINAL_REVIEW | **interactive** | Post-loop interactive return — runs `/figure-comparison` and (optionally) `/check-sentence-by-sentence`, then walks the user through `open-questions.md` with `AskUserQuestion` to ratify accumulated seams. |
+| # | Phase | Default | Why |
+|---|---|---|---|
+| 0 | INTERVIEW | **interactive — *always*** | The first bookend. Scope, runtime, rigor, per-phase mode all decided here. |
+| 1 | ACQUIRE | user choice | Mostly mechanical (LaTeX-tarball download / Docling fallback / code clone); surfacing happens only on download failures. |
+| 2 | STUDY | sub-agent (parallel by paper-section) | One sub-agent per paper section, each reading the section *together with* its matching code. The value is the section-level paper-vs-code agreement check; parallel fresh context fits naturally. |
+| 3 | LITERATURE | sub-agent | One sub-agent per cited paper — pure parallel grunt-work. Core, not opt-in: verifiability against citations is what `prior_insights` evidence depends on. |
+| 4 | SPECIFY | user choice (default interactive) | Material paper-vs-code conflicts and target-formalization happen here; the user usually wants to ratify. |
+| 5 | REVIEW | sub-agent (rigor-dialed) | Fresh-context sub-agent reads `astra.yaml` against paper + code and asks "is this consistent?" — frugal: skip or one pass; rigor: N rounds, each with a fresh reviewer + SPECIFY incorporating fixes. |
+| 6 | IMPLEMENT | sub-agent (rigor-dialed review iterations) | Writes recipes + scripts (parallelized by output where feasible). Frugal: minimal review pass after. Rigor: N rounds of "is the implementation consistent with the paper?" sub-agent review + fix iterations. |
+| 7 | RUN | user choice | Mechanical, but failures need diagnosis. |
+| 8 | COMPARE | user choice | Verdict (was the reproduction close enough?) is the user's call when interactive; sub-agent COMPARE writes the verdict and lets SUMMARIZE_RUN ratify. |
+| 9 | SUMMARIZE_RUN | **interactive — *always*** | The closing bookend. Drafts the report, runs `/figure-comparison` (mandatory) and `/check-sentence-by-sentence` (opt-in), walks `open-questions.md` with `AskUserQuestion`, finalizes the constitution outcome. |
 
 The constitution records the choice; iterations honor it. Sub-agent phases are spawned via the `Task` tool from inside the main loop session — that gives them fresh context but no user-reach. Interactive phases run inline in the loop session and may pause with `AskUserQuestion` at material seams.
 
+### Rigor vs frugality threads through REVIEW and IMPLEMENT
+
+The frugality/rigor dial picked in INTERVIEW is not just a termination criterion for the COMPARE → IMPLEMENT loop. It also tunes how aggressively REVIEW and IMPLEMENT self-check:
+
+- **Frugal**: REVIEW runs once or is skipped; IMPLEMENT does no extra review iterations after writing.
+- **Rigor**: REVIEW iterates — fresh-context sub-agent reads `astra.yaml` against paper + code; SPECIFY incorporates fixes; a *fresh* sub-agent re-reviews; repeat until two consecutive rounds find no fixes (or a configured cap is hit). IMPLEMENT does the same shape after writing recipes — sub-agent reads the implementation against the paper + code, fixes are incorporated, fresh sub-agent re-reviews.
+
+The discipline is **never bias the reviewing sub-agent**: each round runs from fresh context with the prompt "check the spec/implementation is consistent with the paper and the code" — not "here's what was just fixed; check it." Otherwise the reviewer pattern-matches on prior fixes rather than thinking from first principles.
+
 ### Code-as-canonical
 
 When the original codebase is available at `work/reference/code/`, **the agent reads relevant code on every iteration when implementing**. Where paper and code disagree, the **code is canonical** for numerics, plotting, and method; the agent continues with the code's behavior and either ratifies (interactive phases) or logs (sub-agent / loop phases) the disagreement so the user resolves at the next interactive seam.
 
 This is the load-bearing fidelity discipline. Without it, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced (plot styles off, numerical results off). The per-paper CLAUDE.md restates the rule so every iteration's Claude session walks up to it.
 
-### Two surfaces for user attention: open-questions and FINAL_REVIEW
+### Two surfaces for user attention: open-questions and SUMMARIZE_RUN
 
-The reproduction has two periods of human reach: the interview at the start, and FINAL_REVIEW at the end. In between, the loop runs without a human in the conversation. The discipline has two surfaces to match:
+The reproduction has two periods of human reach — the bookends. INTERVIEW at the start, SUMMARIZE_RUN at the end. In between, the loop runs without a human in the conversation. The discipline has two surfaces to match:
 
 - **`<paper-slug>/open-questions.md` — the during-loop accumulator.** When a sub-agent or loop iteration would normally surface a question to the user (paper-vs-code conflicts, figures whose intent isn't obvious, ambiguities the constitution doesn't resolve), it appends the question to `open-questions.md` and continues with the best-judgment default. Never block on `AskUserQuestion` from inside a sub-agent — the prompt fires into nothing.
 
-- **FINAL_REVIEW — the post-loop interactive return.** When the COMPARE→IMPLEMENT loop terminates (verdict=pass or budget exhausted) and SUMMARIZE_RUN has written the final summary, control returns to the user. FINAL_REVIEW invokes `/figure-comparison` and (optionally) `/check-sentence-by-sentence` interactively — these skills can use `AskUserQuestion` because the human is back. Then it walks the user through `open-questions.md` with `AskUserQuestion`, lands resolutions, updates `astra.yaml` or `implementation-notes.md` accordingly, and closes out the constitution outcome.
+- **SUMMARIZE_RUN — the post-loop interactive close-out.** When the COMPARE→IMPLEMENT loop terminates (verdict=pass or budget exhausted), control returns to the user. SUMMARIZE_RUN invokes `/figure-comparison` and (optionally) `/check-sentence-by-sentence` interactively — these skills can use `AskUserQuestion` because the human is back. Then it walks the user through `open-questions.md` with `AskUserQuestion`, lands resolutions, updates `astra.yaml` or `implementation-notes.md` accordingly, drafts `REPRODUCTION-SUMMARY.md`, and finalizes the constitution outcome.
 
 Stays in the conversation while the seams are still soft, walks away while the loop grinds, comes back to a rich review surface plus a list of "things you'd want to know."
 
@@ -161,7 +167,7 @@ Both choices land in `astra.yaml` as decision options. Whichever the user picks
 
 ### Resuming an in-flight reproduction
 
-If the workdir already exists (`work/reference/document.md` is present, `astra.yaml` exists, etc.):
+If the workdir already exists (`work/reference/source/` or `work/reference/document.md` is present, `astra.yaml` exists, etc.):
 
 1. **Skip the interview** unless the user explicitly wants to revise scope.
 2. Read the per-paper constitution if it exists; if it does not, draft a minimal one from the current workdir state.
@@ -171,16 +177,16 @@ Workdir signals (file existence implies the phase has been done):
 
 | Signal | Phase done |
 |---|---|
-| `work/reference/document.md` | ACQUIRE + PARSE |
-| `work/notes/methodology.md` | SUMMARIZE (paper) |
-| `work/notes/code-analysis.md` | SUMMARIZE (code) |
-| `targets/targets.md` | EXTRACT_TARGETS |
+| `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) | ACQUIRE |
+| `work/reference/code/` | ACQUIRE (code clone) |
+| `work/notes/study/<NN>-<section>.md` files | STUDY (per-section paper-vs-code agreement-check) |
+| `work/notes/methodology.md` | STUDY (consolidated decision map + results inventory) |
 | `work/notes/literature.yaml` | LITERATURE |
-| `astra.yaml` valid (`astra validate astra.yaml`) | SPECIFY |
-| `implementation-notes.md` | SPECIFY |
+| `astra.yaml` valid (`astra validate astra.yaml`) + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
 | recipes present in `astra.yaml` | IMPLEMENT |
 | `results/<universe>/<output>/` | RUN |
 | `comparison-report.yaml` | COMPARE |
+| `REPRODUCTION-SUMMARY.md` + `.lightcone/comparison.html` + resolved `open-questions.md` | SUMMARIZE_RUN |
 
 `git log --oneline` complements this — phase commits are the chronological view.
 
@@ -190,8 +196,8 @@ Workdir signals (file existence implies the phase has been done):
 - [`/ralph-loops`](../ralph-loops/SKILL.md) — for the bash-loop and tmux-orchestrated runtime modes
 - [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE
 - [`/narrative`](../narrative/SKILL.md) — for SPECIFY
-- [`/figure-comparison`](../figure-comparison/SKILL.md) — for FINAL_REVIEW (mandatory)
-- [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) — for FINAL_REVIEW (opt-in)
+- [`/figure-comparison`](../figure-comparison/SKILL.md) — for SUMMARIZE_RUN (mandatory)
+- [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) — for SUMMARIZE_RUN (opt-in)
 
 ## Discipline
 
@@ -201,7 +207,10 @@ Workdir signals (file existence implies the phase has been done):
 - **Use the up-to-date CLI surfaces, not skill-specific wrappers.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces.
 - **arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv where there's no better source.
 - **The original code goes into `work/reference/code/`** during ACQUIRE when available, and stays there as the canonical reference for every subsequent iteration (see "Code-as-canonical" above).
-- **`/figure-comparison` and `/check-sentence-by-sentence` run inside FINAL_REVIEW, not inside the loop.** Both have `AskUserQuestion` in their `allowed-tools`; FINAL_REVIEW is the post-loop interactive phase that runs them in the main session so the prompts land. Don't try to spawn either under the `Task` tool from inside the loop.
+- **`/figure-comparison` and `/check-sentence-by-sentence` run inside SUMMARIZE_RUN, not inside the loop.** Both have `AskUserQuestion` in their `allowed-tools`; SUMMARIZE_RUN is the always-interactive close-out bookend that runs them in the main session so the prompts land. Don't try to spawn either under the `Task` tool from inside the loop.
+- **Only the bookends are mandatory-interactive.** INTERVIEW (start) and SUMMARIZE_RUN (close). Every other phase is configurable per the interview's per-phase mode choice — no "always interactive" flag on anything in between. The dial that does the heavy lifting on quality is rigor/frugality, threaded through REVIEW and IMPLEMENT.
+- **Don't bias review sub-agents.** REVIEW and IMPLEMENT's review iterations spawn fresh sub-agents whose prompt is "check `astra.yaml` (or the implementation) is consistent with the paper and the code" — never "here's what was just implemented or fixed last round." Each round runs from a fresh reviewing context. Otherwise the reviewer pattern-matches on prior fixes rather than thinking from first principles.
+- **STUDY parallelizes by paper-section, not by source.** A single sub-agent that reads "the whole paper" can't compare with "the whole code" — too much context. A sub-agent that reads paper-section A *plus* the matching code (located via the bibliography or the code's own structure) is the right unit. Sub-agents fan out across the paper's sections; each one carries enough context to surface paper-vs-code disagreements at its own level.
 - **No synthetic data.** Unless the paper itself uses synthetic data as its input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement phase reference repeats this; treat it as load-bearing.
 - **Tmux preferred-when-available, never required.** Modes (1) and (2) work without it.
 - **The siblings don't know about paper2astra.** Each SKILL stands on its own.
diff --git a/claude/lightcone/skills/paper2astra/references/acquire.md b/claude/lightcone/skills/paper2astra/references/acquire.md
index 93903655..e294ac2c 100644
--- a/claude/lightcone/skills/paper2astra/references/acquire.md
+++ b/claude/lightcone/skills/paper2astra/references/acquire.md
@@ -1,8 +1,8 @@
-# ACQUIRE — fetch the paper and code
+# ACQUIRE — fetch the paper, structure it, clone the code
 
-Acquire the paper's full text and (when available) its reference code repository. The bundle's primary acquisition path is **arXiv LaTeX source via `/managing-bibliography`**; PDF + Docling is the fallback for non-arXiv papers.
+Acquire the paper's full text, structure it for downstream consumption, and (when available) clone the reference code repository. The bundle's primary acquisition path is **arXiv LaTeX source via `/managing-bibliography`**; PDF + Docling is the fallback for non-arXiv papers. ACQUIRE folds in what was previously a separate PARSE phase — for arXiv-LaTeX, the structure is already in the tarball (no extra work); for the PDF fallback, ACQUIRE runs Docling itself.
 
-The constitution's per-phase mode controls whether this runs interactively or as a sub-agent. Default is sub-agent.
+The constitution's per-phase mode controls whether this runs interactively or as a sub-agent. Default is sub-agent — surfacing happens only on download failures.
 
 ## Inputs
 
@@ -11,13 +11,27 @@ The constitution's per-phase mode controls whether this runs interactively or as
 
 ## Outputs
 
-- `work/reference/document.md` — paper as markdown (LaTeX-rendered when arXiv source available; Docling-extracted for PDF fallback)
-- `work/reference/paper.pdf` — paper PDF (still needed for evidence verification via `astra validate --verify-evidence`)
-- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (PARSE may move some of this to `work/reference/`)
+Two shapes depending on the acquisition path:
+
+**Path A — arXiv LaTeX source:**
+
+- `work/reference/source/` — extracted arXiv tarball (the canonical text source: `.tex`, `.bbl`, figure files, etc.)
+- `work/reference/paper.pdf` — paper PDF (kept as a backup for `astra validate --verify-evidence`)
+
+**Path B — PDF + Docling fallback:**
+
+- `work/reference/document.md` — paper as markdown (Docling-extracted)
+- `work/reference/figures/` — extracted figures
+- `work/reference/tables/` — extracted tables
+- `work/reference/metadata.json` — figure / table index with captions and page numbers
+- `work/reference/paper.pdf` — paper PDF
+
+**Both paths:**
+
 - `work/reference/code/` — clone of the code repo (or absent if not found)
 - `work/reference/code-status.yaml` — record of where the code came from
 
-## Step 1: Acquire the paper text
+## Step 1: Acquire and structure the paper text
 
 ### Path A — arXiv ID is available (preferred)
 
@@ -29,9 +43,15 @@ mkdir -p work/reference/source && cd work/reference/source && tar -xzf /tmp/<arx
 ls *.tex
 ```
 
-The LaTeX source gives clean equations, captions, tables, and bibliography — none of the math collapse, ligature artifacts, or caption flattening that plagues PDF extraction. Use the main `.tex` file as the primary text source. Render it to markdown if a downstream phase needs that form (`pandoc`, or just preserve TeX where it is).
+The LaTeX source gives clean equations, captions, tables, and bibliography — none of the math collapse, ligature artifacts, or caption flattening that plagues PDF extraction. **No conversion to markdown is needed.** Downstream phases (STUDY's section sub-agents, SPECIFY's evidence quotes) read `.tex` directly — Claude reads LaTeX fine, and rendering it to markdown only loses information. The tarball stays as `work/reference/source/`.
+
+If you want to identify the main `.tex` file for downstream tools:
+
+```bash
+grep -l '\\documentclass' work/reference/source/*.tex
+```
 
-Also cache the paper for ASTRA's evidence-verification surface:
+Cache the paper for ASTRA's evidence-verification surface:
 
 ```bash
 astra paper add 10.48550/arXiv.<arxiv-id>
@@ -40,6 +60,8 @@ cp "$(astra paper path 10.48550/arXiv.<arxiv-id>)" work/reference/paper.pdf
 
 `astra paper add` for arXiv DOIs fetches the PDF directly. The PDF stays as a backup for `astra validate --verify-evidence`, even though the LaTeX source is the primary text.
 
+There is no PARSE step on Path A. Equation numbers, section numbers, figure references — all preserved in the source. STUDY's sub-agents resolve `\ref{}` against `\label{}` directly in the source tree.
+
 ### Path B — non-arXiv paper (PDF + Docling fallback)
 
 ```bash
@@ -52,13 +74,36 @@ The `file` output must say "PDF document". If it says "HTML document" or anythin
 
 If a valid PDF cannot be obtained, write a clear error to `work/reference/acquire-error.txt` and stop.
 
-Skip Step 1 if `work/reference/paper.pdf` already exists and is a valid PDF.
+Then run Docling to structure the PDF — without this, downstream phases have nothing to read but the raw PDF:
+
+```bash
+docling --output work/reference work/reference/paper.pdf
+```
+
+Docling produces `document.md`, `figures/`, `tables/`, and `metadata.json` directly into `work/reference/`. The `metadata.json` index has the shape:
+
+```json
+{
+  "figures": [
+    {"id": "fig1", "caption": "...", "file": "figures/fig1.pdf", "label": "fig:bao"}
+  ],
+  "tables": [
+    {"id": "tab1", "caption": "...", "file": "tables/tab1.csv", "label": "tab:results"}
+  ]
+}
+```
+
+The `label` field is the source label (where Docling can extract it) so SPECIFY's anchor work can reference the same artifact.
+
+If Docling fails, the PDF may be corrupt — re-download before giving up.
+
+Skip Step 1 if the path's outputs already exist (`work/reference/source/` for Path A, `work/reference/document.md` for Path B).
 
 ## Step 2: Search for the code repository
 
 This step matters more than its size suggests. When `work/reference/code/` exists, every implementing iteration treats it as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md). Without it, iterations have only the paper to anchor to and drift toward "looks right" rather than "matches."
 
-1. Search the paper text for repository URLs — abstract, intro, conclusion, footnotes, "Code Availability" or "Data Availability" sections.
+1. Search the paper text for repository URLs — abstract, intro, conclusion, footnotes, "Code Availability" or "Data Availability" sections. (Path A: grep across `work/reference/source/*.tex`. Path B: grep `work/reference/document.md`.)
 2. If none found, web search: paper title + "github", Papers With Code, or the first author's GitHub profile.
 3. Clone if found:
    ```bash
@@ -78,10 +123,16 @@ Skip Step 2 if `work/reference/code/` already exists.
 
 ## Survey signals (entry into ACQUIRE)
 
-Run `ls work/reference/` first. If `paper.pdf` and `document.md` (or `source/` for arXiv) are present, ACQUIRE is done. If only `paper.pdf` is present, PARSE handles the rest. If nothing is there, run ACQUIRE.
+Run `ls work/reference/` first.
+
+- If `paper.pdf` is present and either `source/` (Path A) or `document.md` (Path B) is also present, ACQUIRE is done — proceed to STUDY.
+- If `paper.pdf` is present but neither structure exists, run the structuring step for the appropriate path.
+- If nothing is there, run the full ACQUIRE.
 
 ## Notes
 
 - **arXiv DOI form is `10.48550/arXiv.<id>`.** `astra paper add` accepts that form directly.
 - **Journal DOIs that 403 on Unpaywall** can be aliased to a locally-downloaded arXiv preprint via `astra paper add <JOURNAL_DOI> --pdf <path-to-arxiv-pdf>`.
-- This phase's job is acquisition, not understanding. Do not start summarizing the paper here — that's SUMMARIZE.
+- **Path A is preferred whenever arXiv source is acquirable.** Math, ligatures, and caption fidelity all come through clean from the LaTeX source; PDF + Docling is the fallback for non-arXiv where there's no better source. The acquisition layer's ASTRA-side counterpart — `astra paper add` preferring LaTeX over PDF for the verification cache, and applying the same logic to bibliography references — is filed as a separate ASTRA issue; paper2astra inherits the improvement once it lands.
+- **Equation numbers and section numbers must match the rendered paper.** On Path A, the printed numbers come from the rendered tarball (look at the PDF if uncertain). On Path B, Docling preserves printed numbers in its markdown output. When citing "eq. N" or "§N" in any downstream phase, find the equation or heading by content, not by a naïve count of TeX blocks or markdown headings.
+- This phase's job is acquisition + structuring, not understanding. Do not start summarizing or comparing the paper here — that's STUDY.
diff --git a/claude/lightcone/skills/paper2astra/references/extract_targets.md b/claude/lightcone/skills/paper2astra/references/extract_targets.md
deleted file mode 100644
index 6af2c22b..00000000
--- a/claude/lightcone/skills/paper2astra/references/extract_targets.md
+++ /dev/null
@@ -1,61 +0,0 @@
-# EXTRACT_TARGETS — pick the replication targets
-
-Take the results inventory from SUMMARIZE and select the concrete figures, tables, and metrics the reproduction will iterate against. Build a self-contained `targets/` directory the COMPARE phase will measure against.
-
-The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. The selection of replication targets is sometimes obvious (paper has 3 primary figures) and sometimes wants user input (which sub-analyses are in scope).
-
-## Inputs
-
-- `work/notes/methodology.md` — has the results inventory split into primary / secondary
-- `work/reference/metadata.json` — index of figures and tables with captions
-- `work/reference/figures/`, `work/reference/tables/` — the actual extracted artifacts
-
-## Outputs
-
-- `targets/targets.md` — the target ledger
-- `targets/<file>` — copies of selected reference files (figures, tables) so `targets/` is self-contained
-
-## Step 1: Read the results inventory
-
-Read `work/notes/methodology.md`. The results inventory section already separates primary from secondary results and notes which decisions feed into each. **Use this as your starting point** — do not re-analyze the paper from scratch.
-
-## Step 2: Select replication targets
-
-For each result in the inventory, find the corresponding figure, table, or in-text metric in `work/reference/`. Apply the constitution's scope:
-
-- **Primary results should almost always be included.** The constitution's Desired State names them.
-- **Secondary results** should be included only if they are useful checkpoints along the pipeline (i.e., if getting them right helps verify intermediate steps).
-- **Targeted reproduction** (per the constitution): include only the targets in scope. Mark out-of-scope primary results in `targets.md` with a reason.
-
-## Step 3: Populate `targets/`
-
-The `targets/` directory is the self-contained reference set the COMPARE phase consumes.
-
-1. **Copy relevant reference files** from `work/reference/figures/` and `work/reference/tables/` into `targets/`. Only copy the files corresponding to selected targets — not everything.
-
-2. **Write `targets/targets.md`.** For each target, a brief entry:
-
-   - What it is and where its reference file lives in `targets/`
-   - Expected values / trends and how to judge if a reproduction matches
-   - Which decisions from the decision map feed into this result
-   - Whether reference code covers this computation (from `code-analysis.md` if present)
-   - Priority: `primary` or `secondary`
-
-   Keep entries brief — a few lines per target, not paragraphs.
-
-## Rules
-
-- All paths in `targets/targets.md` are relative to `targets/`.
-- For figures: describe scientific content, not just "a plot" — name the panels, the axis ranges, the qualitative shape.
-- For tables: note which specific values matter most.
-- For metrics: quote the exact value from the paper text (with the section / equation / sentence reference).
-
-## Survey signals (entry into EXTRACT_TARGETS)
-
-- `work/notes/methodology.md` exists ⇒ ready to extract targets
-- `targets/targets.md` exists and reference files have been copied ⇒ EXTRACT_TARGETS done
-
-## Notes
-
-- **Targets are coverage obligations, not the spec.** SPECIFY maps each target to its appropriate ASTRA home — outputs for artifacts, findings for claims, inputs / decisions / universe defaults for constants. EXTRACT_TARGETS' job is the ledger; SPECIFY's job is the structural placement.
-- **Out-of-scope targets stay in `targets.md`** with an explicit reason, not silently dropped. The constitution's scope is the source of truth for what's in.
diff --git a/claude/lightcone/skills/paper2astra/references/final_review.md b/claude/lightcone/skills/paper2astra/references/final_review.md
deleted file mode 100644
index 9ff7e513..00000000
--- a/claude/lightcone/skills/paper2astra/references/final_review.md
+++ /dev/null
@@ -1,67 +0,0 @@
-# FINAL_REVIEW — interactive post-loop ratification
-
-The COMPARE → IMPLEMENT loop has terminated (verdict `pass` or attempt budget exhausted). SUMMARIZE_RUN has written `REPRODUCTION-SUMMARY.md`. Control now returns to the user; FINAL_REVIEW runs **inside the main loop session, not as a sub-agent**, so `AskUserQuestion` actually reaches a human. This is the post-loop ratification seam — validation surfaces (`/figure-comparison`, optional `/check-sentence-by-sentence`) plus the accumulated `open-questions.md` get walked through and resolved before the constitution is closed.
-
-The constitution's per-phase mode is **always interactive** for this phase. The user must be reachable.
-
-## Inputs
-
-- `<paper-slug>/REPRODUCTION-SUMMARY.md` — final report from SUMMARIZE_RUN
-- `comparison-report.{yaml,md}` — final verdict
-- `<paper-slug>/open-questions.md` — accumulated questions from sub-agent / loop phases
-- `<paper-slug>/<constitution>.md` — its `outcome:` field needs a final rewrite
-- `astra.yaml` — may need targeted edits as questions resolve
-- `implementation-notes.md` — may absorb resolutions that don't belong in `astra.yaml`
-
-## Outputs
-
-- `.lightcone/comparison.html` — self-contained side-by-side report (from `/figure-comparison`)
-- `<paper-slug>/sentence-audit.md` (or wherever `/check-sentence-by-sentence` lands its report) — *optional*
-- `<paper-slug>/open-questions.md` with every entry marked resolved (or explicitly deferred with a reason)
-- `astra.yaml` and/or `implementation-notes.md` updated where resolutions changed a decision or added a gotcha
-- Updated `outcome:` on the constitution
-- A final commit naming the FINAL_REVIEW pass
-
-## Task
-
-1. **Open the report.** Read `REPRODUCTION-SUMMARY.md`. Skim `comparison-report.md`. The agent's job in this phase is to surface the right things to the user — not to re-derive what SUMMARIZE_RUN already concluded.
-
-2. **Invoke `/figure-comparison`.** This is the rich validation surface — base64-embedded HTML side-by-sides for every paper artifact versus its reproduced counterpart. The skill prompts the user for any missing inputs (universe choice, paper-reference path) via its own `AskUserQuestion`. Land the HTML at `.lightcone/comparison.html` and surface the path to the user.
-
-3. **Offer `/check-sentence-by-sentence`.** Ask the user via `AskUserQuestion`:
-
-   > *"Run sentence-by-sentence audit of the paper against the code? (Slow but catches claims that drifted between paper and reproduction.)"*
-
-   On yes: invoke `/check-sentence-by-sentence`. The skill prompts for paper-source path (arXiv TeX preferred, Docling markdown fallback). It produces a per-sentence `file:line` or `NOT FOUND` audit. Surface the audit path and any `NOT FOUND` clusters that suggest missing implementation.
-
-4. **Walk `open-questions.md` with the user.** For every unresolved entry, surface via `AskUserQuestion`:
-
-   > *"Open question: <question text>. The loop's best-judgment default was <default>. Accept, override, or defer?"*
-
-   - **Accept**: mark resolved with the default; record the resolution in the entry.
-   - **Override**: take the user's choice; update `astra.yaml` (decision options, baseline universe) or `implementation-notes.md` accordingly. Re-run `astra validate` if the spec changed.
-   - **Defer**: leave the entry but mark it `deferred: <reason>` so it's clearly not forgotten.
-
-   Some questions surface a real gap (a target wasn't reproduced, a method differs in a way that matters). When the gap is material, ask whether to re-enter the loop for another COMPARE → IMPLEMENT pass. The user owns that call.
-
-5. **Rewrite the constitution `outcome:`.** The SUMMARIZE_RUN sub-agent prepared a draft outcome; refine it with what FINAL_REVIEW surfaced — accepted partials, deferred questions, the `/figure-comparison` HTML path, the audit path if run. The outcome should teach: someone reading it should understand what the reproduction landed and where the rough edges are without opening the body.
-
-6. **Final commit.** Stage `.lightcone/comparison.html`, the audit (if run), the resolved `open-questions.md`, any `astra.yaml` / `implementation-notes.md` edits, and the constitution outcome:
-
-   ```
-   final_review: <paper-short-name> — N questions resolved, comparison.html rendered[, sentence audit completed]
-   ```
-
-7. **Surface closure to the user.** The constitution is now in shape for `status: closed`. Do not flip it from this phase — surface that it's ready, the user closes.
-
-## Survey signals (entry into FINAL_REVIEW)
-
-- `comparison-report.yaml` verdict is `pass` (or user-accepted `partial`) **and** `REPRODUCTION-SUMMARY.md` exists ⇒ ready to enter FINAL_REVIEW
-- `.lightcone/comparison.html` exists, `open-questions.md` entries are all resolved or explicitly deferred, constitution `outcome:` reflects the post-review state ⇒ FINAL_REVIEW done
-
-## Notes
-
-- **This phase is not a sub-agent.** `/figure-comparison` and `/check-sentence-by-sentence` both have `AskUserQuestion` in their `allowed-tools`; spawning them under the `Task` tool would fire prompts into nothing. FINAL_REVIEW runs in the main loop session so the prompts land. The constitution lists FINAL_REVIEW as `interactive` for the same reason.
-- **Don't relitigate SUMMARIZE_RUN.** The final report is the user's reading surface for "what landed." FINAL_REVIEW's job is the rich validation pass and the open-question ratification — not regenerating prose the sub-agent already produced cleanly.
-- **`/figure-comparison` is mandatory; `/check-sentence-by-sentence` is opt-in.** The HTML side-by-side is cheap and high-signal; the sentence audit is slower and pays off most when the user has fidelity concerns. Default opt-in question: no.
-- **The user holds closure.** This phase prepares the outcome and surfaces "ready"; flipping `status: closed` is the user's call after they're satisfied with what FINAL_REVIEW surfaced.
diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/paper2astra/references/interview.md
index 9493ffa5..4984685a 100644
--- a/claude/lightcone/skills/paper2astra/references/interview.md
+++ b/claude/lightcone/skills/paper2astra/references/interview.md
@@ -33,8 +33,8 @@ Use `AskUserQuestion` if the user did not supply enough on `/paper2astra` invoca
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
 - **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) **If code is available, every implementing iteration will read from `work/reference/code/`** and treat code as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md).
-- **User's prior familiarity.** Has the user reproduced this paper before? Read the paper recently? Worked with the original authors? This affects how much of the SUMMARIZE / EXTRACT_TARGETS work needs human ratification.
-- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; SUMMARIZE will read it.
+- **User's prior familiarity.** Has the user reproduced this paper before? Read the paper recently? Worked with the original authors? This affects how much of the STUDY / SPECIFY work needs human ratification.
+- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; STUDY will read it.
 
 ### 2. Scope the reproduction
 
@@ -46,7 +46,7 @@ Ask:
 - **Specific decisions of interest.** A paper makes many choices. The user may care most about a few — e.g. "I want the BAO fit to use a different damping prior than the paper." These become first-class decisions in the spec, with the alternative preserved as a sibling option.
 - **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; SPECIFY will mirror the structure. If the paper is monolithic, one analysis suffices.
 
-These answers live in the constitution's **Desired State** section.
+These answers live in the constitution's **Desired State** section. There is no separate target-extraction phase — the targets the user names here become explicit `outputs:`, `findings:`, `inputs:`, and `decisions:` in `astra.yaml` during SPECIFY.
 
 ### 3. Pick a runtime mode
 
@@ -68,20 +68,21 @@ If tmux isn't installed, only (1) and (2) appear in the question. The chosen mod
 
 Ask:
 
-- **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights.
-- **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens.
+- **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights. REVIEW skips or runs once; IMPLEMENT does no extra review iterations.
+- **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens. REVIEW runs N rounds — each round a fresh sub-agent reads `astra.yaml` against paper + code, fixes are incorporated, then a *fresh* sub-agent re-reviews; iterate until two consecutive rounds find no fixes. IMPLEMENT does the same shape after writing recipes.
 
-Default to strong for fidelity-critical reproductions; weak when the user wants to cap token spend. The choice goes into the per-paper constitution.
+Default to strong for fidelity-critical reproductions; weak when the user wants to cap token spend. The choice goes into the per-paper constitution and is read by both REVIEW and IMPLEMENT.
 
 ### 5. Choose interactive vs sub-agent per phase
 
 Read the "Per-phase mode" table in `../SKILL.md`. The defaults are reasonable. Walk the user through it briefly:
 
-- **Phases that are always interactive (defaults you should not flip):** SPECIFY, COMPARE. These are the ratification seams; the user has to be reachable.
-- **Phases that are always sub-agent (defaults you should not flip):** SUMMARIZE, LITERATURE, SUMMARIZE_RUN. These benefit from parallel fresh-context runs and have no decisions left.
-- **Phases the user chooses:** ACQUIRE, PARSE, EXTRACT_TARGETS, REVIEW, IMPLEMENT, RUN. These may want user attention if the paper is unfamiliar or the user has strong opinions about implementation.
+- **The two bookends are always interactive:** INTERVIEW (now) and SUMMARIZE_RUN (the close-out). These are the only mandatory user-reach phases — every other phase is the user's call.
+- **Phases whose defaults are sub-agent (parallel fresh context fits the work):** STUDY (parallelized by paper-section + matching code), LITERATURE (one sub-agent per cited paper), REVIEW (rigor-dialed; fresh-context reviewers per round), IMPLEMENT (recipe-writing parallelized by output where feasible, with rigor-dialed review iterations after).
+- **Phases whose default is interactive:** SPECIFY (material paper-vs-code conflicts and target-formalization want ratification).
+- **Phases the user genuinely chooses:** ACQUIRE, RUN, COMPARE. These can run either way without losing the surface that matters most.
 
-If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table. Phases marked sub-agent that hit a question they'd normally surface to the user **append the question to `<paper-slug>/open-questions.md`** rather than blocking; the user reads the running report at session boundaries.
+If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table. Phases marked sub-agent that hit a question they'd normally surface to the user **append the question to `<paper-slug>/open-questions.md`** rather than blocking; the user resolves them in SUMMARIZE_RUN.
 
 ### 6. Draft the constitution and CLAUDE.md
 
@@ -124,35 +125,34 @@ The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or the attempt b
 
 ## Per-phase mode
 
-| Phase | Mode |
-|---|---|
-| ACQUIRE | <per user> |
-| PARSE | <per user> |
-| SUMMARIZE | sub-agent |
-| EXTRACT_TARGETS | <per user> |
-| LITERATURE | sub-agent |
-| SPECIFY | interactive |
-| REVIEW | <per user> |
-| IMPLEMENT | <per user> |
-| RUN | <per user> |
-| COMPARE | interactive |
-| SUMMARIZE_RUN | sub-agent |
-| FINAL_REVIEW | interactive |
+| # | Phase | Mode |
+|---|---|---|
+| 0 | INTERVIEW | interactive (always) |
+| 1 | ACQUIRE | <per user> |
+| 2 | STUDY | sub-agent (parallel by paper-section) |
+| 3 | LITERATURE | sub-agent |
+| 4 | SPECIFY | interactive |
+| 5 | REVIEW | sub-agent (rigor-dialed) |
+| 6 | IMPLEMENT | sub-agent (rigor-dialed review iterations) |
+| 7 | RUN | <per user> |
+| 8 | COMPARE | <per user> |
+| 9 | SUMMARIZE_RUN | interactive (always) |
 
 ## Evidence
 
-- `ls work/reference/document.md` — ACQUIRE + PARSE done
+- `ls work/reference/source/ || ls work/reference/document.md` — ACQUIRE done (arxiv-LaTeX tarball or Docling fallback)
 - `ls work/reference/code/` — original code present (canonical reference)
-- `ls work/notes/methodology.md` — SUMMARIZE done
-- `ls targets/targets.md` — EXTRACT_TARGETS done
-- `ls astra.yaml && astra validate astra.yaml` — SPECIFY done and valid
+- `ls work/notes/study/*.md && ls work/notes/methodology.md` — STUDY done (per-section paper-vs-code agreement-check + consolidated methodology)
+- `ls work/notes/literature.yaml` — LITERATURE done
+- `ls astra.yaml && astra validate astra.yaml && ls targets/targets.md && ls implementation-notes.md` — SPECIFY done (target-formalization included)
 - `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs
 - `ls comparison-report.yaml && yq '.verdict' comparison-report.yaml` — most-recent COMPARE verdict
+- `ls REPRODUCTION-SUMMARY.md && ls .lightcone/comparison.html` — SUMMARIZE_RUN done
 - `git log --oneline` — chronological view of phase commits
 
 ## Open Questions
 
-(empty — populated as the loop runs; questions accrete in `<paper-slug>/open-questions.md`, the running report the user reads at session boundaries and ratifies in FINAL_REVIEW.)
+(empty — populated as the loop runs; questions accrete in `<paper-slug>/open-questions.md`, the running report the user resolves in SUMMARIZE_RUN before the constitution closes.)
 ```
 
 Then author the per-paper `<paper-slug>/CLAUDE.md` from the same conversation. The CLAUDE.md is *info and rules*, not desired state — paper identity, where things live, disciplines that always apply. Approximate shape:
@@ -172,12 +172,12 @@ Reproduction of <paper title> (<arXiv ID>). DOI: <doi>.
 
 - Workdir layout follows Paper2ASTRA conventions: `work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`.
 - The constitution (desired state, runtime mode, scope, evidence, per-phase mode) lives at `<constitution>.md` in this directory.
-- The during-loop questions log lives at `open-questions.md`. The user reviews it in FINAL_REVIEW.
+- The during-loop questions log lives at `open-questions.md`. The user reviews it in SUMMARIZE_RUN.
 
 ## Rules
 
 - **Code-as-canonical when `work/reference/code/` exists.** Every implementing iteration reads relevant code. Where paper and code disagree, code is canonical for numerics, plotting, and method.
-- **Never block on `AskUserQuestion` mid-sub-agent.** When a sub-agent or loop phase would surface a question to the user, append it to `open-questions.md` and continue with the best-judgment default. The user resolves in FINAL_REVIEW.
+- **Never block on `AskUserQuestion` mid-sub-agent.** When a sub-agent or loop phase would surface a question to the user, append it to `open-questions.md` and continue with the best-judgment default. The user resolves in SUMMARIZE_RUN.
 - **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
 - **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
 
diff --git a/claude/lightcone/skills/paper2astra/references/parse.md b/claude/lightcone/skills/paper2astra/references/parse.md
deleted file mode 100644
index b14c2d78..00000000
--- a/claude/lightcone/skills/paper2astra/references/parse.md
+++ /dev/null
@@ -1,79 +0,0 @@
-# PARSE — structure the paper
-
-Turn the acquired paper into structured artifacts the rest of the pipeline can consume: markdown text, individual figures, individual tables, and a metadata index. This is mostly a deterministic pre-processing step.
-
-The constitution's per-phase mode controls interactive vs sub-agent. Default is sub-agent.
-
-## Inputs
-
-- `work/reference/source/` — arXiv LaTeX source tree (Path A from ACQUIRE), or
-- `work/reference/paper.pdf` — PDF (Path B fallback)
-
-## Outputs
-
-- `work/reference/document.md` — paper as markdown
-- `work/reference/figures/` — extracted figures (PNG / PDF / vector)
-- `work/reference/tables/` — extracted tables (CSV when machine-readable, MD otherwise)
-- `work/reference/metadata.json` — index of figures and tables with captions and page numbers
-
-## Path A — arXiv LaTeX source (when `work/reference/source/` exists)
-
-The LaTeX source is already structured — sections are `\section{}`, equations are TeX, figures cite their files by name, tables are `tabular` environments. Convert to markdown while preserving equation TeX:
-
-```bash
-# Find the main file (usually has \documentclass at the top)
-grep -l '\\documentclass' work/reference/source/*.tex
-
-# Convert with pandoc, preserving math and structure
-pandoc -f latex -t markdown -o work/reference/document.md work/reference/source/<main>.tex
-```
-
-Adjust pandoc invocation if the main file uses `\input{}` heavily — pandoc resolves them when run from the right cwd. Verify the output by reading the first ~200 lines and checking the section structure looks sensible.
-
-Extract figure files from the source tree into `work/reference/figures/`:
-
-```bash
-mkdir -p work/reference/figures
-# Copy referenced figure files; common extensions are .pdf .png .eps .jpg
-find work/reference/source -type f \( -name "*.pdf" -o -name "*.png" -o -name "*.eps" -o -name "*.jpg" \) \
-    -not -path "*/aux/*" -exec cp {} work/reference/figures/ \;
-```
-
-For tables, the LaTeX `tabular` blocks remain as TeX inside the rendered markdown. If a downstream phase needs them as CSV, extract them on demand.
-
-Build `work/reference/metadata.json` — index of figures and tables. The structure:
-
-```json
-{
-  "figures": [
-    {"id": "fig1", "caption": "...", "file": "figures/fig1.pdf", "label": "fig:bao"}
-  ],
-  "tables": [
-    {"id": "tab1", "caption": "...", "file": "tables/tab1.csv", "label": "tab:results"}
-  ]
-}
-```
-
-The `label` field is the LaTeX `\label{}` so SPECIFY's anchor work and EXTRACT_TARGETS' selection can both reference the same artifact.
-
-## Path B — PDF fallback (when `work/reference/source/` does not exist)
-
-Use Docling — the lightcone-cli stack ships its CLI:
-
-```bash
-# Run Docling against the PDF; outputs into work/reference/
-docling --output work/reference work/reference/paper.pdf
-```
-
-Docling produces `document.md`, `figures/`, `tables/`, and `metadata.json` with the same shape Path A produces.
-
-If Docling fails, the PDF may be corrupt — re-run ACQUIRE's download step before giving up.
-
-## Survey signals (entry into PARSE)
-
-If `work/reference/document.md` exists and `work/reference/metadata.json` exists, PARSE is done — proceed to SUMMARIZE.
-
-## Notes
-
-- **Path A is preferred whenever arXiv source was acquired.** PDF + Docling is the fallback for non-arXiv papers, not the default. The bundle's design philosophy is that math, ligatures, and caption fidelity are easier from LaTeX source than from re-extracted PDF text.
-- **Equation numbers and section numbers must match the rendered paper.** Whether you use Path A or Path B, downstream phases (SPECIFY's evidence quotes, COMPARE's references) cite "eq. N" or "§N" by the printed number. Verify by spot-checking against the PDF.
diff --git a/claude/lightcone/skills/paper2astra/references/specify.md b/claude/lightcone/skills/paper2astra/references/specify.md
index 0eafea0a..0819646f 100644
--- a/claude/lightcone/skills/paper2astra/references/specify.md
+++ b/claude/lightcone/skills/paper2astra/references/specify.md
@@ -1,25 +1,26 @@
-# SPECIFY — author the ASTRA spec
+# SPECIFY — author the ASTRA spec (and formalize the targets)
 
-Read the paper and accumulated notes; produce the structured ASTRA spec, the baseline universe, and the implementation notes. SPECIFY is the **first mandatory user-ratification seam** — material paper-vs-code conflicts surface here and require user input.
+Read the paper and accumulated notes; produce the structured ASTRA spec, the baseline universe, the implementation notes, and the small target ledger COMPARE consumes. SPECIFY is the **first user-ratification seam** — material paper-vs-code conflicts surface here, target-formalization happens here, and the default mode is interactive so the user can ratify.
 
-The constitution's per-phase mode is **always interactive** for this phase. The user must be reachable.
+The constitution's per-phase mode defaults to **interactive** for this phase, but the user can flip it. When SPECIFY runs as a sub-agent, it falls back to the canonical-resolution rule (code wins where paper and code disagree) and surfaces unresolved conflicts to `<paper-slug>/open-questions.md`.
 
 ## Inputs
 
-- `work/notes/methodology.md` — decision map, results inventory, data sources
-- `work/notes/code-analysis.md` (if present) — code structure, parameter values
-- `work/notes/literature.yaml` (if present) — prior insights with evidence quotes and decision links
-- `work/reference/document.md` — paper text (Grep into; do not re-read whole)
-- `work/reference/figures/`, `work/reference/tables/` — extracted artifacts
-- `work/reference/metadata.json` — figure / table index
-- `targets/targets.md` — selected replication targets
+- `work/notes/methodology.md` — consolidated decision map, results inventory, data sources (from STUDY)
+- `work/notes/study/<NN>-<slug>.md` — per-section paper-vs-code agreement-check files (the source of truth for evidence quotes and code locations; methodology.md points back to these)
+- `work/notes/literature.yaml` (if present) — prior insights with evidence quotes and decision links (from LITERATURE)
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
+- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (Path B only; Path A keeps figures inside the source tarball)
+- `work/reference/code/` (if present) — original code, canonical reference for numerics + method
+- The per-paper constitution — names the user's intended replication targets (figures, tables, numbers) in its **Desired State**; SPECIFY formalizes them
 - `work/notes/notes.md` — user-supplied context (read by every phase if present)
 
 ## Outputs
 
-1. **`astra.yaml`** — the full ASTRA specification
+1. **`astra.yaml`** — the full ASTRA specification, with every replication target placed into its appropriate ASTRA home (see "Target formalization" below)
 2. **`universes/baseline.yaml`** — exactly the paper's choices (where paper and code disagree, see "Material conflicts" below)
 3. **`implementation-notes.md`** — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
+4. **`targets/targets.md`** — small target ledger COMPARE consumes; for each target a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure/table/metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
 
 ## Substrate skills to invoke
 
@@ -37,14 +38,28 @@ Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for t
 
 If `work/notes/literature.yaml` exists, incorporate its `prior_insights` into `astra.yaml`. Use the `decision_links` mapping to attach each insight to the relevant decision options, so the multiverse captures evidence-backed alternative choices from the literature.
 
-## Target coverage
+## Target formalization
 
-Targets are coverage obligations, not necessarily outputs. Map each target to the right ASTRA home:
+There is no separate target-extraction phase — the targets the user named in INTERVIEW (recorded in the constitution's **Desired State**) get formalized into `astra.yaml`'s structure here. The work has two layers:
+
+**Layer 1 — place each target into its ASTRA home.** Targets are coverage obligations, not necessarily outputs. Map each target to the right ASTRA home:
 
 - **Figures, tables, equations-as-artifacts, generated data products** → `outputs`
 - **Paper-level claims and quantitative results** → `findings` with source-anchored evidence
 - **Constants and configuration values** → `inputs`, `decisions`, `universes/baseline.yaml`
 
+The methodology.md "Results inventory" already split primary vs secondary; use that split to set priorities. For each result in the inventory, find the corresponding figure / table / in-text metric (Path A: `\label{}` in the source; Path B: `metadata.json` index) and place it. Read the per-section files in `work/notes/study/` for the verbatim claim quotes — those become the `findings` evidence.
+
+**Layer 2 — write `targets/targets.md` as a small ledger for COMPARE.** Only an index, not a derivation of the spec; the depth lives in `astra.yaml`. For each target, a brief entry:
+
+- What it is (one line); the reference file's path (relative to `targets/` when the file is copied into `targets/`, or pointing at `work/reference/figures/...` when not)
+- Type: `metric` | `figure` | `table`
+- Priority: `primary` | `secondary` (from the methodology.md split)
+- Expected value / trend (paper-side); how to judge a match (numerical tolerance for metrics; shape / axis ranges / key features for figures; specific values for tables)
+- Spec home: which `outputs:` entry in `astra.yaml` this target maps to, so COMPARE can find the reproduced result at `results/<universe>/<output_id>/`
+
+Copy reference figure / table files from `work/reference/` into `targets/` so COMPARE has a self-contained reference set. For Path A, files are in `work/reference/source/` (extract by `\includegraphics{}` filename); for Path B, in `work/reference/figures/` / `work/reference/tables/`.
+
 Out-of-scope targets stay in `targets/targets.md` with an explicit reason and should not be forced into the spec. Keep the target ledger's "spec home" pointers specific enough that a later reviewer can tell which claim was discharged where.
 
 ---
@@ -99,12 +114,14 @@ When sub-analyses exist, the root narrative MUST include a top-down end-to-end d
 
 ## Survey signals (entry into SPECIFY)
 
-- `work/notes/methodology.md` exists; `targets/targets.md` exists ⇒ ready to specify
+- `work/notes/methodology.md` exists ⇒ ready to specify
 - `astra.yaml` exists; `astra validate astra.yaml` returns clean ⇒ structural SPECIFY done
+- `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-formalization done
 - `implementation-notes.md` exists ⇒ practical-guidance side done
-- Both ⇒ SPECIFY complete; proceed to REVIEW
+- All four ⇒ SPECIFY complete; proceed to REVIEW
 
 ## Notes
 
-- **Material conflicts that the user explicitly defers** are appended to `<paper-slug>/open-questions.md` (the running report read at session boundaries). The next iteration sees them and either re-surfaces them or notes their continued deferral.
+- **Material conflicts that the user explicitly defers** are appended to `<paper-slug>/open-questions.md` (the running report read at session boundaries). The next iteration sees them and either re-surfaces them or notes their continued deferral; the user resolves at SUMMARIZE_RUN.
 - **The narrative skill is the prose author, not the structure author.** SPECIFY's job is structural correctness; `/narrative` invocation comes after the structural skeleton exists.
+- **The target ledger is a derivation, not a separate phase's output.** Treat `targets/targets.md` as a small index produced alongside `astra.yaml`, not a heavyweight artifact. The depth lives in `astra.yaml`'s `outputs:` / `findings:` / `decisions:` and in the per-section study files.
diff --git a/claude/lightcone/skills/paper2astra/references/study.md b/claude/lightcone/skills/paper2astra/references/study.md
new file mode 100644
index 00000000..c1c30ccc
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/study.md
@@ -0,0 +1,229 @@
+# STUDY — section-parallel paper-vs-code agreement check
+
+Read the parsed paper and the reference code together — by section, with sub-agents fanning out across the paper's structure — and produce a cross-referenced agreement check that the rest of the pipeline consumes. STUDY is paper2astra's load-bearing read phase: its value isn't "summarize the paper" but **measure the level of agreement between paper and code at the section level**.
+
+This phase replaces the old SUMMARIZE. The old shape parallelized one sub-agent on the paper and another on the code; that loses the cross-reference, since "the whole paper" and "the whole code" are too much context for one agent to compare meaningfully. The new shape parallelizes by **paper section + matching code**, so each sub-agent carries enough context to surface disagreements at its own level.
+
+The constitution's per-phase mode is **always sub-agent (parallel by paper-section)** for this phase. Spawn one Task-tool sub-agent per paper section. After they finish, spawn a single synthesis sub-agent.
+
+## Inputs
+
+- `work/reference/source/` (Path A — arXiv LaTeX) **or** `work/reference/document.md` + `work/reference/figures/` + `work/reference/tables/` + `work/reference/metadata.json` (Path B — Docling)
+- `work/reference/code/` — the reference code repo (when present)
+- `work/notes/notes.md` — user-supplied prior notes, if any (read by every phase if present)
+
+## Outputs
+
+- `work/notes/study/<NN>-<section-slug>.md` — one file per paper section, with the cross-referenced agreement check
+- `work/notes/methodology.md` — consolidated decision map, results inventory, data sources (derived from the per-section files; what SPECIFY consumes)
+- `work/notes/cited_papers.yaml` — citations worth following up on for prior insights (what LITERATURE consumes)
+
+## Step 1: Identify paper sections and their matching code
+
+Before fanning out, the orchestrating call (the loop iteration that enters STUDY, before spawning sub-agents) does a quick survey:
+
+1. **List the paper's sections.** Path A: `grep -E '^\\section\{' work/reference/source/*.tex`. Path B: read the `##` headings in `work/reference/document.md`. Skip front-matter (abstract, acknowledgments, author list) and back-matter (references, supplementary). Keep methods, results, and any analysis-bearing intro/discussion.
+2. **Locate matching code per section.** Two routes:
+   - **Code's own structure**: most analysis pipelines mirror the paper's flow (a `reconstruction/` module → reconstruction section, a `bao_fit.py` → BAO-fit section). Walk the code repo's top-level layout and infer the mapping.
+   - **Paper-side bibliography**: when the paper cites a specific module, function, or commit (e.g. "the fitting code at `https://github.com/.../bao_fit.py:42`"), record that.
+3. **Build a section→code map** as a small YAML file at `work/notes/study/section-map.yaml`:
+   ```yaml
+   sections:
+     - id: 01-data
+       paper_section: "Data and Sample Selection"
+       paper_anchor: "section:data"        # \label{} or markdown anchor
+       code_paths:
+         - work/reference/code/data/
+         - work/reference/code/scripts/load_catalog.py
+     - id: 02-methods
+       paper_section: "BAO Fitting Methodology"
+       paper_anchor: "section:methods"
+       code_paths:
+         - work/reference/code/bao_fit/
+       notes: "Paper §3 cites the fitting code in footnote 7."
+   ```
+
+   When a section has no obviously matching code (e.g. "Discussion"), record `code_paths: []` and let the section sub-agent flag claims that imply implementation but have no code anchor — those are signal.
+
+This step matters because it sets the unit of work. A bad map (one sub-agent gets all the code, another gets none) loses the parallelism's value.
+
+## Step 2: Fan out — one sub-agent per section
+
+Spawn one Task-tool sub-agent per entry in `section-map.yaml`. Each sub-agent gets:
+
+- The paper-section reference (the `.tex` file path + `\section{}` anchor for Path A; the `document.md` file + heading anchor for Path B)
+- The list of code paths from the section map
+- The decision-map context structure (so claims and code locations align with what SPECIFY will need)
+
+### Per-section sub-agent — system prompt
+
+> You are a paper-vs-code agreement-check agent for one section of a research paper. Your job is to read the paper section *together with* its matching code and produce a cross-referenced agreement assessment.
+>
+> ### Inputs
+>
+> - Paper section: `<path-to-tex-or-md>` anchored at `<section-anchor>`. Read this section in full — it is bounded; do not stray into other sections.
+> - Code paths: `<list of paths>`. Read each in full; for directories, read the entry-points and follow imports as needed. **Do NOT modify any code.**
+>
+> ### What to extract
+>
+> For each material claim or choice in the paper section, locate its implementation in the code (or note its absence) and record an agreement assessment.
+>
+> A "claim" is anything where a different choice would plausibly change a numerical result the paper reports — methods, parameters, data cuts, calibrations, statistical approaches, hyperparameters, software versions.
+>
+> A "code location" is a `file:line` reference (or `file:line-line` range) to where the code implements (or fails to implement) that claim.
+>
+> An "agreement level" is one of:
+>
+> - `matches`: paper says X, code does X. Cite the line(s); brief one-line note.
+> - `minor-deviation`: paper and code differ in a way that does not change the numerical result (e.g. variable named differently, equivalent algorithm, refactored computation). Cite both, name the equivalence.
+> - `material-disagreement`: paper and code differ in a way that plausibly changes a numerical result. Cite both verbatim. **Surface this prominently** — these are SPECIFY's seams.
+> - `paper-only`: paper claims something the code does not implement. May indicate a methodological description not yet realized in the available code.
+> - `code-only`: code does something the paper does not describe. Often a critical detail the paper compressed; flag it.
+>
+> ### Output format — `work/notes/study/<id>-<slug>.md`
+>
+> ```markdown
+> # Study: <Section title>
+>
+> Paper anchor: `<section-anchor>` in `<paper-source-path>`.
+> Code paths: <list>.
+>
+> ## Agreement table
+>
+> | Claim | Paper | Code | Agreement | Notes |
+> |---|---|---|---|---|
+> | <one-line claim> | §X.Y "<short quote>" | `path:line` | matches \| minor-deviation \| material-disagreement \| paper-only \| code-only | <one-line gloss> |
+>
+> ## Material disagreements
+>
+> For every `material-disagreement` row, expand here:
+>
+> ### <Claim>
+>
+> - **Paper says** (quote): "..." (page N, eq. M)
+> - **Code does** (quote): `path:line-line`:
+>   ```python
+>   <code excerpt>
+>   ```
+> - **Why it matters** (one-line plausible-impact): <e.g. "changes the BAO peak amplitude by ~5%">
+> - **Default per canonical-resolution rule**: <code | paper> — applied if SPECIFY runs sub-agent.
+>
+> ## Decisions surfaced
+>
+> Bullet list of choices in this section that should become first-class decisions in `astra.yaml`. Group by "what" + "why" + "alternatives" (mirroring the methodology.md decision-map shape).
+>
+> ## Cited papers worth following up
+>
+> List citations from this section that informed a decision (not general background). DOI when resolvable + one-line on why.
+>
+> ## Data sources (this section)
+>
+> Any external dataset, catalog, or archive this section's analysis consumes. For each: name + version, exact acquisition path (URL / query / package name), selection criteria.
+>
+> ## Open questions
+>
+> Anything ambiguous, missing, or contradictory that this section couldn't resolve from paper + code alone. Append to `<paper-slug>/open-questions.md` from outside the sub-agent (the orchestrator does this; sub-agents append silently to this section).
+> ```
+>
+> ### Style
+>
+> Be concise but precise. Use bullets and tables. Quote the paper verbatim and cite `path:line` for the code. Do NOT pad with background.
+>
+> ### Rules
+>
+> - **Stay in your section.** Cross-references to other sections are notes, not extractions. If a section's claim depends on a definition from another section, note the dependency and continue.
+> - **Quote, don't paraphrase**, when surfacing a paper-vs-code disagreement. SPECIFY needs the verbatim claim to author evidence-quote-backed findings.
+> - **Code-as-canonical when both exist.** Where paper and code disagree, the code wins for numerics + method (the canonical-resolution rule). Record both, mark the agreement level as `material-disagreement`, surface the disagreement.
+> - **Never block on `AskUserQuestion`.** You're a sub-agent; the user is not in this conversation. Append to the section's `## Open questions` block instead.
+
+## Step 3: Synthesize — single sub-agent merges into methodology.md and cited_papers.yaml
+
+Spawn one synthesis sub-agent that reads all `work/notes/study/<id>-<slug>.md` files and writes:
+
+- `work/notes/methodology.md` — consolidated decision map (every "Decisions surfaced" entry merged across sections), results inventory (split into primary / secondary), data sources (every "Data sources" entry merged).
+- `work/notes/cited_papers.yaml` — every "Cited papers worth following up" entry merged and de-duplicated.
+
+### Synthesis sub-agent — system prompt
+
+> You are a research-paper synthesis agent. Read every per-section file in `work/notes/study/` and merge them into a single `work/notes/methodology.md` and `work/notes/cited_papers.yaml`.
+>
+> ### Task
+>
+> 1. Read every `work/notes/study/<id>-<slug>.md` file (skip `section-map.yaml`).
+> 2. Build `work/notes/methodology.md` with three sections:
+>    - **Decision map**: every "Decisions surfaced" entry across all sections, grouped by pipeline stage. For each decision: what was chosen, why (cite the section + paper-citation), alternatives mentioned, and any *material-disagreement* with the code (cite the section's `Material disagreements` block).
+>    - **Results inventory**: every primary and secondary result the paper reports, grouped primary/secondary, with which decisions feed into each.
+>    - **Data sources**: every external dataset across sections, with name + version, acquisition path, selection criteria, format. **This section is critical** — IMPLEMENT will use it to write data download scripts. If acquisition is vague, flag it.
+> 3. Build `work/notes/cited_papers.yaml` from the de-duplicated cited-papers entries:
+>
+>    ```yaml
+>    papers:
+>      - doi: "10.xxxx/yyyy"
+>        citation: "Smith et al. (2020)"
+>        relevance: "One-line description of why this paper matters for replication"
+>    ```
+>
+> ### Style
+>
+> Cross-reference back to the per-section files (`see work/notes/study/03-bao-fit.md`) for the verbatim quotes and code locations. methodology.md is the consolidated view; the per-section files are the source of truth for evidence.
+>
+> ### Output skeleton — `work/notes/methodology.md`
+>
+> ```markdown
+> # Methodology — consolidated study
+>
+> ## Decision map
+>
+> ### <Pipeline stage>
+>
+> - **<Decision name>**
+>   - **What**: <chosen value/method>
+>   - **Why**: <citation, e.g. "Smith+2020">. Section: `work/notes/study/<NN>-<slug>.md`
+>   - **Alternatives**: <list>
+>   - **Code agreement**: matches | minor-deviation | material-disagreement (see `work/notes/study/<NN>-<slug>.md#material-disagreements`)
+>
+> ## Results inventory
+>
+> ### Primary
+> - <result> — feeds from <decisions>; expected: <values>; section: `work/notes/study/<NN>-<slug>.md`
+>
+> ### Secondary
+> - <result> — feeds from <decisions>; expected: <values>; section: `work/notes/study/<NN>-<slug>.md`
+>
+> ## Data sources
+>
+> - **<Dataset name + version>**
+>   - Obtain: <URL / query / package>
+>   - Selection: <cuts>
+>   - Format: <columns/fields>
+>   - Used in: <list of sections>
+> ```
+>
+> ### Rules
+>
+> - Preserve paper citations exactly as they appear in the source per-section files.
+> - Do NOT introduce decisions that aren't in any per-section file's "Decisions surfaced" block.
+> - When two sections name the same decision, merge — do not duplicate.
+
+## Step 4: Append open questions to the running report
+
+After the per-section sub-agents finish (and before the synthesis runs), the orchestrator scans each `work/notes/study/<id>-<slug>.md` for `## Open questions` entries and appends them to `<paper-slug>/open-questions.md` with the section as origin. The user resolves these in SUMMARIZE_RUN.
+
+## Survey signals (entry into STUDY)
+
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) exists ⇒ ready to study
+- `work/notes/study/section-map.yaml` exists ⇒ section identification done
+- Every section in `section-map.yaml` has a corresponding `work/notes/study/<id>-<slug>.md` ⇒ per-section pass done
+- `work/notes/methodology.md` and `work/notes/cited_papers.yaml` exist ⇒ STUDY done; proceed to LITERATURE
+
+## Notes
+
+- **Run the section sub-agents in parallel.** They're fully independent (each reads its own paper section + code paths). The synthesis sub-agent runs once, after all per-section files exist.
+- **The agreement check is the value.** A study that reads only the paper or only the code is a regression to the old SUMMARIZE — do not allow a section sub-agent to skip the code (or vice versa) unless that section genuinely has no matching code (and that absence itself is information, recorded as `paper-only` rows).
+- **methodology.md is the door, not the source of truth.** SPECIFY drills back into the per-section files via the `see work/notes/study/...` pointers when authoring evidence-quote-backed findings. Do not bloat methodology.md with verbatim quotes; keep it as the consolidated view and let the per-section files carry the evidence.
+- **Section granularity earns separate insights.** When a section's analysis builds on a method defined in another section, file the agreement check for the *defining* section there and note the dependency in the using section. Do not collapse all the borrowed pieces into the application section's row.
+- **Resume is automatic.** If a per-section file already exists, the orchestrator skips its sub-agent. The synthesis re-runs whenever the set of per-section files changes.
+
+## Output format — open question
+
+The constitution flags whether `astra.yaml`'s `prior_insights` shape can absorb STUDY's per-section output directly. The current answer is **no**: `prior_insights` is for *cited* papers' findings supporting the *target* paper's decisions; STUDY's output is the *target paper's own claims* checked against *its own code*. The natural ASTRA homes for STUDY's output are downstream, in SPECIFY: paper-claim quotes become `findings` evidence in `astra.yaml`; code locations become decision-option metadata or implementation-notes. The per-section files stay as the source of truth; methodology.md is the consolidated derivation. Revisit if the spec gains a structure for "paper-vs-code agreement-check evidence" as a first-class entity.
diff --git a/claude/lightcone/skills/paper2astra/references/summarize.md b/claude/lightcone/skills/paper2astra/references/summarize.md
deleted file mode 100644
index 6df42b52..00000000
--- a/claude/lightcone/skills/paper2astra/references/summarize.md
+++ /dev/null
@@ -1,120 +0,0 @@
-# SUMMARIZE — extract methodology, decisions, and results inventory
-
-Read the parsed paper and (in parallel, when present) the reference code, and extract everything the SPECIFY phase will need to author `astra.yaml`. The substance lives in `work/notes/methodology.md`, `work/notes/cited_papers.yaml`, and (when code exists) `work/notes/code-analysis.md`.
-
-The constitution's per-phase mode is **always sub-agent** for this phase. Spawn one Task-tool sub-agent for the paper analysis and (in parallel) a separate sub-agent for the code analysis if `work/reference/code/` exists. Each sub-agent gets fresh context and writes one file.
-
-## Inputs
-
-- `work/reference/document.md` — paper as markdown (from PARSE)
-- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json`
-- `work/reference/code/` — code repo, if cloned
-
-## Outputs
-
-- `work/notes/methodology.md` — decision map + results inventory + data sources
-- `work/notes/cited_papers.yaml` — papers worth following up on for prior insights
-- `work/notes/code-analysis.md` — code structure (only when `work/reference/code/` exists)
-
----
-
-## Paper sub-agent — system prompt
-
-> You are a research paper analysis agent. Your job is to read a parsed paper and extract everything needed to reproduce the analysis.
->
-> ### Approach
->
-> Read `work/reference/document.md` **section by section** — do not try to read the entire file at once. Start by scanning the headers to understand the structure, then work through each section in order.
->
-> **Write as you go.** After reading each section, immediately update `work/notes/methodology.md` and `work/notes/cited_papers.yaml` with what you learned. Do not wait until the end — build the outputs incrementally. This ensures partial progress is saved and forces you to consolidate your understanding at each step.
->
-> Skip acknowledgments and author affiliations. Do read the references section — you will need it to resolve citations to DOIs.
->
-> ### What to extract
->
-> As you read each section, look for:
->
-> - **Data sources** — every external dataset, catalog, survey, or archive the paper uses as input. For each one, record the exact name/version, where to obtain it (URL, database query, package name), and any selection criteria or quality cuts applied. This is critical — the implement phase must download real data, not generate synthetic substitutes.
-> - **Decisions** — every choice that shaped the analysis (methods, parameters, data cuts, calibrations, etc.) and *what informed each one* (a cited paper, a physical argument, an empirical finding, internal results from the paper).
-> - **Results** — numeric values, figures, tables; which are the paper's core claims vs. supporting/diagnostic outputs.
-> - **Key references** — cited papers that actually influenced methodology (not general background).
->
-> ### Output format — `work/notes/methodology.md`
->
-> #### Decision map (most important)
->
-> A complete list of every decision that shaped the analysis, grouped by pipeline stage. For each decision:
->
-> - **What** was chosen (the specific value, method, or approach)
-> - **Why** — what informed the choice: cite the specific paper, physical argument, or empirical finding. Use the citation as it appears in the text (e.g., "Freedman et al. 2020"). This is critical — decisions without traced justifications are much harder to reproduce.
-> - **Alternatives** — what else could have been chosen, if mentioned
->
-> #### Results inventory
->
-> List the paper's outputs, separated into:
->
-> - **Primary results** — the core claims; what you'd check to evaluate whether the work was reproduced. Flag which are most important.
-> - **Secondary results** — supporting/diagnostic outputs.
->
-> For each result, note which decisions feed into it and the expected values.
->
-> #### Data sources (critical)
->
-> For **every** external dataset the paper uses, document:
->
-> - **Name and version** (e.g., "OGLE-III SMC LPV catalog, Soszynski+2011")
-> - **How to obtain it** — exact URL, database query (with SQL if applicable), API endpoint, or package name. Be as specific as possible.
-> - **Selection criteria** — any spatial, magnitude, quality, or flag cuts applied to the raw data.
-> - **Format** — what columns/fields are used downstream.
->
-> This section is essential. The implement phase will use it to write data download scripts. If acquisition details are vague in the paper, flag this explicitly so the review phase can investigate further.
->
-> #### Additional context (brief)
->
-> - Software and dependencies — languages, libraries, versions mentioned.
->
-> ### Output format — `work/notes/cited_papers.yaml`
->
-> ```yaml
-> papers:
->   - doi: "10.xxxx/yyyy"
->     citation: "Smith et al. (2020)"
->     relevance: "One-line description of why this paper matters for replication"
-> ```
->
-> **Include** papers that: informed a methodological decision, provided a method or algorithm the paper builds on, contain calibration data or corrections the paper applies.
->
-> **Exclude** papers cited only for general background or final-result comparisons.
->
-> Only include papers whose DOI you can find in the references. Aim for 5–15 papers; quality over quantity.
->
-> ### Style
->
-> Be concise but precise. Use bullet points. Include exact numeric values and parameter choices. Do not pad with background or motivation — only include what is needed to reproduce the analysis.
-
-## Code sub-agent — system prompt (only when `work/reference/code/` exists)
-
-> You are a code exploration agent. Explore the repository at `work/reference/code/` and write up a detailed understanding of the codebase to `work/notes/code-analysis.md`.
->
-> ### What to produce
->
-> 1. **Architecture** — how the codebase is structured, what the main modules / scripts are, and how they relate to each other.
-> 2. **Execution flow** — where things are run from, in what order, and where to look for different stages of the analysis.
-> 3. **Key variables and parameters** — the main variables defined in the code, configuration values, and any decisions baked into the implementation.
-> 4. **Outputs** — what the code produces, where results are written, what format they take.
->
-> Be thorough — explore the file tree, read the main scripts, and trace the execution path. Focus on implementation decisions and parameter values that the paper might not mention.
->
-> Do NOT modify any code in the repository.
-
-## Survey signals (entry into SUMMARIZE)
-
-- `work/reference/document.md` exists ⇒ ready to summarize the paper
-- `work/notes/methodology.md` exists ⇒ paper sub-agent already ran
-- `work/reference/code/` exists ∧ `work/notes/code-analysis.md` does not ⇒ code sub-agent should run
-- Both `methodology.md` and (if code exists) `code-analysis.md` exist ⇒ SUMMARIZE done, proceed to EXTRACT_TARGETS
-
-## Notes
-
-- **Run the two sub-agents in parallel** when both apply. The paper agent and the code agent are fully independent; each writes one file.
-- The methodology notes are the substrate everything downstream consumes. SPECIFY reads them, REVIEW cross-checks them, IMPLEMENT writes scripts based on them. Their quality determines the rest.

From 12ce714185038447fdd3383388516cf6fd54bce8 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 02:22:03 +0200
Subject: [PATCH 013/124] skills/paper2astra: rigor-dial REVIEW + IMPLEMENT,
 fold FINAL_REVIEW back into SUMMARIZE_RUN
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Continues the phase redesign at
[[lightcone/paper2astra-as-skill/phase-redesign]] (started in 15c6dff).
The reference files now match the constitution's Desired State.

REVIEW (review.md) is rewritten from a fixed pre-implement sanity check
to a rigor-dialed fresh-context audit:
- Frugal: skip or one pass.
- Rigor: N rounds of fresh-context sub-agents until two consecutive
  rounds find no fixes (or a 5-round system cap).
- Each round's reviewer is fresh — no view of prior rounds' findings or
  fixes, only paper + code + spec. Pattern-matching on prior fixes
  defeats the discipline.
- Reviewer outputs findings only; SPECIFY (or the orchestrator inline
  for trivial fixes) edits the spec between rounds.

IMPLEMENT (implement.md) gets the same shape after the first-pass
write — paper-vs-implementation review by fresh sub-agents, frugal vs
rigor dialing iteration count. Fix passes are a separate sub-agent
between rounds. Independent of the COMPARE → IMPLEMENT retry loop
(which is post-RUN).

SUMMARIZE_RUN (summarize_run.md) is now the always-interactive
close-out — folds back what was briefly FINAL_REVIEW. Renders
/figure-comparison (mandatory) and /check-sentence-by-sentence
(opt-in) in the main session, walks open-questions.md with
AskUserQuestion, lands resolutions, drafts REPRODUCTION-SUMMARY.md,
finalizes the constitution outcome.

COMPARE (compare.md): per-phase mode is user choice (defaults to
interactive), not always-interactive. Pass-verdict chain reads
COMPARE → SUMMARIZE_RUN (close-out); FINAL_REVIEW references gone.

literature.md: cited_papers.yaml from STUDY, not SUMMARIZE; paper text
reference covers Path A (source/) and Path B (document.md).

specify.md: SUMMARIZE → STUDY in the work-from-notes guidance.

skills/README.md: paper2astra row mentions the 10-phase shape and the
INTERVIEW/SUMMARIZE_RUN bookends; figure-comparison and
check-sentence-by-sentence rows reference SUMMARIZE_RUN, not
FINAL_REVIEW.

Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/README.md             |   6 +-
 .../skills/paper2astra/references/compare.md  |  16 +-
 .../paper2astra/references/implement.md       | 122 ++++++++++--
 .../paper2astra/references/literature.md      |   4 +-
 .../skills/paper2astra/references/review.md   | 174 ++++++++++++------
 .../skills/paper2astra/references/specify.md  |   2 +-
 .../paper2astra/references/summarize_run.md   |  98 +++++++---
 7 files changed, 317 insertions(+), 105 deletions(-)

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index de2d2aef..79e58851 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -18,13 +18,13 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. |
+| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. The constitution carries 10 phases, bookended by two always-interactive seams (INTERVIEW at start, SUMMARIZE_RUN at close-out); every other phase is configurable per the user's per-phase mode choice, with REVIEW and IMPLEMENT additionally tuned by a frugality / rigor dial. |
 | [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. |
 | [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. |
 | [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by paper2astra's bash-loop and tmux-orchestrated runtime modes. |
 | [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. |
-| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from paper2astra's FINAL_REVIEW phase (opt-in); also user-invokable directly. |
-| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from paper2astra's FINAL_REVIEW phase (mandatory); also user-invokable directly. |
+| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from paper2astra's SUMMARIZE_RUN close-out (opt-in); also user-invokable directly. |
+| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from paper2astra's SUMMARIZE_RUN close-out (mandatory); also user-invokable directly. |
 
 The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra.
 
diff --git a/claude/lightcone/skills/paper2astra/references/compare.md b/claude/lightcone/skills/paper2astra/references/compare.md
index 7ea35088..6d2f16b5 100644
--- a/claude/lightcone/skills/paper2astra/references/compare.md
+++ b/claude/lightcone/skills/paper2astra/references/compare.md
@@ -1,8 +1,8 @@
 # COMPARE — judge whether the reproduction matches
 
-Compare reproduced results against the paper's replication targets. Produce a structured verdict the IMPLEMENT-retry loop consumes. COMPARE is the **second mandatory user-ratification seam** — the verdict (was it close enough?) is a judgment the user owns, not the agent.
+Compare reproduced results against the paper's replication targets. Produce a structured verdict the IMPLEMENT-retry loop consumes.
 
-The constitution's per-phase mode is **always interactive** for this phase. Pause for verdict ratification.
+The constitution's per-phase mode is **user choice** for this phase — defaults to interactive for verdict ratification (was the reproduction close enough?), but a user who set the loop up to drive itself to terminal verdict can flip it to sub-agent. When sub-agent, COMPARE writes the report and the loop continues per the report's verdict; SUMMARIZE_RUN ratifies the final verdict at close-out.
 
 ## Inputs
 
@@ -69,21 +69,23 @@ If verdict is not `pass`, **`fix_suggestions` MUST reference specific scripts an
 
 Also write `comparison-report.md` with a human-readable summary. For figure / table comparisons, describe what you see in both and explain your match judgment.
 
-## Verdict ratification (the user seam)
+## Verdict ratification (interactive COMPARE)
 
-After writing the report, surface the verdict to the user via `AskUserQuestion`:
+When COMPARE runs interactively, surface the verdict to the user via `AskUserQuestion` after writing the report:
 
-- **If `pass`**: confirm with the user before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Mark reproduction complete?"* The user accepts → SUMMARIZE_RUN runs (sub-agent, writes the summary), then FINAL_REVIEW runs (interactive, walks `/figure-comparison` and the open-questions ledger); the user rejects → name what's still off and re-enter the loop.
+- **If `pass`**: confirm before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Proceed to close-out?"* The user accepts → SUMMARIZE_RUN runs interactively (renders `/figure-comparison`, walks the open-questions ledger, lands resolutions, finalizes the constitution outcome); the user rejects → name what's still off and re-enter the loop.
 - **If `partial`**: show the user the failing targets and the diagnosis. *"Partial match. <N> outputs failing: <list>. Continue retrying or accept partial?"* If the attempt budget (from the constitution) is reached, this surfacing is mandatory.
 - **If `fail`**: same shape, but the loop's continuation should be questioned more sharply. A fundamental methodological issue may need a constitution amendment, not another implement retry.
 
-The verdict is the agent's judgment; the **decision to keep iterating** is the user's. Default on user silence: continue the loop until the attempt budget is exhausted, then mandatory user surfacing.
+When COMPARE runs as a sub-agent, no `AskUserQuestion` — the report is the output. The loop reads the verdict and either retries (if budget remains and verdict is partial/fail) or proceeds to SUMMARIZE_RUN, where the user ratifies the final verdict during close-out.
+
+The verdict is the agent's judgment; the **decision to keep iterating** is the user's, surfaced either at this seam (interactive COMPARE) or at SUMMARIZE_RUN's close-out (sub-agent COMPARE). Default on user silence: continue the loop until the attempt budget is exhausted, then mandatory user surfacing.
 
 ## Survey signals (entry into COMPARE)
 
 - All outputs in `lc status --universe baseline` are `ok` ⇒ ready to compare
 - `comparison-report.yaml` exists with current `attempt` ⇒ COMPARE done for this attempt
-- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN, then FINAL_REVIEW
+- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN (interactive close-out)
 
 ## Notes
 
diff --git a/claude/lightcone/skills/paper2astra/references/implement.md b/claude/lightcone/skills/paper2astra/references/implement.md
index a2abe684..31b4d7a4 100644
--- a/claude/lightcone/skills/paper2astra/references/implement.md
+++ b/claude/lightcone/skills/paper2astra/references/implement.md
@@ -1,14 +1,15 @@
-# IMPLEMENT — write scripts and recipes
+# IMPLEMENT — write scripts and recipes; rigor-dialed self-review
 
-Read `astra.yaml` (the spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end.
+Read `astra.yaml` (the spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, fresh-context sub-agents review the implementation against the paper and the code; SPECIFY-style fixes feed back into IMPLEMENT for the next iteration. The depth of self-review is set by the constitution's frugality / rigor dial.
 
-The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. Most implementation is mechanical (translate spec → script), but algorithm choices on tricky steps may want ratification.
+The constitution's per-phase mode defaults this to **sub-agent**. Most implementation is mechanical (translate spec → script), but algorithm choices on tricky steps may want ratification. Where parallelization is feasible (multiple independent outputs from different scripts), spawn one sub-agent per output and merge.
 
 ## Inputs
 
 - `astra.yaml` — the structural spec
 - `implementation-notes.md` — tricky algorithms, numerical gotchas, data-format quirks
 - `work/notes/methodology.md` — for context when the spec compresses
+- `work/notes/study/` — per-section paper-vs-code agreement-check files (Grep into for the verbatim claim and the canonical code location for any output you're implementing)
 - `work/reference/code/` (if present) — **canonical reference. Read on every iteration when implementing.** Where paper and code disagree, code wins for numerics, plotting, and method.
 
 ## Outputs
@@ -16,10 +17,11 @@ The constitution's per-phase mode is **user choice** for this phase — defaults
 - `scripts/<output>.py` (or `.sh`, or whatever fits) — one script per output (or shared scripts for tightly-coupled outputs)
 - `requirements.txt` — Python dependencies
 - Recipes in `astra.yaml` — each output gets a `recipe:` block with `command:` and `inputs:`
+- `work/notes/implement-review/round-<N>.md` — each rigor-dialed review round's findings (rigor only; one file per round)
 
-## Task
+## Step 1: write recipes + scripts
 
-Read `astra.yaml` and `implementation-notes.md`. Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml`.
+Read `astra.yaml` and `implementation-notes.md`. For each output, write a script in `scripts/` that produces it, and add a `recipe:` block to the output's entry in `astra.yaml` with `command:` and `inputs:`.
 
 If `work/reference/code/` exists, **read the relevant code on every iteration** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that the running spec hasn't resolved:
 
@@ -28,35 +30,127 @@ If `work/reference/code/` exists, **read the relevant code on every iteration**
 
 Without this discipline, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced.
 
-## Data: REAL DATA ONLY
+### Parallelize where feasible
 
-**NEVER generate synthetic, mock, or fake data.** Every input dataset must be downloaded or queried from its real source (archive URL, database query, API, etc.). The methodology notes and `astra.yaml` inputs describe where each dataset comes from — write scripts that fetch the actual data.
+When outputs are produced by independent scripts (no shared expensive computation), spawn one Task-tool sub-agent per output. Each sub-agent gets:
 
-The only exception is if the paper itself uses synthetic / simulated data as its input (e.g., N-body simulations, Monte Carlo samples). In that case, reproduce the paper's data generation procedure exactly as described — but this is reproducing the paper's methodology, not substituting real data with fakes.
+- The output's spec entry from `astra.yaml`
+- The relevant section of `implementation-notes.md`
+- The matching `work/notes/study/<NN>-<slug>.md` row(s) for the verbatim paper claim and the canonical code location
+- The relevant code path(s) under `work/reference/code/`
 
-If a dataset is behind a paywall, requires registration, or is "available upon request," write the download script with a clear error message explaining what the user needs to do manually. **Do NOT substitute synthetic data as a workaround.**
+The orchestrator merges scripts and recipes after the per-output sub-agents finish. Tightly-coupled outputs (e.g. an MCMC producing both a chain and a summary statistic) stay in one sub-agent and one script.
 
-## Rules
+### Rules for the first pass
 
 1. **One script per output** (or a shared script for tightly-coupled outputs).
 2. **Parameterize by decisions.** Each decision is a CLI argument; scripts also receive `--universe <universe_id>`. See lightcone-cli's `CLAUDE.md` for the full convention.
 3. **Add recipes** to each output in `astra.yaml` with `command:` and `inputs:` (dependencies). Recipe inputs use the same `<analysis>.<output>` form the narrative skill's data-flow rules require.
 4. **Create `requirements.txt`** with needed packages. Do not install them — the RUN phase manages environments.
-5. **Do not execute scripts** — the RUN phase handles execution via `prism run` (now `lc run`).
+5. **Do not execute scripts** — the RUN phase handles execution via `lc run`.
 6. **Validate** with `astra validate astra.yaml` after adding recipes.
 
-## Retry attempts
+## Step 2: rigor-dialed self-review
+
+After the first-pass implementation lands, the constitution's frugality / rigor dial decides what happens next:
+
+- **Frugal:** one minimal review pass — a single fresh sub-agent reads `scripts/`, `astra.yaml`'s recipes, and the paper, and reports any obvious paper-vs-implementation inconsistencies. Fixes are applied once; no further iteration. If no fixes are needed, IMPLEMENT proceeds to RUN.
+- **Rigor:** N rounds of fresh-context sub-agent review + fix. Each round runs a fresh reviewer that does not see the prior round's findings or fixes. Stop when **two consecutive rounds find no fixes** (strong termination criterion), or after 5 rounds (system cap), whichever comes first.
+
+The discipline is the same as REVIEW's: each round's reviewer is fresh, prompted to check "is the implementation consistent with the paper and the code?", and outputs findings only — not edits. Fixes are applied between rounds by a separate IMPLEMENT-fix sub-agent (or the orchestrator inline for trivial mechanical fixes).
+
+### Per-round fresh sub-agent — system prompt
+
+> You are a paper-vs-implementation review agent. Read the implementation (`scripts/`, `astra.yaml` recipes), the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
+>
+> ### Inputs
+>
+> - `scripts/` — first-pass implementation
+> - `astra.yaml` — the spec (recipes are part of the implementation; structural fields are SPECIFY's)
+> - `implementation-notes.md`
+> - `work/notes/methodology.md` — Grep into; do not re-read whole
+> - `work/notes/study/` — per-section paper-vs-code agreement-check files (the verbatim claims and code locations you're checking against)
+> - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep)
+> - `work/reference/code/` (when present) — canonical reference for numerics + method
+>
+> ### What to check
+>
+> 1. **Recipe coverage.** Every output in `astra.yaml` has a recipe; every recipe runs a script that exists in `scripts/`.
+> 2. **Method fidelity.** For each output, the script implements the method described in the matching `work/notes/study/<NN>-<slug>.md` row. Where paper and code disagreed (material-disagreement rows), the script follows the code's method (canonical-resolution rule), unless the spec explicitly recorded a different override in `decisions:` and `universes/baseline.yaml`.
+> 3. **Numerical correctness.** Constants, hyperparameters, threshold values match the paper (or the code, where the canonical-resolution rule applied). Flag mismatches with `path:line` of the script and the paper §/eq.
+> 4. **Data acquisition.** Scripts that fetch data use the real acquisition path from `astra.yaml`'s inputs — no synthetic / mock substitutes.
+> 5. **Determinism.** Scripts set random seeds where the paper's method is stochastic. Library versions in `requirements.txt` are pinned where reproducibility requires it.
+> 6. **Recipe wiring.** Recipe `inputs:` references match the data-flow the scripts actually consume; no orphan dependencies, no missing dependencies.
+>
+> ### What NOT to do
+>
+> - **Do not edit any file.** Your output is a findings file; an IMPLEMENT-fix pass responds to the findings.
+> - **Do not re-read the entire paper.** Grep into `work/notes/study/` and `work/reference/source/` (or `document.md`) for the specific claims you want to verify.
+> - **Do not invent problems.** If the implementation matches paper + code, say so briefly.
+> - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
+>
+> ### Output format — `work/notes/implement-review/round-<N>.md`
+>
+> ```markdown
+> # Implement-review round <N>
+>
+> Reviewer ran fresh against scripts/, recipes in astra.yaml, paper, and code.
+>
+> ## Findings
+>
+> ### <category — e.g. "Method fidelity" / "Numerical correctness" / "Recipe wiring">
+>
+> - **<one-line finding>**
+>   - **What's wrong**: <quote or `script:line` of the implementation problem>
+>   - **Where to fix**: <`scripts/<file>.py:line` or `astra.yaml#path/to/recipe`>
+>   - **Suggested fix**: <one-line concrete change>
+>   - **Source**: <paper §X.Y "quote" + `work/notes/study/<id>` row, or code `path:line`>
+>
+> ## Verdict
+>
+> - **fixes_needed**: <count>
+> - **clean** | **needs-fixes**
+> ```
+
+### Step 3: IMPLEMENT-fix pass between rounds
+
+After each round's findings file lands, an IMPLEMENT-fix sub-agent (or the orchestrator inline for trivial fixes) edits `scripts/`, `astra.yaml` recipes, `requirements.txt`, and `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`, run `astra validate astra.yaml`.
+
+### Step 4: termination check
+
+- `weak` (frugal): one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
+- `strong` (rigor):
+  - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
+  - If N hits the system cap of 5 without two consecutive clean rounds, surface to the user via `AskUserQuestion`: "implement-review reached round cap with N fixes still landing; continue, accept the current implementation, or revise the constitution?" Default on user silence: accept current implementation, log the unfinished tail in `<paper-slug>/open-questions.md`, proceed.
+
+The IMPLEMENT-review iterations are independent of the COMPARE → IMPLEMENT retry loop — review iterations run before RUN, on the spec/implementation alignment side; COMPARE retries run after RUN, on the result-matching side.
+
+## Data: REAL DATA ONLY
+
+**NEVER generate synthetic, mock, or fake data.** Every input dataset must be downloaded or queried from its real source (archive URL, database query, API, etc.). The methodology notes and `astra.yaml` inputs describe where each dataset comes from — write scripts that fetch the actual data.
+
+The only exception is if the paper itself uses synthetic / simulated data as its input (e.g., N-body simulations, Monte Carlo samples). In that case, reproduce the paper's data generation procedure exactly as described — but this is reproducing the paper's methodology, not substituting real data with fakes.
+
+If a dataset is behind a paywall, requires registration, or is "available upon request," write the download script with a clear error message explaining what the user needs to do manually. **Do NOT substitute synthetic data as a workaround.**
+
+## Retry attempts (post-COMPARE)
 
 If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, the IMPLEMENT iteration is a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. The constitution carries the attempt budget (default 5); the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, surface to the user via `AskUserQuestion` ("verdict still failing after N attempts — continue, change scope, or accept partial?") rather than burning more cycles.
 
+A retry attempt re-runs the IMPLEMENT-review iterations on the changed scripts before proceeding to RUN.
+
 ## Survey signals (entry into IMPLEMENT)
 
-- `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement
+- `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement first pass
 - `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done
-- `comparison-report.yaml` returns `pass` ⇒ IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN
+- For frugal: `work/notes/implement-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ IMPLEMENT done
+- For rigor: two consecutive `work/notes/implement-review/round-<N>.md` files both have verdict `clean` ⇒ IMPLEMENT done; proceed to RUN
+- `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN
 
 ## Notes
 
 - **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
 - **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
 - **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
+- **The fresh-context discipline is the same as REVIEW's.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Each round must spawn a brand-new sub-agent.
+- **Minimize churn in fixes.** Targeted edits, not restructures. Big restructures defeat the round-over-round comparison the orchestrator uses to decide termination.
diff --git a/claude/lightcone/skills/paper2astra/references/literature.md b/claude/lightcone/skills/paper2astra/references/literature.md
index 07c77814..5a0668cd 100644
--- a/claude/lightcone/skills/paper2astra/references/literature.md
+++ b/claude/lightcone/skills/paper2astra/references/literature.md
@@ -6,9 +6,9 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 
 ## Inputs
 
-- `work/notes/cited_papers.yaml` — the list of papers to mine, from SUMMARIZE
+- `work/notes/cited_papers.yaml` — the list of papers to mine, from STUDY
 - `work/notes/methodology.md` — has the decision map; each per-paper sub-agent gets it as context
-- `work/reference/document.md` — the target paper (for reference)
+- `work/reference/source/` (Path A — arXiv LaTeX) or `work/reference/document.md` (Path B — Docling) — the target paper (for reference)
 
 ## Outputs
 
diff --git a/claude/lightcone/skills/paper2astra/references/review.md b/claude/lightcone/skills/paper2astra/references/review.md
index 13363378..50a16226 100644
--- a/claude/lightcone/skills/paper2astra/references/review.md
+++ b/claude/lightcone/skills/paper2astra/references/review.md
@@ -1,79 +1,149 @@
-# REVIEW — pre-implementation sanity check
+# REVIEW — rigor-dialed fresh-context spec audit
 
-Verify that the ASTRA specification is complete, consistent, and ready for the IMPLEMENT phase. REVIEW edits the spec in place when fixes are obvious; it surfaces gaps to the user (or as Open Questions) when judgment is required.
+A fresh-context sub-agent reads `astra.yaml` against the paper and the code and asks "is this consistent?" The reviewer never sees what was just implemented or fixed last round — its only job is first-principles cross-reference. SPECIFY incorporates fixes; a *fresh* reviewer re-runs; iterate until two consecutive rounds find nothing or a configured cap is hit.
 
-The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. REVIEW is mostly mechanical (cross-reference, validation), so sub-agent suits it; but a paper that hits the SPECIFY conflict-surfacing path heavily may want REVIEW interactive too.
+REVIEW's depth is set by the constitution's **frugality / rigor** dial (see "Rigor vs frugality" in `../SKILL.md`):
 
-## Inputs
-
-- `astra.yaml` — the spec from SPECIFY
-- `universes/baseline.yaml`
-- `implementation-notes.md`
-- `work/notes/methodology.md`
-- `targets/targets.md`
-- `work/reference/document.md` (Grep into; do not re-read whole)
-- `work/notes/literature.yaml` (if present) — for evidence verification
-
-## Outputs
-
-- In-place edits to `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` as needed
-- No new files unless a missing data-acquisition path needs to be flagged with content
-
-## Checks
+- **Frugal:** skip REVIEW entirely, or run a single fresh sub-agent pass and incorporate its fixes once.
+- **Rigor:** N rounds — each round runs a fresh reviewer; SPECIFY incorporates fixes; the next round runs *another* fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes (the strong termination criterion the loop already uses), or a system cap of 5 rounds, whichever is sooner.
 
-1. **Target coverage.** Every replication target from `targets/targets.md` must appear as an output (or finding, or input/decision/universe default) in `astra.yaml`. Any missing target either gets added or earns an explicit out-of-scope reason in `targets.md`.
+The constitution's per-phase mode defaults this to **sub-agent**; interactive REVIEW is rare (a paper that hits the SPECIFY conflict-surfacing path heavily may want a human in the loop).
 
-2. **Output definitions.** Each output has a clear `type` and sufficient description.
+## Why fresh-context sub-agents
 
-3. **Methodology detail.** Cross-check `work/notes/methodology.md` against the spec for gaps: missing hyperparameters, underspecified algorithms, vague data-processing steps. Re-read targeted sections of the paper to fill them in. Use Grep on `work/reference/document.md` rather than re-reading the whole thing.
+A reviewer that has just helped fix `astra.yaml` will pattern-match on its own fixes rather than re-reading the paper. Catching the *next* class of inconsistency requires a fresh context that doesn't carry the prior round's framing. The sub-agent's prompt must therefore say "check `astra.yaml` is consistent with the paper and the code" — never "here's what was just fixed; check it." The reviewer's only inputs are the paper, the code, and the spec.
 
-4. **Decisions.** Decisions should cover what actually affects reproducibility. Remove cosmetic choices; add anything material that is missing. Ensure `universes/baseline.yaml` stays consistent.
+This also bounds the work: each round is one fresh sub-agent over a bounded artifact. Rigor doesn't mean "longer review" — it means "more independent reviewers."
 
-5. **Data obtainability.** Every data source needs a concrete path (URL, package name, or generation code). Flag anything vague or "available upon request."
-
-6. **Data acquisition.** Every input in `astra.yaml` must have a concrete acquisition path — a download URL, database query, API call, or package name. Verify that `methodology.md` documents how to obtain each dataset. Flag any dataset that is vague so IMPLEMENT knows what to handle.
-
-7. **Implementation notes.** Check `implementation-notes.md` for completeness — does it flag the tricky parts? Add anything IMPLEMENT should know.
+## Inputs
 
-8. **Evidence verification.** If `work/notes/literature.yaml` exists, run:
-   ```bash
-   astra validate astra.yaml --verify-evidence
-   ```
-   This verifies that all prior-insight quotes match the source PDFs. Flag any misquotes or unsupported claims; these typically arise when a quote was paraphrased or when prefix/suffix carry editorial commentary instead of real surrounding text.
+- `astra.yaml` — the spec from SPECIFY (the artifact under review)
+- `universes/baseline.yaml` — the universe selection
+- `implementation-notes.md` — practical guidance for IMPLEMENT
+- `targets/targets.md` — coverage obligations
+- `work/notes/methodology.md` — consolidated decision map / results inventory / data sources (Grep into for cross-reference; do not re-read whole)
+- `work/notes/study/` — per-section paper-vs-code agreement-check files (Grep into for verbatim claims and code locations)
+- `work/notes/literature.yaml` (if present) — for evidence verification
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
+- `work/reference/code/` (if present) — original code, canonical reference for numerics + method
 
-## Fixes
+## Outputs
 
-Edit files directly. After any change to `astra.yaml`, run:
+- In-place edits to `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` driven by reviewer findings — written by SPECIFY in response, **not** by the reviewer itself
+- `work/notes/review/round-<N>.md` — each round's reviewer findings (one file per round; the orchestrator passes round-N's findings to SPECIFY for fixing, then spawns round-(N+1) as a fresh sub-agent that does not see round-N's findings)
+
+## Step 1: orchestrator decides round count from the constitution's rigor dial
+
+Read the constitution's termination-criterion field:
+
+- `weak` (frugal) → at most one round; if no fixes found, REVIEW is done. If skipping is preferred (the user said "skip review"), the orchestrator records "REVIEW skipped per constitution" in the workdir and proceeds to IMPLEMENT.
+- `strong` (rigor) → iterate. Stop when **two consecutive rounds find no fixes**, or after 5 rounds (system cap), whichever comes first.
+
+## Step 2: per-round fresh sub-agent — system prompt
+
+Spawn one Task-tool sub-agent per round. Each round's sub-agent gets only the inputs above — never the prior round's findings, never a description of what was just fixed.
+
+> You are an ASTRA-spec reviewer. Read `astra.yaml`, the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
+>
+> ### Inputs
+>
+> - `astra.yaml` — the spec under review
+> - `universes/baseline.yaml`
+> - `implementation-notes.md`
+> - `targets/targets.md`
+> - `work/notes/methodology.md` — consolidated paper-derived decision map (Grep into; do not re-read whole)
+> - `work/notes/study/` — per-section paper-vs-code agreement-check files (Grep into for verbatim claims and code locations)
+> - `work/notes/literature.yaml` (if present)
+> - `work/reference/source/` (arXiv LaTeX; preferred) or `work/reference/document.md` (Docling fallback) — paper text (Grep into; do not re-read whole)
+> - `work/reference/code/` (when present) — canonical reference for numerics + method
+>
+> ### What to check
+>
+> 1. **Target coverage.** Every entry in `targets/targets.md` must appear in `astra.yaml` as an output, finding, input, decision, or universe default. Any missing target either earns a spec home or an explicit out-of-scope reason in `targets.md`.
+> 2. **Output definitions.** Each output has a clear `type` and sufficient description.
+> 3. **Methodology coverage.** Cross-check `work/notes/methodology.md` against the spec for gaps: missing hyperparameters, underspecified algorithms, vague data-processing steps. Grep targeted sections of the paper to confirm.
+> 4. **Decisions.** Decisions cover what affects reproducibility. Cosmetic / pure-tooling choices should not be decisions; anything material that is missing should be added. `universes/baseline.yaml` must be consistent with the paper's reported choices (or with the code's, when paper-vs-code resolution applied per the canonical-resolution rule).
+> 5. **Data acquisition.** Every input has a concrete acquisition path — a download URL, database query, API call, or package name. Vague references ("available upon request", no source named) are flagged.
+> 6. **Implementation-notes completeness.** Does `implementation-notes.md` flag the tricky parts the IMPLEMENT phase will hit? Cross-check against `work/notes/study/<NN>-<slug>.md` material-disagreement entries — every paper-vs-code material disagreement that landed in the spec should also appear in implementation-notes for IMPLEMENT.
+> 7. **Evidence verification.** If `work/notes/literature.yaml` exists, run `astra validate astra.yaml --verify-evidence`. Flag any misquotes or unsupported claims; these typically arise when a quote was paraphrased or when prefix/suffix carry editorial commentary instead of real surrounding text.
+> 8. **Code-as-canonical applied.** Where paper and code disagree on a material choice (per `work/notes/study/`'s material-disagreement rows), check that `universes/baseline.yaml` selects the code's choice, OR that an interactive seam recorded a different user choice. Flag any material disagreement where the spec silently picked the paper without recording an explicit override.
+> 9. **No synthetic data.** Unless the paper itself uses synthetic data, every input has a real acquisition source — no mock / synthetic substitutes anywhere in the spec, recipes, or implementation-notes.
+>
+> ### What NOT to do
+>
+> - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; SPECIFY responds to the findings. Editing here defeats the multi-round-fresh-context discipline.
+> - **Do not re-read the entire paper.** Use Grep to look up specific claims you want to verify. Work primarily from `work/notes/methodology.md` and `work/notes/study/`.
+> - **Do not invent problems.** If the spec is consistent with paper + code, say so briefly.
+> - **Do not assume a prior reviewer has been here.** You are fresh. Treat this as a first-principles read.
+>
+> ### Output format — `work/notes/review/round-<N>.md`
+>
+> ```markdown
+> # Review round <N>
+>
+> Reviewer ran fresh against `astra.yaml`, paper, and code.
+>
+> ## Findings
+>
+> ### <category — e.g. "Target coverage" / "Decisions" / "Data acquisition" / "Evidence">
+>
+> - **<one-line finding>**
+>   - **What's wrong**: <quote or location of the spec problem>
+>   - **Where to fix**: <`astra.yaml#path/to/key` or `implementation-notes.md`>
+>   - **Suggested fix**: <one-line concrete change>
+>   - **Source**: <paper §X.Y "quote" + `work/notes/study/<id>` row, or code `path:line`>
+>
+> ## No-fix sections
+>
+> Brief one-liners for sections that look clean (so the orchestrator knows you actually checked).
+>
+> ## Verdict
+>
+> - **fixes_needed**: <count>
+> - **clean** | **needs-fixes**
+> ```
+>
+> Be concise. The orchestrator reads this file to decide whether to spawn another round and what SPECIFY needs to fix.
+
+## Step 3: SPECIFY incorporates findings
+
+After the round's findings file lands, SPECIFY (or the orchestrator playing SPECIFY for trivial mechanical fixes) edits `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`, run:
 
 ```bash
 astra validate astra.yaml
 ```
 
-## CRITICAL: No synthetic data
+If literature.yaml is present:
 
-Unless the paper itself uses synthetic / simulated data as input, the pipeline must use **real data only**. Check that:
+```bash
+astra validate astra.yaml --verify-evidence
+```
 
-- Every `astra.yaml` input has a real acquisition source (URL, query, etc.)
-- `implementation-notes.md` does NOT suggest generating mock / synthetic data
-- The methodology notes describe real data sources with concrete download paths
+The orchestrator records what was fixed in a small commit per round so `git log` shows the chain.
 
-If any input lacks a concrete acquisition path, add one by searching the paper for URLs, DOIs, or archive references. If the data truly cannot be obtained programmatically, document this clearly in `implementation-notes.md` so IMPLEMENT writes a script that fails with a helpful message rather than silently substituting fake data.
+## Step 4: termination check
 
-## Rules
+After SPECIFY incorporates the round's fixes, the orchestrator decides whether to spawn another round:
 
-- Use Grep to search `work/reference/document.md` for specific claims to verify — do not read the entire markdown at once. Work primarily from notes and the spec.
-- **Minimize churn** — don't restructure or rename unnecessarily.
-- If everything looks good, say so briefly; don't invent problems.
-- Do **NOT** add implementation recipes — that is IMPLEMENT's job.
+- `weak` (frugal): one pass is enough. Done.
+- `strong` (rigor):
+  - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done (two consecutive clean rounds = strong termination criterion).
+  - If round N is the first round (N=1), spawn round 2 unconditionally so we can compare.
+  - If round N produced fixes, spawn round (N+1) as a fresh sub-agent that does not see round N's findings or the fixes.
+  - If N hits the system cap of 5 rounds without two consecutive clean rounds, surface to the user: "REVIEW reached round cap with N fixes still landing; continue, accept the current spec, or revise the constitution?" via `AskUserQuestion`. Default on user silence: accept the current spec, log the unfinished tail in `<paper-slug>/open-questions.md`, and proceed.
 
 ## Survey signals (entry into REVIEW)
 
-- `astra.yaml` exists and validates ⇒ ready to review
-- `astra validate astra.yaml --verify-evidence` returns clean (when literature.yaml exists) ⇒ evidence side done
-- All `targets/targets.md` entries map to spec homes (output / finding / input / decision / universe default) ⇒ coverage side done
-- Both ⇒ REVIEW complete; proceed to IMPLEMENT
+- `astra.yaml` exists and `astra validate astra.yaml` returns clean ⇒ ready to review
+- `work/notes/review/round-1.md` exists ⇒ first round done
+- For frugal: `round-1.md` exists with verdict `clean` (or no fixes were incorporated) ⇒ REVIEW done
+- For rigor: two consecutive `round-<N>.md` and `round-<N-1>.md` files both have verdict `clean` ⇒ REVIEW done; proceed to IMPLEMENT
+- `astra validate astra.yaml --verify-evidence` returns clean (when literature.yaml exists) ⇒ evidence side validated
 
 ## Notes
 
-- **REVIEW does not write code.** Its outputs are edits to the spec and additions to `implementation-notes.md`, not new scripts.
-- **A clean REVIEW reduces IMPLEMENT thrash.** It is worth running even when the spec looks fine after SPECIFY — the cross-check catches "looks fine in isolation, breaks under full coverage" gaps.
+- **REVIEW does not write code.** Its outputs are findings; SPECIFY's edits to the spec / notes implement them.
+- **The fresh-context discipline is load-bearing.** A reviewer that sees the prior round's findings or fixes pattern-matches on them and stops finding the next class of inconsistency. Each round must spawn a brand-new sub-agent with only paper + code + spec as inputs.
+- **Minimize churn in fixes.** SPECIFY's edits should target the specific finding, not restructure surrounding spec. Big restructures defeat the round-over-round comparison the orchestrator uses to decide termination.
+- **A clean REVIEW reduces IMPLEMENT thrash.** It is worth running even when SPECIFY's output looked fine — fresh-context cross-checks catch "looks fine in isolation, breaks under full coverage" gaps.
+- **For frugal runs, REVIEW can be skipped when SPECIFY ran interactively** and the user already ratified material conflicts. The constitution records the skip; iterations honor it.
diff --git a/claude/lightcone/skills/paper2astra/references/specify.md b/claude/lightcone/skills/paper2astra/references/specify.md
index 0819646f..2960eab8 100644
--- a/claude/lightcone/skills/paper2astra/references/specify.md
+++ b/claude/lightcone/skills/paper2astra/references/specify.md
@@ -110,7 +110,7 @@ When sub-analyses exist, the root narrative MUST include a top-down end-to-end d
 - **Equation and section numbers must match the rendered paper / PDF**, not a naïve count of TeX blocks or markdown headings. When citing "eq. N" or "§N", find the equation or heading by content in the rendered paper and use the printed number.
 - **When adding finding evidence**, verify the quoted text against the paper source by Grep or PDF search. `astra validate --verify-evidence` currently verifies `prior_insights` evidence; artifact-anchored `findings` evidence still needs a manual quote check.
 - **Validate** with `astra validate astra.yaml` and fix until it passes.
-- **Work primarily from `work/notes/`** — SUMMARIZE has already distilled the paper. Use `work/reference/document.md` only to look up specific details (Grep for terms, or read targeted sections with offset/limit). Do not read the entire markdown at once.
+- **Work primarily from `work/notes/`** — STUDY has already distilled the paper section by section. Use `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) only to look up specific details (Grep for terms, or read targeted sections with offset/limit). Do not re-read the whole paper.
 
 ## Survey signals (entry into SPECIFY)
 
diff --git a/claude/lightcone/skills/paper2astra/references/summarize_run.md b/claude/lightcone/skills/paper2astra/references/summarize_run.md
index 927902c1..4332a8e1 100644
--- a/claude/lightcone/skills/paper2astra/references/summarize_run.md
+++ b/claude/lightcone/skills/paper2astra/references/summarize_run.md
@@ -1,60 +1,106 @@
-# SUMMARIZE_RUN — final report and outcome draft
+# SUMMARIZE_RUN — interactive close-out
 
-The reproduction has converged (verdict `pass` or user-accepted `partial`). Write the final summary to disk, draft the constitution's outcome, and prepare the workdir for the post-loop interactive return. SUMMARIZE_RUN runs as a silent sub-agent — it produces the report cleanly and exits; the next phase, FINAL_REVIEW, picks up interactively to drive `/figure-comparison`, optionally `/check-sentence-by-sentence`, walk the user through `open-questions.md`, and finalize the outcome before closure.
+The reproduction has converged (verdict `pass` or user-accepted `partial`). Control returns to the user. SUMMARIZE_RUN is the second always-interactive bookend (INTERVIEW being the first); it runs in the main loop session, not as a sub-agent, so it can use `AskUserQuestion` and invoke sibling skills that need user reach. Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and finalize the constitution outcome — in one interactive arc.
 
-The constitution's per-phase mode is **always sub-agent**. There are no decisions left for this phase; this is reportage that hands off to FINAL_REVIEW.
+The constitution's per-phase mode is **always interactive** for this phase. It does not run as a sub-agent. There is no "silent close-out" path; the close-out is the human's review.
 
 ## Inputs
 
-- `astra.yaml` — final spec
+- `astra.yaml` — final spec (validates with `--verify-evidence` if literature.yaml exists)
 - `comparison-report.yaml`, `comparison-report.md` — final verdict
-- `targets/targets.md` — what was being matched against
+- `targets/targets.md` — what was being matched against; reference figures / tables in `targets/`
+- `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
+- `<paper-slug>/open-questions.md` — running report from sub-agent phases (paper-vs-code conflicts, ambiguities, anything sub-agents flagged for user resolution)
 - `work/notes/methodology.md` — for context
-- The constitution at the project root — its `outcome:` field needs rewriting
+- The constitution at the project root — its `outcome:` field needs the final write
+- `<paper-slug>/CLAUDE.md` — paper identity, code location
 
 ## Outputs
 
-- `REPRODUCTION-SUMMARY.md` (or whatever name fits the project) — final report; concise.
-- Draft `outcome:` on the constitution. FINAL_REVIEW refines it after the user has walked the validation surfaces.
-- A commit on the reproduction branch with a clear message.
+- `.lightcone/comparison.html` — `/figure-comparison`'s portable side-by-side report (paper artifacts vs reproduced)
+- (Optional) `.lightcone/check-sentence-by-sentence.md` — `/check-sentence-by-sentence`'s claim audit (file:line or NOT FOUND per sentence)
+- `<paper-slug>/open-questions.md` — same file, but with `## Resolutions` section appended capturing what the user said for each entry
+- Edits to `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` if any open-question resolution warrants a spec change
+- `REPRODUCTION-SUMMARY.md` — final report; concise (~1–2 pages)
+- Constitution `outcome:` rewritten to its final form
+- A commit closing out the reproduction
 
-## What the final report covers
+## Step 1: render the validation surfaces
+
+### `/figure-comparison` (mandatory)
+
+Invoke the `/figure-comparison` skill from this session. It builds a portable HTML side-by-side comparing paper artifacts (from `targets/`) to reproduced artifacts (from `results/<universe>/`). The skill uses `AskUserQuestion` for any inputs it can't infer from the workdir; that works because SUMMARIZE_RUN is interactive — the prompts land in this session.
+
+Output lands at `.lightcone/comparison.html`. Show the user the path and offer to open it (`open` on macOS, `xdg-open` on Linux, or just print the path so they click in their terminal).
+
+**Do not spawn `/figure-comparison` under the `Task` tool.** It has `AskUserQuestion` in its `allowed-tools`; a Task-tool sub-agent has no user-reach, so the prompt fires into nothing.
+
+### `/check-sentence-by-sentence` (opt-in)
+
+Ask the user via `AskUserQuestion` whether they want the claim audit. It's optional because for many reproductions the figure-comparison already settles "did it match?"; the sentence-by-sentence audit earns its keep when the paper makes many specific quantitative claims and the user wants each one anchored to a code location.
+
+If yes, invoke `/check-sentence-by-sentence`. Same discipline as `/figure-comparison` — it can prompt the user; do not spawn under `Task`.
+
+Output lands at `.lightcone/check-sentence-by-sentence.md` (or wherever the skill writes it). Show the user the path.
+
+## Step 2: walk `<paper-slug>/open-questions.md` with the user
+
+Read `<paper-slug>/open-questions.md`. For each unresolved entry, surface it via `AskUserQuestion` with:
+
+- **The question** (verbatim from the file)
+- **Origin** — which phase / sub-agent flagged it
+- **The default the loop applied** (if any — e.g. "code as canonical")
+- **Three options**: ratify the default, override (user spells out their choice), or defer (leave as a known limitation in the final report)
+
+Append a `## Resolutions` section to `<paper-slug>/open-questions.md` capturing what the user said for each entry. This makes the resolution durable — re-runs and future sessions see it.
+
+If a resolution warrants a spec change (the user picks an override), edit `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` accordingly and re-run `astra validate astra.yaml`. If the change would invalidate the comparison report (e.g. flips the canonical method for a primary output), surface that to the user — in most cases the reproduction is "done" and the override is a known limitation, but the user may choose to re-enter the loop.
+
+## Step 3: write `REPRODUCTION-SUMMARY.md`
 
 A single markdown file at the project root, ~1–2 pages. Sections:
 
 1. **What was reproduced** — the paper, the scope, the targets.
 2. **Verdict** — pass / partial. If partial, what failed and why we accepted it.
-3. **Material decisions** — the paper-vs-code conflicts the SPECIFY phase surfaced, what the user chose, and why.
-4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target, with the path to the reproduced result.
-5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut that's stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded.
-6. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`).
-7. **Open questions for FINAL_REVIEW** — short pointer to `<paper-slug>/open-questions.md`, with a count of unresolved entries. FINAL_REVIEW will walk these with the user; this section just flags that they're waiting.
+3. **Material decisions** — the paper-vs-code conflicts SPECIFY surfaced, what the user chose (interactively or by canonical-resolution default), and why.
+4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target with the path to the reproduced result and a one-line match note from the comparison report.
+5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded.
+6. **Resolved open questions** — pull from `<paper-slug>/open-questions.md`'s `## Resolutions` section. One bullet per question + its resolution.
+7. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`).
 
 Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
 
-## Constitution outcome (draft)
+## Step 4: finalize the constitution outcome
 
-Draft the constitution's `outcome:` field to reflect the realized state. A good outcome teaches:
+Rewrite the constitution's `outcome:` field to its final form. Now the user has walked the validation surfaces, ratified the open questions, and accepted (or explicitly partially-accepted) the reproduction. Write the outcome that teaches:
 
-> Reproduced <paper> against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to <reason>. Material conflicts surfaced and resolved: <list>. Spec at `astra.yaml` (validates with `--verify-evidence`); reproduction summary at `REPRODUCTION-SUMMARY.md`. **FINAL_REVIEW pending: <N> open questions, `/figure-comparison` not yet rendered.**
+> Reproduced <paper> against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to <reason>. Material conflicts surfaced and resolved: <list>. Open questions resolved: <count> (full chain in `open-questions.md`). Spec at `astra.yaml` (validates with `--verify-evidence`); side-by-side at `.lightcone/comparison.html`; full report at `REPRODUCTION-SUMMARY.md`.
 
-This is a draft. **FINAL_REVIEW refines it** after the user has walked the validation surfaces and ratified the open questions. The constitution's `status:` flips to `closed` only when the user accepts FINAL_REVIEW's surfacing. This sub-agent does not flip status, and does not finalize the outcome — it prepares the report and the outcome draft, then exits so FINAL_REVIEW can take over interactively.
+The outcome should stand on its own — someone reading just `felt show <reproduction-fiber>` (or the kanban card) should learn the verdict, the material decisions that landed, and where the artifacts live. No "see the body for details."
 
-## Commit
+## Step 5: commit
 
-Stage the report, the constitution outcome draft, the final `astra.yaml`, the comparison report, and any housekeeping changes. Commit with a message that names the verdict and signals the handoff:
+Stage `REPRODUCTION-SUMMARY.md`, `<paper-slug>/open-questions.md` (with resolutions), the constitution with the final outcome, the final `astra.yaml`, the comparison artifacts, and any housekeeping changes. Commit with a message that names the verdict and the close-out:
 
 ```
-summarize_run: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMMARY.md, final_review pending
+summarize_run: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMMARY.md
 ```
 
+After the commit, optionally flip the constitution's status to `closed` (or whatever the per-paper conventions name) so future surveys recognize the reproduction is done.
+
 ## Survey signals (entry into SUMMARIZE_RUN)
 
-- `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready
-- `REPRODUCTION-SUMMARY.md` exists, constitution outcome draft is in place ⇒ SUMMARIZE_RUN done; FINAL_REVIEW takes over interactively
+- `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready to close out
+- `.lightcone/comparison.html` exists ⇒ `/figure-comparison` rendered
+- `<paper-slug>/open-questions.md` has a `## Resolutions` section covering every entry ⇒ open-questions walkthrough done
+- `REPRODUCTION-SUMMARY.md` exists ⇒ final report written
+- Constitution `outcome:` reflects the final state ⇒ SUMMARIZE_RUN done; reproduction complete
 
 ## Notes
 
-- **This phase does not flip the constitution's status to closed.** The user does that, after FINAL_REVIEW. SUMMARIZE_RUN's job is to produce the summary cleanly and hand off.
-- **Do not invoke `/figure-comparison` or `/check-sentence-by-sentence` from here.** Both have `AskUserQuestion` in their `allowed-tools`; spawning them under the `Task` tool fires prompts into nothing. They run in FINAL_REVIEW, where the user is reachable.
+- **This phase runs interactively in the main loop session.** Do not spawn it under `Task`. The whole point of SUMMARIZE_RUN is that the user is reachable — every step uses `AskUserQuestion` (directly, or via the sibling skills it invokes).
+- **`/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`.** That's why SUMMARIZE_RUN is the always-interactive close-out and they live here, not in the loop. Spawning either under `Task` from inside the loop fires prompts into nothing.
+- **The user owns the verdict-acceptance decision.** SUMMARIZE_RUN's purpose is to let the user see what the loop did and decide whether they accept it. The skill renders surfaces and asks; it does not unilaterally close.
+- **Open-question resolutions are durable.** Append to `<paper-slug>/open-questions.md`'s `## Resolutions` section so the next re-run / future session sees what was decided. Do not delete the original questions.
 - **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous.
+- **Do not invent further work.** If the constitution's evidence checks all pass, the reproduction is done. The next session, the human, or a future revisit can decide whether the reproduction's place still serves them.

From 50c0869b698a8da8193dfa6d1af046833e676591 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 03:30:46 +0200
Subject: [PATCH 014/124] =?UTF-8?q?paper2astra:=20ARCHITECT-first=20phase?=
 =?UTF-8?q?=20shape=20=E2=80=94=20references/?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Rewires the loop's middle to architecture-first, two-pass-per-sub-analysis:

- references/architect.md (NEW, replaces study.md): two parallel Explore
  sub-agents (paper-side + code-side) feed a synthesis sub-agent that writes
  a stub astra.yaml — sub-analyses named, inputs/outputs declared at the
  sub-analysis level, narrative prose (no anchor refs yet). Rigor-dialed
  self-review pass.
- references/specify.md (REWRITE): per sub-analysis, paper pass authors
  decisions/prior_insights/findings + weaves astra-anchor refs; code pass
  surfaces material disagreements (canonical-resolution rule) and
  code-revealed insights; rigor-dialed self-review pass. Parallelizable
  across independent sub-analyses.
- references/review.md (the close-out, RENAMED from summarize_run.md):
  /figure-comparison + /check-sentence-by-sentence + open-questions
  walkthrough + REPRODUCTION-SUMMARY.md + outcome finalize. Header retitled.
- references/review.md (the old pre-implement audit) — DELETED. Discipline
  folds into ARCHITECT, SPECIFY, IMPLEMENT as their fresh-context
  rigor-dialed self-review passes.
- references/implement.md: sharpened wording so the rigor-dial discipline
  reads matched-shape with ARCHITECT/SPECIFY; index references switched
  from work/notes/study/ → work/notes/architect/.
- references/literature.md: cited_papers.yaml now produced by ARCHITECT;
  per-paper sub-agents target decision *clusters* from paper-index.md
  (concrete decisions don't exist until SPECIFY).
- references/acquire.md: STUDY references → ARCHITECT.
- references/compare.md: SUMMARIZE_RUN → REVIEW (close-out).

Constitution: [[lightcone/paper2astra-as-skill/architect-first]].

SKILL.md, interview.md, README.md, and the bundle constitution body land
in the next commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../skills/paper2astra/references/acquire.md  |   8 +-
 .../paper2astra/references/architect.md       | 285 ++++++++++++++++++
 .../skills/paper2astra/references/compare.md  |  10 +-
 .../paper2astra/references/implement.md       |  36 +--
 .../paper2astra/references/literature.md      |  22 +-
 .../skills/paper2astra/references/review.md   | 212 ++++++-------
 .../skills/paper2astra/references/specify.md  | 229 +++++++++-----
 .../skills/paper2astra/references/study.md    | 229 --------------
 .../paper2astra/references/summarize_run.md   | 106 -------
 9 files changed, 566 insertions(+), 571 deletions(-)
 create mode 100644 claude/lightcone/skills/paper2astra/references/architect.md
 delete mode 100644 claude/lightcone/skills/paper2astra/references/study.md
 delete mode 100644 claude/lightcone/skills/paper2astra/references/summarize_run.md

diff --git a/claude/lightcone/skills/paper2astra/references/acquire.md b/claude/lightcone/skills/paper2astra/references/acquire.md
index e294ac2c..3682ff0b 100644
--- a/claude/lightcone/skills/paper2astra/references/acquire.md
+++ b/claude/lightcone/skills/paper2astra/references/acquire.md
@@ -43,7 +43,7 @@ mkdir -p work/reference/source && cd work/reference/source && tar -xzf /tmp/<arx
 ls *.tex
 ```
 
-The LaTeX source gives clean equations, captions, tables, and bibliography — none of the math collapse, ligature artifacts, or caption flattening that plagues PDF extraction. **No conversion to markdown is needed.** Downstream phases (STUDY's section sub-agents, SPECIFY's evidence quotes) read `.tex` directly — Claude reads LaTeX fine, and rendering it to markdown only loses information. The tarball stays as `work/reference/source/`.
+The LaTeX source gives clean equations, captions, tables, and bibliography — none of the math collapse, ligature artifacts, or caption flattening that plagues PDF extraction. **No conversion to markdown is needed.** Downstream phases (ARCHITECT's paper-side Explore sub-agent, SPECIFY's evidence quotes) read `.tex` directly — Claude reads LaTeX fine, and rendering it to markdown only loses information. The tarball stays as `work/reference/source/`.
 
 If you want to identify the main `.tex` file for downstream tools:
 
@@ -60,7 +60,7 @@ cp "$(astra paper path 10.48550/arXiv.<arxiv-id>)" work/reference/paper.pdf
 
 `astra paper add` for arXiv DOIs fetches the PDF directly. The PDF stays as a backup for `astra validate --verify-evidence`, even though the LaTeX source is the primary text.
 
-There is no PARSE step on Path A. Equation numbers, section numbers, figure references — all preserved in the source. STUDY's sub-agents resolve `\ref{}` against `\label{}` directly in the source tree.
+There is no PARSE step on Path A. Equation numbers, section numbers, figure references — all preserved in the source. ARCHITECT's paper-side Explore sub-agent (and SPECIFY's evidence-quote pass) resolves `\ref{}` against `\label{}` directly in the source tree.
 
 ### Path B — non-arXiv paper (PDF + Docling fallback)
 
@@ -125,7 +125,7 @@ Skip Step 2 if `work/reference/code/` already exists.
 
 Run `ls work/reference/` first.
 
-- If `paper.pdf` is present and either `source/` (Path A) or `document.md` (Path B) is also present, ACQUIRE is done — proceed to STUDY.
+- If `paper.pdf` is present and either `source/` (Path A) or `document.md` (Path B) is also present, ACQUIRE is done — proceed to ARCHITECT.
 - If `paper.pdf` is present but neither structure exists, run the structuring step for the appropriate path.
 - If nothing is there, run the full ACQUIRE.
 
@@ -135,4 +135,4 @@ Run `ls work/reference/` first.
 - **Journal DOIs that 403 on Unpaywall** can be aliased to a locally-downloaded arXiv preprint via `astra paper add <JOURNAL_DOI> --pdf <path-to-arxiv-pdf>`.
 - **Path A is preferred whenever arXiv source is acquirable.** Math, ligatures, and caption fidelity all come through clean from the LaTeX source; PDF + Docling is the fallback for non-arXiv where there's no better source. The acquisition layer's ASTRA-side counterpart — `astra paper add` preferring LaTeX over PDF for the verification cache, and applying the same logic to bibliography references — is filed as a separate ASTRA issue; paper2astra inherits the improvement once it lands.
 - **Equation numbers and section numbers must match the rendered paper.** On Path A, the printed numbers come from the rendered tarball (look at the PDF if uncertain). On Path B, Docling preserves printed numbers in its markdown output. When citing "eq. N" or "§N" in any downstream phase, find the equation or heading by content, not by a naïve count of TeX blocks or markdown headings.
-- This phase's job is acquisition + structuring, not understanding. Do not start summarizing or comparing the paper here — that's STUDY.
+- This phase's job is acquisition + structuring, not understanding. Do not start indexing or comparing the paper here — that's ARCHITECT.
diff --git a/claude/lightcone/skills/paper2astra/references/architect.md b/claude/lightcone/skills/paper2astra/references/architect.md
new file mode 100644
index 00000000..2d31a7c4
--- /dev/null
+++ b/claude/lightcone/skills/paper2astra/references/architect.md
@@ -0,0 +1,285 @@
+# ARCHITECT — write the stub `astra.yaml`
+
+ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub in with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps the cognitive load on each phase manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
+
+This phase replaces the old STUDY. The old shape wrote per-section paper-vs-code agreement-check files in markdown — same content SPECIFY would re-author into `astra.yaml` next. The new shape skips the markdown intermediate: ARCHITECT writes the structural skeleton directly in YAML, and SPECIFY's per-sub-analysis paper-pass / code-pass authors the content. One translation layer fewer.
+
+The constitution's per-phase mode is **always sub-agent** for this phase. The work is two parallel Explore sub-agents (one paper-side, one code-side), then one synthesis sub-agent that produces the stub. After the stub lands, a rigor-dialed self-review pass cross-checks it against paper + code before SPECIFY runs.
+
+## Inputs
+
+- `work/reference/source/` (Path A — arXiv LaTeX) **or** `work/reference/document.md` + `work/reference/figures/` + `work/reference/tables/` + `work/reference/metadata.json` (Path B — Docling)
+- `work/reference/code/` — the reference code repo (when present)
+- The per-paper constitution — names the user's intended replication targets (figures, tables, numbers) in its **Desired State**
+- `work/notes/notes.md` — user-supplied prior notes, if any (read by every phase if present)
+
+## Outputs
+
+- `astra.yaml` — **stub form**: sub-analyses named, architecture wired (inputs / outputs declared at the sub-analysis level), high-level `narrative:` prose blocks per analysis. **No `decisions:`, `prior_insights:`, `findings:`, or `astra-anchor:` references yet** — those entries don't exist for the narrative to reference.
+- `work/notes/architect/paper-index.md` — paper-side Explore output: section list, sub-analysis boundary candidates, decision clusters, result loci (figures / tables / quoted numerics)
+- `work/notes/architect/code-index.md` — code-side Explore output: top-level module map, natural decomposition, entry-points, where the analysis stages live
+- `work/notes/cited_papers.yaml` — citations worth following up on for prior insights (what LITERATURE consumes); populated from the paper-side index
+- `work/notes/architect/review-round-<N>.md` — each rigor-dialed self-review round's findings (rigor only; one file per round)
+
+## Step 1: Two parallel Explore sub-agents
+
+Spawn two Task-tool sub-agents in parallel. Each is bounded — neither tries to compare paper to code, and neither writes `astra.yaml`. Their job is to give the synthesis sub-agent enough indexed context to draft the stub.
+
+### Paper-side Explore — system prompt
+
+> You are a paper-indexing agent. Read the paper and produce an index that the architecture-synthesis agent will use to decide the `astra.yaml` sub-analysis decomposition. **Do NOT read code; do NOT write `astra.yaml`.**
+>
+> ### Inputs
+>
+> - Paper text: `work/reference/source/*.tex` (Path A) or `work/reference/document.md` (Path B). Read the methods, results, and analysis-bearing intro / discussion sections in full. Skip front-matter (abstract, acknowledgments, author list) and back-matter (references, supplementary).
+> - User-supplied notes: `work/notes/notes.md` if present.
+>
+> ### What to extract
+>
+> 1. **Section list** with anchors (`\label{}` for Path A; markdown heading for Path B).
+> 2. **Sub-analysis boundary candidates.** Where does the paper's pipeline have natural seams — places one stage's output flows as the next stage's input? Look for: a reconstruction stage producing a catalog consumed by a clustering stage; an MCMC producing a chain consumed by a parameter-estimation stage; a fit producing posteriors consumed by a comparison stage. Name each candidate with a noun phrase (`reconstruction`, `clustering`, `bao_fit`) and one-line description.
+> 3. **Decision clusters per sub-analysis.** Group the paper's choices by where they sit in the pipeline. Don't enumerate every choice — name the *clusters* (e.g. "fitting prior choices", "selection criteria for the catalog"). SPECIFY drills back into the paper to author each `decisions:` entry; you're indicating where to look.
+> 4. **Result loci.** Which figures / tables / in-text metrics report the paper's primary and secondary results? Use `path:line` for the `\includegraphics{}` or table source (Path A); use `metadata.json` indexes for Path B. Tag each as primary / secondary based on the paper's own emphasis.
+> 5. **Citations worth following up.** Citations that justify a method, parameter, or value (not general background). DOI when resolvable + one-line on why the citation matters. The synthesis agent merges your list into `work/notes/cited_papers.yaml` for LITERATURE to mine.
+> 6. **Data-flow shape.** A short prose paragraph: "Inputs flow from <source datasets> through <stage 1> producing <intermediate>, into <stage 2> producing <intermediate>, into <stage 3> producing <primary result>." This becomes the seed for the root narrative's data-flow paragraph.
+>
+> ### Output format — `work/notes/architect/paper-index.md`
+>
+> ```markdown
+> # Paper index
+>
+> ## Sections
+> - <NN. Section title> — anchor `<label>` in `<path>`. Phase: methods | results | discussion | other.
+>
+> ## Sub-analysis candidates
+> - **<noun phrase id>** — <one-line role>; spans sections <list>; produces <output(s)>; consumes <input(s)>.
+>
+> ## Decision clusters (per candidate sub-analysis)
+> ### <sub-analysis id>
+> - **<cluster name>** — <where in the paper>; <one-line shape of the choices>.
+>
+> ## Result loci (primary + secondary)
+> - **<figure / table / metric>** — `<source-path:line>` or `metadata.json#<id>`; reported in §<X>; primary | secondary.
+>
+> ## Citations worth following up
+> - **<citation>** — DOI: <doi> — <one-line on why this citation matters for replication>.
+>
+> ## Data-flow shape
+> <one-paragraph prose: how inputs flow through the pipeline to the primary result>.
+> ```
+>
+> ### Rules
+>
+> - **Bounded read.** Do not read the code repo. Your job is paper-side only.
+> - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`. Those are SPECIFY's. Your output is markdown, not YAML.
+> - **Quote sparingly.** Brief paper quotes are OK to disambiguate a result locus or a sub-analysis boundary; verbatim claim quotes are SPECIFY's substrate, not yours.
+
+### Code-side Explore — system prompt
+
+> You are a code-indexing agent. Read the code repo and produce an index that the architecture-synthesis agent will use to decide the `astra.yaml` sub-analysis decomposition. **Do NOT read the paper; do NOT write `astra.yaml`.**
+>
+> ### Inputs
+>
+> - Code repo at `work/reference/code/`. Read the README, the entry-points, and follow imports to map the analysis pipeline. **Do NOT modify any code.**
+> - User-supplied notes: `work/notes/notes.md` if present.
+>
+> ### What to extract
+>
+> 1. **Top-level module map.** What lives where: each top-level directory or module file with a one-line role.
+> 2. **Natural decomposition.** Where does the code's pipeline split into independent stages? Most analysis pipelines have stage seams visible from imports — a `reconstruction/` module fed by `data/`, a `bao_fit/` module fed by `reconstruction/`. Name each stage with the same noun-phrase shape the paper-side index uses (the synthesis agent will reconcile names).
+> 3. **Entry-points.** Top-level scripts the user runs to produce primary results: `scripts/run_reconstruction.py`, `nbs/figure_4.ipynb`, etc. For each: which stage / output it produces, with a `path:line` to the main function.
+> 4. **External data dependencies.** What datasets the code expects to find at runtime — environment variables, config files, paths to catalogs. SPECIFY uses these for `inputs:`; this is the place to surface them.
+> 5. **Code-specific gotchas surfaced from the README or top-level docs.** Things the paper doesn't say but the code's own docs flag (a calibration version, a runtime requirement, a data preprocessing step). One bullet each, with `path:line`.
+>
+> ### Output format — `work/notes/architect/code-index.md`
+>
+> ```markdown
+> # Code index
+>
+> ## Module map
+> - `<path>` — <one-line role>.
+>
+> ## Natural decomposition
+> - **<noun phrase id>** — <one-line role>; entry-point `<path:line>`; consumes <input modules / data>; produces <output artifact paths or in-memory shapes>.
+>
+> ## Entry-points (top-level runnable scripts)
+> - **<script path>** — produces <output id>; main: `<path:line>`.
+>
+> ## External data dependencies
+> - **<dataset / env var / config path>** — read at `<path:line>`; <one-line on what's expected>.
+>
+> ## Code-specific gotchas
+> - **<gotcha>** — surfaced at `<path:line>`; <one-line on why it matters>.
+> ```
+>
+> ### Rules
+>
+> - **Bounded read.** Do not read the paper. Your job is code-side only.
+> - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`, no recipes. Your output is markdown, not YAML.
+> - **Trust the imports.** Module dependencies tell the natural decomposition story more reliably than the README's prose summary.
+
+## Step 2: Synthesis sub-agent — write the stub `astra.yaml`
+
+Spawn one synthesis sub-agent that reads both index files and writes the stub. This is where the structural decisions actually get made: the synthesis agent reconciles paper-side vs code-side sub-analysis decompositions, picks the unified set of sub-analysis IDs, wires inputs and outputs at the sub-analysis level, and authors the high-level `narrative:` prose blocks.
+
+> You are an ASTRA architecture-synthesis agent. You read paper-side and code-side indexes and produce the stub `astra.yaml` that SPECIFY will fill in.
+>
+> ### Inputs
+>
+> - `work/notes/architect/paper-index.md` — paper-side Explore output
+> - `work/notes/architect/code-index.md` — code-side Explore output (when present)
+> - `work/notes/notes.md` — user-supplied notes (if present)
+> - The per-paper constitution at the project root — its **Desired State** names the user's intended replication targets
+>
+> ### What to do
+>
+> 1. **Reconcile sub-analysis decompositions.** Read both index files' sub-analysis candidates. Where paper and code agree on a stage, use that name (noun-phrase, e.g. `reconstruction`). Where they disagree, the code's structure is canonical for stage boundaries — the paper compresses; the code reveals the actual decomposition. Where the code is absent, follow the paper alone.
+> 2. **Choose: one analysis or sub-analyses?** If the paper has only one stage end-to-end (no clean intermediate handoffs), write a single analysis. If the paper has genuinely independent stages (each one's output flows as the next one's input), write sub-analyses. Sub-analysis IDs must be noun phrases (not verb phrases): `reconstruction`, `clustering`, `bao_fit`. Avoid reserved names (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
+> 3. **Wire inputs and outputs at the sub-analysis level.** For each sub-analysis:
+>    - Declare `inputs:` from the data-dependency list in the code-side index plus any paper-named external datasets. The depth (acquisition path, selection criteria) is SPECIFY's; ARCHITECT names the input and gives it a stable id.
+>    - Declare `outputs:` matching the result loci from the paper-side index plus any intermediate artifacts a downstream sub-analysis consumes. Tag each output's `priority:` from the paper's emphasis (primary / secondary). The reproduction's targeted scope from the constitution's Desired State takes precedence — if the user only wants Figure 3 and Table 2, only those land as `outputs:` (the rest are out-of-scope and noted as such).
+> 4. **Author the root and per-analysis narrative.** Use `/narrative` for prose authoring (it carries the discipline on reserved names, voice, the data-flow paragraph requirement). High-level prose only — *no `astra-anchor:` references yet, because the entries those would point at don't exist*. SPECIFY will weave in anchors as it authors `decisions:` / `prior_insights:` / `findings:` per sub-analysis. The root `narrative:` MUST include a top-down end-to-end data-flow paragraph (per the narrative skill's data-flow rules — closes lightcone-cli#108) when sub-analyses exist.
+> 5. **Build `work/notes/cited_papers.yaml`** from the paper-side index's "Citations worth following up" entries:
+>    ```yaml
+>    papers:
+>      - doi: "10.xxxx/yyyy"
+>        citation: "Smith et al. (2020)"
+>        relevance: "One-line description of why this paper matters for replication"
+>    ```
+>    This is what LITERATURE mines.
+> 6. **Validate** with `astra validate astra.yaml`. The stub MUST validate as written — even with empty `decisions:` / `prior_insights:` / `findings:` blocks, the structural fields and the narrative prose must pass schema checks.
+>
+> ### Stub shape — what `astra.yaml` looks like after ARCHITECT
+>
+> ```yaml
+> # Stub: structure + narrative; SPECIFY fills decisions, findings, prior_insights, evidence, anchors.
+> id: <paper-slug>
+> title: "<paper title>"
+> doi: <doi>
+>
+> narrative:
+>   summary: |
+>     <high-level paragraph for the root analysis>
+>   methods: |
+>     <data-flow paragraph; required when sub-analyses exist>
+>
+> analyses:
+>   <sub-analysis-id-1>:
+>     narrative:
+>       summary: |
+>         <prose for this sub-analysis>
+>     inputs:
+>       <input-id>:
+>         <stable name; depth lives in SPECIFY>
+>     outputs:
+>       <output-id>:
+>         type: figure | table | metric | data-product
+>         priority: primary | secondary
+>         description: |
+>           <one-line on what this output is>
+>     decisions: {}      # SPECIFY fills
+>     prior_insights: {} # LITERATURE → SPECIFY fills
+>     findings: {}       # SPECIFY fills
+>
+>   <sub-analysis-id-2>:
+>     ...
+> ```
+>
+> ### Rules
+>
+> - **Stub, not snapshot.** Don't try to author content for `decisions:`, `prior_insights:`, `findings:`. Those go in SPECIFY. Your job is the structural skeleton.
+> - **Reserved names.** Sub-analysis IDs are noun phrases; avoid the reserved set listed above. Each ID must be unique across the spec.
+> - **Code-as-canonical for structure.** Where paper and code disagree on the decomposition, the code's structure is canonical (the paper compresses for narrative; the code reveals real seams).
+> - **Targeted scope wins.** The constitution's Desired State scopes the reproduction. If the user only wants Figures 3 and 4 plus Table 2, only those land as `outputs:` in the stub.
+> - **Narrative prose, no anchors.** Author `narrative:` prose at the root and per-sub-analysis level. Do NOT add `astra-anchor:` references — the entries those would point at don't exist yet.
+> - **Validate before exit.** `astra validate astra.yaml` must return clean.
+
+## Step 3: Rigor-dialed self-review
+
+After the stub lands, a fresh-context sub-agent cross-checks it against paper + code: are the sub-analyses the right decomposition? Are the inputs and outputs declared at the sub-analysis level wired correctly? Does the narrative prose accurately describe what each sub-analysis does?
+
+The depth of self-review is set by the constitution's frugality / rigor dial:
+
+- **Frugal:** skip review entirely, or run a single fresh-context sub-agent pass and incorporate its fixes once.
+- **Rigor:** N rounds — each round runs a fresh reviewer against `astra.yaml` + paper + code; ARCHITECT incorporates fixes (regenerate the stub or edit it directly for trivial cases); the next round runs another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes (the strong-termination criterion the loop already uses), or a 5-round system cap.
+
+The discipline matches REVIEW's old shape (folded here): each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; a separate fix pass (the orchestrator inline for trivial fixes, or another ARCHITECT iteration for structural changes) edits the stub.
+
+### Per-round fresh sub-agent — system prompt
+
+> You are an ARCHITECT-stub reviewer. Read `astra.yaml` (the stub), the paper, and the code (when present), and report any structural inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
+>
+> ### Inputs
+>
+> - `astra.yaml` — the stub under review (sub-analyses, inputs, outputs, narrative; `decisions:` / `prior_insights:` / `findings:` are intentionally empty at this stage, do NOT flag those as missing)
+> - `work/notes/architect/paper-index.md` — paper-side Explore output
+> - `work/notes/architect/code-index.md` — code-side Explore output (when present)
+> - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
+> - `work/reference/code/` (when present) — canonical reference for stage boundaries + entry-points
+> - The per-paper constitution — for the Desired State scope fence
+>
+> ### What to check
+>
+> 1. **Sub-analysis decomposition.** Are the sub-analyses the right cuts? Where the code structure shows a clean stage boundary, is the stub's split consistent with it? Where the paper compresses across stages, is the stub's decomposition still defensible against the code? Where there is no code, does the stub's decomposition match the paper's natural seams?
+> 2. **Sub-analysis IDs.** Noun phrases, not verb phrases. No reserved-name collisions (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
+> 3. **Inputs at sub-analysis level.** Each declared input has a stable id; the data dependency is real (cross-check against `work/notes/architect/code-index.md`'s External-data-dependencies list and the paper's data section). No phantom inputs invented to round out the structure.
+> 4. **Outputs at sub-analysis level.** Each declared output corresponds to a result locus from the paper-side index OR an intermediate artifact a downstream sub-analysis consumes. The targeted scope from the constitution's Desired State is honored — no out-of-scope outputs sneaking in, no in-scope targets missed.
+> 5. **Narrative coverage.** The root narrative includes a data-flow paragraph (when sub-analyses exist). Each sub-analysis's `narrative:` accurately describes its role. No `astra-anchor:` references at this stage (those land in SPECIFY); flag any that snuck in.
+> 6. **Validates.** `astra validate astra.yaml` returns clean.
+>
+> ### What NOT to do
+>
+> - **Do not flag empty `decisions:` / `prior_insights:` / `findings:`.** That's SPECIFY's territory. Your job is structural correctness of the stub.
+> - **Do not edit any file.** Your output is a findings file; an ARCHITECT-fix pass responds to the findings.
+> - **Do not re-read the entire paper.** Use Grep + the index files.
+> - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
+>
+> ### Output format — `work/notes/architect/review-round-<N>.md`
+>
+> ```markdown
+> # Architect-review round <N>
+>
+> Reviewer ran fresh against astra.yaml (stub), paper, and code.
+>
+> ## Findings
+>
+> ### <category — e.g. "Sub-analysis decomposition" / "Outputs" / "Narrative">
+>
+> - **<one-line finding>**
+>   - **What's wrong**: <quote or location of the structural problem>
+>   - **Where to fix**: <`astra.yaml#path/to/key` or `work/notes/architect/paper-index.md` row>
+>   - **Suggested fix**: <one-line concrete change>
+>   - **Source**: <paper §X.Y "quote" + index row, or code `path:line`>
+>
+> ## Verdict
+>
+> - **fixes_needed**: <count>
+> - **clean** | **needs-fixes**
+> ```
+
+### Termination
+
+- `weak` (frugal): one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
+- `strong` (rigor):
+  - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
+  - If round N is the first round (N=1), spawn round 2 unconditionally so we can compare.
+  - If round N produced fixes, spawn round (N+1) as a fresh sub-agent that does not see round N's findings or the fixes.
+  - If N hits the system cap of 5 rounds without two consecutive clean rounds, surface to the user via `AskUserQuestion`: "ARCHITECT review reached round cap with N fixes still landing; continue, accept the current stub, or revise the constitution?" Default on user silence: accept the current stub, log the unfinished tail in `<paper-slug>/open-questions.md`, proceed to LITERATURE.
+
+## Survey signals (entry into ARCHITECT)
+
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) exists ⇒ ready to architect
+- `work/notes/architect/paper-index.md` and `work/notes/architect/code-index.md` (if code present) exist ⇒ Explore pass done
+- `astra.yaml` exists; `astra validate astra.yaml` returns clean; sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks are present-and-empty ⇒ stub written
+- For frugal: `work/notes/architect/review-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ ARCHITECT done
+- For rigor: two consecutive `work/notes/architect/review-round-<N>.md` files both have verdict `clean` ⇒ ARCHITECT done; proceed to LITERATURE
+- `work/notes/cited_papers.yaml` exists ⇒ LITERATURE has its input
+
+## Notes
+
+- **Run the Explore sub-agents in parallel.** They're fully independent (one reads paper-only, one reads code-only). The synthesis agent runs once, after both index files exist.
+- **The Explore agents do not write `astra.yaml`.** They write index markdown. Only the synthesis agent writes the stub. This separation keeps each Explore agent's context bounded — they don't have to think about ASTRA's schema, only the read.
+- **The stub's empty blocks are intentional.** `decisions: {}`, `prior_insights: {}`, `findings: {}` make it clear at a glance that ARCHITECT's job is structural, and that SPECIFY is what fills them. Don't try to half-author content — empty is honest.
+- **Code-as-canonical for structure, paper-as-canonical for narrative voice.** The code reveals where the real stage boundaries are; the paper provides the words to describe them. The stub uses both.
+- **Resume is automatic.** If `astra.yaml` already validates and has the structural fields populated, the orchestrator skips Step 1 and Step 2 and runs Step 3 (review) only.
+- **The narrative skill is the prose author, not the structure author.** Invoke `/narrative` for the prose blocks; ARCHITECT's job is the structural skeleton plus invoking `/narrative` to fill the `narrative:` keys cleanly.
diff --git a/claude/lightcone/skills/paper2astra/references/compare.md b/claude/lightcone/skills/paper2astra/references/compare.md
index 6d2f16b5..c2fdf2bb 100644
--- a/claude/lightcone/skills/paper2astra/references/compare.md
+++ b/claude/lightcone/skills/paper2astra/references/compare.md
@@ -2,7 +2,7 @@
 
 Compare reproduced results against the paper's replication targets. Produce a structured verdict the IMPLEMENT-retry loop consumes.
 
-The constitution's per-phase mode is **user choice** for this phase — defaults to interactive for verdict ratification (was the reproduction close enough?), but a user who set the loop up to drive itself to terminal verdict can flip it to sub-agent. When sub-agent, COMPARE writes the report and the loop continues per the report's verdict; SUMMARIZE_RUN ratifies the final verdict at close-out.
+The constitution's per-phase mode is **user choice** for this phase — defaults to interactive for verdict ratification (was the reproduction close enough?), but a user who set the loop up to drive itself to terminal verdict can flip it to sub-agent. When sub-agent, COMPARE writes the report and the loop continues per the report's verdict; REVIEW (close-out) ratifies the final verdict at close-out.
 
 ## Inputs
 
@@ -73,19 +73,19 @@ Also write `comparison-report.md` with a human-readable summary. For figure / ta
 
 When COMPARE runs interactively, surface the verdict to the user via `AskUserQuestion` after writing the report:
 
-- **If `pass`**: confirm before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Proceed to close-out?"* The user accepts → SUMMARIZE_RUN runs interactively (renders `/figure-comparison`, walks the open-questions ledger, lands resolutions, finalizes the constitution outcome); the user rejects → name what's still off and re-enter the loop.
+- **If `pass`**: confirm before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Proceed to close-out?"* The user accepts → REVIEW (close-out) runs interactively (renders `/figure-comparison`, walks the open-questions ledger, lands resolutions, finalizes the constitution outcome); the user rejects → name what's still off and re-enter the loop.
 - **If `partial`**: show the user the failing targets and the diagnosis. *"Partial match. <N> outputs failing: <list>. Continue retrying or accept partial?"* If the attempt budget (from the constitution) is reached, this surfacing is mandatory.
 - **If `fail`**: same shape, but the loop's continuation should be questioned more sharply. A fundamental methodological issue may need a constitution amendment, not another implement retry.
 
-When COMPARE runs as a sub-agent, no `AskUserQuestion` — the report is the output. The loop reads the verdict and either retries (if budget remains and verdict is partial/fail) or proceeds to SUMMARIZE_RUN, where the user ratifies the final verdict during close-out.
+When COMPARE runs as a sub-agent, no `AskUserQuestion` — the report is the output. The loop reads the verdict and either retries (if budget remains and verdict is partial/fail) or proceeds to REVIEW (close-out), where the user ratifies the final verdict during close-out.
 
-The verdict is the agent's judgment; the **decision to keep iterating** is the user's, surfaced either at this seam (interactive COMPARE) or at SUMMARIZE_RUN's close-out (sub-agent COMPARE). Default on user silence: continue the loop until the attempt budget is exhausted, then mandatory user surfacing.
+The verdict is the agent's judgment; the **decision to keep iterating** is the user's, surfaced either at this seam (interactive COMPARE) or at REVIEW (close-out)'s close-out (sub-agent COMPARE). Default on user silence: continue the loop until the attempt budget is exhausted, then mandatory user surfacing.
 
 ## Survey signals (entry into COMPARE)
 
 - All outputs in `lc status --universe baseline` are `ok` ⇒ ready to compare
 - `comparison-report.yaml` exists with current `attempt` ⇒ COMPARE done for this attempt
-- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN (interactive close-out)
+- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to REVIEW (close-out) (interactive close-out)
 
 ## Notes
 
diff --git a/claude/lightcone/skills/paper2astra/references/implement.md b/claude/lightcone/skills/paper2astra/references/implement.md
index 31b4d7a4..ca73927a 100644
--- a/claude/lightcone/skills/paper2astra/references/implement.md
+++ b/claude/lightcone/skills/paper2astra/references/implement.md
@@ -1,15 +1,15 @@
 # IMPLEMENT — write scripts and recipes; rigor-dialed self-review
 
-Read `astra.yaml` (the spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, fresh-context sub-agents review the implementation against the paper and the code; SPECIFY-style fixes feed back into IMPLEMENT for the next iteration. The depth of self-review is set by the constitution's frugality / rigor dial.
+Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, a rigor-dialed self-review pass cross-checks the implementation against paper + code — same fresh-context-no-bias shape ARCHITECT and SPECIFY use. Fixes feed back into IMPLEMENT for the next iteration.
 
 The constitution's per-phase mode defaults this to **sub-agent**. Most implementation is mechanical (translate spec → script), but algorithm choices on tricky steps may want ratification. Where parallelization is feasible (multiple independent outputs from different scripts), spawn one sub-agent per output and merge.
 
 ## Inputs
 
-- `astra.yaml` — the structural spec
+- `astra.yaml` — the filled spec (sub-analyses, decisions, prior_insights, findings, narrative — all populated by SPECIFY)
 - `implementation-notes.md` — tricky algorithms, numerical gotchas, data-format quirks
-- `work/notes/methodology.md` — for context when the spec compresses
-- `work/notes/study/` — per-section paper-vs-code agreement-check files (Grep into for the verbatim claim and the canonical code location for any output you're implementing)
+- `work/notes/architect/paper-index.md` — for context when the spec compresses (sub-analysis decomposition, result loci, decision clusters)
+- `work/notes/architect/code-index.md` (when code present) — natural decomposition + entry-points + data dependencies + gotchas (the canonical map of where each sub-analysis's logic lives in `work/reference/code/`)
 - `work/reference/code/` (if present) — **canonical reference. Read on every iteration when implementing.** Where paper and code disagree, code wins for numerics, plotting, and method.
 
 ## Outputs
@@ -23,10 +23,10 @@ The constitution's per-phase mode defaults this to **sub-agent**. Most implement
 
 Read `astra.yaml` and `implementation-notes.md`. For each output, write a script in `scripts/` that produces it, and add a `recipe:` block to the output's entry in `astra.yaml` with `command:` and `inputs:`.
 
-If `work/reference/code/` exists, **read the relevant code on every iteration** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that the running spec hasn't resolved:
+If `work/reference/code/` exists, **read the relevant code on every iteration** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that SPECIFY's code pass missed:
 
 - **Interactive IMPLEMENT** (rare; usually sub-agent): surface via `AskUserQuestion`.
-- **Sub-agent IMPLEMENT** (default): continue with the code's behavior, append the disagreement to `<paper-slug>/open-questions.md`, and note it in `implementation-notes.md` so the next interactive seam can ratify or override.
+- **Sub-agent IMPLEMENT** (default): continue with the code's behavior, append the disagreement to `<paper-slug>/open-questions.md`, and note it in `implementation-notes.md` so REVIEW (close-out) can ratify or override.
 
 Without this discipline, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced.
 
@@ -34,9 +34,9 @@ Without this discipline, iterations drift to "looks right" rather than "matches"
 
 When outputs are produced by independent scripts (no shared expensive computation), spawn one Task-tool sub-agent per output. Each sub-agent gets:
 
-- The output's spec entry from `astra.yaml`
+- The output's spec entry from `astra.yaml` (including its sub-analysis's `decisions:` / `findings:` for context)
 - The relevant section of `implementation-notes.md`
-- The matching `work/notes/study/<NN>-<slug>.md` row(s) for the verbatim paper claim and the canonical code location
+- The matching entry in `work/notes/architect/code-index.md`'s natural-decomposition / entry-points block — that's the pointer back to the canonical code location for the sub-analysis the output lives in
 - The relevant code path(s) under `work/reference/code/`
 
 The orchestrator merges scripts and recipes after the per-output sub-agents finish. Tightly-coupled outputs (e.g. an MCMC producing both a chain and a summary statistic) stay in one sub-agent and one script.
@@ -57,7 +57,7 @@ After the first-pass implementation lands, the constitution's frugality / rigor
 - **Frugal:** one minimal review pass — a single fresh sub-agent reads `scripts/`, `astra.yaml`'s recipes, and the paper, and reports any obvious paper-vs-implementation inconsistencies. Fixes are applied once; no further iteration. If no fixes are needed, IMPLEMENT proceeds to RUN.
 - **Rigor:** N rounds of fresh-context sub-agent review + fix. Each round runs a fresh reviewer that does not see the prior round's findings or fixes. Stop when **two consecutive rounds find no fixes** (strong termination criterion), or after 5 rounds (system cap), whichever comes first.
 
-The discipline is the same as REVIEW's: each round's reviewer is fresh, prompted to check "is the implementation consistent with the paper and the code?", and outputs findings only — not edits. Fixes are applied between rounds by a separate IMPLEMENT-fix sub-agent (or the orchestrator inline for trivial mechanical fixes).
+The discipline is the same shape ARCHITECT and SPECIFY use: each round's reviewer is fresh, prompted to check "is the implementation consistent with the paper and the code?", and outputs findings only — not edits. Fixes are applied between rounds by a separate IMPLEMENT-fix sub-agent (or the orchestrator inline for trivial mechanical fixes). Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
 
 ### Per-round fresh sub-agent — system prompt
 
@@ -66,18 +66,18 @@ The discipline is the same as REVIEW's: each round's reviewer is fresh, prompted
 > ### Inputs
 >
 > - `scripts/` — first-pass implementation
-> - `astra.yaml` — the spec (recipes are part of the implementation; structural fields are SPECIFY's)
+> - `astra.yaml` — the spec (recipes are part of the implementation; structural + content fields are ARCHITECT's and SPECIFY's)
 > - `implementation-notes.md`
-> - `work/notes/methodology.md` — Grep into; do not re-read whole
-> - `work/notes/study/` — per-section paper-vs-code agreement-check files (the verbatim claims and code locations you're checking against)
+> - `work/notes/architect/paper-index.md` — Grep into; do not re-read whole
+> - `work/notes/architect/code-index.md` (when present) — natural decomposition + entry-points + gotchas
 > - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep)
 > - `work/reference/code/` (when present) — canonical reference for numerics + method
 >
 > ### What to check
 >
 > 1. **Recipe coverage.** Every output in `astra.yaml` has a recipe; every recipe runs a script that exists in `scripts/`.
-> 2. **Method fidelity.** For each output, the script implements the method described in the matching `work/notes/study/<NN>-<slug>.md` row. Where paper and code disagreed (material-disagreement rows), the script follows the code's method (canonical-resolution rule), unless the spec explicitly recorded a different override in `decisions:` and `universes/baseline.yaml`.
-> 3. **Numerical correctness.** Constants, hyperparameters, threshold values match the paper (or the code, where the canonical-resolution rule applied). Flag mismatches with `path:line` of the script and the paper §/eq.
+> 2. **Method fidelity.** For each output, the script implements the method described by the relevant sub-analysis's `decisions:` and `findings:` in `astra.yaml` (which carry the verbatim paper quotes and code anchors). Where SPECIFY's code pass surfaced a material disagreement, the script follows the code's method (canonical-resolution rule), unless the spec recorded a different override in `decisions:` and `universes/baseline.yaml`.
+> 3. **Numerical correctness.** Constants, hyperparameters, threshold values match the paper (or the code, where the canonical-resolution rule applied). Flag mismatches with `path:line` of the script and the paper §/eq + the relevant `astra.yaml#analyses.<sub-id>.decisions.<key>` entry.
 > 4. **Data acquisition.** Scripts that fetch data use the real acquisition path from `astra.yaml`'s inputs — no synthetic / mock substitutes.
 > 5. **Determinism.** Scripts set random seeds where the paper's method is stochastic. Library versions in `requirements.txt` are pinned where reproducibility requires it.
 > 6. **Recipe wiring.** Recipe `inputs:` references match the data-flow the scripts actually consume; no orphan dependencies, no missing dependencies.
@@ -85,7 +85,7 @@ The discipline is the same as REVIEW's: each round's reviewer is fresh, prompted
 > ### What NOT to do
 >
 > - **Do not edit any file.** Your output is a findings file; an IMPLEMENT-fix pass responds to the findings.
-> - **Do not re-read the entire paper.** Grep into `work/notes/study/` and `work/reference/source/` (or `document.md`) for the specific claims you want to verify.
+> - **Do not re-read the entire paper.** Grep into `work/notes/architect/` and `work/reference/source/` (or `document.md`) for the specific claims you want to verify; the filled `astra.yaml` is your primary source for what each sub-analysis is supposed to do.
 > - **Do not invent problems.** If the implementation matches paper + code, say so briefly.
 > - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
 >
@@ -104,7 +104,7 @@ The discipline is the same as REVIEW's: each round's reviewer is fresh, prompted
 >   - **What's wrong**: <quote or `script:line` of the implementation problem>
 >   - **Where to fix**: <`scripts/<file>.py:line` or `astra.yaml#path/to/recipe`>
 >   - **Suggested fix**: <one-line concrete change>
->   - **Source**: <paper §X.Y "quote" + `work/notes/study/<id>` row, or code `path:line`>
+>   - **Source**: <paper §X.Y "quote" + `astra.yaml#analyses.<sub-id>.decisions.<key>` evidence, or code `path:line`>
 >
 > ## Verdict
 >
@@ -145,12 +145,12 @@ A retry attempt re-runs the IMPLEMENT-review iterations on the changed scripts b
 - `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done
 - For frugal: `work/notes/implement-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ IMPLEMENT done
 - For rigor: two consecutive `work/notes/implement-review/round-<N>.md` files both have verdict `clean` ⇒ IMPLEMENT done; proceed to RUN
-- `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN
+- `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to REVIEW (close-out)
 
 ## Notes
 
 - **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
 - **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
 - **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
-- **The fresh-context discipline is the same as REVIEW's.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Each round must spawn a brand-new sub-agent.
+- **The fresh-context discipline is the same as ARCHITECT's and SPECIFY's self-review.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Each round must spawn a brand-new sub-agent.
 - **Minimize churn in fixes.** Targeted edits, not restructures. Big restructures defeat the round-over-round comparison the orchestrator uses to decide termination.
diff --git a/claude/lightcone/skills/paper2astra/references/literature.md b/claude/lightcone/skills/paper2astra/references/literature.md
index 5a0668cd..6d6b9753 100644
--- a/claude/lightcone/skills/paper2astra/references/literature.md
+++ b/claude/lightcone/skills/paper2astra/references/literature.md
@@ -6,8 +6,9 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 
 ## Inputs
 
-- `work/notes/cited_papers.yaml` — the list of papers to mine, from STUDY
-- `work/notes/methodology.md` — has the decision map; each per-paper sub-agent gets it as context
+- `work/notes/cited_papers.yaml` — the list of papers to mine, from ARCHITECT (paper-side Explore output, merged by the synthesis sub-agent)
+- `work/notes/architect/paper-index.md` — has the decision clusters per sub-analysis; each per-paper sub-agent gets it as context
+- `astra.yaml` — the stub from ARCHITECT (sub-analyses + outputs declared; `decisions:` empty); per-paper sub-agents read the structure to know what they're searching for evidence about
 - `work/reference/source/` (Path A — arXiv LaTeX) or `work/reference/document.md` (Path B — Docling) — the target paper (for reference)
 
 ## Outputs
@@ -22,11 +23,11 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 > ### Instructions
 >
 > 1. Read the PDF at the path provided below using the Read tool.
-> 2. Review the decision map provided below — these are the specific decisions you are looking for evidence about.
-> 3. Scan the cited paper for findings that support, contradict, or compare the options listed in those decisions. Focus on:
->    - Empirical comparisons between approaches listed as decision options
->    - Performance benchmarks or validation results relevant to the choices
->    - Recommendations or caveats about specific methods/parameters
+> 2. Review the **decision clusters** provided below (from `work/notes/architect/paper-index.md`) — these are the *areas* where the target paper makes choices that bear on numerical results. Concrete decision options haven't been authored yet (SPECIFY does that after LITERATURE) — your job is to find evidence about the cluster, and SPECIFY links it to specific options once they exist.
+> 3. Scan the cited paper for findings that support, contradict, or compare approaches within those clusters. Focus on:
+>    - Empirical comparisons between approaches that are candidates within a cluster
+>    - Performance benchmarks or validation results relevant to the choices the cluster represents
+>    - Recommendations or caveats about specific methods / parameters in the cluster's scope
 > 4. For each relevant finding, extract:
 >    - A clear claim (1–2 sentences stating what we learned)
 >    - An exact quote from the paper (verbatim, 1–3 sentences)
@@ -89,8 +90,8 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 >     scope: "<when this applies -- optional>"
 >
 > decision_links:
->   <decision_id>:
->     <option_id>:
+>   <decision_cluster_or_id>:
+>     <provisional_option_label>:
 >       - <insight_id>
 > ```
 >
@@ -100,7 +101,8 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 > - Quotes must be EXACT — copy verbatim from the PDF, no paraphrasing or whitespace normalization.
 > - Prefix and suffix must be real surrounding page text, not editorial parentheticals.
 > - One claim per insight — do not combine multiple findings.
-> - Only extract insights relevant to the target decisions listed below.
+> - Only extract insights relevant to the target decision clusters listed below.
+> - `decision_links` keys reference clusters by the names ARCHITECT used in `work/notes/architect/paper-index.md`. SPECIFY rewires these to concrete `decision_id:option_id` keys when it authors the actual decisions.
 > - If no relevant insights found, write `insights: {}` and `decision_links: {}`.
 > - prefix and suffix are REQUIRED for every TextQuoteSelector.
 
diff --git a/claude/lightcone/skills/paper2astra/references/review.md b/claude/lightcone/skills/paper2astra/references/review.md
index 50a16226..4f7f8b20 100644
--- a/claude/lightcone/skills/paper2astra/references/review.md
+++ b/claude/lightcone/skills/paper2astra/references/review.md
@@ -1,149 +1,109 @@
-# REVIEW — rigor-dialed fresh-context spec audit
+# REVIEW — interactive close-out
 
-A fresh-context sub-agent reads `astra.yaml` against the paper and the code and asks "is this consistent?" The reviewer never sees what was just implemented or fixed last round — its only job is first-principles cross-reference. SPECIFY incorporates fixes; a *fresh* reviewer re-runs; iterate until two consecutive rounds find nothing or a configured cap is hit.
+The reproduction has converged (verdict `pass` or user-accepted `partial`). Control returns to the user. REVIEW is the second always-interactive bookend (INTERVIEW being the first); it runs in the main loop session, not as a sub-agent, so it can use `AskUserQuestion` and invoke sibling skills that need user reach. Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and finalize the constitution outcome — in one interactive arc.
 
-REVIEW's depth is set by the constitution's **frugality / rigor** dial (see "Rigor vs frugality" in `../SKILL.md`):
+The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding into ARCHITECT, SPECIFY, and IMPLEMENT as their rigor-dialed self-review passes. This close-out is what the previous shape called SUMMARIZE_RUN.
 
-- **Frugal:** skip REVIEW entirely, or run a single fresh sub-agent pass and incorporate its fixes once.
-- **Rigor:** N rounds — each round runs a fresh reviewer; SPECIFY incorporates fixes; the next round runs *another* fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes (the strong termination criterion the loop already uses), or a system cap of 5 rounds, whichever is sooner.
+The constitution's per-phase mode is **always interactive** for this phase. It does not run as a sub-agent. There is no "silent close-out" path; the close-out is the human's review.
 
-The constitution's per-phase mode defaults this to **sub-agent**; interactive REVIEW is rare (a paper that hits the SPECIFY conflict-surfacing path heavily may want a human in the loop).
+## Inputs
 
-## Why fresh-context sub-agents
+- `astra.yaml` — final spec (validates with `--verify-evidence` if literature.yaml exists)
+- `comparison-report.yaml`, `comparison-report.md` — final verdict
+- `targets/targets.md` — what was being matched against; reference figures / tables in `targets/`
+- `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
+- `<paper-slug>/open-questions.md` — running report from sub-agent phases (paper-vs-code conflicts, ambiguities, anything sub-agents flagged for user resolution)
+- `work/notes/architect/paper-index.md` and `work/notes/architect/code-index.md` — for context
+- The constitution at the project root — its `outcome:` field needs the final write
+- `<paper-slug>/CLAUDE.md` — paper identity, code location
 
-A reviewer that has just helped fix `astra.yaml` will pattern-match on its own fixes rather than re-reading the paper. Catching the *next* class of inconsistency requires a fresh context that doesn't carry the prior round's framing. The sub-agent's prompt must therefore say "check `astra.yaml` is consistent with the paper and the code" — never "here's what was just fixed; check it." The reviewer's only inputs are the paper, the code, and the spec.
+## Outputs
 
-This also bounds the work: each round is one fresh sub-agent over a bounded artifact. Rigor doesn't mean "longer review" — it means "more independent reviewers."
+- `.lightcone/comparison.html` — `/figure-comparison`'s portable side-by-side report (paper artifacts vs reproduced)
+- (Optional) `.lightcone/check-sentence-by-sentence.md` — `/check-sentence-by-sentence`'s claim audit (file:line or NOT FOUND per sentence)
+- `<paper-slug>/open-questions.md` — same file, but with `## Resolutions` section appended capturing what the user said for each entry
+- Edits to `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` if any open-question resolution warrants a spec change
+- `REPRODUCTION-SUMMARY.md` — final report; concise (~1–2 pages)
+- Constitution `outcome:` rewritten to its final form
+- A commit closing out the reproduction
 
-## Inputs
+## Step 1: render the validation surfaces
 
-- `astra.yaml` — the spec from SPECIFY (the artifact under review)
-- `universes/baseline.yaml` — the universe selection
-- `implementation-notes.md` — practical guidance for IMPLEMENT
-- `targets/targets.md` — coverage obligations
-- `work/notes/methodology.md` — consolidated decision map / results inventory / data sources (Grep into for cross-reference; do not re-read whole)
-- `work/notes/study/` — per-section paper-vs-code agreement-check files (Grep into for verbatim claims and code locations)
-- `work/notes/literature.yaml` (if present) — for evidence verification
-- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
-- `work/reference/code/` (if present) — original code, canonical reference for numerics + method
+### `/figure-comparison` (mandatory)
 
-## Outputs
+Invoke the `/figure-comparison` skill from this session. It builds a portable HTML side-by-side comparing paper artifacts (from `targets/`) to reproduced artifacts (from `results/<universe>/`). The skill uses `AskUserQuestion` for any inputs it can't infer from the workdir; that works because REVIEW is interactive — the prompts land in this session.
 
-- In-place edits to `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` driven by reviewer findings — written by SPECIFY in response, **not** by the reviewer itself
-- `work/notes/review/round-<N>.md` — each round's reviewer findings (one file per round; the orchestrator passes round-N's findings to SPECIFY for fixing, then spawns round-(N+1) as a fresh sub-agent that does not see round-N's findings)
-
-## Step 1: orchestrator decides round count from the constitution's rigor dial
-
-Read the constitution's termination-criterion field:
-
-- `weak` (frugal) → at most one round; if no fixes found, REVIEW is done. If skipping is preferred (the user said "skip review"), the orchestrator records "REVIEW skipped per constitution" in the workdir and proceeds to IMPLEMENT.
-- `strong` (rigor) → iterate. Stop when **two consecutive rounds find no fixes**, or after 5 rounds (system cap), whichever comes first.
-
-## Step 2: per-round fresh sub-agent — system prompt
-
-Spawn one Task-tool sub-agent per round. Each round's sub-agent gets only the inputs above — never the prior round's findings, never a description of what was just fixed.
-
-> You are an ASTRA-spec reviewer. Read `astra.yaml`, the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
->
-> ### Inputs
->
-> - `astra.yaml` — the spec under review
-> - `universes/baseline.yaml`
-> - `implementation-notes.md`
-> - `targets/targets.md`
-> - `work/notes/methodology.md` — consolidated paper-derived decision map (Grep into; do not re-read whole)
-> - `work/notes/study/` — per-section paper-vs-code agreement-check files (Grep into for verbatim claims and code locations)
-> - `work/notes/literature.yaml` (if present)
-> - `work/reference/source/` (arXiv LaTeX; preferred) or `work/reference/document.md` (Docling fallback) — paper text (Grep into; do not re-read whole)
-> - `work/reference/code/` (when present) — canonical reference for numerics + method
->
-> ### What to check
->
-> 1. **Target coverage.** Every entry in `targets/targets.md` must appear in `astra.yaml` as an output, finding, input, decision, or universe default. Any missing target either earns a spec home or an explicit out-of-scope reason in `targets.md`.
-> 2. **Output definitions.** Each output has a clear `type` and sufficient description.
-> 3. **Methodology coverage.** Cross-check `work/notes/methodology.md` against the spec for gaps: missing hyperparameters, underspecified algorithms, vague data-processing steps. Grep targeted sections of the paper to confirm.
-> 4. **Decisions.** Decisions cover what affects reproducibility. Cosmetic / pure-tooling choices should not be decisions; anything material that is missing should be added. `universes/baseline.yaml` must be consistent with the paper's reported choices (or with the code's, when paper-vs-code resolution applied per the canonical-resolution rule).
-> 5. **Data acquisition.** Every input has a concrete acquisition path — a download URL, database query, API call, or package name. Vague references ("available upon request", no source named) are flagged.
-> 6. **Implementation-notes completeness.** Does `implementation-notes.md` flag the tricky parts the IMPLEMENT phase will hit? Cross-check against `work/notes/study/<NN>-<slug>.md` material-disagreement entries — every paper-vs-code material disagreement that landed in the spec should also appear in implementation-notes for IMPLEMENT.
-> 7. **Evidence verification.** If `work/notes/literature.yaml` exists, run `astra validate astra.yaml --verify-evidence`. Flag any misquotes or unsupported claims; these typically arise when a quote was paraphrased or when prefix/suffix carry editorial commentary instead of real surrounding text.
-> 8. **Code-as-canonical applied.** Where paper and code disagree on a material choice (per `work/notes/study/`'s material-disagreement rows), check that `universes/baseline.yaml` selects the code's choice, OR that an interactive seam recorded a different user choice. Flag any material disagreement where the spec silently picked the paper without recording an explicit override.
-> 9. **No synthetic data.** Unless the paper itself uses synthetic data, every input has a real acquisition source — no mock / synthetic substitutes anywhere in the spec, recipes, or implementation-notes.
->
-> ### What NOT to do
->
-> - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; SPECIFY responds to the findings. Editing here defeats the multi-round-fresh-context discipline.
-> - **Do not re-read the entire paper.** Use Grep to look up specific claims you want to verify. Work primarily from `work/notes/methodology.md` and `work/notes/study/`.
-> - **Do not invent problems.** If the spec is consistent with paper + code, say so briefly.
-> - **Do not assume a prior reviewer has been here.** You are fresh. Treat this as a first-principles read.
->
-> ### Output format — `work/notes/review/round-<N>.md`
->
-> ```markdown
-> # Review round <N>
->
-> Reviewer ran fresh against `astra.yaml`, paper, and code.
->
-> ## Findings
->
-> ### <category — e.g. "Target coverage" / "Decisions" / "Data acquisition" / "Evidence">
->
-> - **<one-line finding>**
->   - **What's wrong**: <quote or location of the spec problem>
->   - **Where to fix**: <`astra.yaml#path/to/key` or `implementation-notes.md`>
->   - **Suggested fix**: <one-line concrete change>
->   - **Source**: <paper §X.Y "quote" + `work/notes/study/<id>` row, or code `path:line`>
->
-> ## No-fix sections
->
-> Brief one-liners for sections that look clean (so the orchestrator knows you actually checked).
->
-> ## Verdict
->
-> - **fixes_needed**: <count>
-> - **clean** | **needs-fixes**
-> ```
->
-> Be concise. The orchestrator reads this file to decide whether to spawn another round and what SPECIFY needs to fix.
-
-## Step 3: SPECIFY incorporates findings
-
-After the round's findings file lands, SPECIFY (or the orchestrator playing SPECIFY for trivial mechanical fixes) edits `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`, run:
-
-```bash
-astra validate astra.yaml
-```
+Output lands at `.lightcone/comparison.html`. Show the user the path and offer to open it (`open` on macOS, `xdg-open` on Linux, or just print the path so they click in their terminal).
 
-If literature.yaml is present:
+**Do not spawn `/figure-comparison` under the `Task` tool.** It has `AskUserQuestion` in its `allowed-tools`; a Task-tool sub-agent has no user-reach, so the prompt fires into nothing.
 
-```bash
-astra validate astra.yaml --verify-evidence
-```
+### `/check-sentence-by-sentence` (opt-in)
+
+Ask the user via `AskUserQuestion` whether they want the claim audit. It's optional because for many reproductions the figure-comparison already settles "did it match?"; the sentence-by-sentence audit earns its keep when the paper makes many specific quantitative claims and the user wants each one anchored to a code location.
+
+If yes, invoke `/check-sentence-by-sentence`. Same discipline as `/figure-comparison` — it can prompt the user; do not spawn under `Task`.
+
+Output lands at `.lightcone/check-sentence-by-sentence.md` (or wherever the skill writes it). Show the user the path.
+
+## Step 2: walk `<paper-slug>/open-questions.md` with the user
+
+Read `<paper-slug>/open-questions.md`. For each unresolved entry, surface it via `AskUserQuestion` with:
+
+- **The question** (verbatim from the file)
+- **Origin** — which phase / sub-agent flagged it
+- **The default the loop applied** (if any — e.g. "code as canonical")
+- **Three options**: ratify the default, override (user spells out their choice), or defer (leave as a known limitation in the final report)
 
-The orchestrator records what was fixed in a small commit per round so `git log` shows the chain.
+Append a `## Resolutions` section to `<paper-slug>/open-questions.md` capturing what the user said for each entry. This makes the resolution durable — re-runs and future sessions see it.
 
-## Step 4: termination check
+If a resolution warrants a spec change (the user picks an override), edit `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` accordingly and re-run `astra validate astra.yaml`. If the change would invalidate the comparison report (e.g. flips the canonical method for a primary output), surface that to the user — in most cases the reproduction is "done" and the override is a known limitation, but the user may choose to re-enter the loop.
 
-After SPECIFY incorporates the round's fixes, the orchestrator decides whether to spawn another round:
+## Step 3: write `REPRODUCTION-SUMMARY.md`
+
+A single markdown file at the project root, ~1–2 pages. Sections:
+
+1. **What was reproduced** — the paper, the scope, the targets.
+2. **Verdict** — pass / partial. If partial, what failed and why we accepted it.
+3. **Material decisions** — the paper-vs-code conflicts SPECIFY's code pass surfaced, what the user chose (interactively or by canonical-resolution default), and why.
+4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target with the path to the reproduced result and a one-line match note from the comparison report.
+5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded.
+6. **Resolved open questions** — pull from `<paper-slug>/open-questions.md`'s `## Resolutions` section. One bullet per question + its resolution.
+7. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`).
+
+Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
+
+## Step 4: finalize the constitution outcome
+
+Rewrite the constitution's `outcome:` field to its final form. Now the user has walked the validation surfaces, ratified the open questions, and accepted (or explicitly partially-accepted) the reproduction. Write the outcome that teaches:
+
+> Reproduced <paper> against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to <reason>. Material conflicts surfaced and resolved: <list>. Open questions resolved: <count> (full chain in `open-questions.md`). Spec at `astra.yaml` (validates with `--verify-evidence`); side-by-side at `.lightcone/comparison.html`; full report at `REPRODUCTION-SUMMARY.md`.
+
+The outcome should stand on its own — someone reading just `felt show <reproduction-fiber>` (or the kanban card) should learn the verdict, the material decisions that landed, and where the artifacts live. No "see the body for details."
+
+## Step 5: commit
+
+Stage `REPRODUCTION-SUMMARY.md`, `<paper-slug>/open-questions.md` (with resolutions), the constitution with the final outcome, the final `astra.yaml`, the comparison artifacts, and any housekeeping changes. Commit with a message that names the verdict and the close-out:
+
+```
+review: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMMARY.md
+```
 
-- `weak` (frugal): one pass is enough. Done.
-- `strong` (rigor):
-  - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done (two consecutive clean rounds = strong termination criterion).
-  - If round N is the first round (N=1), spawn round 2 unconditionally so we can compare.
-  - If round N produced fixes, spawn round (N+1) as a fresh sub-agent that does not see round N's findings or the fixes.
-  - If N hits the system cap of 5 rounds without two consecutive clean rounds, surface to the user: "REVIEW reached round cap with N fixes still landing; continue, accept the current spec, or revise the constitution?" via `AskUserQuestion`. Default on user silence: accept the current spec, log the unfinished tail in `<paper-slug>/open-questions.md`, and proceed.
+After the commit, optionally flip the constitution's status to `closed` (or whatever the per-paper conventions name) so future surveys recognize the reproduction is done.
 
 ## Survey signals (entry into REVIEW)
 
-- `astra.yaml` exists and `astra validate astra.yaml` returns clean ⇒ ready to review
-- `work/notes/review/round-1.md` exists ⇒ first round done
-- For frugal: `round-1.md` exists with verdict `clean` (or no fixes were incorporated) ⇒ REVIEW done
-- For rigor: two consecutive `round-<N>.md` and `round-<N-1>.md` files both have verdict `clean` ⇒ REVIEW done; proceed to IMPLEMENT
-- `astra validate astra.yaml --verify-evidence` returns clean (when literature.yaml exists) ⇒ evidence side validated
+- `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready to close out
+- `.lightcone/comparison.html` exists ⇒ `/figure-comparison` rendered
+- `<paper-slug>/open-questions.md` has a `## Resolutions` section covering every entry ⇒ open-questions walkthrough done
+- `REPRODUCTION-SUMMARY.md` exists ⇒ final report written
+- Constitution `outcome:` reflects the final state ⇒ REVIEW done; reproduction complete
 
 ## Notes
 
-- **REVIEW does not write code.** Its outputs are findings; SPECIFY's edits to the spec / notes implement them.
-- **The fresh-context discipline is load-bearing.** A reviewer that sees the prior round's findings or fixes pattern-matches on them and stops finding the next class of inconsistency. Each round must spawn a brand-new sub-agent with only paper + code + spec as inputs.
-- **Minimize churn in fixes.** SPECIFY's edits should target the specific finding, not restructure surrounding spec. Big restructures defeat the round-over-round comparison the orchestrator uses to decide termination.
-- **A clean REVIEW reduces IMPLEMENT thrash.** It is worth running even when SPECIFY's output looked fine — fresh-context cross-checks catch "looks fine in isolation, breaks under full coverage" gaps.
-- **For frugal runs, REVIEW can be skipped when SPECIFY ran interactively** and the user already ratified material conflicts. The constitution records the skip; iterations honor it.
+- **This phase runs interactively in the main loop session.** Do not spawn it under `Task`. The whole point of REVIEW (close-out) is that the user is reachable — every step uses `AskUserQuestion` (directly, or via the sibling skills it invokes).
+- **`/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`.** That's why REVIEW is the always-interactive close-out and they live here, not in the loop. Spawning either under `Task` from inside the loop fires prompts into nothing.
+- **The user owns the verdict-acceptance decision.** REVIEW's purpose is to let the user see what the loop did and decide whether they accept it. The skill renders surfaces and asks; it does not unilaterally close.
+- **Don't confuse with the rigor-dialed self-reviews.** ARCHITECT, SPECIFY, and IMPLEMENT each run their own internal fresh-context self-review passes during the loop. Those are unrelated to this close-out — same word, different jobs. The phase boundary makes them unambiguous: rigor-dial reviews live inside their host phase's reference; this one is the always-interactive close-out.
+- **Open-question resolutions are durable.** Append to `<paper-slug>/open-questions.md`'s `## Resolutions` section so the next re-run / future session sees what was decided. Do not delete the original questions.
+- **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous.
+- **Do not invent further work.** If the constitution's evidence checks all pass, the reproduction is done. The next session, the human, or a future revisit can decide whether the reproduction's place still serves them.
diff --git a/claude/lightcone/skills/paper2astra/references/specify.md b/claude/lightcone/skills/paper2astra/references/specify.md
index 2960eab8..ebe6b25c 100644
--- a/claude/lightcone/skills/paper2astra/references/specify.md
+++ b/claude/lightcone/skills/paper2astra/references/specify.md
@@ -1,127 +1,210 @@
-# SPECIFY — author the ASTRA spec (and formalize the targets)
+# SPECIFY — fill the stub `astra.yaml`, two passes per sub-analysis
 
-Read the paper and accumulated notes; produce the structured ASTRA spec, the baseline universe, the implementation notes, and the small target ledger COMPARE consumes. SPECIFY is the **first user-ratification seam** — material paper-vs-code conflicts surface here, target-formalization happens here, and the default mode is interactive so the user can ratify.
+Read the stub `astra.yaml` from ARCHITECT and fill in `decisions:`, `prior_insights:`, `findings:` per sub-analysis, weaving the existing narrative with `astra-anchor:` references as entries land. SPECIFY is the **first user-ratification seam** — material paper-vs-code conflicts surface here; the default mode is interactive so the user can ratify.
 
-The constitution's per-phase mode defaults to **interactive** for this phase, but the user can flip it. When SPECIFY runs as a sub-agent, it falls back to the canonical-resolution rule (code wins where paper and code disagree) and surfaces unresolved conflicts to `<paper-slug>/open-questions.md`.
+This phase replaces the old SPECIFY's monolithic shape. The new structure runs **two passes per sub-analysis** (paper, then code, when code exists), then a rigor-dialed self-review pass. The two passes are the cross-check: the paper pass authors what the paper says; the code pass surfaces where the code says something different; the difference is gold (it's where the reproduction has to make a decision).
+
+The constitution's per-phase mode defaults to **interactive** for this phase, but the user can flip it. When SPECIFY runs as a sub-agent, it falls back to the canonical-resolution rule (code wins where paper and code disagree on a material choice) and surfaces unresolved conflicts to `<paper-slug>/open-questions.md`.
+
+Per-sub-analysis work is parallelizable when sub-analyses are independent. Each sub-analysis's two passes (paper, then code) run sequentially within that sub-analysis; across sub-analyses the work fans out.
 
 ## Inputs
 
-- `work/notes/methodology.md` — consolidated decision map, results inventory, data sources (from STUDY)
-- `work/notes/study/<NN>-<slug>.md` — per-section paper-vs-code agreement-check files (the source of truth for evidence quotes and code locations; methodology.md points back to these)
+- `astra.yaml` — the stub from ARCHITECT (sub-analyses, inputs, outputs, narrative; empty `decisions:` / `prior_insights:` / `findings:` blocks)
+- `work/notes/architect/paper-index.md` — paper-side decision clusters, result loci, citations
+- `work/notes/architect/code-index.md` (when code present) — module map, natural decomposition, entry-points, gotchas
 - `work/notes/literature.yaml` (if present) — prior insights with evidence quotes and decision links (from LITERATURE)
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
-- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (Path B only; Path A keeps figures inside the source tarball)
+- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (Path B only)
 - `work/reference/code/` (if present) — original code, canonical reference for numerics + method
-- The per-paper constitution — names the user's intended replication targets (figures, tables, numbers) in its **Desired State**; SPECIFY formalizes them
+- The per-paper constitution — its **Desired State** + the per-phase mode + the rigor / frugality dial
 - `work/notes/notes.md` — user-supplied context (read by every phase if present)
 
 ## Outputs
 
-1. **`astra.yaml`** — the full ASTRA specification, with every replication target placed into its appropriate ASTRA home (see "Target formalization" below)
-2. **`universes/baseline.yaml`** — exactly the paper's choices (where paper and code disagree, see "Material conflicts" below)
-3. **`implementation-notes.md`** — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
-4. **`targets/targets.md`** — small target ledger COMPARE consumes; for each target a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure/table/metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
+- `astra.yaml` — **filled form**: each sub-analysis's `decisions:`, `prior_insights:`, `findings:` populated with `evidence:` selectors; `narrative:` keys updated to weave `astra-anchor:` references into prose as entries land; validates with `astra validate astra.yaml --verify-evidence` when literature.yaml is present
+- `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
+- `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
+- `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
+- `work/notes/specify-review/<sub-analysis>-round-<N>.md` — each rigor-dialed review round's findings (rigor only; one file per round per sub-analysis)
 
 ## Substrate skills to invoke
 
-- **`/narrative`** — narrative authoring (any of the five `narrative.{summary,inputs,methods,findings,outputs}` keys, plus decision `rationale:` fields) is owned by the narrative skill. Invoke it when authoring the prose. The narrative skill teaches reserved entity names, the tree-path anchor grammar, the conditional-narrative requirement (which keys are required when), the five-key authoring order, paper-reproduction fidelity discipline, and the new downstream-consumer discipline (lightcone-cli#108). Do not duplicate that content.
+- **`/narrative`** — narrative authoring (any of the five `narrative.{summary,inputs,methods,findings,outputs}` keys, plus decision `rationale:` fields) is owned by the narrative skill. Invoke it during the **paper pass** when authoring or extending narrative prose. The narrative skill teaches reserved entity names, the tree-path anchor grammar, the conditional-narrative requirement (which keys are required when), the five-key authoring order, paper-reproduction fidelity discipline, and the new downstream-consumer discipline (lightcone-cli#108). Do not duplicate that content.
 
-Your responsibility in this phase is the **structure**: build a spec whose entities are narrative-ready (human-readable labels, no ID collisions with reserved names, sub-analysis IDs as noun phrases) so `/narrative` can author cleanly downstream.
+Your responsibility in this phase is the **content**: build out the `decisions:` / `prior_insights:` / `findings:` for each sub-analysis with verbatim paper quotes anchored to the paper as evidence, and weave `astra-anchor:` references back into the narrative as entries land. ARCHITECT already settled the structure.
 
-## Decisions
+## The two-pass-per-sub-analysis structure
 
-The notes identify many candidate decisions. Include every choice where a different defensible option could plausibly shift a numerical result — algorithmic methods, thresholds, statistical approaches, data selection criteria, calibration choices.
+For each sub-analysis (parallelizable across independent sub-analyses):
 
-Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. Use `when`, `incompatible_with`, and `requires` constraints for non-independent decisions. A typical analysis has 8–20 decisions; if you have fewer than 5, revisit `methodology.md` and reconsider what you excluded.
+### Pass A — paper pass
 
-## Prior insights from literature
+Read the paper's section(s) covering this sub-analysis. Author:
+
+1. **`decisions:`** — every choice in this sub-analysis where a different defensible option could plausibly shift a numerical result: algorithmic methods, thresholds, statistical approaches, data selection criteria, calibration choices. Use `when`, `incompatible_with`, and `requires` constraints for non-independent decisions.
+
+   For each decision, the paper-pass authors:
+   - The chosen option with its name + a `rationale:` block (use `/narrative` for the prose).
+   - Sibling alternatives mentioned in the paper, each as a separate option.
+   - `evidence:` for the chosen option using `TextQuoteSelector` against the paper text — verbatim quote + `prefix` / `suffix` from real surrounding text + page or section anchor.
 
-If `work/notes/literature.yaml` exists, incorporate its `prior_insights` into `astra.yaml`. Use the `decision_links` mapping to attach each insight to the relevant decision options, so the multiverse captures evidence-backed alternative choices from the literature.
+   Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. A typical sub-analysis has 2–6 decisions; if a sub-analysis has fewer than 2, revisit `work/notes/architect/paper-index.md` and reconsider.
 
-## Target formalization
+2. **`prior_insights:`** — incorporate insights from `work/notes/literature.yaml` (when present) that bear on this sub-analysis's decisions. Use the `decision_links` mapping to attach each insight to the relevant decision options, so the multiverse captures evidence-backed alternative choices from the literature.
+
+3. **`findings:`** — paper-level claims and quantitative results scoped to this sub-analysis, each with source-anchored `evidence:` (verbatim quote against the paper). Pull the verbatim claims for each output's expected value from the paper text + the result loci in `paper-index.md`.
 
-There is no separate target-extraction phase — the targets the user named in INTERVIEW (recorded in the constitution's **Desired State**) get formalized into `astra.yaml`'s structure here. The work has two layers:
+4. **Weave `astra-anchor:` references into the existing narrative.** ARCHITECT wrote `narrative:` prose without anchors because the entries didn't exist. Now they do — extend the narrative to point at the new `decisions:` / `prior_insights:` / `findings:` entries via the tree-path anchor grammar. Use `/narrative` for this pass; it carries the discipline.
 
-**Layer 1 — place each target into its ASTRA home.** Targets are coverage obligations, not necessarily outputs. Map each target to the right ASTRA home:
+5. **Verify evidence quotes against the paper source by Grep** — `astra validate --verify-evidence` currently verifies `prior_insights` evidence; artifact-anchored `findings` evidence still needs a manual quote check before the code pass.
 
-- **Figures, tables, equations-as-artifacts, generated data products** → `outputs`
-- **Paper-level claims and quantitative results** → `findings` with source-anchored evidence
-- **Constants and configuration values** → `inputs`, `decisions`, `universes/baseline.yaml`
+### Pass B — code pass (when `work/reference/code/` exists)
+
+Read the code that implements this sub-analysis (`work/notes/architect/code-index.md`'s natural-decomposition rows point at the relevant modules / scripts). Augment / amend:
 
-The methodology.md "Results inventory" already split primary vs secondary; use that split to set priorities. For each result in the inventory, find the corresponding figure / table / in-text metric (Path A: `\label{}` in the source; Path B: `metadata.json` index) and place it. Read the per-section files in `work/notes/study/` for the verbatim claim quotes — those become the `findings` evidence.
+1. **Code-as-canonical material disagreements.** For each decision authored in the paper pass, locate its implementation in the code. Where paper and code disagree:
+   - **Material** = a different choice would plausibly change a numeric result the paper reports.
+   - **Stylistic / cosmetic / pure-tooling** = not material; record in `implementation-notes.md` and move on.
 
-**Layer 2 — write `targets/targets.md` as a small ledger for COMPARE.** Only an index, not a derivation of the spec; the depth lives in `astra.yaml`. For each target, a brief entry:
+   For **material** disagreements, behavior depends on whether SPECIFY is interactive:
+   - **Interactive SPECIFY** (default): pause and surface via `AskUserQuestion`. Present the paper's stated method (with quote + section), the code's actual method (with `path:line`), the plausible impact ("changes the BAO peak amplitude by ~5%"), and three options: paper, code, *something else* (custom, with the user's choice spelled out). **Default on user silence is code when `work/reference/code/` exists, otherwise paper.**
+   - **Sub-agent SPECIFY** (rare; the constitution lists this only when the user explicitly chose it): take **code as canonical** per the canonical-resolution rule, append the conflict to `<paper-slug>/open-questions.md` so the user sees it at the next session boundary, and let `universes/baseline.yaml` select the code's method. The user can flip the baseline at REVIEW (close-out).
 
-- What it is (one line); the reference file's path (relative to `targets/` when the file is copied into `targets/`, or pointing at `work/reference/figures/...` when not)
-- Type: `metric` | `figure` | `table`
-- Priority: `primary` | `secondary` (from the methodology.md split)
-- Expected value / trend (paper-side); how to judge a match (numerical tolerance for metrics; shape / axis ranges / key features for figures; specific values for tables)
-- Spec home: which `outputs:` entry in `astra.yaml` this target maps to, so COMPARE can find the reproduced result at `results/<universe>/<output_id>/`
+   Either way, the override is preserved in `astra.yaml` as a `decisions:` entry with both options preserved, plus the `universes/baseline.yaml` selecting whichever option won. A `findings:` entry (or an insight if the conflict matters for replication discipline broadly) records the conflict with quote + line evidence.
 
-Copy reference figure / table files from `work/reference/` into `targets/` so COMPARE has a self-contained reference set. For Path A, files are in `work/reference/source/` (extract by `\includegraphics{}` filename); for Path B, in `work/reference/figures/` / `work/reference/tables/`.
-
-Out-of-scope targets stay in `targets/targets.md` with an explicit reason and should not be forced into the spec. Keep the target ledger's "spec home" pointers specific enough that a later reviewer can tell which claim was discharged where.
+2. **Code-revealed insights and findings.** Things the code does that the paper doesn't describe (a calibration version, a cut stricter than stated, a hyperparameter the paper compressed). These earn `findings:` entries with `path:line` evidence anchors against the code (when an output corresponds), or `implementation-notes.md` bullets (when no formal output corresponds).
 
----
+3. **Decision-option augmentation.** Where the code reveals an option the paper didn't mention but is defensible (a sibling implementation alternative used in the codebase or referenced in a comment), add it as a sibling option to the relevant `decisions:` entry. Do not pre-emptively author every code variant; only the ones that bear on a real choice.
 
-## Material conflicts — the user-ratification seam
+4. **Surface paper-vs-code material disagreements** to `<paper-slug>/open-questions.md` (sub-agent) or via `AskUserQuestion` (interactive) per the canonical-resolution rule above. The verbatim paper quote + the `path:line` code anchor + the plausible-impact one-liner should both make it into the open-questions entry so the user sees enough to decide at REVIEW (close-out).
 
-When `methodology.md` or `code-analysis.md` mentions a paper-vs-code disagreement, **classify it before writing**:
+### Pass C — rigor-dialed self-review
 
-- **Material**: a different choice would plausibly change a numeric result the paper reports.
-- **Stylistic / cosmetic / pure-tooling**: not material — record in `implementation-notes.md` and move on.
+After the paper + code passes land for a sub-analysis, a fresh-context sub-agent cross-checks: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
 
-For **material** conflicts, behaviour depends on whether SPECIFY is running interactively:
+Self-review depth follows the constitution's frugality / rigor dial — same shape as ARCHITECT's review pass and IMPLEMENT's:
 
-- **Interactive SPECIFY** (default): pause and surface via `AskUserQuestion`. Present:
-  - The paper's stated method (with quote / section reference)
-  - The code's actual method (with file / line reference)
-  - The plausible impact ("changes the BAO peak amplitude by ~5%")
-  - Three options: paper, code, *something else* (custom, with the user's choice spelled out)
+- **Frugal:** skip self-review, or run a single fresh sub-agent pass and incorporate its fixes once.
+- **Rigor:** N rounds — each round runs a fresh reviewer; fixes are incorporated; the next round runs another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes (the strong-termination criterion the loop already uses), or a 5-round system cap. Each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check.
 
-- **Sub-agent SPECIFY** (rare; the constitution lists this only when the user explicitly chose it): take **code as canonical** per the canonical-resolution rule, append the conflict to `<paper-slug>/open-questions.md` so the user sees it at the next session boundary, and let `universes/baseline.yaml` select the code's method. The user can flip the baseline at the next interactive seam.
+#### Per-round fresh sub-agent — system prompt
 
-**Default on user silence in interactive SPECIFY is code when `work/reference/code/` exists, otherwise paper.** This is the canonical-resolution rule: where paper and code disagree, code wins for numerics + method. (Older versions of this skill defaulted to paper; the new default reflects what the first-paper test surfaced — the code is what produced the published numbers.)
+> You are a SPECIFY reviewer for one sub-analysis. Read the relevant slice of `astra.yaml`, the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
+>
+> ### Inputs
+>
+> - `astra.yaml` — focus on `analyses.<sub-analysis-id>` (`decisions:`, `prior_insights:`, `findings:`, `narrative:`, `inputs:`, `outputs:`)
+> - `universes/baseline.yaml`
+> - `implementation-notes.md`
+> - `work/notes/architect/paper-index.md` — the decision clusters and result loci that scoped the work
+> - `work/notes/architect/code-index.md` (when code present)
+> - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
+> - `work/reference/code/` (when present) — canonical reference for numerics + method
+> - `work/notes/literature.yaml` (if present) — for evidence verification
+>
+> ### What to check
+>
+> 1. **Decision coverage.** Does this sub-analysis's `decisions:` block cover every choice in the paper-side index's decision clusters? Cosmetic / pure-tooling choices should NOT be decisions; anything material that's missing should be added.
+> 2. **Decision options.** Each decision has the chosen option plus any sibling alternatives the paper discusses or the code reveals. The chosen option's `rationale:` is grounded in the paper's stated reasoning (or the code's, where canonical-resolution applied).
+> 3. **Evidence verification.** Every `evidence:` block uses `TextQuoteSelector` with a verbatim `exact:` quote, real surrounding-text `prefix:` / `suffix:`, and a real page or section anchor. Quotes that are paraphrased or whose prefix / suffix are editorial parentheticals will fail `--verify-evidence`. Run `astra validate astra.yaml --verify-evidence` when literature.yaml is present.
+> 4. **Findings traceability.** Each `findings:` entry's `evidence:` resolves to a real paper claim (verbatim quote + source anchor) or a real code location (`path:line`).
+> 5. **Material-disagreement surfacing.** Where paper and code disagree on a material choice, the spec records both options under the relevant `decisions:` entry. `universes/baseline.yaml` selects the code's option (canonical-resolution default), unless an interactive seam recorded a different user choice. Flag any material disagreement that got silently dropped or where the spec picked the paper without an explicit user override.
+> 6. **Narrative anchors.** The sub-analysis's `narrative:` weaves `astra-anchor:` references to the new `decisions:` / `prior_insights:` / `findings:` entries — the tree-path grammar must be valid, and entries actually exist at the referenced paths.
+> 7. **`narrative:` voice fidelity.** Hedges and qualifiers from the paper survive (per the narrative skill's discipline). Editorial commentary added beyond what the paper supports gets flagged.
+> 8. **No synthetic data.** Unless the paper itself uses synthetic data, every input has a real acquisition source — no mock / synthetic substitutes anywhere in the sub-analysis's inputs, decisions, or implementation-notes.
+>
+> ### What NOT to do
+>
+> - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; a SPECIFY-fix pass responds to the findings. Editing here defeats the multi-round-fresh-context discipline.
+> - **Do not flag missing `recipes:`.** Recipes are IMPLEMENT's, not SPECIFY's.
+> - **Do not re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/notes/architect/paper-index.md`.
+> - **Do not invent problems.** If the sub-analysis is consistent with paper + code, say so briefly.
+> - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
+>
+> ### Output format — `work/notes/specify-review/<sub-analysis>-round-<N>.md`
+>
+> ```markdown
+> # Specify-review round <N> — <sub-analysis-id>
+>
+> Reviewer ran fresh against astra.yaml's <sub-analysis-id> slice, paper, and code.
+>
+> ## Findings
+>
+> ### <category — e.g. "Decision coverage" / "Evidence" / "Material disagreement">
+>
+> - **<one-line finding>**
+>   - **What's wrong**: <quote or location of the spec problem>
+>   - **Where to fix**: <`astra.yaml#analyses.<sub-id>.path/to/key` or `implementation-notes.md`>
+>   - **Suggested fix**: <one-line concrete change>
+>   - **Source**: <paper §X.Y "quote" + index row, or code `path:line`>
+>
+> ## Verdict
+>
+> - **fixes_needed**: <count>
+> - **clean** | **needs-fixes**
+> ```
+
+#### SPECIFY-fix pass between rounds
+
+After each round's findings file lands, a SPECIFY-fix pass (or the orchestrator inline for trivial mechanical fixes) edits `astra.yaml` for the sub-analysis, plus `universes/baseline.yaml` and `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`:
+
+```bash
+astra validate astra.yaml
+astra validate astra.yaml --verify-evidence  # when literature.yaml exists
+```
+
+#### Termination
+
+- `weak` (frugal): one pass per sub-analysis. Done after fixes (or immediately, if `fixes_needed` was 0).
+- `strong` (rigor):
+  - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
+  - If round N is the first round (N=1), spawn round 2 unconditionally so we can compare.
+  - If round N produced fixes, spawn round (N+1) as a fresh sub-agent that does not see round N's findings or the fixes.
+  - If N hits the system cap of 5 rounds without two consecutive clean rounds, surface to the user via `AskUserQuestion`: "SPECIFY review for <sub-analysis-id> reached round cap with N fixes still landing; continue, accept the current spec, or revise the constitution?" Default on user silence: accept the current sub-analysis spec, log the unfinished tail in `<paper-slug>/open-questions.md`, and proceed.
+
+When all sub-analyses' reviews terminate, SPECIFY produces the final outputs:
+
+## Target-ledger output
+
+After every sub-analysis is filled and self-reviewed, write `targets/targets.md` as a small ledger COMPARE consumes. Only an index, not a derivation of the spec; the depth lives in `astra.yaml`. For each `outputs:` entry across all sub-analyses (already declared by ARCHITECT), a brief entry:
 
-Either way, the override is preserved in `astra.yaml` as:
+- What it is (one line); the reference file's path (relative to `targets/` when the file is copied into `targets/`, or pointing at `work/reference/figures/...` when not)
+- Type: `metric` | `figure` | `table`
+- Priority: `primary` | `secondary` (from ARCHITECT's tagging)
+- Expected value / trend (paper-side); how to judge a match (numerical tolerance for metrics; shape / axis ranges / key features for figures; specific values for tables)
+- Spec home: which `analyses.<sub-id>.outputs.<output-id>` entry in `astra.yaml` this target maps to, so COMPARE can find the reproduced result at `results/<universe>/<output_id>/`
 
-- A `decisions:` entry with both options preserved
-- The `universes/baseline.yaml` selecting whichever option won (chosen by the user, or canonical default on silence)
-- A finding (or an insight if the conflict matters for replication discipline broadly) that records the conflict with quote / line evidence
+Copy reference figure / table files from `work/reference/` into `targets/` so COMPARE has a self-contained reference set. For Path A, files are in `work/reference/source/` (extract by `\includegraphics{}` filename); for Path B, in `work/reference/figures/` / `work/reference/tables/`.
 
-This makes the override surface in any later review of the spec — *"the paper says X, the code does Y, the user chose Z, here's why."* The fidelity-of-prose side of this (voice seams, hedge preservation, evidence-quote verification) is the `/narrative` skill's job.
+Out-of-scope targets stay in `targets/targets.md` with an explicit reason and should not be forced into the spec.
 
 ---
 
-## Sub-analysis structure
-
-Split into sub-analyses **only if the paper has genuinely independent analysis stages**. Examples:
-
-- A reconstruction stage that produces a catalog consumed by a clustering stage which produces inputs to a BAO fit — three sub-analyses.
-- A monolithic analysis that runs end-to-end with no clean intermediate handoff — one analysis.
-
-Sub-analysis IDs should be **noun phrases** (not verb phrases): `reconstruction`, `clustering`, `bao_fit`. Avoid reserved names (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
-
-When sub-analyses exist, the root narrative MUST include a top-down end-to-end data-flow paragraph (per the narrative skill's data-flow rules — closes lightcone-cli#108).
-
 ## Other rules
 
 - **Do NOT add executable implementation code or invented run commands.** Do add concise provenance / recipe descriptions where ASTRA fields support them, especially for paper-derived calculations, figure generation, imported constants, and values that IMPLEMENT will need to regenerate.
 - **Equation and section numbers must match the rendered paper / PDF**, not a naïve count of TeX blocks or markdown headings. When citing "eq. N" or "§N", find the equation or heading by content in the rendered paper and use the printed number.
-- **When adding finding evidence**, verify the quoted text against the paper source by Grep or PDF search. `astra validate --verify-evidence` currently verifies `prior_insights` evidence; artifact-anchored `findings` evidence still needs a manual quote check.
-- **Validate** with `astra validate astra.yaml` and fix until it passes.
-- **Work primarily from `work/notes/`** — STUDY has already distilled the paper section by section. Use `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) only to look up specific details (Grep for terms, or read targeted sections with offset/limit). Do not re-read the whole paper.
+- **Validate** with `astra validate astra.yaml` after each pass.
+- **Work primarily from `work/notes/architect/`** — the index files distilled the relevant scope per sub-analysis. Use `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) only to look up specific details (Grep for terms, or read targeted sections with offset/limit). Do not re-read the whole paper.
+- **The narrative skill is the prose author, not the structure author.** SPECIFY weaves anchors into the prose ARCHITECT wrote — the structural surface is fixed, the anchored references are SPECIFY's contribution.
 
 ## Survey signals (entry into SPECIFY)
 
-- `work/notes/methodology.md` exists ⇒ ready to specify
-- `astra.yaml` exists; `astra validate astra.yaml` returns clean ⇒ structural SPECIFY done
-- `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-formalization done
+- `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative; empty decisions / prior_insights / findings) ⇒ ready to specify
+- For each sub-analysis: `decisions:` / `findings:` populated AND, if literature.yaml exists, `prior_insights:` populated ⇒ paper pass done
+- For each sub-analysis: when `work/reference/code/` exists, code-pass material-disagreement entries land in `decisions:` (with both options) and `universes/baseline.yaml` selects the canonical-resolution choice; `implementation-notes.md` carries non-material gotchas ⇒ code pass done
+- For frugal: each sub-analysis has at least a `work/notes/specify-review/<sub>-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ SPECIFY review done
+- For rigor: each sub-analysis has two consecutive `<sub>-round-<N>.md` files with verdict `clean` ⇒ SPECIFY review done
+- `astra validate astra.yaml --verify-evidence` returns clean (when literature.yaml exists) ⇒ evidence side validated
+- `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-ledger done
 - `implementation-notes.md` exists ⇒ practical-guidance side done
-- All four ⇒ SPECIFY complete; proceed to REVIEW
+- All of the above ⇒ SPECIFY complete; proceed to IMPLEMENT
 
 ## Notes
 
-- **Material conflicts that the user explicitly defers** are appended to `<paper-slug>/open-questions.md` (the running report read at session boundaries). The next iteration sees them and either re-surfaces them or notes their continued deferral; the user resolves at SUMMARIZE_RUN.
-- **The narrative skill is the prose author, not the structure author.** SPECIFY's job is structural correctness; `/narrative` invocation comes after the structural skeleton exists.
-- **The target ledger is a derivation, not a separate phase's output.** Treat `targets/targets.md` as a small index produced alongside `astra.yaml`, not a heavyweight artifact. The depth lives in `astra.yaml`'s `outputs:` / `findings:` / `decisions:` and in the per-section study files.
+- **Material conflicts that the user explicitly defers** are appended to `<paper-slug>/open-questions.md` (the running report read at session boundaries). The next iteration sees them and either re-surfaces them or notes their continued deferral; the user resolves at REVIEW (close-out).
+- **The narrative skill is the prose author, not the structure author.** SPECIFY's job is content correctness; `/narrative` invocation comes during the paper pass when authoring or extending the narrative prose to weave in anchor references.
+- **The target ledger is a derivation, not a separate phase's output.** Treat `targets/targets.md` as a small index produced alongside the filled `astra.yaml`, not a heavyweight artifact. The depth lives in `astra.yaml`'s `outputs:` / `findings:` / `decisions:`.
+- **Two-pass discipline is the cross-check.** Skipping the code pass (when code exists) loses the canonical-resolution surface and lets paper-vs-code material disagreements slip through. The fresh-context self-review can recover *some* of these but not all — the disciplined sequence (paper → code → self-review) catches more.
+- **Per-sub-analysis parallelism is opt-in.** When sub-analyses are independent (no shared decision blocks, no cross-sub-analysis findings), spawn one Task-tool sub-agent per sub-analysis to run its passes in parallel. When they share material decisions or findings (rare), serialize.
diff --git a/claude/lightcone/skills/paper2astra/references/study.md b/claude/lightcone/skills/paper2astra/references/study.md
deleted file mode 100644
index c1c30ccc..00000000
--- a/claude/lightcone/skills/paper2astra/references/study.md
+++ /dev/null
@@ -1,229 +0,0 @@
-# STUDY — section-parallel paper-vs-code agreement check
-
-Read the parsed paper and the reference code together — by section, with sub-agents fanning out across the paper's structure — and produce a cross-referenced agreement check that the rest of the pipeline consumes. STUDY is paper2astra's load-bearing read phase: its value isn't "summarize the paper" but **measure the level of agreement between paper and code at the section level**.
-
-This phase replaces the old SUMMARIZE. The old shape parallelized one sub-agent on the paper and another on the code; that loses the cross-reference, since "the whole paper" and "the whole code" are too much context for one agent to compare meaningfully. The new shape parallelizes by **paper section + matching code**, so each sub-agent carries enough context to surface disagreements at its own level.
-
-The constitution's per-phase mode is **always sub-agent (parallel by paper-section)** for this phase. Spawn one Task-tool sub-agent per paper section. After they finish, spawn a single synthesis sub-agent.
-
-## Inputs
-
-- `work/reference/source/` (Path A — arXiv LaTeX) **or** `work/reference/document.md` + `work/reference/figures/` + `work/reference/tables/` + `work/reference/metadata.json` (Path B — Docling)
-- `work/reference/code/` — the reference code repo (when present)
-- `work/notes/notes.md` — user-supplied prior notes, if any (read by every phase if present)
-
-## Outputs
-
-- `work/notes/study/<NN>-<section-slug>.md` — one file per paper section, with the cross-referenced agreement check
-- `work/notes/methodology.md` — consolidated decision map, results inventory, data sources (derived from the per-section files; what SPECIFY consumes)
-- `work/notes/cited_papers.yaml` — citations worth following up on for prior insights (what LITERATURE consumes)
-
-## Step 1: Identify paper sections and their matching code
-
-Before fanning out, the orchestrating call (the loop iteration that enters STUDY, before spawning sub-agents) does a quick survey:
-
-1. **List the paper's sections.** Path A: `grep -E '^\\section\{' work/reference/source/*.tex`. Path B: read the `##` headings in `work/reference/document.md`. Skip front-matter (abstract, acknowledgments, author list) and back-matter (references, supplementary). Keep methods, results, and any analysis-bearing intro/discussion.
-2. **Locate matching code per section.** Two routes:
-   - **Code's own structure**: most analysis pipelines mirror the paper's flow (a `reconstruction/` module → reconstruction section, a `bao_fit.py` → BAO-fit section). Walk the code repo's top-level layout and infer the mapping.
-   - **Paper-side bibliography**: when the paper cites a specific module, function, or commit (e.g. "the fitting code at `https://github.com/.../bao_fit.py:42`"), record that.
-3. **Build a section→code map** as a small YAML file at `work/notes/study/section-map.yaml`:
-   ```yaml
-   sections:
-     - id: 01-data
-       paper_section: "Data and Sample Selection"
-       paper_anchor: "section:data"        # \label{} or markdown anchor
-       code_paths:
-         - work/reference/code/data/
-         - work/reference/code/scripts/load_catalog.py
-     - id: 02-methods
-       paper_section: "BAO Fitting Methodology"
-       paper_anchor: "section:methods"
-       code_paths:
-         - work/reference/code/bao_fit/
-       notes: "Paper §3 cites the fitting code in footnote 7."
-   ```
-
-   When a section has no obviously matching code (e.g. "Discussion"), record `code_paths: []` and let the section sub-agent flag claims that imply implementation but have no code anchor — those are signal.
-
-This step matters because it sets the unit of work. A bad map (one sub-agent gets all the code, another gets none) loses the parallelism's value.
-
-## Step 2: Fan out — one sub-agent per section
-
-Spawn one Task-tool sub-agent per entry in `section-map.yaml`. Each sub-agent gets:
-
-- The paper-section reference (the `.tex` file path + `\section{}` anchor for Path A; the `document.md` file + heading anchor for Path B)
-- The list of code paths from the section map
-- The decision-map context structure (so claims and code locations align with what SPECIFY will need)
-
-### Per-section sub-agent — system prompt
-
-> You are a paper-vs-code agreement-check agent for one section of a research paper. Your job is to read the paper section *together with* its matching code and produce a cross-referenced agreement assessment.
->
-> ### Inputs
->
-> - Paper section: `<path-to-tex-or-md>` anchored at `<section-anchor>`. Read this section in full — it is bounded; do not stray into other sections.
-> - Code paths: `<list of paths>`. Read each in full; for directories, read the entry-points and follow imports as needed. **Do NOT modify any code.**
->
-> ### What to extract
->
-> For each material claim or choice in the paper section, locate its implementation in the code (or note its absence) and record an agreement assessment.
->
-> A "claim" is anything where a different choice would plausibly change a numerical result the paper reports — methods, parameters, data cuts, calibrations, statistical approaches, hyperparameters, software versions.
->
-> A "code location" is a `file:line` reference (or `file:line-line` range) to where the code implements (or fails to implement) that claim.
->
-> An "agreement level" is one of:
->
-> - `matches`: paper says X, code does X. Cite the line(s); brief one-line note.
-> - `minor-deviation`: paper and code differ in a way that does not change the numerical result (e.g. variable named differently, equivalent algorithm, refactored computation). Cite both, name the equivalence.
-> - `material-disagreement`: paper and code differ in a way that plausibly changes a numerical result. Cite both verbatim. **Surface this prominently** — these are SPECIFY's seams.
-> - `paper-only`: paper claims something the code does not implement. May indicate a methodological description not yet realized in the available code.
-> - `code-only`: code does something the paper does not describe. Often a critical detail the paper compressed; flag it.
->
-> ### Output format — `work/notes/study/<id>-<slug>.md`
->
-> ```markdown
-> # Study: <Section title>
->
-> Paper anchor: `<section-anchor>` in `<paper-source-path>`.
-> Code paths: <list>.
->
-> ## Agreement table
->
-> | Claim | Paper | Code | Agreement | Notes |
-> |---|---|---|---|---|
-> | <one-line claim> | §X.Y "<short quote>" | `path:line` | matches \| minor-deviation \| material-disagreement \| paper-only \| code-only | <one-line gloss> |
->
-> ## Material disagreements
->
-> For every `material-disagreement` row, expand here:
->
-> ### <Claim>
->
-> - **Paper says** (quote): "..." (page N, eq. M)
-> - **Code does** (quote): `path:line-line`:
->   ```python
->   <code excerpt>
->   ```
-> - **Why it matters** (one-line plausible-impact): <e.g. "changes the BAO peak amplitude by ~5%">
-> - **Default per canonical-resolution rule**: <code | paper> — applied if SPECIFY runs sub-agent.
->
-> ## Decisions surfaced
->
-> Bullet list of choices in this section that should become first-class decisions in `astra.yaml`. Group by "what" + "why" + "alternatives" (mirroring the methodology.md decision-map shape).
->
-> ## Cited papers worth following up
->
-> List citations from this section that informed a decision (not general background). DOI when resolvable + one-line on why.
->
-> ## Data sources (this section)
->
-> Any external dataset, catalog, or archive this section's analysis consumes. For each: name + version, exact acquisition path (URL / query / package name), selection criteria.
->
-> ## Open questions
->
-> Anything ambiguous, missing, or contradictory that this section couldn't resolve from paper + code alone. Append to `<paper-slug>/open-questions.md` from outside the sub-agent (the orchestrator does this; sub-agents append silently to this section).
-> ```
->
-> ### Style
->
-> Be concise but precise. Use bullets and tables. Quote the paper verbatim and cite `path:line` for the code. Do NOT pad with background.
->
-> ### Rules
->
-> - **Stay in your section.** Cross-references to other sections are notes, not extractions. If a section's claim depends on a definition from another section, note the dependency and continue.
-> - **Quote, don't paraphrase**, when surfacing a paper-vs-code disagreement. SPECIFY needs the verbatim claim to author evidence-quote-backed findings.
-> - **Code-as-canonical when both exist.** Where paper and code disagree, the code wins for numerics + method (the canonical-resolution rule). Record both, mark the agreement level as `material-disagreement`, surface the disagreement.
-> - **Never block on `AskUserQuestion`.** You're a sub-agent; the user is not in this conversation. Append to the section's `## Open questions` block instead.
-
-## Step 3: Synthesize — single sub-agent merges into methodology.md and cited_papers.yaml
-
-Spawn one synthesis sub-agent that reads all `work/notes/study/<id>-<slug>.md` files and writes:
-
-- `work/notes/methodology.md` — consolidated decision map (every "Decisions surfaced" entry merged across sections), results inventory (split into primary / secondary), data sources (every "Data sources" entry merged).
-- `work/notes/cited_papers.yaml` — every "Cited papers worth following up" entry merged and de-duplicated.
-
-### Synthesis sub-agent — system prompt
-
-> You are a research-paper synthesis agent. Read every per-section file in `work/notes/study/` and merge them into a single `work/notes/methodology.md` and `work/notes/cited_papers.yaml`.
->
-> ### Task
->
-> 1. Read every `work/notes/study/<id>-<slug>.md` file (skip `section-map.yaml`).
-> 2. Build `work/notes/methodology.md` with three sections:
->    - **Decision map**: every "Decisions surfaced" entry across all sections, grouped by pipeline stage. For each decision: what was chosen, why (cite the section + paper-citation), alternatives mentioned, and any *material-disagreement* with the code (cite the section's `Material disagreements` block).
->    - **Results inventory**: every primary and secondary result the paper reports, grouped primary/secondary, with which decisions feed into each.
->    - **Data sources**: every external dataset across sections, with name + version, acquisition path, selection criteria, format. **This section is critical** — IMPLEMENT will use it to write data download scripts. If acquisition is vague, flag it.
-> 3. Build `work/notes/cited_papers.yaml` from the de-duplicated cited-papers entries:
->
->    ```yaml
->    papers:
->      - doi: "10.xxxx/yyyy"
->        citation: "Smith et al. (2020)"
->        relevance: "One-line description of why this paper matters for replication"
->    ```
->
-> ### Style
->
-> Cross-reference back to the per-section files (`see work/notes/study/03-bao-fit.md`) for the verbatim quotes and code locations. methodology.md is the consolidated view; the per-section files are the source of truth for evidence.
->
-> ### Output skeleton — `work/notes/methodology.md`
->
-> ```markdown
-> # Methodology — consolidated study
->
-> ## Decision map
->
-> ### <Pipeline stage>
->
-> - **<Decision name>**
->   - **What**: <chosen value/method>
->   - **Why**: <citation, e.g. "Smith+2020">. Section: `work/notes/study/<NN>-<slug>.md`
->   - **Alternatives**: <list>
->   - **Code agreement**: matches | minor-deviation | material-disagreement (see `work/notes/study/<NN>-<slug>.md#material-disagreements`)
->
-> ## Results inventory
->
-> ### Primary
-> - <result> — feeds from <decisions>; expected: <values>; section: `work/notes/study/<NN>-<slug>.md`
->
-> ### Secondary
-> - <result> — feeds from <decisions>; expected: <values>; section: `work/notes/study/<NN>-<slug>.md`
->
-> ## Data sources
->
-> - **<Dataset name + version>**
->   - Obtain: <URL / query / package>
->   - Selection: <cuts>
->   - Format: <columns/fields>
->   - Used in: <list of sections>
-> ```
->
-> ### Rules
->
-> - Preserve paper citations exactly as they appear in the source per-section files.
-> - Do NOT introduce decisions that aren't in any per-section file's "Decisions surfaced" block.
-> - When two sections name the same decision, merge — do not duplicate.
-
-## Step 4: Append open questions to the running report
-
-After the per-section sub-agents finish (and before the synthesis runs), the orchestrator scans each `work/notes/study/<id>-<slug>.md` for `## Open questions` entries and appends them to `<paper-slug>/open-questions.md` with the section as origin. The user resolves these in SUMMARIZE_RUN.
-
-## Survey signals (entry into STUDY)
-
-- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) exists ⇒ ready to study
-- `work/notes/study/section-map.yaml` exists ⇒ section identification done
-- Every section in `section-map.yaml` has a corresponding `work/notes/study/<id>-<slug>.md` ⇒ per-section pass done
-- `work/notes/methodology.md` and `work/notes/cited_papers.yaml` exist ⇒ STUDY done; proceed to LITERATURE
-
-## Notes
-
-- **Run the section sub-agents in parallel.** They're fully independent (each reads its own paper section + code paths). The synthesis sub-agent runs once, after all per-section files exist.
-- **The agreement check is the value.** A study that reads only the paper or only the code is a regression to the old SUMMARIZE — do not allow a section sub-agent to skip the code (or vice versa) unless that section genuinely has no matching code (and that absence itself is information, recorded as `paper-only` rows).
-- **methodology.md is the door, not the source of truth.** SPECIFY drills back into the per-section files via the `see work/notes/study/...` pointers when authoring evidence-quote-backed findings. Do not bloat methodology.md with verbatim quotes; keep it as the consolidated view and let the per-section files carry the evidence.
-- **Section granularity earns separate insights.** When a section's analysis builds on a method defined in another section, file the agreement check for the *defining* section there and note the dependency in the using section. Do not collapse all the borrowed pieces into the application section's row.
-- **Resume is automatic.** If a per-section file already exists, the orchestrator skips its sub-agent. The synthesis re-runs whenever the set of per-section files changes.
-
-## Output format — open question
-
-The constitution flags whether `astra.yaml`'s `prior_insights` shape can absorb STUDY's per-section output directly. The current answer is **no**: `prior_insights` is for *cited* papers' findings supporting the *target* paper's decisions; STUDY's output is the *target paper's own claims* checked against *its own code*. The natural ASTRA homes for STUDY's output are downstream, in SPECIFY: paper-claim quotes become `findings` evidence in `astra.yaml`; code locations become decision-option metadata or implementation-notes. The per-section files stay as the source of truth; methodology.md is the consolidated derivation. Revisit if the spec gains a structure for "paper-vs-code agreement-check evidence" as a first-class entity.
diff --git a/claude/lightcone/skills/paper2astra/references/summarize_run.md b/claude/lightcone/skills/paper2astra/references/summarize_run.md
deleted file mode 100644
index 4332a8e1..00000000
--- a/claude/lightcone/skills/paper2astra/references/summarize_run.md
+++ /dev/null
@@ -1,106 +0,0 @@
-# SUMMARIZE_RUN — interactive close-out
-
-The reproduction has converged (verdict `pass` or user-accepted `partial`). Control returns to the user. SUMMARIZE_RUN is the second always-interactive bookend (INTERVIEW being the first); it runs in the main loop session, not as a sub-agent, so it can use `AskUserQuestion` and invoke sibling skills that need user reach. Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and finalize the constitution outcome — in one interactive arc.
-
-The constitution's per-phase mode is **always interactive** for this phase. It does not run as a sub-agent. There is no "silent close-out" path; the close-out is the human's review.
-
-## Inputs
-
-- `astra.yaml` — final spec (validates with `--verify-evidence` if literature.yaml exists)
-- `comparison-report.yaml`, `comparison-report.md` — final verdict
-- `targets/targets.md` — what was being matched against; reference figures / tables in `targets/`
-- `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
-- `<paper-slug>/open-questions.md` — running report from sub-agent phases (paper-vs-code conflicts, ambiguities, anything sub-agents flagged for user resolution)
-- `work/notes/methodology.md` — for context
-- The constitution at the project root — its `outcome:` field needs the final write
-- `<paper-slug>/CLAUDE.md` — paper identity, code location
-
-## Outputs
-
-- `.lightcone/comparison.html` — `/figure-comparison`'s portable side-by-side report (paper artifacts vs reproduced)
-- (Optional) `.lightcone/check-sentence-by-sentence.md` — `/check-sentence-by-sentence`'s claim audit (file:line or NOT FOUND per sentence)
-- `<paper-slug>/open-questions.md` — same file, but with `## Resolutions` section appended capturing what the user said for each entry
-- Edits to `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` if any open-question resolution warrants a spec change
-- `REPRODUCTION-SUMMARY.md` — final report; concise (~1–2 pages)
-- Constitution `outcome:` rewritten to its final form
-- A commit closing out the reproduction
-
-## Step 1: render the validation surfaces
-
-### `/figure-comparison` (mandatory)
-
-Invoke the `/figure-comparison` skill from this session. It builds a portable HTML side-by-side comparing paper artifacts (from `targets/`) to reproduced artifacts (from `results/<universe>/`). The skill uses `AskUserQuestion` for any inputs it can't infer from the workdir; that works because SUMMARIZE_RUN is interactive — the prompts land in this session.
-
-Output lands at `.lightcone/comparison.html`. Show the user the path and offer to open it (`open` on macOS, `xdg-open` on Linux, or just print the path so they click in their terminal).
-
-**Do not spawn `/figure-comparison` under the `Task` tool.** It has `AskUserQuestion` in its `allowed-tools`; a Task-tool sub-agent has no user-reach, so the prompt fires into nothing.
-
-### `/check-sentence-by-sentence` (opt-in)
-
-Ask the user via `AskUserQuestion` whether they want the claim audit. It's optional because for many reproductions the figure-comparison already settles "did it match?"; the sentence-by-sentence audit earns its keep when the paper makes many specific quantitative claims and the user wants each one anchored to a code location.
-
-If yes, invoke `/check-sentence-by-sentence`. Same discipline as `/figure-comparison` — it can prompt the user; do not spawn under `Task`.
-
-Output lands at `.lightcone/check-sentence-by-sentence.md` (or wherever the skill writes it). Show the user the path.
-
-## Step 2: walk `<paper-slug>/open-questions.md` with the user
-
-Read `<paper-slug>/open-questions.md`. For each unresolved entry, surface it via `AskUserQuestion` with:
-
-- **The question** (verbatim from the file)
-- **Origin** — which phase / sub-agent flagged it
-- **The default the loop applied** (if any — e.g. "code as canonical")
-- **Three options**: ratify the default, override (user spells out their choice), or defer (leave as a known limitation in the final report)
-
-Append a `## Resolutions` section to `<paper-slug>/open-questions.md` capturing what the user said for each entry. This makes the resolution durable — re-runs and future sessions see it.
-
-If a resolution warrants a spec change (the user picks an override), edit `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` accordingly and re-run `astra validate astra.yaml`. If the change would invalidate the comparison report (e.g. flips the canonical method for a primary output), surface that to the user — in most cases the reproduction is "done" and the override is a known limitation, but the user may choose to re-enter the loop.
-
-## Step 3: write `REPRODUCTION-SUMMARY.md`
-
-A single markdown file at the project root, ~1–2 pages. Sections:
-
-1. **What was reproduced** — the paper, the scope, the targets.
-2. **Verdict** — pass / partial. If partial, what failed and why we accepted it.
-3. **Material decisions** — the paper-vs-code conflicts SPECIFY surfaced, what the user chose (interactively or by canonical-resolution default), and why.
-4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target with the path to the reproduced result and a one-line match note from the comparison report.
-5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded.
-6. **Resolved open questions** — pull from `<paper-slug>/open-questions.md`'s `## Resolutions` section. One bullet per question + its resolution.
-7. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`).
-
-Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
-
-## Step 4: finalize the constitution outcome
-
-Rewrite the constitution's `outcome:` field to its final form. Now the user has walked the validation surfaces, ratified the open questions, and accepted (or explicitly partially-accepted) the reproduction. Write the outcome that teaches:
-
-> Reproduced <paper> against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to <reason>. Material conflicts surfaced and resolved: <list>. Open questions resolved: <count> (full chain in `open-questions.md`). Spec at `astra.yaml` (validates with `--verify-evidence`); side-by-side at `.lightcone/comparison.html`; full report at `REPRODUCTION-SUMMARY.md`.
-
-The outcome should stand on its own — someone reading just `felt show <reproduction-fiber>` (or the kanban card) should learn the verdict, the material decisions that landed, and where the artifacts live. No "see the body for details."
-
-## Step 5: commit
-
-Stage `REPRODUCTION-SUMMARY.md`, `<paper-slug>/open-questions.md` (with resolutions), the constitution with the final outcome, the final `astra.yaml`, the comparison artifacts, and any housekeeping changes. Commit with a message that names the verdict and the close-out:
-
-```
-summarize_run: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMMARY.md
-```
-
-After the commit, optionally flip the constitution's status to `closed` (or whatever the per-paper conventions name) so future surveys recognize the reproduction is done.
-
-## Survey signals (entry into SUMMARIZE_RUN)
-
-- `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready to close out
-- `.lightcone/comparison.html` exists ⇒ `/figure-comparison` rendered
-- `<paper-slug>/open-questions.md` has a `## Resolutions` section covering every entry ⇒ open-questions walkthrough done
-- `REPRODUCTION-SUMMARY.md` exists ⇒ final report written
-- Constitution `outcome:` reflects the final state ⇒ SUMMARIZE_RUN done; reproduction complete
-
-## Notes
-
-- **This phase runs interactively in the main loop session.** Do not spawn it under `Task`. The whole point of SUMMARIZE_RUN is that the user is reachable — every step uses `AskUserQuestion` (directly, or via the sibling skills it invokes).
-- **`/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`.** That's why SUMMARIZE_RUN is the always-interactive close-out and they live here, not in the loop. Spawning either under `Task` from inside the loop fires prompts into nothing.
-- **The user owns the verdict-acceptance decision.** SUMMARIZE_RUN's purpose is to let the user see what the loop did and decide whether they accept it. The skill renders surfaces and asks; it does not unilaterally close.
-- **Open-question resolutions are durable.** Append to `<paper-slug>/open-questions.md`'s `## Resolutions` section so the next re-run / future session sees what was decided. Do not delete the original questions.
-- **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous.
-- **Do not invent further work.** If the constitution's evidence checks all pass, the reproduction is done. The next session, the human, or a future revisit can decide whether the reproduction's place still serves them.

From 7fb82fbd0d6cbc8e5198589de818d132916c1cea Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 03:36:04 +0200
Subject: [PATCH 015/124] =?UTF-8?q?paper2astra:=20ARCHITECT-first=20phase?=
 =?UTF-8?q?=20shape=20=E2=80=94=20SKILL.md=20+=20interview=20+=20README?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- paper2astra/SKILL.md: per-phase mode table is 9 rows (INTERVIEW, ACQUIRE,
  ARCHITECT, LITERATURE, SPECIFY, IMPLEMENT, RUN, COMPARE, REVIEW); ARCHITECT
  replaces STUDY's row with the two-parallel-Explore + synthesis +
  rigor-dialed-self-review shape; REVIEW row is the close-out (replacing
  SUMMARIZE_RUN); pre-implement REVIEW row removed (folded into
  ARCHITECT/SPECIFY/IMPLEMENT). Description, phase reference table, resume
  signals, discipline bullets, and surface-pointers all updated. The "rigor
  vs frugality" subsection retitled to thread through all three
  artifact-producing phases.
- paper2astra/references/interview.md: scope-question wording mentions
  ARCHITECT instead of STUDY/SPECIFY for structure; rigor-dial discussion
  threads through three phases; per-phase mode table (constitution template)
  shows the 9-phase shape; evidence-checks list updated to ARCHITECT-stub
  + filled-spec signals; CLAUDE.md template's open-questions pointer points
  at REVIEW (close-out).
- skills/README.md: paper2astra row mentions 9 phases and the
  ARCHITECT/SPECIFY/IMPLEMENT rigor-dial; figure-comparison and
  check-sentence-by-sentence rows reference REVIEW (close-out).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/README.md             |   6 +-
 claude/lightcone/skills/paper2astra/SKILL.md  | 102 +++++++++---------
 .../paper2astra/references/interview.md       |  49 ++++-----
 3 files changed, 81 insertions(+), 76 deletions(-)

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 79e58851..2e473038 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -18,13 +18,13 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. The constitution carries 10 phases, bookended by two always-interactive seams (INTERVIEW at start, SUMMARIZE_RUN at close-out); every other phase is configurable per the user's per-phase mode choice, with REVIEW and IMPLEMENT additionally tuned by a frugality / rigor dial. |
+| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. The constitution carries 9 phases, bookended by two always-interactive seams (INTERVIEW at start, REVIEW at close-out); every other phase is configurable per the user's per-phase mode choice, with ARCHITECT, SPECIFY, and IMPLEMENT additionally tuned by a frugality / rigor dial that drives each phase's internal fresh-context self-review. |
 | [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. |
 | [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. |
 | [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by paper2astra's bash-loop and tmux-orchestrated runtime modes. |
 | [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. |
-| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from paper2astra's SUMMARIZE_RUN close-out (opt-in); also user-invokable directly. |
-| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from paper2astra's SUMMARIZE_RUN close-out (mandatory); also user-invokable directly. |
+| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from paper2astra's REVIEW close-out (opt-in); also user-invokable directly. |
+| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from paper2astra's REVIEW close-out (mandatory); also user-invokable directly. |
 
 The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra.
 
diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
index 890b4439..ef767087 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -4,12 +4,13 @@ description: >
   Reproduce a published scientific paper in ASTRA. Interview the user
   about the paper and the intended scope, draft a per-paper reproduction
   constitution, then launch a ralph loop that drives the multi-session
-  reproduction work. Composes sibling skills for each phase: managing-
-  bibliography for ACQUIRE and narrative for SPECIFY. COMPARE follows the
-  original Paper2ASTRA target-ledger structure directly rather than requiring
-  sibling comparison skills. Use when the user wants to reproduce a paper,
-  has a DOI or arXiv ID and wants to start a reproduction project, or asks
-  to "reproduce <paper>", "set up reproduction", "paper2astra",
+  reproduction work. The loop is 9 phases bookended by two always-interactive
+  seams (INTERVIEW at start, REVIEW at close-out); ARCHITECT writes a stub
+  astra.yaml decomposition before SPECIFY's two-pass-per-sub-analysis fills
+  it in. Composes sibling skills for each phase: managing-bibliography for
+  ACQUIRE and narrative for SPECIFY. Use when the user wants to reproduce
+  a paper, has a DOI or arXiv ID and wants to start a reproduction project,
+  or asks to "reproduce <paper>", "set up reproduction", "paper2astra",
   "/paper2astra <doi>", or hands you a published paper as a starting point
   for ASTRA work.
 ---
@@ -42,21 +43,23 @@ paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. Al
 
 paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them. The siblings stand alone; they don't know about paper2astra.
 
-Two further siblings are invoked from **SUMMARIZE_RUN**, the always-interactive close-out phase that runs after the COMPARE → IMPLEMENT loop terminates: [`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable side-by-side HTML report (paper artifacts vs reproduced), and [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) (optional) audits paper claims against code locations. Both have `AskUserQuestion` in their `allowed-tools`, so SUMMARIZE_RUN runs interactively in the main loop session — spawning them under the `Task` tool would fire prompts into nothing.
+Two further siblings are invoked from **REVIEW** (the close-out), the always-interactive phase that runs after the COMPARE → IMPLEMENT loop terminates: [`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable side-by-side HTML report (paper artifacts vs reproduced), and [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) (optional) audits paper claims against code locations. Both have `AskUserQuestion` in their `allowed-tools`, so REVIEW runs interactively in the main loop session — spawning them under the `Task` tool would fire prompts into nothing.
+
+The phase name **REVIEW** is the close-out (replacing what was briefly called SUMMARIZE_RUN); the rigor-dialed self-review pass that previously lived in a pre-implement REVIEW phase folded into ARCHITECT, SPECIFY, and IMPLEMENT as their internal cross-check. Same word, different jobs — the close-out is named by phase boundary, the self-reviews are named by their host phase.
 
 ## Workflow
 
 ### Interview (interactive — once per project)
 
-The interview is the first of two always-interactive bookends — INTERVIEW at the start, SUMMARIZE_RUN at the end. Every phase between them is configurable per the user's per-phase mode choice. Read [`references/interview.md`](references/interview.md) in full before starting.
+The interview is the first of two always-interactive bookends — INTERVIEW at the start, REVIEW at the close-out. Every phase between them is configurable per the user's per-phase mode choice. Read [`references/interview.md`](references/interview.md) in full before starting.
 
 The interview has six jobs:
 
 1. **Identify the paper** — DOI / arXiv ID / title; whether code is available; whether the user has prior experience with this paper.
-2. **Scope the reproduction** — full reproduction vs targeted (e.g. only the BAO fit), which figures/tables/numbers are the targets. The user's named targets get formalized into `astra.yaml`'s `outputs:`, `findings:`, `inputs:`, and `decisions:` structure during SPECIFY — there is no separate target-extraction phase.
+2. **Scope the reproduction** — full reproduction vs targeted (e.g. only the BAO fit), which figures/tables/numbers are the targets. The user's named targets get declared as `outputs:` in the stub `astra.yaml` during ARCHITECT and filled with evidence-backed `findings:` / `decisions:` during SPECIFY — there is no separate target-extraction phase.
 3. **Pick a runtime mode** — interactive / bash-loop / tmux-orchestrated. See "Runtime modes" below.
-4. **Pick a termination criterion** — frugality (weak) vs rigor (strong). The dial threads through REVIEW and IMPLEMENT, scaling iteration depth. See "Frugality vs rigor" below.
-5. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. Only INTERVIEW and SUMMARIZE_RUN are mandatory-interactive; every other phase is the user's call.
+4. **Pick a termination criterion** — frugality (weak) vs rigor (strong). The dial threads through ARCHITECT, SPECIFY, and IMPLEMENT, scaling each phase's internal self-review depth. See "Frugality vs rigor" below.
+5. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. Only INTERVIEW and REVIEW (close-out) are mandatory-interactive; every other phase is the user's call.
 6. **Draft the per-paper constitution and CLAUDE.md** — invoke `/constitution` to draft the constitution. Author the per-paper `CLAUDE.md` from the same conversation. The two files have separate jobs and don't overlap:
 
    - **`CLAUDE.md`** is *info and rules* — paper identity (DOI / arXiv ID / title / authors), where the original code lives (`work/reference/code/`), the code-as-canonical rule, the never-block-on-`AskUserQuestion`-mid-sub-agent rule, any paper-specific conventions or warnings, pointers to the constitution and `open-questions.md`. Auto-loaded by Claude Code on every walk-up to this directory. **Evolves over time** — iterations that learn new conventions or surface paper-specific gotchas can add lines so future sessions don't re-derive the same context.
@@ -94,22 +97,21 @@ Inside each ralph iteration, the agent reads the per-paper constitution, surveys
 | # | Phase | Reference | Outputs |
 |---|---|---|---|
 | 1 | ACQUIRE | [`references/acquire.md`](references/acquire.md) | `work/reference/{source/ \| document.md, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml}` |
-| 2 | STUDY | [`references/study.md`](references/study.md) | `work/notes/study/<NN>-<section>.md` (one per paper section, paper-vs-code agreement-check) + `work/notes/methodology.md` + `work/notes/cited_papers.yaml` |
+| 2 | ARCHITECT | [`references/architect.md`](references/architect.md) | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative — no anchors yet); `work/notes/architect/{paper-index.md, code-index.md}`; `work/notes/cited_papers.yaml`; rigor-dialed self-review |
 | 3 | LITERATURE | [`references/literature.md`](references/literature.md) | `work/notes/literature.yaml` + per-paper YAMLs |
-| 4 | SPECIFY | [`references/specify.md`](references/specify.md) | `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md`, `targets/targets.md` |
-| 5 | REVIEW | [`references/review.md`](references/review.md) | (in-place edits to spec + notes; rigor-dialed iterations) |
-| 6 | IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml`; rigor-dialed paper-vs-implementation review iterations |
-| 7 | RUN | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
-| 8 | COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
-| 9 | SUMMARIZE_RUN | [`references/summarize_run.md`](references/summarize_run.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, (optional) sentence audit, resolved `open-questions.md`, finalized constitution outcome |
+| 4 | SPECIFY | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (decisions, prior_insights, findings, anchored narrative); `universes/baseline.yaml`; `implementation-notes.md`; `targets/targets.md`; per-sub-analysis rigor-dialed self-review |
+| 5 | IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml`; rigor-dialed paper-vs-implementation review iterations |
+| 6 | RUN | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
+| 7 | COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
+| 8 | REVIEW (close-out) | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, (optional) sentence audit, resolved `open-questions.md`, finalized constitution outcome |
 
-The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it. On pass (or user-accepted partial), control returns to the user and SUMMARIZE_RUN runs interactively in the main session — drafting the report, invoking `/figure-comparison`, optionally `/check-sentence-by-sentence`, walking accumulated questions, and finalizing the constitution outcome.
+The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it. On pass (or user-accepted partial), control returns to the user and REVIEW runs interactively in the main session — drafting the report, invoking `/figure-comparison`, optionally `/check-sentence-by-sentence`, walking accumulated questions, and finalizing the constitution outcome.
 
-ACQUIRE folds in what was previously a separate PARSE phase: arxiv-LaTeX papers come pre-structured in their tarball (no Docling needed), and PDF-fallback papers run Docling inside ACQUIRE itself to produce `document.md` + extracted figures/tables. SPECIFY folds in target-formalization (what was a separate EXTRACT_TARGETS phase): the targets the user named in INTERVIEW become explicit `outputs:`, `findings:`, `inputs:`, and `decisions:` in `astra.yaml`, plus a small `targets/targets.md` ledger as a derivation for COMPARE.
+ACQUIRE folds in what was previously a separate PARSE phase: arxiv-LaTeX papers come pre-structured in their tarball (no Docling needed), and PDF-fallback papers run Docling inside ACQUIRE itself to produce `document.md` + extracted figures/tables. ARCHITECT replaces the old STUDY: instead of writing per-section paper-vs-code agreement-check files in markdown that SPECIFY would re-author into YAML, ARCHITECT writes the structural skeleton of `astra.yaml` directly (sub-analyses, inputs, outputs, narrative prose). SPECIFY then fills it in with `decisions:` / `prior_insights:` / `findings:` and `astra-anchor:` references — same content the old STUDY produced, but authored once in YAML rather than twice (markdown then YAML). The pre-implement REVIEW phase folded into ARCHITECT, SPECIFY, and IMPLEMENT as a rigor-dialed self-review discipline at every artifact-producing seam, freeing the REVIEW *name* for the close-out (replacing SUMMARIZE_RUN, whose name was a verb stuck describing one piece of what the close-out actually does).
 
 ### Per-phase mode (interactive vs sub-agent)
 
-A reproduction's most consequential decisions show up at known seams. Only the bookends are mandatory-interactive — INTERVIEW at the start, SUMMARIZE_RUN at the end. Every phase between them is configurable: the interview decides which run interactively (in the main loop session, the user reachable via `AskUserQuestion`) and which delegate to a sub-agent (Task tool with fresh context, no user reach).
+A reproduction's most consequential decisions show up at known seams. Only the bookends are mandatory-interactive — INTERVIEW at the start, REVIEW (close-out) at the end. Every phase between them is configurable: the interview decides which run interactively (in the main loop session, the user reachable via `AskUserQuestion`) and which delegate to a sub-agent (Task tool with fresh context, no user reach).
 
 Defaults the constitution starts with:
 
@@ -117,25 +119,26 @@ Defaults the constitution starts with:
 |---|---|---|---|
 | 0 | INTERVIEW | **interactive — *always*** | The first bookend. Scope, runtime, rigor, per-phase mode all decided here. |
 | 1 | ACQUIRE | user choice | Mostly mechanical (LaTeX-tarball download / Docling fallback / code clone); surfacing happens only on download failures. |
-| 2 | STUDY | sub-agent (parallel by paper-section) | One sub-agent per paper section, each reading the section *together with* its matching code. The value is the section-level paper-vs-code agreement check; parallel fresh context fits naturally. |
+| 2 | ARCHITECT | sub-agent (two parallel Explore + synthesis; rigor-dialed self-review) | Two Task-tool sub-agents fan out (one paper-side, one code-side) and produce indexes; a synthesis sub-agent writes the stub `astra.yaml`. Rigor-dialed fresh-context self-review pass cross-checks the stub before SPECIFY runs. |
 | 3 | LITERATURE | sub-agent | One sub-agent per cited paper — pure parallel grunt-work. Core, not opt-in: verifiability against citations is what `prior_insights` evidence depends on. |
-| 4 | SPECIFY | user choice (default interactive) | Material paper-vs-code conflicts and target-formalization happen here; the user usually wants to ratify. |
-| 5 | REVIEW | sub-agent (rigor-dialed) | Fresh-context sub-agent reads `astra.yaml` against paper + code and asks "is this consistent?" — frugal: skip or one pass; rigor: N rounds, each with a fresh reviewer + SPECIFY incorporating fixes. |
-| 6 | IMPLEMENT | sub-agent (rigor-dialed review iterations) | Writes recipes + scripts (parallelized by output where feasible). Frugal: minimal review pass after. Rigor: N rounds of "is the implementation consistent with the paper?" sub-agent review + fix iterations. |
-| 7 | RUN | user choice | Mechanical, but failures need diagnosis. |
-| 8 | COMPARE | user choice | Verdict (was the reproduction close enough?) is the user's call when interactive; sub-agent COMPARE writes the verdict and lets SUMMARIZE_RUN ratify. |
-| 9 | SUMMARIZE_RUN | **interactive — *always*** | The closing bookend. Drafts the report, runs `/figure-comparison` (mandatory) and `/check-sentence-by-sentence` (opt-in), walks `open-questions.md` with `AskUserQuestion`, finalizes the constitution outcome. |
+| 4 | SPECIFY | user choice (default interactive); two-pass-per-sub-analysis | **Paper pass**: authors decisions / prior_insights / findings with paper-anchored evidence; weaves `astra-anchor:` references into the existing narrative. **Code pass** (when code present): augments / amends with code-as-canonical insights and material-disagreement entries; surfaces material conflicts via `AskUserQuestion` (interactive) or `<paper-slug>/open-questions.md` (sub-agent). **Self-review** (rigor-dialed): fresh-context sub-agent per sub-analysis. Per-sub-analysis parallelism when independent. |
+| 5 | IMPLEMENT | sub-agent (rigor-dialed review iterations) | Writes recipes + scripts (parallelized by output where feasible). Frugal: minimal review pass after. Rigor: N rounds of fresh-context "is the implementation consistent with the paper?" review + fix iterations. |
+| 6 | RUN | user choice | Mechanical, but failures need diagnosis. |
+| 7 | COMPARE | user choice | Verdict (was the reproduction close enough?) is the user's call when interactive; sub-agent COMPARE writes the verdict and lets REVIEW (close-out) ratify. |
+| 8 | REVIEW (close-out) | **interactive — *always*** | The closing bookend. Drafts the report, runs `/figure-comparison` (mandatory) and `/check-sentence-by-sentence` (opt-in), walks `open-questions.md` with `AskUserQuestion`, finalizes the constitution outcome. |
 
 The constitution records the choice; iterations honor it. Sub-agent phases are spawned via the `Task` tool from inside the main loop session — that gives them fresh context but no user-reach. Interactive phases run inline in the loop session and may pause with `AskUserQuestion` at material seams.
 
-### Rigor vs frugality threads through REVIEW and IMPLEMENT
+### Rigor vs frugality threads through ARCHITECT, SPECIFY, and IMPLEMENT
+
+The frugality/rigor dial picked in INTERVIEW is not just a termination criterion for the COMPARE → IMPLEMENT loop. It also tunes how aggressively each artifact-producing phase self-checks. Same shape at every seam:
 
-The frugality/rigor dial picked in INTERVIEW is not just a termination criterion for the COMPARE → IMPLEMENT loop. It also tunes how aggressively REVIEW and IMPLEMENT self-check:
+- **Frugal**: skip self-review, or run one fresh-context sub-agent pass and incorporate fixes once.
+- **Rigor**: N rounds of fresh-context sub-agent review + fix. Each round runs a brand-new reviewer that does NOT see prior rounds' findings or fixes. Stop when two consecutive rounds find no fixes (strong-termination), or after 5 rounds (system cap), whichever comes first.
 
-- **Frugal**: REVIEW runs once or is skipped; IMPLEMENT does no extra review iterations after writing.
-- **Rigor**: REVIEW iterates — fresh-context sub-agent reads `astra.yaml` against paper + code; SPECIFY incorporates fixes; a *fresh* sub-agent re-reviews; repeat until two consecutive rounds find no fixes (or a configured cap is hit). IMPLEMENT does the same shape after writing recipes — sub-agent reads the implementation against the paper + code, fixes are incorporated, fresh sub-agent re-reviews.
+The artifact under review changes per phase — ARCHITECT reviews the stub `astra.yaml`; SPECIFY reviews each sub-analysis's filled spec; IMPLEMENT reviews `scripts/` + recipes against paper + code — but the cross-check shape is constant.
 
-The discipline is **never bias the reviewing sub-agent**: each round runs from fresh context with the prompt "check the spec/implementation is consistent with the paper and the code" — not "here's what was just fixed; check it." Otherwise the reviewer pattern-matches on prior fixes rather than thinking from first principles.
+The discipline is **never bias the reviewing sub-agent**: each round runs from fresh context with the prompt "check the artifact is consistent with the paper and the code" — not "here's what was just fixed; check it." Otherwise the reviewer pattern-matches on prior fixes rather than thinking from first principles.
 
 ### Code-as-canonical
 
@@ -143,25 +146,25 @@ When the original codebase is available at `work/reference/code/`, **the agent r
 
 This is the load-bearing fidelity discipline. Without it, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced (plot styles off, numerical results off). The per-paper CLAUDE.md restates the rule so every iteration's Claude session walks up to it.
 
-### Two surfaces for user attention: open-questions and SUMMARIZE_RUN
+### Two surfaces for user attention: open-questions and REVIEW (close-out)
 
-The reproduction has two periods of human reach — the bookends. INTERVIEW at the start, SUMMARIZE_RUN at the end. In between, the loop runs without a human in the conversation. The discipline has two surfaces to match:
+The reproduction has two periods of human reach — the bookends. INTERVIEW at the start, REVIEW (close-out) at the end. In between, the loop runs without a human in the conversation. The discipline has two surfaces to match:
 
 - **`<paper-slug>/open-questions.md` — the during-loop accumulator.** When a sub-agent or loop iteration would normally surface a question to the user (paper-vs-code conflicts, figures whose intent isn't obvious, ambiguities the constitution doesn't resolve), it appends the question to `open-questions.md` and continues with the best-judgment default. Never block on `AskUserQuestion` from inside a sub-agent — the prompt fires into nothing.
 
-- **SUMMARIZE_RUN — the post-loop interactive close-out.** When the COMPARE→IMPLEMENT loop terminates (verdict=pass or budget exhausted), control returns to the user. SUMMARIZE_RUN invokes `/figure-comparison` and (optionally) `/check-sentence-by-sentence` interactively — these skills can use `AskUserQuestion` because the human is back. Then it walks the user through `open-questions.md` with `AskUserQuestion`, lands resolutions, updates `astra.yaml` or `implementation-notes.md` accordingly, drafts `REPRODUCTION-SUMMARY.md`, and finalizes the constitution outcome.
+- **REVIEW (close-out) — the post-loop interactive close-out.** When the COMPARE→IMPLEMENT loop terminates (verdict=pass or budget exhausted), control returns to the user. REVIEW invokes `/figure-comparison` and (optionally) `/check-sentence-by-sentence` interactively — these skills can use `AskUserQuestion` because the human is back. Then it walks the user through `open-questions.md` with `AskUserQuestion`, lands resolutions, updates `astra.yaml` or `implementation-notes.md` accordingly, drafts `REPRODUCTION-SUMMARY.md`, and finalizes the constitution outcome.
 
 Stays in the conversation while the seams are still soft, walks away while the loop grinds, comes back to a rich review surface plus a list of "things you'd want to know."
 
-### Material conflicts (the SPECIFY seam)
+### Material conflicts (the SPECIFY code-pass seam)
 
-Inside SPECIFY, when paper and code disagree on something material:
+SPECIFY's code pass (per sub-analysis) is where paper-vs-code material disagreements surface. The paper pass authors decisions / findings from the paper alone; the code pass cross-checks them against the implementation. When paper and code disagree on something material:
 
 - **Material** = a different choice would plausibly change a numeric result the paper reports.
 - **Stylistic / cosmetic / pure-tooling differences** are not material — record them in `implementation-notes.md` and move on.
 - **Code is canonical** for numerics and method per "Code-as-canonical" above.
 - **Interactive SPECIFY**: surface the conflict with `AskUserQuestion`. The user picks which option `universes/baseline.yaml` selects.
-- **Sub-agent SPECIFY** (rare; default is interactive): take code as canonical, record the conflict in `open-questions.md`, and preserve both options in `astra.yaml` so the user can flip baseline at the next interactive seam.
+- **Sub-agent SPECIFY** (rare; default is interactive): take code as canonical, record the conflict in `open-questions.md`, and preserve both options in `astra.yaml` so the user can flip baseline at REVIEW (close-out).
 
 Both choices land in `astra.yaml` as decision options. Whichever the user picks becomes the option selected by `universes/baseline.yaml`; the alternative is preserved as a sibling option for future universe runs. See `references/specify.md` for the full SPECIFY discipline.
 
@@ -179,14 +182,15 @@ Workdir signals (file existence implies the phase has been done):
 |---|---|
 | `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) | ACQUIRE |
 | `work/reference/code/` | ACQUIRE (code clone) |
-| `work/notes/study/<NN>-<section>.md` files | STUDY (per-section paper-vs-code agreement-check) |
-| `work/notes/methodology.md` | STUDY (consolidated decision map + results inventory) |
+| `work/notes/architect/{paper-index.md,code-index.md}` | ARCHITECT (Explore pass) |
+| `astra.yaml` validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
+| `work/notes/cited_papers.yaml` | ARCHITECT (citation extraction) |
 | `work/notes/literature.yaml` | LITERATURE |
-| `astra.yaml` valid (`astra validate astra.yaml`) + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
+| `astra.yaml` validates with non-empty `decisions:` per sub-analysis + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
 | recipes present in `astra.yaml` | IMPLEMENT |
 | `results/<universe>/<output>/` | RUN |
 | `comparison-report.yaml` | COMPARE |
-| `REPRODUCTION-SUMMARY.md` + `.lightcone/comparison.html` + resolved `open-questions.md` | SUMMARIZE_RUN |
+| `REPRODUCTION-SUMMARY.md` + `.lightcone/comparison.html` + resolved `open-questions.md` | REVIEW (close-out) |
 
 `git log --oneline` complements this — phase commits are the chronological view.
 
@@ -196,8 +200,8 @@ Workdir signals (file existence implies the phase has been done):
 - [`/ralph-loops`](../ralph-loops/SKILL.md) — for the bash-loop and tmux-orchestrated runtime modes
 - [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE
 - [`/narrative`](../narrative/SKILL.md) — for SPECIFY
-- [`/figure-comparison`](../figure-comparison/SKILL.md) — for SUMMARIZE_RUN (mandatory)
-- [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) — for SUMMARIZE_RUN (opt-in)
+- [`/figure-comparison`](../figure-comparison/SKILL.md) — for REVIEW (close-out, mandatory)
+- [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) — for REVIEW (close-out, opt-in)
 
 ## Discipline
 
@@ -207,10 +211,10 @@ Workdir signals (file existence implies the phase has been done):
 - **Use the up-to-date CLI surfaces, not skill-specific wrappers.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces.
 - **arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv where there's no better source.
 - **The original code goes into `work/reference/code/`** during ACQUIRE when available, and stays there as the canonical reference for every subsequent iteration (see "Code-as-canonical" above).
-- **`/figure-comparison` and `/check-sentence-by-sentence` run inside SUMMARIZE_RUN, not inside the loop.** Both have `AskUserQuestion` in their `allowed-tools`; SUMMARIZE_RUN is the always-interactive close-out bookend that runs them in the main session so the prompts land. Don't try to spawn either under the `Task` tool from inside the loop.
-- **Only the bookends are mandatory-interactive.** INTERVIEW (start) and SUMMARIZE_RUN (close). Every other phase is configurable per the interview's per-phase mode choice — no "always interactive" flag on anything in between. The dial that does the heavy lifting on quality is rigor/frugality, threaded through REVIEW and IMPLEMENT.
-- **Don't bias review sub-agents.** REVIEW and IMPLEMENT's review iterations spawn fresh sub-agents whose prompt is "check `astra.yaml` (or the implementation) is consistent with the paper and the code" — never "here's what was just implemented or fixed last round." Each round runs from a fresh reviewing context. Otherwise the reviewer pattern-matches on prior fixes rather than thinking from first principles.
-- **STUDY parallelizes by paper-section, not by source.** A single sub-agent that reads "the whole paper" can't compare with "the whole code" — too much context. A sub-agent that reads paper-section A *plus* the matching code (located via the bibliography or the code's own structure) is the right unit. Sub-agents fan out across the paper's sections; each one carries enough context to surface paper-vs-code disagreements at its own level.
+- **`/figure-comparison` and `/check-sentence-by-sentence` run inside REVIEW (close-out), not inside the loop.** Both have `AskUserQuestion` in their `allowed-tools`; REVIEW is the always-interactive close-out bookend that runs them in the main session so the prompts land. Don't try to spawn either under the `Task` tool from inside the loop.
+- **Only the bookends are mandatory-interactive.** INTERVIEW (start) and REVIEW (close). Every other phase is configurable per the interview's per-phase mode choice — no "always interactive" flag on anything in between. The dial that does the heavy lifting on quality is rigor/frugality, threaded through ARCHITECT, SPECIFY, and IMPLEMENT's internal self-review passes.
+- **Don't bias review sub-agents.** ARCHITECT, SPECIFY, and IMPLEMENT's self-review iterations spawn fresh sub-agents whose prompt is "check the artifact is consistent with the paper and the code" — never "here's what was just authored or fixed last round." Each round runs from a fresh reviewing context. Otherwise the reviewer pattern-matches on prior fixes rather than thinking from first principles.
+- **ARCHITECT decides structure; SPECIFY decides content.** ARCHITECT's two parallel Explore sub-agents (paper-side + code-side) feed a synthesis sub-agent that writes the stub `astra.yaml` — sub-analyses, inputs, outputs, narrative prose. SPECIFY's per-sub-analysis paper pass + code pass + self-review fills in `decisions:`, `prior_insights:`, `findings:` and weaves anchor references into the narrative. Splitting **structure** from **content** keeps each phase's cognitive load bounded.
 - **No synthetic data.** Unless the paper itself uses synthetic data as its input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement phase reference repeats this; treat it as load-bearing.
 - **Tmux preferred-when-available, never required.** Modes (1) and (2) work without it.
 - **The siblings don't know about paper2astra.** Each SKILL stands on its own.
diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/paper2astra/references/interview.md
index 4984685a..434ecf7f 100644
--- a/claude/lightcone/skills/paper2astra/references/interview.md
+++ b/claude/lightcone/skills/paper2astra/references/interview.md
@@ -33,8 +33,8 @@ Use `AskUserQuestion` if the user did not supply enough on `/paper2astra` invoca
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
 - **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) **If code is available, every implementing iteration will read from `work/reference/code/`** and treat code as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md).
-- **User's prior familiarity.** Has the user reproduced this paper before? Read the paper recently? Worked with the original authors? This affects how much of the STUDY / SPECIFY work needs human ratification.
-- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; STUDY will read it.
+- **User's prior familiarity.** Has the user reproduced this paper before? Read the paper recently? Worked with the original authors? This affects how much of the ARCHITECT / SPECIFY work needs human ratification.
+- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; ARCHITECT will read it.
 
 ### 2. Scope the reproduction
 
@@ -44,9 +44,9 @@ Ask:
 
 - **Full reproduction or targeted?** Full = every primary result the paper reports. Targeted = "I only care about figures 3, 4, 7 and the headline number in Table 2." Targeted is cheaper and produces a tighter astra.yaml.
 - **Specific decisions of interest.** A paper makes many choices. The user may care most about a few — e.g. "I want the BAO fit to use a different damping prior than the paper." These become first-class decisions in the spec, with the alternative preserved as a sibling option.
-- **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; SPECIFY will mirror the structure. If the paper is monolithic, one analysis suffices.
+- **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; ARCHITECT will mirror the structure as the stub's decomposition. If the paper is monolithic, one analysis suffices.
 
-These answers live in the constitution's **Desired State** section. There is no separate target-extraction phase — the targets the user names here become explicit `outputs:`, `findings:`, `inputs:`, and `decisions:` in `astra.yaml` during SPECIFY.
+These answers live in the constitution's **Desired State** section. There is no separate target-extraction phase — the targets the user names here become explicit `outputs:` declared in the stub `astra.yaml` during ARCHITECT, then filled with paper-anchored `findings:` / `decisions:` during SPECIFY.
 
 ### 3. Pick a runtime mode
 
@@ -68,21 +68,21 @@ If tmux isn't installed, only (1) and (2) appear in the question. The chosen mod
 
 Ask:
 
-- **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights. REVIEW skips or runs once; IMPLEMENT does no extra review iterations.
-- **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens. REVIEW runs N rounds — each round a fresh sub-agent reads `astra.yaml` against paper + code, fixes are incorporated, then a *fresh* sub-agent re-reviews; iterate until two consecutive rounds find no fixes. IMPLEMENT does the same shape after writing recipes.
+- **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights. ARCHITECT, SPECIFY, and IMPLEMENT each skip or run their internal self-review pass once.
+- **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens. ARCHITECT, SPECIFY, and IMPLEMENT each iterate their internal self-review — fresh-context sub-agent per round; fixes incorporated; a *fresh* sub-agent re-reviews; iterate until two consecutive rounds find no fixes (or a 5-round system cap).
 
-Default to strong for fidelity-critical reproductions; weak when the user wants to cap token spend. The choice goes into the per-paper constitution and is read by both REVIEW and IMPLEMENT.
+Default to strong for fidelity-critical reproductions; weak when the user wants to cap token spend. The choice goes into the per-paper constitution and is read by ARCHITECT, SPECIFY, and IMPLEMENT.
 
 ### 5. Choose interactive vs sub-agent per phase
 
 Read the "Per-phase mode" table in `../SKILL.md`. The defaults are reasonable. Walk the user through it briefly:
 
-- **The two bookends are always interactive:** INTERVIEW (now) and SUMMARIZE_RUN (the close-out). These are the only mandatory user-reach phases — every other phase is the user's call.
-- **Phases whose defaults are sub-agent (parallel fresh context fits the work):** STUDY (parallelized by paper-section + matching code), LITERATURE (one sub-agent per cited paper), REVIEW (rigor-dialed; fresh-context reviewers per round), IMPLEMENT (recipe-writing parallelized by output where feasible, with rigor-dialed review iterations after).
-- **Phases whose default is interactive:** SPECIFY (material paper-vs-code conflicts and target-formalization want ratification).
+- **The two bookends are always interactive:** INTERVIEW (now) and REVIEW (close-out). These are the only mandatory user-reach phases — every other phase is the user's call.
+- **Phases whose defaults are sub-agent (parallel fresh context fits the work):** ARCHITECT (two parallel Explore sub-agents — paper-side + code-side — feed a synthesis sub-agent that writes the stub `astra.yaml`; rigor-dialed self-review pass after), LITERATURE (one sub-agent per cited paper), IMPLEMENT (recipe-writing parallelized by output where feasible, with rigor-dialed self-review iterations after).
+- **Phases whose default is interactive:** SPECIFY (material paper-vs-code conflicts in the code pass want ratification; per-sub-analysis self-review pass is rigor-dialed regardless of mode).
 - **Phases the user genuinely chooses:** ACQUIRE, RUN, COMPARE. These can run either way without losing the surface that matters most.
 
-If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table. Phases marked sub-agent that hit a question they'd normally surface to the user **append the question to `<paper-slug>/open-questions.md`** rather than blocking; the user resolves them in SUMMARIZE_RUN.
+If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table. Phases marked sub-agent that hit a question they'd normally surface to the user **append the question to `<paper-slug>/open-questions.md`** rather than blocking; the user resolves them in REVIEW (close-out).
 
 ### 6. Draft the constitution and CLAUDE.md
 
@@ -129,30 +129,31 @@ The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or the attempt b
 |---|---|---|
 | 0 | INTERVIEW | interactive (always) |
 | 1 | ACQUIRE | <per user> |
-| 2 | STUDY | sub-agent (parallel by paper-section) |
+| 2 | ARCHITECT | sub-agent (two parallel Explore + synthesis; rigor-dialed self-review) |
 | 3 | LITERATURE | sub-agent |
-| 4 | SPECIFY | interactive |
-| 5 | REVIEW | sub-agent (rigor-dialed) |
-| 6 | IMPLEMENT | sub-agent (rigor-dialed review iterations) |
-| 7 | RUN | <per user> |
-| 8 | COMPARE | <per user> |
-| 9 | SUMMARIZE_RUN | interactive (always) |
+| 4 | SPECIFY | interactive (two-pass per sub-analysis: paper, code, rigor-dialed self-review) |
+| 5 | IMPLEMENT | sub-agent (rigor-dialed review iterations) |
+| 6 | RUN | <per user> |
+| 7 | COMPARE | <per user> |
+| 8 | REVIEW (close-out) | interactive (always) |
 
 ## Evidence
 
 - `ls work/reference/source/ || ls work/reference/document.md` — ACQUIRE done (arxiv-LaTeX tarball or Docling fallback)
 - `ls work/reference/code/` — original code present (canonical reference)
-- `ls work/notes/study/*.md && ls work/notes/methodology.md` — STUDY done (per-section paper-vs-code agreement-check + consolidated methodology)
+- `ls work/notes/architect/paper-index.md && ls work/notes/architect/code-index.md` — ARCHITECT Explore pass done
+- `ls astra.yaml && astra validate astra.yaml` (with empty `decisions:`/`prior_insights:`/`findings:` blocks) — ARCHITECT stub written
+- `ls work/notes/cited_papers.yaml` — ARCHITECT citation list ready for LITERATURE
 - `ls work/notes/literature.yaml` — LITERATURE done
-- `ls astra.yaml && astra validate astra.yaml && ls targets/targets.md && ls implementation-notes.md` — SPECIFY done (target-formalization included)
+- `astra validate astra.yaml` (with non-empty `decisions:` per sub-analysis) `&& ls targets/targets.md && ls implementation-notes.md` — SPECIFY done
 - `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs
 - `ls comparison-report.yaml && yq '.verdict' comparison-report.yaml` — most-recent COMPARE verdict
-- `ls REPRODUCTION-SUMMARY.md && ls .lightcone/comparison.html` — SUMMARIZE_RUN done
+- `ls REPRODUCTION-SUMMARY.md && ls .lightcone/comparison.html` — REVIEW (close-out) done
 - `git log --oneline` — chronological view of phase commits
 
 ## Open Questions
 
-(empty — populated as the loop runs; questions accrete in `<paper-slug>/open-questions.md`, the running report the user resolves in SUMMARIZE_RUN before the constitution closes.)
+(empty — populated as the loop runs; questions accrete in `<paper-slug>/open-questions.md`, the running report the user resolves in REVIEW (close-out) before the constitution closes.)
 ```
 
 Then author the per-paper `<paper-slug>/CLAUDE.md` from the same conversation. The CLAUDE.md is *info and rules*, not desired state — paper identity, where things live, disciplines that always apply. Approximate shape:
@@ -172,12 +173,12 @@ Reproduction of <paper title> (<arXiv ID>). DOI: <doi>.
 
 - Workdir layout follows Paper2ASTRA conventions: `work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`.
 - The constitution (desired state, runtime mode, scope, evidence, per-phase mode) lives at `<constitution>.md` in this directory.
-- The during-loop questions log lives at `open-questions.md`. The user reviews it in SUMMARIZE_RUN.
+- The during-loop questions log lives at `open-questions.md`. The user reviews it in REVIEW (close-out).
 
 ## Rules
 
 - **Code-as-canonical when `work/reference/code/` exists.** Every implementing iteration reads relevant code. Where paper and code disagree, code is canonical for numerics, plotting, and method.
-- **Never block on `AskUserQuestion` mid-sub-agent.** When a sub-agent or loop phase would surface a question to the user, append it to `open-questions.md` and continue with the best-judgment default. The user resolves in SUMMARIZE_RUN.
+- **Never block on `AskUserQuestion` mid-sub-agent.** When a sub-agent or loop phase would surface a question to the user, append it to `open-questions.md` and continue with the best-judgment default. The user resolves in REVIEW (close-out).
 - **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
 - **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
 

From 7329524d9dcd51a6f1a1cd83752dd54b154df60e Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 6 May 2026 04:04:36 +0200
Subject: [PATCH 016/124] paper2astra: move LITERATURE after SPECIFY in the
 skill
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The architect-first commits (50c0869 + 7fb82fb) realized the rest of the
redesign but kept LITERATURE at phase 3 (before SPECIFY). The constitution's
intent is LITERATURE-after-SPECIFY because relevant prior_insights are defined
by the decisions and findings they justify, not pre-known — fetching cited
papers speculatively before SPECIFY would do work for citations that may
never end up needed.

Final ordering: INTERVIEW → ACQUIRE → ARCHITECT → SPECIFY → LITERATURE →
IMPLEMENT → RUN → COMPARE → REVIEW.

- SKILL.md: swap rows 3↔4 in both per-phase tables (outputs description +
  default-mode description); refresh workdir signals; rewrite the architect
  paragraph to say SPECIFY records prior_insights placeholders, LITERATURE
  resolves them.
- references/literature.md: full rewrite for post-SPECIFY position. New shape
  consumes astra.yaml's prior_insights placeholders, caches cited PDFs via
  `astra paper add`, and authors resolved evidence: selectors back into
  astra.yaml's prior_insights[<id>].evidence[]. Per-paper sub-agents write to
  work/notes/literature/<doi-slug>.yaml (resume-by-existence preserved); a
  single merge step writes back to astra.yaml (avoids YAML round-trip
  conflicts). Rigor-dialed self-review pass after merge.
- references/specify.md: paper pass records prior_insights as citation-only
  placeholders (id, claim, doi from cited_papers.yaml, decision_links — no
  evidence: yet). Drop literature.yaml as input; --verify-evidence runs after
  LITERATURE.
- references/architect.md: cited_papers.yaml is now the marker→DOI map for
  SPECIFY's placeholders + LITERATURE's fetch list (was: list LITERATURE
  mines).
- references/interview.md: phase-mode table swap; evidence checks reflect
  placeholder vs resolved-evidence states.
- references/review.md: --verify-evidence prerequisite phrased as
  "after LITERATURE".
- skills/README.md: paper2astra row enumerates the full 9-phase order;
  rigor-dial bullet adds LITERATURE alongside ARCHITECT, SPECIFY, IMPLEMENT.
---
 claude/lightcone/skills/README.md             |   2 +-
 claude/lightcone/skills/paper2astra/SKILL.md  |  14 +-
 .../paper2astra/references/architect.md       |   4 +-
 .../paper2astra/references/interview.md       |  12 +-
 .../paper2astra/references/literature.md      | 221 +++++++++++-------
 .../skills/paper2astra/references/review.md   |   2 +-
 .../skills/paper2astra/references/specify.md  |  31 ++-
 7 files changed, 174 insertions(+), 112 deletions(-)

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 2e473038..07588636 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -18,7 +18,7 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. The constitution carries 9 phases, bookended by two always-interactive seams (INTERVIEW at start, REVIEW at close-out); every other phase is configurable per the user's per-phase mode choice, with ARCHITECT, SPECIFY, and IMPLEMENT additionally tuned by a frugality / rigor dial that drives each phase's internal fresh-context self-review. |
+| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. The constitution carries 9 phases — INTERVIEW → ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW — bookended by two always-interactive seams (INTERVIEW at start, REVIEW at close-out); every other phase is configurable per the user's per-phase mode choice, with ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT additionally tuned by a frugality / rigor dial that drives each phase's internal fresh-context self-review. |
 | [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. |
 | [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. |
 | [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by paper2astra's bash-loop and tmux-orchestrated runtime modes. |
diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
index ef767087..b6b62b94 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -98,8 +98,8 @@ Inside each ralph iteration, the agent reads the per-paper constitution, surveys
 |---|---|---|---|
 | 1 | ACQUIRE | [`references/acquire.md`](references/acquire.md) | `work/reference/{source/ \| document.md, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml}` |
 | 2 | ARCHITECT | [`references/architect.md`](references/architect.md) | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative — no anchors yet); `work/notes/architect/{paper-index.md, code-index.md}`; `work/notes/cited_papers.yaml`; rigor-dialed self-review |
-| 3 | LITERATURE | [`references/literature.md`](references/literature.md) | `work/notes/literature.yaml` + per-paper YAMLs |
-| 4 | SPECIFY | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (decisions, prior_insights, findings, anchored narrative); `universes/baseline.yaml`; `implementation-notes.md`; `targets/targets.md`; per-sub-analysis rigor-dialed self-review |
+| 3 | SPECIFY | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (decisions + findings authored from the paper, prior_insights as citation-only **placeholders**, anchored narrative); `universes/baseline.yaml`; `implementation-notes.md`; `targets/targets.md`; per-sub-analysis rigor-dialed self-review |
+| 4 | LITERATURE | [`references/literature.md`](references/literature.md) | `astra.yaml` with `prior_insights:` placeholders **resolved** (`evidence:` selectors authored against the cited papers); per-paper PDFs cached via `astra paper add`; rigor-dialed self-review |
 | 5 | IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml`; rigor-dialed paper-vs-implementation review iterations |
 | 6 | RUN | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
 | 7 | COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
@@ -107,7 +107,7 @@ Inside each ralph iteration, the agent reads the per-paper constitution, surveys
 
 The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it. On pass (or user-accepted partial), control returns to the user and REVIEW runs interactively in the main session — drafting the report, invoking `/figure-comparison`, optionally `/check-sentence-by-sentence`, walking accumulated questions, and finalizing the constitution outcome.
 
-ACQUIRE folds in what was previously a separate PARSE phase: arxiv-LaTeX papers come pre-structured in their tarball (no Docling needed), and PDF-fallback papers run Docling inside ACQUIRE itself to produce `document.md` + extracted figures/tables. ARCHITECT replaces the old STUDY: instead of writing per-section paper-vs-code agreement-check files in markdown that SPECIFY would re-author into YAML, ARCHITECT writes the structural skeleton of `astra.yaml` directly (sub-analyses, inputs, outputs, narrative prose). SPECIFY then fills it in with `decisions:` / `prior_insights:` / `findings:` and `astra-anchor:` references — same content the old STUDY produced, but authored once in YAML rather than twice (markdown then YAML). The pre-implement REVIEW phase folded into ARCHITECT, SPECIFY, and IMPLEMENT as a rigor-dialed self-review discipline at every artifact-producing seam, freeing the REVIEW *name* for the close-out (replacing SUMMARIZE_RUN, whose name was a verb stuck describing one piece of what the close-out actually does).
+ACQUIRE folds in what was previously a separate PARSE phase: arxiv-LaTeX papers come pre-structured in their tarball (no Docling needed), and PDF-fallback papers run Docling inside ACQUIRE itself to produce `document.md` + extracted figures/tables. ARCHITECT replaces the old STUDY: instead of writing per-section paper-vs-code agreement-check files in markdown that SPECIFY would re-author into YAML, ARCHITECT writes the structural skeleton of `astra.yaml` directly (sub-analyses, inputs, outputs, narrative prose). SPECIFY then fills it in with `decisions:` and `findings:` and `astra-anchor:` references; `prior_insights:` are recorded as citation-only placeholders for LITERATURE to resolve next — fetching cited papers speculatively before SPECIFY would do work for citations that may never end up needed, so LITERATURE comes *after* SPECIFY now. LITERATURE then iterates over each placeholder, caches the cited paper via `astra paper add`, and authors the resolved `evidence:` selectors back into `astra.yaml`. The pre-implement REVIEW phase folded into ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT as a rigor-dialed self-review discipline at every artifact-producing seam, freeing the REVIEW *name* for the close-out (replacing SUMMARIZE_RUN, whose name was a verb stuck describing one piece of what the close-out actually does).
 
 ### Per-phase mode (interactive vs sub-agent)
 
@@ -120,8 +120,8 @@ Defaults the constitution starts with:
 | 0 | INTERVIEW | **interactive — *always*** | The first bookend. Scope, runtime, rigor, per-phase mode all decided here. |
 | 1 | ACQUIRE | user choice | Mostly mechanical (LaTeX-tarball download / Docling fallback / code clone); surfacing happens only on download failures. |
 | 2 | ARCHITECT | sub-agent (two parallel Explore + synthesis; rigor-dialed self-review) | Two Task-tool sub-agents fan out (one paper-side, one code-side) and produce indexes; a synthesis sub-agent writes the stub `astra.yaml`. Rigor-dialed fresh-context self-review pass cross-checks the stub before SPECIFY runs. |
-| 3 | LITERATURE | sub-agent | One sub-agent per cited paper — pure parallel grunt-work. Core, not opt-in: verifiability against citations is what `prior_insights` evidence depends on. |
-| 4 | SPECIFY | user choice (default interactive); two-pass-per-sub-analysis | **Paper pass**: authors decisions / prior_insights / findings with paper-anchored evidence; weaves `astra-anchor:` references into the existing narrative. **Code pass** (when code present): augments / amends with code-as-canonical insights and material-disagreement entries; surfaces material conflicts via `AskUserQuestion` (interactive) or `<paper-slug>/open-questions.md` (sub-agent). **Self-review** (rigor-dialed): fresh-context sub-agent per sub-analysis. Per-sub-analysis parallelism when independent. |
+| 3 | SPECIFY | user choice (default interactive); two-pass-per-sub-analysis | **Paper pass**: authors `decisions:` and `findings:` with paper-anchored evidence; records citation markers (`[12]`, `Smith+24`) as `prior_insights:` placeholders (citation-only — no `evidence:` selector yet, LITERATURE fills those in); weaves `astra-anchor:` references into the existing narrative. **Code pass** (when code present): augments / amends with code-as-canonical insights and material-disagreement entries; surfaces material conflicts via `AskUserQuestion` (interactive) or `<paper-slug>/open-questions.md` (sub-agent). **Self-review** (rigor-dialed): fresh-context sub-agent per sub-analysis. Per-sub-analysis parallelism when independent. |
+| 4 | LITERATURE | sub-agent (rigor-dialed self-review) | Reads SPECIFY's `prior_insights:` placeholders, caches each cited paper via `astra paper add`, and authors the resolved `evidence:` selectors back into `astra.yaml`'s `prior_insights[<id>].evidence[]` so each placeholder becomes a verified citation. One sub-agent per cited paper — pure parallel grunt-work. Self-review: fresh-context sub-agent reads each `prior_insights:` entry against its cited paper and asks "does this evidence actually justify the decision/finding it's attached to?" Core, not opt-in: verifiability against citations is what `prior_insights` evidence depends on. |
 | 5 | IMPLEMENT | sub-agent (rigor-dialed review iterations) | Writes recipes + scripts (parallelized by output where feasible). Frugal: minimal review pass after. Rigor: N rounds of fresh-context "is the implementation consistent with the paper?" review + fix iterations. |
 | 6 | RUN | user choice | Mechanical, but failures need diagnosis. |
 | 7 | COMPARE | user choice | Verdict (was the reproduction close enough?) is the user's call when interactive; sub-agent COMPARE writes the verdict and lets REVIEW (close-out) ratify. |
@@ -185,8 +185,8 @@ Workdir signals (file existence implies the phase has been done):
 | `work/notes/architect/{paper-index.md,code-index.md}` | ARCHITECT (Explore pass) |
 | `astra.yaml` validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
 | `work/notes/cited_papers.yaml` | ARCHITECT (citation extraction) |
-| `work/notes/literature.yaml` | LITERATURE |
-| `astra.yaml` validates with non-empty `decisions:` per sub-analysis + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
+| `astra.yaml` has non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` entries present as citation-only placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
+| `astra.yaml`'s `prior_insights:` entries each have a resolved `evidence:` selector (verified by `astra validate --verify-evidence`); `work/notes/literature/<doi-slug>.yaml` files present (one per cited paper) | LITERATURE |
 | recipes present in `astra.yaml` | IMPLEMENT |
 | `results/<universe>/<output>/` | RUN |
 | `comparison-report.yaml` | COMPARE |
diff --git a/claude/lightcone/skills/paper2astra/references/architect.md b/claude/lightcone/skills/paper2astra/references/architect.md
index 2d31a7c4..1110ed2c 100644
--- a/claude/lightcone/skills/paper2astra/references/architect.md
+++ b/claude/lightcone/skills/paper2astra/references/architect.md
@@ -146,7 +146,7 @@ Spawn one synthesis sub-agent that reads both index files and writes the stub. T
 >        citation: "Smith et al. (2020)"
 >        relevance: "One-line description of why this paper matters for replication"
 >    ```
->    This is what LITERATURE mines.
+>    This is the marker→DOI map SPECIFY uses to write each `prior_insights:` placeholder's `doi:` field, and LITERATURE consumes when fetching the cited papers to resolve those placeholders.
 > 6. **Validate** with `astra validate astra.yaml`. The stub MUST validate as written — even with empty `decisions:` / `prior_insights:` / `findings:` blocks, the structural fields and the narrative prose must pass schema checks.
 >
 > ### Stub shape — what `astra.yaml` looks like after ARCHITECT
@@ -178,7 +178,7 @@ Spawn one synthesis sub-agent that reads both index files and writes the stub. T
 >         description: |
 >           <one-line on what this output is>
 >     decisions: {}      # SPECIFY fills
->     prior_insights: {} # LITERATURE → SPECIFY fills
+>     prior_insights: {} # SPECIFY records placeholders (citation only), LITERATURE resolves evidence
 >     findings: {}       # SPECIFY fills
 >
 >   <sub-analysis-id-2>:
diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/paper2astra/references/interview.md
index 434ecf7f..7afb7a59 100644
--- a/claude/lightcone/skills/paper2astra/references/interview.md
+++ b/claude/lightcone/skills/paper2astra/references/interview.md
@@ -130,8 +130,8 @@ The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or the attempt b
 | 0 | INTERVIEW | interactive (always) |
 | 1 | ACQUIRE | <per user> |
 | 2 | ARCHITECT | sub-agent (two parallel Explore + synthesis; rigor-dialed self-review) |
-| 3 | LITERATURE | sub-agent |
-| 4 | SPECIFY | interactive (two-pass per sub-analysis: paper, code, rigor-dialed self-review) |
+| 3 | SPECIFY | interactive (two-pass per sub-analysis: paper, code, rigor-dialed self-review) |
+| 4 | LITERATURE | sub-agent (rigor-dialed self-review) |
 | 5 | IMPLEMENT | sub-agent (rigor-dialed review iterations) |
 | 6 | RUN | <per user> |
 | 7 | COMPARE | <per user> |
@@ -143,10 +143,10 @@ The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or the attempt b
 - `ls work/reference/code/` — original code present (canonical reference)
 - `ls work/notes/architect/paper-index.md && ls work/notes/architect/code-index.md` — ARCHITECT Explore pass done
 - `ls astra.yaml && astra validate astra.yaml` (with empty `decisions:`/`prior_insights:`/`findings:` blocks) — ARCHITECT stub written
-- `ls work/notes/cited_papers.yaml` — ARCHITECT citation list ready for LITERATURE
-- `ls work/notes/literature.yaml` — LITERATURE done
-- `astra validate astra.yaml` (with non-empty `decisions:` per sub-analysis) `&& ls targets/targets.md && ls implementation-notes.md` — SPECIFY done
-- `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs
+- `ls work/notes/cited_papers.yaml` — ARCHITECT citation list (used by SPECIFY for marker→DOI mapping; consumed by LITERATURE for placeholder resolution)
+- `astra validate astra.yaml` (with non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` populated as citation-only placeholders) `&& ls targets/targets.md && ls implementation-notes.md` — SPECIFY done
+- `ls work/notes/literature/` (one `<doi-slug>.yaml` per cited DOI) and `astra.yaml`'s `prior_insights:` entries each have a resolved `evidence:` selector — LITERATURE done
+- `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs (runs after LITERATURE)
 - `ls comparison-report.yaml && yq '.verdict' comparison-report.yaml` — most-recent COMPARE verdict
 - `ls REPRODUCTION-SUMMARY.md && ls .lightcone/comparison.html` — REVIEW (close-out) done
 - `git log --oneline` — chronological view of phase commits
diff --git a/claude/lightcone/skills/paper2astra/references/literature.md b/claude/lightcone/skills/paper2astra/references/literature.md
index 6d6b9753..9b392593 100644
--- a/claude/lightcone/skills/paper2astra/references/literature.md
+++ b/claude/lightcone/skills/paper2astra/references/literature.md
@@ -1,44 +1,61 @@
-# LITERATURE — extract prior insights from cited papers
+# LITERATURE — resolve `prior_insights:` placeholders against the cited papers
 
-For each cited paper that informed a methodological decision, extract evidence-quote-backed insights and link them to the relevant decisions and options. Synthesize across papers into `work/notes/literature.yaml`, which SPECIFY consumes when authoring `astra.yaml`'s `prior_insights` block.
+After SPECIFY's paper pass records each citation marker as a `prior_insights:` *placeholder* (id, claim, doi, decision_links — no `evidence:` selector), LITERATURE fetches each cited paper, finds the verbatim quote that justifies the placeholder's claim, and authors the resolved `evidence:` selector back into `astra.yaml`'s `prior_insights[<id>].evidence[]`. After LITERATURE, every `prior_insights:` entry is a verified citation; `astra validate astra.yaml --verify-evidence` should pass.
 
-The constitution's per-phase mode is **always sub-agent** for this phase. Spawn one Task-tool sub-agent per cited paper for parallel extraction; spawn a final sub-agent for synthesis. This is pure parallel grunt-work.
+LITERATURE runs **after SPECIFY**, not before — relevant `prior_insights:` are defined by the decisions and findings they justify. Fetching cited papers speculatively before SPECIFY would do work for citations that may never end up needed.
+
+The constitution's per-phase mode is **always sub-agent** for this phase. Spawn one Task-tool sub-agent per cited paper for parallel resolution — they edit disjoint subsets of `astra.yaml`'s `prior_insights:` entries (only the placeholders whose `doi:` matches the sub-agent's paper). A merge step (orchestrator-inline) writes the per-paper resolutions back into `astra.yaml` after all sub-agents complete; a final fresh-context sub-agent runs the rigor-dialed self-review.
 
 ## Inputs
 
-- `work/notes/cited_papers.yaml` — the list of papers to mine, from ARCHITECT (paper-side Explore output, merged by the synthesis sub-agent)
-- `work/notes/architect/paper-index.md` — has the decision clusters per sub-analysis; each per-paper sub-agent gets it as context
-- `astra.yaml` — the stub from ARCHITECT (sub-analyses + outputs declared; `decisions:` empty); per-paper sub-agents read the structure to know what they're searching for evidence about
-- `work/reference/source/` (Path A — arXiv LaTeX) or `work/reference/document.md` (Path B — Docling) — the target paper (for reference)
+- `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries with `claim:` + `doi:` + `decision_links:` but no `evidence:` selector. These are the placeholders LITERATURE resolves.
+- `work/notes/cited_papers.yaml` — citation marker → DOI mapping from ARCHITECT (used to discover which DOIs need fetching, complementing the per-placeholder `doi:` lookup).
+- `work/notes/architect/paper-index.md` — has the decision clusters per sub-analysis; per-paper sub-agents get it as context.
+- `work/reference/source/` (Path A — arXiv LaTeX) or `work/reference/document.md` (Path B — Docling) — the target paper (for context on how the cited paper is invoked).
 
 ## Outputs
 
-- `work/notes/literature/<doi-slug>.yaml` — one file per cited paper (per-paper extraction)
-- `work/notes/literature.yaml` — synthesized merged view (final output)
+- `astra.yaml` — `prior_insights:` placeholders **resolved**: each placeholder now has at least one `evidence:` entry with `TextQuoteSelector` (`exact:`, `prefix:`, `suffix:`) plus `FragmentSelector` (`page:`) pointing at the cited paper. `astra validate astra.yaml --verify-evidence` returns clean.
+- `work/notes/literature/<doi-slug>.yaml` — one file per cited paper carrying that paper's per-placeholder evidence resolutions (intermediate artifact; resume-by-existence — re-running LITERATURE skips a paper whose YAML already exists).
+- Cached PDFs registered with `astra paper add` so `astra validate --verify-evidence` and downstream auditors can find them.
+
+## How it runs
+
+1. **Discovery.** Read `astra.yaml` and collect every `prior_insights:` entry whose `evidence:` is missing or empty. Group by `doi:`. Each group becomes a per-paper sub-agent invocation.
+2. **Per-paper resolution (parallel).** Spawn one Task-tool sub-agent per DOI group. Each sub-agent: caches the PDF via `astra paper add`, reads the cited paper, finds verbatim quote(s) supporting each placeholder claim in its group, and writes the per-placeholder `evidence:` resolutions to `work/notes/literature/<doi-slug>.yaml`. Sub-agents do not edit `astra.yaml` directly — they write their per-paper YAML and exit.
+3. **Merge.** A short orchestrator pass (or a single merge sub-agent) reads each `work/notes/literature/<doi-slug>.yaml` and writes the resolved `evidence:` entries back into `astra.yaml`'s `prior_insights[<insight_id>].evidence[]`. Single writer, no merge conflicts.
+4. **Rigor-dialed self-review.** A fresh-context sub-agent reads each `prior_insights:` entry against its cited paper and asks "does this evidence actually justify the claim it's attached to?" Iterate per the rigor dial — frugal: one pass; rigor: N rounds until two consecutive rounds find no fixes (or a 5-round system cap).
 
-## Per-paper extraction sub-agent — system prompt
+## Per-paper resolution sub-agent — system prompt
 
-> You are an ASTRA insight extraction agent with self-validation capability. Your task is to extract scientific insights from a single cited paper that bear on specific methodological decisions already identified in the target paper.
+> You are an ASTRA evidence-resolution agent. Your task is to find the verbatim quotes in a single cited paper that justify a set of `prior_insights:` placeholders authored by SPECIFY.
+>
+> ### Inputs
+>
+> You are given:
+>
+> - The path to the cited paper's PDF (cached via `astra paper add`).
+> - A list of placeholder claims to resolve, each carrying:
+>   - `id:` — the placeholder's unique id within `astra.yaml`.
+>   - `claim:` — what the cited paper supports about a decision in the target paper (the target paper's framing, written by SPECIFY).
+>   - `decision_links:` — which decision option(s) in `astra.yaml` this placeholder backs (for context — helps you find the right passage).
+> - The path to the target paper (`work/reference/source/` or `work/reference/document.md`) for context on how the cited paper is invoked.
+> - `work/notes/architect/paper-index.md` — the decision clusters from ARCHITECT.
 >
 > ### Instructions
 >
-> 1. Read the PDF at the path provided below using the Read tool.
-> 2. Review the **decision clusters** provided below (from `work/notes/architect/paper-index.md`) — these are the *areas* where the target paper makes choices that bear on numerical results. Concrete decision options haven't been authored yet (SPECIFY does that after LITERATURE) — your job is to find evidence about the cluster, and SPECIFY links it to specific options once they exist.
-> 3. Scan the cited paper for findings that support, contradict, or compare approaches within those clusters. Focus on:
->    - Empirical comparisons between approaches that are candidates within a cluster
->    - Performance benchmarks or validation results relevant to the choices the cluster represents
->    - Recommendations or caveats about specific methods / parameters in the cluster's scope
-> 4. For each relevant finding, extract:
->    - A clear claim (1–2 sentences stating what we learned)
->    - An exact quote from the paper (verbatim, 1–3 sentences)
->    - The page number where the quote appears
->    - Prefix and suffix context — REAL surrounding text from the page (~20–100 chars each), used to disambiguate the quote among similar passages. This follows the W3C TextQuoteSelector convention: prefix and suffix are literal substrings of the source page, NOT editorial parentheticals. Wording like "(Section 3.1 of Foo+19)" or "(see Figure 4)" will fail verification because the validator concatenates `prefix + quote + suffix` and matches against actual page text.
-> 5. Cache the paper so spec-level verification can find it (see below).
-> 6. Write the extracted insights as YAML to the specified output file.
+> 1. Read the cited PDF using the Read tool.
+> 2. For each placeholder claim, locate verbatim passage(s) in the cited paper that support it. Focus on:
+>    - Empirical comparisons between approaches the placeholder's `decision_links` reference.
+>    - Performance benchmarks or validation results relevant to the choices.
+>    - Recommendations or caveats about specific methods / parameters.
+> 3. For each supporting passage, build a `TextQuoteSelector` (`exact:` + `prefix:` + `suffix:`) and `FragmentSelector` (`page:`).
+> 4. If a placeholder's claim has no supporting evidence in the paper (the citation was loose or the claim was paraphrased beyond what the paper actually says), record it under `unresolved:` with a brief note rather than fabricating evidence. The self-review pass surfaces these to the user via `<paper-slug>/open-questions.md`.
+> 5. Write the per-placeholder resolutions to the specified output file.
 >
 > ### Caching the source PDF
 >
-> Before extraction completes, register each paper with the validator's PDF cache so downstream evidence verification can find it:
+> Before resolution, register the paper with the validator's PDF cache:
 >
 > ```bash
 > astra paper add "<DOI>"
@@ -52,30 +69,28 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 >
 > ### Quote fidelity rules
 >
-> Quotes are NOT verified during this per-paper extraction phase — verification is spec-level (`astra validate astra.yaml --verify-evidence`) and runs once SPECIFY has authored `astra.yaml` referencing each paper. Your job here is to extract quotes that will pass that verification cleanly. The checks are:
+> Quotes are verified at the spec level (`astra validate astra.yaml --verify-evidence`). Your job here is to extract quotes that pass that verification cleanly. The checks are:
 >
 > - Each `exact` quote must be present on the cited page, fuzzy-matched at RapidFuzz `partial_ratio` ≥ 70. Copy verbatim from the PDF; do not paraphrase, normalize whitespace, or strip mathematical typesetting.
-> - The validator concatenates `prefix + quote + suffix` and matches that against the page text at a context score ≥ 80. Choose prefix/suffix as REAL surrounding page text (W3C TextQuoteSelector convention), not editorial commentary. Wording like "(Section 3.1 of Foo+19)" or "(see Figure 4)" silently lowers the context score below threshold even when the quote itself is in the PDF.
+> - The validator concatenates `prefix + quote + suffix` and matches that against the page text at a context score ≥ 80. Choose `prefix` / `suffix` as REAL surrounding page text (W3C TextQuoteSelector convention), not editorial commentary. Wording like "(Section 3.1 of Foo+19)" or "(see Figure 4)" silently lowers the context score below threshold even when the quote itself is in the PDF.
 > - Avoid YAML `|` block-literal style for `exact`, `prefix`, and `suffix` values: embedded newlines from block-literal folding can mishandle the context-score concatenation. Single-line strings or `>` folded-block style are safer.
-> - Math-formula quotes (with superscripts, subscripts, inline footnote markers) are likely to fail because the PDF text extractor collapses these. Quote the surrounding English narrative instead, or skip that piece of evidence if a sibling quote already establishes the finding.
+> - Math-formula quotes (with superscripts, subscripts, inline footnote markers) are likely to fail because the PDF text extractor collapses these. Quote the surrounding English narrative instead, or skip that piece of evidence if a sibling quote already establishes the claim.
 >
 > The verification cache is keyed by `(doi, version, sha256(quote_text))` plus `pdf_sha256`, so any edit to a quote in the eventual YAML automatically invalidates that entry — there is no need to delete the cache between runs.
 >
-> ### Quote granularity and finding attribution
+> ### Quote granularity rules
 >
-> - **Quotes carry the claim on their own.** A four-word fragment ("two widely used fitting codes", "the actual quantity being fit") satisfies fuzzy-match but fails the reader: lift the quote out of context and the claim it supports must still stand. The validator is happy with any string that fuzzy-matches; a downstream agent or human reader following the evidence pointer needs to learn what the paper actually said. Default to full sentences with TeX-anchored prefix/suffix; split a long passage into two evidence rows rather than truncate a quote into a fragment that depends on context. Fragments creep in at exactly the spots where inline math forces shrinking, which is also where claims hide.
-> - **Cross-section methodology gets separate insights.** When a paper's relevant methodology is split across multiple sections — a methods chapter defining a tool, a results chapter setting a threshold, an application chapter running it — file one insight per piece, each citing the section where that piece is *defined*. Do not collapse all the borrowed pieces into the application section's number. The application section gets all the credit and the methodology section disappears, which is a real fidelity-sweep failure mode.
+> - **Quotes carry the claim on their own.** A four-word fragment satisfies fuzzy-match but fails the reader: lift the quote out of context and the claim it supports must still stand. Default to full sentences with TeX-anchored prefix/suffix; split a long passage into two evidence rows rather than truncate a quote into a fragment that depends on context. Fragments creep in at exactly the spots where inline math forces shrinking, which is also where claims hide.
+> - **Cross-section methodology gets separate evidence rows.** When a paper's relevant methodology is split across multiple sections — a methods chapter defining a tool, a results chapter setting a threshold, an application chapter running it — file one evidence row per piece, each citing the section where that piece is *defined*. Do not collapse all the borrowed pieces into the application section's number.
 >
 > ### Output format
 >
 > Write ONLY this YAML structure to the output file. No other text.
 >
 > ```yaml
-> insights:
+> resolutions:
 >   <insight_id>:
 >     id: <insight_id>
->     claim: "<What we learned from this finding>"
->     created_at: "<ISO 8601 timestamp>"
 >     evidence:
 >       - id: ev1
 >         doi: "<DOI>"
@@ -87,79 +102,113 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 >         location:
 >           type: FragmentSelector
 >           page: <page number>
->     scope: "<when this applies -- optional>"
 >
-> decision_links:
->   <decision_cluster_or_id>:
->     <provisional_option_label>:
->       - <insight_id>
+> unresolved:
+>   <insight_id>:
+>     reason: "<one-line: why no supporting evidence was found>"
 > ```
 >
 > ### Rules
 >
-> - Use `lowercase_with_underscores` for insight IDs.
+> - The keys under `resolutions:` and `unresolved:` are the placeholder `id:` values from `astra.yaml`'s `prior_insights:` — preserve them exactly. The merge step uses these as the join key.
+> - One placeholder lands in either `resolutions:` or `unresolved:`, never both. If two passages support the same claim, list both as siblings under one placeholder's `evidence:`.
 > - Quotes must be EXACT — copy verbatim from the PDF, no paraphrasing or whitespace normalization.
 > - Prefix and suffix must be real surrounding page text, not editorial parentheticals.
-> - One claim per insight — do not combine multiple findings.
-> - Only extract insights relevant to the target decision clusters listed below.
-> - `decision_links` keys reference clusters by the names ARCHITECT used in `work/notes/architect/paper-index.md`. SPECIFY rewires these to concrete `decision_id:option_id` keys when it authors the actual decisions.
-> - If no relevant insights found, write `insights: {}` and `decision_links: {}`.
-> - prefix and suffix are REQUIRED for every TextQuoteSelector.
+> - `prefix:` and `suffix:` are REQUIRED for every `TextQuoteSelector`.
+> - Do NOT edit `astra.yaml`. The merge step does that.
+
+## Merge step
+
+After all per-paper sub-agents complete, the orchestrator (or a single merge sub-agent) reads each `work/notes/literature/<doi-slug>.yaml` and writes the resolutions back into `astra.yaml`:
+
+- For each entry in `resolutions:`, locate `prior_insights[<insight_id>]` in `astra.yaml` (sub-analysis ownership is implicit in the id; the placeholder already lives there) and set its `evidence:` field to the resolved selectors.
+- For each entry in `unresolved:`, append a line to `<paper-slug>/open-questions.md` describing the unresolved placeholder and the reason — the user resolves at REVIEW (close-out) by either supplying a different citation, weakening the placeholder's `claim:`, or removing the placeholder entirely.
+- Re-run `astra validate astra.yaml` after each per-paper merge to catch any structural breakage early.
 
-## Synthesis sub-agent — system prompt
+A single writer (the merge step) avoids YAML round-trip conflicts that parallel writes would produce.
 
-> You are a literature synthesis agent. Read all per-paper extraction YAML files in `work/notes/literature/` and merge them into a single `work/notes/literature.yaml` that consolidates insights from all cited papers.
+## Rigor-dialed self-review
+
+After the merge lands, a fresh-context sub-agent cross-checks each resolved `prior_insights:` entry against its cited paper:
+
+- Does the `evidence:` quote belong to the cited paper at the cited page? (`astra validate --verify-evidence` does the deterministic check; the sub-agent does the semantic check.)
+- Does the quote actually justify the placeholder's `claim:`? Or is the quote technically present but tangential?
+- Does the placeholder's `claim:` actually support the decision option it's linked to via `decision_links:`?
+
+The depth of self-review is set by the constitution's frugality / rigor dial:
+
+- **Frugal:** skip review entirely, or run a single fresh-context sub-agent pass and incorporate its fixes once.
+- **Rigor:** N rounds — each round runs a fresh reviewer against the resolved `prior_insights:` + the cited papers + the target paper; LITERATURE incorporates fixes (re-spawn the per-paper sub-agent for entries that need a different quote, or adjust unresolved entries); the next round runs another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes (the strong-termination criterion the loop already uses), or a 5-round system cap.
+
+The discipline matches ARCHITECT's and SPECIFY's self-review shape: each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; a separate fix pass (the orchestrator inline for trivial fixes, or another LITERATURE iteration for substantive changes) edits `astra.yaml`.
+
+### Per-round fresh sub-agent — system prompt
+
+> You are a LITERATURE reviewer. Read `astra.yaml`'s `prior_insights:` entries, the cited papers (cached via `astra paper add`), and the target paper, and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
 >
-> ### Task
+> ### Inputs
 >
-> 1. Read all per-paper YAML files in `work/notes/literature/`.
-> 2. Merge insights, de-duplicating where multiple papers support the same claim.
-> 3. Merge decision links across all papers.
-> 4. Write the consolidated output to `work/notes/literature.yaml`.
+> - `astra.yaml` — focus on every `analyses.<sub-analysis-id>.prior_insights:` entry. Each should have a resolved `evidence:` block.
+> - The cited papers (cached PDFs).
+> - `work/notes/cited_papers.yaml` — DOI lookups.
+> - `<paper-slug>/open-questions.md` — to see which placeholders the resolution sub-agents flagged unresolved.
+> - `work/reference/source/` (or `document.md`) — the target paper, for context on how the cited paper is invoked.
 >
-> ### Output format
+> ### What to check
 >
-> ```yaml
-> prior_insights:
->   <insight_id>:
->     id: <insight_id>
->     claim: "<What the literature says>"
->     evidence:
->       - id: e1
->         doi: "<DOI of source paper>"
->         quote:
->           type: TextQuoteSelector
->           exact: "<Exact quote from paper>"
->           prefix: "<~20-100 chars before>"
->           suffix: "<~20-100 chars after>"
->         location:
->           type: FragmentSelector
->           page: <page number>
->     scope: "<When this applies -- optional>"
+> 1. **Evidence integrity.** `astra validate astra.yaml --verify-evidence` returns clean. (Do not run it yourself — your job is the semantic check beyond what `--verify-evidence` does.)
+> 2. **Evidence justifies claim.** For each `prior_insights:` entry, does the quote actually support the `claim:`? Or is it tangential / weaker than the claim asserts?
+> 3. **Claim supports the decision.** For each placeholder's `decision_links:`, does the placeholder's claim actually justify the linked decision option(s)? Or is the link a leap?
+> 4. **Cited paper is the right paper.** Does the target paper actually invoke this DOI for this claim? (Sometimes a citation marker is misread; the wrong paper gets cached.)
+> 5. **Unresolved entries are honest.** For entries in `<paper-slug>/open-questions.md` flagged unresolved, does a closer read of the cited paper actually find supporting evidence? (If yes, the resolution sub-agent missed it; flag for re-resolution.)
+>
+> ### Output
 >
-> decision_links:
->   <decision_id>:
->     <option_id>: [insight_id1, insight_id2]
+> Write your findings to `work/notes/literature-review/round-<N>.md`:
+>
+> ```markdown
+> # LITERATURE review — round <N>
+>
+> ## verdict: clean | <count> fixes
+>
+> ## findings (one per fix needed)
+>
+> ### F-1 — <one-line summary>
+>
+> - placeholder: `prior_insights.<id>` (sub-analysis: `<sub-analysis-id>`)
+> - issue: <evidence integrity | evidence-claim mismatch | claim-decision mismatch | wrong paper | unresolved-but-resolvable>
+> - paper: `<DOI>` (page <N>)
+> - what's wrong: <2–3 sentences>
+> - suggested fix: <re-resolve with a different quote | adjust the claim | re-link decision | flag for human review>
 > ```
 >
 > ### Rules
 >
-> - Preserve all verified evidence exactly as-is (do not rewrite quotes).
-> - When two papers support the same claim, merge their evidence lists under a single insight entry.
-> - When papers support different but related claims, keep them as separate insights.
-> - `decision_links` should map decision IDs to option IDs to lists of insight IDs. Merge across all papers so each decision collects all relevant insights.
-> - Use consistent insight IDs (`lowercase_with_underscores`).
-> - Drop any insights that had zero verified quotes.
-> - If no papers produced insights, write `prior_insights: {}` and `decision_links: {}`.
+> - **Output findings only — do not edit `astra.yaml`.** A separate fix pass responds to your findings. Editing here defeats the multi-round-fresh-context discipline.
+> - **Verdict is `clean` or a count.** "clean" means no fixes; otherwise enumerate.
+> - **One fix per `F-N`.** Do not bundle.
+> - **Cite specifically.** Always reference the placeholder by id, the cited paper by DOI + page, and the target paper's invocation site by section / page.
+
+### LITERATURE-fix pass between rounds
+
+After each round's findings file lands, a LITERATURE-fix pass (or the orchestrator inline for trivial mechanical fixes) responds to the findings — re-resolving placeholders with different quotes, adjusting claims, re-linking decisions, or surfacing unresolvable entries to `<paper-slug>/open-questions.md`. After any change to `astra.yaml`, re-run `astra validate astra.yaml --verify-evidence` to confirm the structural and quote-fidelity checks still pass.
+
+If N hits the system cap of 5 rounds without two consecutive clean rounds, surface to the user via `AskUserQuestion`: "LITERATURE review reached round cap with N fixes still landing; continue, accept the current resolutions, or revise the constitution?" Default on user silence: accept current state, log the unfinished tail in `<paper-slug>/open-questions.md`, and proceed to IMPLEMENT.
 
 ## Survey signals (entry into LITERATURE)
 
-- `work/notes/cited_papers.yaml` exists ⇒ ready to extract
-- `work/notes/literature/` directory has one YAML per paper in `cited_papers.yaml` ⇒ extraction done
-- `work/notes/literature.yaml` exists ⇒ synthesis done; LITERATURE complete
+- `astra.yaml` has `prior_insights:` placeholders — entries with `claim:` + `doi:` but no `evidence:` ⇒ ready to resolve
+- `work/notes/literature/<doi-slug>.yaml` files exist (one per cited DOI) ⇒ per-paper resolution done
+- `astra.yaml`'s `prior_insights:` entries each have a resolved `evidence:` selector ⇒ merge done
+- `astra validate astra.yaml --verify-evidence` returns clean ⇒ structural validation done
+- For frugal: at least a `work/notes/literature-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ LITERATURE review done
+- For rigor: two consecutive `round-<N>.md` files with verdict `clean` ⇒ LITERATURE review done
+
+When all of the above hold ⇒ LITERATURE complete; proceed to IMPLEMENT.
 
 ## Notes
 
-- **Run per-paper extractions in parallel.** One sub-agent per entry in `cited_papers.yaml`. They are fully independent.
-- **Synthesis is a single sub-agent.** It reads everything in `work/notes/literature/` and writes one merged `literature.yaml`.
-- **Resume is automatic.** If `work/notes/literature/<doi-slug>.yaml` already exists, skip the per-paper extraction for that paper. The synthesis re-runs whenever new per-paper files appear.
+- **Run per-paper resolutions in parallel.** One sub-agent per cited DOI; they edit disjoint subsets of `prior_insights:` so write conflicts don't arise — but the merge step still serializes the writes back to `astra.yaml` to keep YAML round-trip safe.
+- **Resume is automatic.** If `work/notes/literature/<doi-slug>.yaml` already exists, skip the per-paper resolution for that DOI. The merge re-runs whenever new per-paper files appear.
+- **Unresolved is not failure.** A placeholder that no quote in the cited paper supports is a real signal — the target paper cited loosely, or paraphrased beyond what the source actually says. Surface to `<paper-slug>/open-questions.md`; don't fabricate evidence to make it green.
+- **`astra validate --verify-evidence` runs after the merge, not after each per-paper sub-agent.** Sub-agents write to per-paper YAMLs; the deterministic check happens once `astra.yaml` is updated.
diff --git a/claude/lightcone/skills/paper2astra/references/review.md b/claude/lightcone/skills/paper2astra/references/review.md
index 4f7f8b20..bd709be1 100644
--- a/claude/lightcone/skills/paper2astra/references/review.md
+++ b/claude/lightcone/skills/paper2astra/references/review.md
@@ -8,7 +8,7 @@ The constitution's per-phase mode is **always interactive** for this phase. It d
 
 ## Inputs
 
-- `astra.yaml` — final spec (validates with `--verify-evidence` if literature.yaml exists)
+- `astra.yaml` — final spec (validates with `--verify-evidence` once LITERATURE has resolved every `prior_insights:` placeholder's `evidence:` selector)
 - `comparison-report.yaml`, `comparison-report.md` — final verdict
 - `targets/targets.md` — what was being matched against; reference figures / tables in `targets/`
 - `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
diff --git a/claude/lightcone/skills/paper2astra/references/specify.md b/claude/lightcone/skills/paper2astra/references/specify.md
index ebe6b25c..392fe03d 100644
--- a/claude/lightcone/skills/paper2astra/references/specify.md
+++ b/claude/lightcone/skills/paper2astra/references/specify.md
@@ -13,7 +13,7 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 - `astra.yaml` — the stub from ARCHITECT (sub-analyses, inputs, outputs, narrative; empty `decisions:` / `prior_insights:` / `findings:` blocks)
 - `work/notes/architect/paper-index.md` — paper-side decision clusters, result loci, citations
 - `work/notes/architect/code-index.md` (when code present) — module map, natural decomposition, entry-points, gotchas
-- `work/notes/literature.yaml` (if present) — prior insights with evidence quotes and decision links (from LITERATURE)
+- `work/notes/cited_papers.yaml` — citation marker → DOI mapping (from ARCHITECT); SPECIFY uses it to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
 - `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (Path B only)
 - `work/reference/code/` (if present) — original code, canonical reference for numerics + method
@@ -22,7 +22,7 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 
 ## Outputs
 
-- `astra.yaml` — **filled form**: each sub-analysis's `decisions:`, `prior_insights:`, `findings:` populated with `evidence:` selectors; `narrative:` keys updated to weave `astra-anchor:` references into prose as entries land; validates with `astra validate astra.yaml --verify-evidence` when literature.yaml is present
+- `astra.yaml` — **filled form**: each sub-analysis's `decisions:` and `findings:` populated with paper-anchored `evidence:` selectors; `prior_insights:` populated as citation-only **placeholders** (id, claim, decision_links, `doi:` lookup from `cited_papers.yaml` — but no `evidence:` selector yet, LITERATURE fills those next); `narrative:` keys updated to weave `astra-anchor:` references into prose as entries land. `astra validate astra.yaml` returns clean; `astra validate astra.yaml --verify-evidence` runs after LITERATURE has resolved the placeholders.
 - `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
 - `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
 - `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
@@ -51,13 +51,26 @@ Read the paper's section(s) covering this sub-analysis. Author:
 
    Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. A typical sub-analysis has 2–6 decisions; if a sub-analysis has fewer than 2, revisit `work/notes/architect/paper-index.md` and reconsider.
 
-2. **`prior_insights:`** — incorporate insights from `work/notes/literature.yaml` (when present) that bear on this sub-analysis's decisions. Use the `decision_links` mapping to attach each insight to the relevant decision options, so the multiverse captures evidence-backed alternative choices from the literature.
+2. **`prior_insights:`** — for every citation marker the paper invokes that bears on a decision in this sub-analysis (`[12]`, `Smith+24`, `(Doe & Lee 2023)`), record a **placeholder**: an `id:`, a `claim:` describing what the cited paper supports about the decision (the target paper's framing of why it cites that paper here), a `doi:` looked up from `work/notes/cited_papers.yaml`, and `decision_links:` mapping the placeholder to the relevant decision option(s). **Do not author the `evidence:` selector** — that's LITERATURE's job. Leave `evidence:` absent or empty; LITERATURE fetches the cited paper, finds the supporting quote, and authors the resolved selector back into this placeholder. The placeholder shape:
+
+   ```yaml
+   prior_insights:
+     <insight_id>:
+       id: <insight_id>
+       claim: "<what the cited paper supports about the decision>"
+       doi: "<DOI from cited_papers.yaml>"
+       # evidence: omitted — LITERATURE fills this in
+       decision_links:
+         <decision_id>: [<option_id>, ...]
+   ```
+
+   Don't pre-emptively fetch the cited paper or guess its content; LITERATURE does that with fresh context per paper.
 
 3. **`findings:`** — paper-level claims and quantitative results scoped to this sub-analysis, each with source-anchored `evidence:` (verbatim quote against the paper). Pull the verbatim claims for each output's expected value from the paper text + the result loci in `paper-index.md`.
 
 4. **Weave `astra-anchor:` references into the existing narrative.** ARCHITECT wrote `narrative:` prose without anchors because the entries didn't exist. Now they do — extend the narrative to point at the new `decisions:` / `prior_insights:` / `findings:` entries via the tree-path anchor grammar. Use `/narrative` for this pass; it carries the discipline.
 
-5. **Verify evidence quotes against the paper source by Grep** — `astra validate --verify-evidence` currently verifies `prior_insights` evidence; artifact-anchored `findings` evidence still needs a manual quote check before the code pass.
+5. **Verify evidence quotes against the paper source by Grep** — `astra validate --verify-evidence` will verify `prior_insights` evidence after LITERATURE resolves the placeholders; for now, manually Grep the paper source to confirm each `decisions:` and `findings:` `evidence:` quote is verbatim. Artifact-anchored `findings` evidence still needs a manual quote check before the code pass.
 
 ### Pass B — code pass (when `work/reference/code/` exists)
 
@@ -101,13 +114,13 @@ Self-review depth follows the constitution's frugality / rigor dial — same sha
 > - `work/notes/architect/code-index.md` (when code present)
 > - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
 > - `work/reference/code/` (when present) — canonical reference for numerics + method
-> - `work/notes/literature.yaml` (if present) — for evidence verification
+> - `work/notes/cited_papers.yaml` — citation marker → DOI mapping (use to confirm each `prior_insights:` placeholder's `doi:` matches what the paper cites)
 >
 > ### What to check
 >
 > 1. **Decision coverage.** Does this sub-analysis's `decisions:` block cover every choice in the paper-side index's decision clusters? Cosmetic / pure-tooling choices should NOT be decisions; anything material that's missing should be added.
 > 2. **Decision options.** Each decision has the chosen option plus any sibling alternatives the paper discusses or the code reveals. The chosen option's `rationale:` is grounded in the paper's stated reasoning (or the code's, where canonical-resolution applied).
-> 3. **Evidence verification.** Every `evidence:` block uses `TextQuoteSelector` with a verbatim `exact:` quote, real surrounding-text `prefix:` / `suffix:`, and a real page or section anchor. Quotes that are paraphrased or whose prefix / suffix are editorial parentheticals will fail `--verify-evidence`. Run `astra validate astra.yaml --verify-evidence` when literature.yaml is present.
+> 3. **Evidence verification.** Every `evidence:` block uses `TextQuoteSelector` with a verbatim `exact:` quote, real surrounding-text `prefix:` / `suffix:`, and a real page or section anchor. Quotes that are paraphrased or whose prefix / suffix are editorial parentheticals will fail `--verify-evidence`. Note `prior_insights:` placeholders intentionally have no `evidence:` block at this stage — LITERATURE authors them — so do not flag missing `evidence:` on placeholder entries. After LITERATURE resolves the placeholders, run `astra validate astra.yaml --verify-evidence`.
 > 4. **Findings traceability.** Each `findings:` entry's `evidence:` resolves to a real paper claim (verbatim quote + source anchor) or a real code location (`path:line`).
 > 5. **Material-disagreement surfacing.** Where paper and code disagree on a material choice, the spec records both options under the relevant `decisions:` entry. `universes/baseline.yaml` selects the code's option (canonical-resolution default), unless an interactive seam recorded a different user choice. Flag any material disagreement that got silently dropped or where the spec picked the paper without an explicit user override.
 > 6. **Narrative anchors.** The sub-analysis's `narrative:` weaves `astra-anchor:` references to the new `decisions:` / `prior_insights:` / `findings:` entries — the tree-path grammar must be valid, and entries actually exist at the referenced paths.
@@ -151,7 +164,7 @@ After each round's findings file lands, a SPECIFY-fix pass (or the orchestrator
 
 ```bash
 astra validate astra.yaml
-astra validate astra.yaml --verify-evidence  # when literature.yaml exists
+astra validate astra.yaml --verify-evidence  # after LITERATURE has resolved the prior_insights placeholders
 ```
 
 #### Termination
@@ -192,11 +205,11 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 ## Survey signals (entry into SPECIFY)
 
 - `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative; empty decisions / prior_insights / findings) ⇒ ready to specify
-- For each sub-analysis: `decisions:` / `findings:` populated AND, if literature.yaml exists, `prior_insights:` populated ⇒ paper pass done
+- For each sub-analysis: `decisions:` and `findings:` populated with paper-anchored `evidence:` selectors AND `prior_insights:` populated as citation-only placeholders (id, claim, doi, decision_links — no `evidence:` selector yet, LITERATURE fills those next) ⇒ paper pass done
 - For each sub-analysis: when `work/reference/code/` exists, code-pass material-disagreement entries land in `decisions:` (with both options) and `universes/baseline.yaml` selects the canonical-resolution choice; `implementation-notes.md` carries non-material gotchas ⇒ code pass done
 - For frugal: each sub-analysis has at least a `work/notes/specify-review/<sub>-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ SPECIFY review done
 - For rigor: each sub-analysis has two consecutive `<sub>-round-<N>.md` files with verdict `clean` ⇒ SPECIFY review done
-- `astra validate astra.yaml --verify-evidence` returns clean (when literature.yaml exists) ⇒ evidence side validated
+- `astra validate astra.yaml` returns clean (placeholders without `evidence:` are valid at this stage) ⇒ structural side validated; `--verify-evidence` waits until LITERATURE has authored the resolved `evidence:` selectors
 - `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-ledger done
 - `implementation-notes.md` exists ⇒ practical-guidance side done
 - All of the above ⇒ SPECIFY complete; proceed to IMPLEMENT

From 6bacfb648ff8935f151aa1dcdfb5479cf47394c4 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Fri, 8 May 2026 19:21:13 +0200
Subject: [PATCH 017/124] paper-extraction skill: structural extraction from
 arXiv source / PDF fallback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Introduces /paper-extraction as a standalone skill that turns an arXiv ID or
DOI into a standardized work/reference/ directory: one entry-point that
produces a structural index and a stub astra.yaml in one pass. Subsumes the
arXiv-LaTeX-fetch path that lived inside managing-bibliography (which is
retired in a follow-up commit) and serves as paper2astra's ACQUIRE
substrate.

Two surfaces per paper:

- index.json: figures (with copied files + line numbers + multi-graphic
  panels), tables (one .tex per block, including AAS deluxetable), section
  outline (in paper-reading order, with line numbers), citation keys (with
  every file+line they appear on; natbib + biblatex variants), abstract,
  title, paths.
- astra.yaml: stub ASTRA artifact (id derived from arxiv-id/DOI, version,
  name from \title{}, narrative.summary from abstract, empty findings:).
  Validates with `astra validate` as-is. Optional Step 5: agent walks the
  paper for findings and fills in Insight + Evidence with verbatim
  quote.exact.

Robustness handled by extract-paper-substrate.py:
- LaTeX comments stripped before regex passes (newlines preserved for line
  numbers); prevents commented-out figures/tables/sections/citations from
  leaking.
- Multi-file source read in paper-reading order via main.tex's
  \input{} / \include{} chain (not alphabetical filename order).
- Simple no-arg \newcommand macros expanded in title/abstract/captions/
  section-titles. Args-form passes through.
- Standard table envs (table, table*, deluxetable, deluxetable*) and
  citation commands (natbib + biblatex autocite/textcite/parencite/
  footcite/smartcite) recognized.
- Multi-\includegraphics figures (subfloats, multi-panel) capture all
  files into figures[].files.
- AASTeX and PGF figure references handled.

Tested end-to-end on UNIONS arXiv:2604.03227 (8 figs / 1 tab / 14 sections /
52 citations / 0 warnings) and DESI DR1 BAO arXiv:2404.03000 (multi-tex,
15 figs / 22 tabs / 40 sections / 161 citations / 0 warnings). Both produce
valid astra.yaml that passes `astra validate`. UNIONS Step 5 worked example
included under examples/.

Scope is structural extraction only — ADS BibTeX management is intentionally
not folded in. PDF acquisition uses direct curl from arxiv.org/doi.org so
this skill has no implicit dependency on `astra paper add`. Anyone wanting
the paper in the ASTRA cache for `astra validate --verify-evidence` registers
it themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../skills/paper-extraction/SKILL.md          | 236 ++++++
 .../examples/unions-bmodes-astra.yaml         | 106 +++
 .../references/arxiv-source.md                |  47 ++
 .../references/pdf-fallback.md                |  66 ++
 .../scripts/extract-paper-substrate.py        | 699 ++++++++++++++++++
 5 files changed, 1154 insertions(+)
 create mode 100644 claude/lightcone/skills/paper-extraction/SKILL.md
 create mode 100644 claude/lightcone/skills/paper-extraction/examples/unions-bmodes-astra.yaml
 create mode 100644 claude/lightcone/skills/paper-extraction/references/arxiv-source.md
 create mode 100644 claude/lightcone/skills/paper-extraction/references/pdf-fallback.md
 create mode 100755 claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py

diff --git a/claude/lightcone/skills/paper-extraction/SKILL.md b/claude/lightcone/skills/paper-extraction/SKILL.md
new file mode 100644
index 00000000..034cd279
--- /dev/null
+++ b/claude/lightcone/skills/paper-extraction/SKILL.md
@@ -0,0 +1,236 @@
+---
+name: paper-extraction
+description: >
+  Turn an arXiv ID or DOI into a standardized `work/reference/` directory:
+  paper substrate (arXiv LaTeX source primary, PDF + Docling fallback),
+  copied figure files, per-table `.tex` files, section outline with line
+  numbers, deduplicated citation keys with every location they appear,
+  abstract, embedded bibliography (when present in source), and a valid
+  `astra.yaml` representing the paper as an ASTRA artifact (with the
+  paper's claimed numerical findings as ASTRA `findings:`). Emits a
+  top-level `index.json` for the structural surface plus the `astra.yaml`
+  for the semantic surface. Triggers on: "read paper", "prep paper",
+  "ingest paper", "extract paper", "set up paper", "fetch arxiv", "arxiv
+  id", "DOI", "find paper", or `/paper-extraction <id>`.
+allowed-tools: Read, Write, Edit, Bash, Grep, Glob, WebFetch, WebSearch
+---
+
+# paper-extraction
+
+Turn a DOI or arXiv ID into a standardized, indexed `work/reference/` directory. One entry-point, idempotent, self-contained.
+
+The output is a predictable surface anyone can rely on without re-parsing LaTeX. What a consumer does with that surface is their concern — paper-extraction's job ends at the index.
+
+## When to use
+
+- "Read [paper] end-to-end" / "I want to verify a claim in [paper]" — full source plus structured artifacts so you're reading the actual paper, not a flattened PDF
+- "Set up reading materials for [paper]" — when the next thing you'll do involves browsing figures, citations, or section structure and you don't want to grep the tarball every time
+- Any workflow where another skill or process needs a known directory shape per paper
+
+## Outputs
+
+Under `work/reference/` (idempotent — skips work already done):
+
+```
+work/reference/
+├── index.json                # structural index — figures, tables, outline, citations, paths
+├── astra.yaml                # ASTRA-shape representation: the paper as an ASTRA artifact, including findings
+├── paper.pdf                 # always
+├── paper.tex                 # Path A — symlink to the main .tex file
+│   (or)
+├── document.md               # Path B — Docling-extracted markdown
+├── source/                   # Path A — extracted arXiv tarball (full source tree)
+├── figures/                  # figure files (copied from LaTeX or rendered by Docling)
+├── tables/                   # one .tex file per `\begin{table}` block (Path A)
+├── bibliography-source.bib   # Path A only — copy of any .bib found in source/
+└── bibliography-source.bbl   # Path A only — copy of any .bbl found in source/
+```
+
+The skill produces only the paper's own reading materials. Anything not contained in or derived from the paper itself — code repositories, supplementary datasets, related papers — is out of scope; the caller handles those.
+
+### Two surfaces: `index.json` (structural) and `astra.yaml` (semantic)
+
+**`index.json` is structural and machine-friendly.** Everything the script could mechanically extract: figures, tables, section outline with line numbers, citation keys with every location, abstract, paths. Read this when you want to know "what's in this paper, where do I find it." Sample shape:
+
+```json
+{
+  "path": "A",                                  // or "B"
+  "paper_pdf": "paper.pdf",
+  "paper_tex": "paper.tex",                     // null on Path B
+  "source_dir": "source",                       // null on Path B
+  "document_md": null,                          // "document.md" on Path B
+  "bibliography_source_bib": "bibliography-source.bib",
+  "bibliography_source_bbl": null,
+  "astra_yaml": "astra.yaml",
+  "title": "UNIONS-3500 Weak Lensing: B-mode validation",
+  "abstract": "At Stage-III sensitivities, cosmic shear B modes ...",
+  "figures": [
+    {"id": "fig1", "label": "fig:bao", "caption": "...", "source_path": "fig_bao",
+     "file": "figures/fig_bao.pdf", "block_origin": "main.tex", "line": 412}
+  ],
+  "tables": [
+    {"id": "tab1", "label": "tab:cosmo", "caption": "...", "file": "tables/tab-cosmo.tex",
+     "block_origin": "main.tex", "line": 487}
+  ],
+  "outline": [
+    {"level": 1, "title": "Introduction", "label": "sec:intro", "source_file": "main.tex", "line": 157}
+  ],
+  "citations": {
+    "asgari17": [{"file": "main.tex", "line": 178}, {"file": "main.tex", "line": 561}],
+    "smith2024": [{"file": "main.tex", "line": 92}]
+  },
+  "extraction_warnings": [
+    "figure fig3: \\includegraphics{...} could not resolve to a file in source/"
+  ]
+}
+```
+
+**`astra.yaml` is semantic and ASTRA-validating.** Treats the paper as an ASTRA artifact: `id`, `version`, `name`, `narrative.summary`, and `findings:` carrying the paper's claimed numerical results in ASTRA's Insight + Evidence shape. Read this when you want to know "what does this paper claim, with quote evidence anchored to the source." The script writes a stub (id, version, name, narrative.summary from abstract, empty findings); Step 5 fills in `findings:`.
+
+Why both: the structural index is queryable by any consumer (`grep`, `jq`, agent code) without needing to know about ASTRA. The ASTRA file composes directly into reproductions, MySTRA, and any other ASTRA-aware tool — and the verbosity of the Insight + Evidence shape *is* the back-pressure against hallucinated numerical claims (the agent has to find and quote the actual text).
+
+## Workflow
+
+### Step 1 — Survey
+
+Always start with `ls work/reference/` and read `index.json` if present. Skip the work that's already done:
+
+| File present | Step to skip |
+|---|---|
+| `source/` (Path A) or `document.md` (Path B) + `paper.pdf` | Substrate acquired (Step 2) |
+| `index.json` with non-empty figures/tables/outline | Structural extraction done (Step 3) |
+| `astra.yaml` exists | Stub written; never overwritten on re-run (preserves agent edits) |
+| `astra.yaml` has non-empty `findings:` and `narrative.findings:` populated | Findings step done (Step 5, optional) |
+
+If nothing is present, run the full workflow.
+
+### Step 2 — Acquire substrate
+
+Pick the path on entry from the input form:
+
+- **arXiv ID** (e.g. `2503.19441`) → **Path A** (LaTeX source primary)
+- **DOI** for an arXiv paper (e.g. `10.48550/arXiv.2503.19441`) → Path A (resolve to arXiv ID first)
+- **Journal DOI** without arXiv preprint → **Path B** (PDF + Docling fallback)
+
+Read [`references/arxiv-source.md`](references/arxiv-source.md) for Path A; [`references/pdf-fallback.md`](references/pdf-fallback.md) for Path B. Both end with `work/reference/paper.pdf` and a structured-text representation under `work/reference/`.
+
+### Step 3 — Run the extraction script
+
+`scripts/extract-paper-substrate.py` does the deterministic structural pass and writes the `astra.yaml` stub:
+
+```bash
+python3 .claude/skills/paper-extraction/scripts/extract-paper-substrate.py \
+  --arxiv-id <arxiv-id>   # or --doi <doi>
+```
+
+The script detects the path automatically and produces:
+
+- `figures/` populated with copied figure files (Path A) or untouched (Path B — Docling already populated it)
+- `tables/<label-slug>.tex` — one file per `\begin{table}` block (Path A only)
+- `bibliography-source.{bib,bbl}` if present in the source tarball (Path A only)
+- `index.json` — the unified structural index
+- `astra.yaml` — stub ASTRA representation: id, version, name (from `\title{}`), narrative.summary (from abstract), empty `findings: {}` for Step 5
+
+The `--arxiv-id` / `--doi` argument populates the `id` and the evidence `doi:` field in `astra.yaml`. If neither is provided, the script writes placeholder text the agent can fix.
+
+### Step 4 — Review the script's output and fix structural gaps
+
+The script is purely deterministic. It walks the structural surface but does not understand the paper. Read `index.json`'s `extraction_warnings` and address each:
+
+- **`figure figN: \includegraphics{X} could not resolve`** — the LaTeX referenced a file the script couldn't find. Search the source tree manually (sometimes figures live in non-standard subdirectories with non-standard extensions); copy the file into `figures/` and update the corresponding `index.json` entry's `file` so it's no longer null.
+- **`figure figN: no \caption found`** — composite figures (subfloats) sometimes lack a top-level caption; verify the figure block in source and either record the per-subfigure captions in `caption` or note that the figure is composite.
+- **`table tabN: no \label`** — verify the table is intentional (some `\begin{table}` blocks are non-tabular layout); rename or annotate as needed.
+- **Path B caveat** — outline + citation extraction are not yet implemented for the Docling fallback; the warnings list flags this. For now, on Path B, those fields are empty.
+
+Also eyeball `astra.yaml`'s `name:` and `narrative.summary:`. The title or abstract may contain unresolved custom `\newcommand` macros (defined elsewhere in the source); the script doesn't expand macros, so they pass through verbatim. Clean them up if you need pretty rendering downstream — none of this blocks validation.
+
+### Step 5 — *(Optional)* Walk the paper for findings, append to `astra.yaml`
+
+**Skip this step unless a downstream consumer needs it.** Steps 1–4 produce a complete `work/reference/` plus a valid (empty-findings) `astra.yaml` on their own. Step 5 fills in the paper's claimed numerical findings — useful when the next thing you'll do is reproduce the paper (the findings become reproduction targets) or compare against it (the findings become diff anchors). Skip when you just want to read the paper or have the structural index for browsing.
+
+When you do run Step 5: this is the agent's central interpretive step and the one piece the script can't do.
+
+For each **central numerical claim the paper makes about its results**, append a finding to `astra.yaml`'s `findings:` map. The shape (per ASTRA's [Insight + Evidence](https://w3id.org/ASTRA/insight) classes):
+
+```yaml
+findings:
+  s8_constraint:
+    id: s8_constraint
+    claim: "S_8 = sigma_8 (Omega_m / 0.3)^0.5 = 0.795 ± 0.014 from the fiducial pure E/B analysis"
+    created_at: "2026-04-04T00:00:00Z"
+    evidence:
+      - id: abstract_quote
+        doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "we find $S_8 = 0.795 \\pm 0.014$"
+  bmode_pte_fiducial:
+    id: bmode_pte_fiducial
+    claim: "Minimum B-mode PTE = 0.18 across configuration-space, COSEBI, and harmonic-space statistics at fiducial scale cuts"
+    created_at: "2026-04-04T00:00:00Z"
+    evidence:
+      - id: abstract_pte
+        doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "all three statistics pass the null test (minimum PTE $= \\configPteSixThreeCombined$)"
+```
+
+**What counts as a finding:** a numerical or specific qualitative result the paper claims, of the kind a reproduction would have to match (or document divergence from). Headline results (S_8, PTEs, χ²), structural conclusions ("we detect X at Y σ"), validated null-test outcomes. *Not* methodology choices, *not* dataset descriptions — those live elsewhere.
+
+**Discipline:**
+
+1. **Read the abstract and conclusions first.** The paper's own framing of its results lives there. Most central findings can be quoted from one of those two surfaces.
+2. **Use `quote.exact` literally.** Copy the LaTeX text as it appears in `paper.tex` — don't paraphrase, don't expand macros, don't normalize math. The `exact` is what `astra validate --verify-evidence` will look for in the source PDF; if you paraphrase, evidence verification fails. If the quote is hard to make unique, add `prefix:` and `suffix:` (~20–100 chars before/after) per the W3C TextQuoteSelector spec.
+3. **Anchor to the source.** Every finding's evidence carries a `doi:` (the paper's own DOI, e.g. `10.48550/arXiv.2604.03227`) and `version:` (paper version — `1` for v1, `2` for v2 of an arXiv preprint).
+4. **`created_at`** is the timestamp of the finding's creation in this file (i.e., when the agent wrote it). ISO 8601.
+5. **Add the `narrative.findings:` cross-link.** ASTRA requires that when `findings:` is non-empty, `narrative.findings:` exists and references at least one finding. Shape: `narrative: { findings: "The fiducial analysis yields the [S_8 constraint](#findings.s8_constraint); B-mode null tests pass with [minimum PTE = 0.18](#findings.bmode_pte_fiducial)." }`
+6. **Validate.** Run `astra validate work/reference/astra.yaml`. If it passes, the file is a valid ASTRA artifact. Add `--verify-evidence` to confirm each `quote.exact` is actually findable in the cached PDF.
+
+**How many findings?** Aim for the central results, not exhaustive coverage. A paper with one headline measurement (e.g. an S_8 constraint) plus a few supporting null-test outcomes typically has 3–8 findings. A paper covering multiple separate analyses may have more.
+
+
+## Inputs
+
+The skill accepts:
+
+1. An **arXiv ID** (`YYMM.NNNNN` or pre-2007 form like `astro-ph/0607021`)
+2. A **DOI** — either an arXiv DOI (`10.48550/arXiv.<id>`) or a journal DOI
+
+The slash-command form is `/paper-extraction <arxiv-id-or-doi>`.
+
+## What the script does vs what the agent does
+
+**Script (`extract-paper-substrate.py`):** walks LaTeX (Path A) or Docling output (Path B) and emits two things:
+
+1. `index.json` — figures (with copied files + line numbers + multi-graphic panels), tables (one `.tex` per block, including AAS `deluxetable`), section outline (with line numbers, in paper-reading order), citation keys (with every file+line they appear on, including biblatex commands), abstract, title, paths.
+2. `astra.yaml` — a stub ASTRA artifact: `id` (derived from arxiv-id/DOI), `version`, `name` (from `\title{}`), `narrative.summary` (from abstract), empty `inputs:`/`outputs:`/`findings:`. Validates as-is.
+
+The script handles a few realities of LaTeX papers automatically:
+
+- **Comments are stripped** before regex passes, so commented-out `\includegraphics` / `\cite` / `\section` don't leak into extraction. Newlines are preserved so line numbers stay accurate.
+- **Multi-file source** (`\input{}` / `\include{}` chains) is read in **paper-reading order** by walking `main.tex`'s input tree, not alphabetical filename order.
+- **Simple `\newcommand{\name}{body}` macros** are expanded in extracted titles, abstracts, captions, and section names. Macros with arguments (`\newcommand{\foo}[1]{...}`) pass through unexpanded — handling those would require evaluating arbitrary LaTeX.
+- **Standard table envs** (`table`, `table*`, `deluxetable`, `deluxetable*`) and **standard citation commands** (natbib family + biblatex `\autocite` / `\textcite` / `\parencite` / `\footcite` / `\smartcite`) are all recognized.
+
+What the script does *not* do: understand what figures show, identify findings, infer methodology, or handle substrate acquisition (Step 2). It also doesn't expand macros with arguments, resolve `\graphicspath{}` overrides, or parse non-LaTeX abstract metadata blocks.
+
+**Agent (Steps 4 + 5):** reads `index.json`'s `extraction_warnings` and fixes structural gaps (Step 4), then walks the paper and writes `findings:` into `astra.yaml` with quote-anchored evidence (Step 5). The verbosity of the Insight + Evidence shape *is* the back-pressure: the agent has to find and quote actual paper text, not invent.
+
+## Discipline
+
+- **One entry-point.** `/paper-extraction <id>` is the whole surface. Don't have callers reach into `scripts/` or `references/` directly. The skill orchestrates; consumers trust `index.json`.
+- **Self-contained.** This skill takes a DOI and produces a standardized directory. It doesn't know who calls it or what they do with the result. Don't add caller-specific logic.
+- **Idempotent.** Survey-first, skip-if-done. Re-invoking on the same paper does no work and produces no errors.
+- **arXiv-LaTeX is primary.** When an arXiv source tarball is acquirable, Path A wins. PDF + Docling is the fallback for non-arXiv only.
+- **Reading materials only.** The skill produces what's structurally in the paper itself — substrate, figures, tables, outline, citations, embedded bibliography. Adjacent assets (code repos, supplementary datasets, related papers, project bibliography management) are explicitly out of scope.
+- **Script is dumb on purpose.** The deterministic pieces (figure/table blocks, section headings, `\cite{}` keys) belong to the script. Anything that requires understanding what the paper is *about* lives outside this skill — paper-extraction sets the table; it doesn't read the meal.
+- **`extraction_warnings` is the agent surface.** When the script can't resolve something, it doesn't fail or guess — it warns. The agent reads the warnings and decides whether to fix or surface.
+
+## Anti-patterns
+
+- **Re-fetching what's already there.** Always survey `work/reference/` and read `index.json` first.
+- **Adding numerical-finding extraction to the script.** Macro-based extraction (`\newcommand{\Omegam}{0.315}`) catches almost no real papers; inline-value extraction needs semantic judgment about what's a *result* vs incidental. Findings live in `astra.yaml`, written by the agent in Step 5.
+- **Paraphrasing the `quote.exact` text.** Copy the paper's LaTeX text verbatim. Paraphrasing breaks `astra validate --verify-evidence` and weakens the back-pressure that justified ASTRA shape in the first place.
+- **Surfacing partial state silently.** If `paper.pdf` was fetched but the LaTeX-source download failed, write `work/reference/extraction-error.txt` with a clear cause and stop, rather than producing a half-populated `work/reference/` with no signal that more was intended.
+- **Knowing about the caller.** The skill's contract is the directory + index. If you're tempted to write logic that depends on a particular invoker, push that logic into the invoker instead.
diff --git a/claude/lightcone/skills/paper-extraction/examples/unions-bmodes-astra.yaml b/claude/lightcone/skills/paper-extraction/examples/unions-bmodes-astra.yaml
new file mode 100644
index 00000000..8fa35ceb
--- /dev/null
+++ b/claude/lightcone/skills/paper-extraction/examples/unions-bmodes-astra.yaml
@@ -0,0 +1,106 @@
+# Worked Step 5 example for paper-extraction.
+#
+# Generated from arXiv:2604.03227, then filled with 6 quote-anchored
+# findings. Verified with:
+#
+#   astra paper add 10.48550/arXiv.2604.03227 --version 1 --pdf paper.pdf
+#   astra validate astra.yaml --verify-evidence
+
+id: arxiv_2604_03227
+version: "0.0.7"
+name: "UNIONS-3500 Weak Lensing: II. B-mode validation for cosmic shear"
+
+narrative:
+  summary: |
+    At Stage-III sensitivities, cosmic shear $B$ modes unambiguously indicate systematic contamination and are often used to inform data selection and scale cuts for cosmological inference.
+    We validate $B$ modes for the Ultraviolet Near-Infrared Optical Northern Survey (UNIONS)-3500 (\SI{2894}{\square\deg}, $n_\mathrm{eff} \approx \SI{5.0}{arcmin\tothe{-2}}$) using three $E$/$B$-separable statistics: pure-mode correlation functions $\xi_\pm^{\mathrm{B}}(\theta)$, Complete Orthogonal Sets of $E$/$B$-mode Integrals (COSEBI) $B$-mode amplitudes $B_n$, and harmonic-space power spectra $C_\ell^{BB}$.
+    For each statistic, we compute probability-to-exceed (PTE) values over a two-dimensional grid of scale-cut boundaries; our adopted cuts lie in broad stable regions of acceptable PTE.
+    $B$-mode detections and PTE failures on initial catalog versions led us to investigate galaxy size cuts and stellar halo masking.
+    After cuts, all three statistics pass the null test (minimum PTE $= \num{0.18}$).
+    Before scale cuts, we measure an oscillatory COSEBI $B$-mode pattern consistent with repeating additive shear bias, a detector-level effect seen across multiple Stage-III surveys including CFHTLenS, which used the same MegaCam camera; scale cuts that exclude the charge-coupled device (CCD) angular scale suppress it.
+    Although these statistics probe the same two-point shear field, scale cuts in one do not map exactly onto cuts in another, because their respective filter functions weight angular scales differently.
+    The most conservative validation therefore requires scale and sample selections that pass null tests across all frameworks simultaneously, an approach that applies directly to Stage-IV surveys where systematic errors dominate.
+  findings: |
+    The paper validates the UNIONS-3500 weak-lensing B-mode surface across three estimators: the [adopted cuts pass all three null tests](#findings.adopted_cuts_pass_all_statistics), [the conclusion restates consistency with zero at those cuts](#findings.bmodes_consistent_with_zero_at_adopted_cuts), and [the cuts remain acceptable under a two-parameter scale-cut accounting](#findings.scale_cut_degrees_of_freedom_still_pass). The central systematic is an [oscillatory full-range COSEBI pattern consistent with repeating additive shear bias](#findings.full_range_cosebi_repeating_additive_pattern). The paper also shows that [only the fiducial catalog passes in every representation](#findings.fiducial_catalog_only_full_pass) and that [COSEBI versus harmonic-space disagreement is driven by filter-function sensitivity](#findings.filter_functions_drive_representation_disagreement).
+
+inputs: []
+outputs: []
+
+findings:
+  adopted_cuts_pass_all_statistics:
+    id: adopted_cuts_pass_all_statistics
+    label: "Adopted cuts pass all three B-mode null tests"
+    claim: |
+      After galaxy-size cuts and stellar-halo masking choices, the adopted UNIONS-3500 scale cuts pass pure-mode correlation-function, COSEBI, and harmonic-space B-mode null tests, with a minimum PTE of 0.18.
+    created_at: "2026-05-08T03:32:00+02:00"
+    evidence:
+      - id: abstract_minimum_pte
+        doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "After cuts, all three statistics pass the null test (minimum PTE = 0.18)."
+
+  bmodes_consistent_with_zero_at_adopted_cuts:
+    id: bmodes_consistent_with_zero_at_adopted_cuts
+    label: "B modes are consistent with zero at the adopted scale cuts"
+    claim: |
+      In the paper's conclusion, the adopted scale cuts leave B modes consistent with zero across pure-mode correlation functions, COSEBIs, and harmonic-space power spectra.
+    created_at: "2026-05-08T03:32:00+02:00"
+    evidence:
+      - id: conclusion_consistency
+        doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "UNIONS-3500 weak-lensing B modes are consistent with zero at the adopted scale cuts across pure-mode correlation functions, COSEBIs, and harmonic-space power spectra."
+
+  scale_cut_degrees_of_freedom_still_pass:
+    id: scale_cut_degrees_of_freedom_still_pass
+    label: "Scale-cut degree-of-freedom accounting still passes"
+    claim: |
+      Treating the selected scale-cut boundaries as two fitted parameters lowers the minimum PTE from 0.18 to 0.09, which remains above the 0.05 failure threshold.
+    created_at: "2026-05-08T03:32:00+02:00"
+    evidence:
+      - id: pte_two_parameter_accounting
+        doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "Doing so lowers the minimum PTE across all statistics from 0.18 to 0.09, still above the 0.05 threshold."
+
+  full_range_cosebi_repeating_additive_pattern:
+    id: full_range_cosebi_repeating_additive_pattern
+    label: "Full-range COSEBIs show repeating-additive pattern"
+    claim: |
+      Before the adopted cuts, all catalog versions show an oscillatory full-range COSEBI B-mode pattern consistent with repeating additive shear bias at CCD angular scales.
+    created_at: "2026-05-08T03:32:00+02:00"
+    evidence:
+      - id: conclusion_full_range_pattern
+        doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "On the full angular range, all catalog versions show an oscillatory COSEBI B-mode pattern consistent with repeating additive shear bias at CCD angular scales"
+
+  fiducial_catalog_only_full_pass:
+    id: fiducial_catalog_only_full_pass
+    label: "Only the fiducial catalog passes every representation"
+    claim: |
+      Of the four catalog variants tested, only the fiducial size-cut catalog passes the full set of B-mode validation representations at the adopted cuts.
+    created_at: "2026-05-08T03:32:00+02:00"
+    evidence:
+      - id: conclusion_fiducial_only
+        doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "Of the four catalog variants tested, only the fiducial passes in every representation."
+
+  filter_functions_drive_representation_disagreement:
+    id: filter_functions_drive_representation_disagreement
+    label: "Filter functions explain representation disagreement"
+    claim: |
+      The paper argues that COSEBI versus harmonic-space disagreement is not a real-space versus harmonic-space basis effect; COSEBI filter functions concentrate sensitivity on contaminated angular scales.
+    created_at: "2026-05-08T03:32:00+02:00"
+    evidence:
+      - id: discussion_harmonic_cosebi_comparison
+        doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "By computing COSEBIs from the harmonic-space bandpowers, we confirm that the disagreement is not a matter of harmonic versus real space"
diff --git a/claude/lightcone/skills/paper-extraction/references/arxiv-source.md b/claude/lightcone/skills/paper-extraction/references/arxiv-source.md
new file mode 100644
index 00000000..0d4515a0
--- /dev/null
+++ b/claude/lightcone/skills/paper-extraction/references/arxiv-source.md
@@ -0,0 +1,47 @@
+# Path A — arXiv LaTeX source (primary)
+
+When the paper has an arXiv ID, the LaTeX source tarball is the substrate. Math, ligatures, captions, tables, and bibliography all come through clean — none of the rendering artifacts that plague PDF extraction.
+
+## Acquire the source tarball
+
+```bash
+ARXIV_ID="2503.19441"  # adapt
+curl -L -o /tmp/${ARXIV_ID}.tar.gz "https://arxiv.org/src/${ARXIV_ID}"
+mkdir -p work/reference/source
+cd work/reference/source && tar -xzf /tmp/${ARXIV_ID}.tar.gz
+```
+
+Identify the main `.tex` file (the one with `\documentclass`):
+
+```bash
+grep -l '\\documentclass' work/reference/source/*.tex | head -1
+```
+
+Symlink that file as `work/reference/paper.tex` so downstream consumers have a stable handle:
+
+```bash
+MAIN_TEX=$(grep -l '\\documentclass' work/reference/source/*.tex | head -1)
+ln -sf "source/$(basename "$MAIN_TEX")" work/reference/paper.tex
+```
+
+## Fetch the PDF
+
+```bash
+curl -L -o work/reference/paper.pdf "https://arxiv.org/pdf/${ARXIV_ID}"
+file work/reference/paper.pdf  # must say "PDF document"
+```
+
+## What downstream gets
+
+- `work/reference/source/` — the full extracted tarball (everything: `.tex`, `.bbl`, `.bib`, figure files, tables, supplementary `.tex` files).
+- `work/reference/paper.tex` — symlink to the main `.tex` file so consumers don't have to re-detect it.
+- `work/reference/paper.pdf` — cached PDF for evidence verification.
+
+No conversion to markdown is needed. Claude reads LaTeX directly; converting to markdown only loses information (math collapse, label resolution, caption flattening). Consumers of `work/reference/` read `.tex` and resolve `\ref{}` against `\label{}` in the source tree.
+
+## Notes
+
+- **arXiv DOI form is `10.48550/arXiv.<id>`.** Useful when downstream tools want a DOI rather than an arXiv ID.
+- **Equation numbers and section numbers must match the rendered paper.** When a downstream consumer cites "eq. N" or "§N", they should find the equation by content, not by counting TeX blocks. Reach for the cached PDF if you need to confirm a printed number.
+- **`\input{}` and `\include{}` chains** are common — the main `.tex` may pull section content from sibling files. Downstream consumers should grep across the whole `source/` tree, not just `paper.tex`, when searching for content.
+- **If the tarball download fails** (rare: typically a transient HTTP error or a paper still in moderation), retry once. If it still fails, the paper may need to come in as Path B (DOI-only). Write `work/reference/extraction-error.txt` with the cause and surface to the user.
diff --git a/claude/lightcone/skills/paper-extraction/references/pdf-fallback.md b/claude/lightcone/skills/paper-extraction/references/pdf-fallback.md
new file mode 100644
index 00000000..c80d776e
--- /dev/null
+++ b/claude/lightcone/skills/paper-extraction/references/pdf-fallback.md
@@ -0,0 +1,66 @@
+# Path B — PDF + Docling (fallback for non-arXiv)
+
+When the paper does not have an arXiv preprint, the PDF is the only substrate. Docling produces a structured representation (markdown + figures + tables + metadata) that downstream consumers read instead of the raw PDF.
+
+This path is a **fallback**. Whenever Path A is available, prefer it.
+
+## Acquire the PDF
+
+Resolve the DOI to a PDF. The straightforward path:
+
+```bash
+curl -L -o work/reference/paper.pdf "https://doi.org/<DOI>"
+file work/reference/paper.pdf
+```
+
+The `file` output must say "PDF document". If it says "HTML document" or anything else, the download was blocked (CAPTCHA, paywall, journal redirect):
+
+1. Search for an open-access copy: NASA ADS, arXiv, Unpaywall, Semantic Scholar, or the journal's open-access link.
+2. Download with `curl -L -o work/reference/paper.pdf <url>`.
+3. Re-check with `file work/reference/paper.pdf`.
+
+If a valid PDF cannot be obtained, write a clear error to `work/reference/extraction-error.txt` and stop. Do not try to extract structure from a non-PDF.
+
+## Run Docling
+
+```bash
+docling --output work/reference work/reference/paper.pdf
+```
+
+Docling produces, directly into `work/reference/`:
+
+- `document.md` — paper as markdown
+- `figures/` — extracted figures (one file per figure)
+- `tables/` — extracted tables (one file per table)
+- `metadata.json` — figure / table index with captions, page numbers, and labels (where Docling can extract them)
+
+The `metadata.json` shape Docling emits:
+
+```json
+{
+  "figures": [
+    {"id": "fig1", "caption": "...", "file": "figures/fig1.pdf", "label": "fig:bao"}
+  ],
+  "tables": [
+    {"id": "tab1", "caption": "...", "file": "tables/tab1.csv", "label": "tab:results"}
+  ]
+}
+```
+
+The `label` field is the source label where Docling can extract it; consumers reading `index.json` use it to anchor references back to the paper.
+
+If Docling fails, the PDF may be corrupt — re-download once, then surface to the user if it still fails.
+
+## What downstream gets
+
+- `work/reference/document.md` — paper as markdown.
+- `work/reference/figures/`, `work/reference/tables/` — already populated by Docling.
+- `work/reference/metadata.json` — Docling's own index; the extraction script reads this and folds figures + tables into the unified `work/reference/index.json`.
+- `work/reference/paper.pdf` — the PDF.
+
+No `paper.tex` and no `source/` on Path B. Consumers detect the path by reading `index.json`'s `path` field (`"A"` or `"B"`).
+
+## Notes
+
+- **Outline + citation extraction don't run on Path B in the launch script.** No LaTeX source means no `\section{}` or `\cite{}` markers to walk. `index.json` includes an `extraction_warnings` entry flagging this; a future LLM pass over `document.md` would fill the gap.
+- **Journal DOIs that 403 on Unpaywall** sometimes have an arXiv preprint twin. When that's available, treat the paper as Path A using the arXiv ID — the LaTeX-source surface is far cleaner than any PDF extraction.
diff --git a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
new file mode 100755
index 00000000..de02e272
--- /dev/null
+++ b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
@@ -0,0 +1,699 @@
+#!/usr/bin/env python3
+"""
+extract-paper-substrate.py — deterministic structural extraction for the
+paper-extraction skill.
+
+Reads `work/reference/` and produces:
+
+  - figures/                        # figure files copied from source/
+  - tables/<label-slug>.tex         # one file per LaTeX table block
+  - bibliography-source.bib         # copy of any .bib found in source/ (Path A only)
+  - bibliography-source.bbl         # copy of any .bbl found in source/ (Path A only)
+  - index.json                      # single top-level index of everything extracted
+
+Path A (arXiv LaTeX source): reads from work/reference/source/.
+Path B (Docling fallback):   reads from work/reference/document.md and Docling's
+                             pre-existing figures/ + tables/ + metadata.json.
+
+The script handles only the deterministic pieces. Semantic interpretation —
+"what does this figure show", "which findings are central", numerical-claim
+extraction — is the agent's job after this script runs. The agent reads
+index.json (specifically extraction_warnings) and fixes or surfaces gaps.
+
+Usage:
+    python extract-paper-substrate.py [--reference-dir work/reference]
+
+Idempotent — skips files that already exist.
+"""
+
+import argparse
+import json
+import re
+import shutil
+import sys
+from pathlib import Path
+
+
+# ---------------------------------------------------------------------------
+# Patterns
+# ---------------------------------------------------------------------------
+
+FIGURE_BLOCK = re.compile(r"\\begin\{figure\*?\}(.*?)\\end\{figure\*?\}", re.DOTALL)
+# Tables: include AAS-specific `deluxetable` (ApJ, ApJL, ApJS) alongside the standard `table`.
+TABLE_BLOCK = re.compile(
+    r"\\begin\{(?:table|deluxetable)\*?\}(.*?)\\end\{(?:table|deluxetable)\*?\}",
+    re.DOTALL,
+)
+ABSTRACT_BLOCK = re.compile(r"\\begin\{abstract\}(.*?)\\end\{abstract\}", re.DOTALL)
+TITLE_CMD = re.compile(r"\\title\*?\s*(?:\[[^\]]*\])?\s*\{")
+# Citations: natbib family + biblatex (autocite, textcite, parencite, footcite, smartcite).
+CITE = re.compile(
+    r"\\(?:cite|citep|citet|citealp|citealt|citeauthor|citeyear|citeyearpar|"
+    r"autocite|textcite|parencite|footcite|smartcite)\*?"
+    r"(?:\[[^\]]*\]){0,2}\{([^}]+)\}"
+)
+ASTRA_SCHEMA_VERSION = "0.0.7"  # bump when the ASTRA spec version we target changes
+CAPTION = re.compile(r"\\caption\{((?:[^{}]|\{[^}]*\})*)\}", re.DOTALL)
+LABEL = re.compile(r"\\label\{([^}]+)\}")
+INCLUDEGRAPHICS = re.compile(r"\\includegraphics(?:\[[^\]]*\])?\{([^}]+)\}")
+PLOTONE = re.compile(r"\\plotone\{([^}]+)\}")
+PLOTTWO = re.compile(r"\\plottwo\{([^}]+)\}\{([^}]+)\}")
+FIGURE_INPUT = re.compile(r"\\input\{([^}]+\.(?:pgf|tex|tikz))\}")
+SECTION = re.compile(r"\\(section|subsection|subsubsection)\*?\{((?:[^{}]|\{[^}]*\})*)\}")
+
+
+def line_at(content: str, offset: int) -> int:
+    """1-indexed line number of `offset` within `content`."""
+    return content.count("\n", 0, offset) + 1
+
+
+def first_match(pattern: re.Pattern, text: str) -> str | None:
+    m = pattern.search(text)
+    return m.group(1).strip() if m else None
+
+
+def extract_caption(text: str, macros: dict[str, str]) -> str:
+    """Return the last non-empty caption in a block.
+
+    Composite figures often have empty subfigure captions before the real
+    top-level caption; taking the first caption produces a false warning.
+    """
+    captions = [m.group(1).strip() for m in CAPTION.finditer(text)]
+    nonempty = [caption for caption in captions if caption]
+    return expand_macros(nonempty[-1], macros) if nonempty else ""
+
+
+# ---------------------------------------------------------------------------
+# Path detection
+# ---------------------------------------------------------------------------
+
+
+def detect_path(reference_dir: Path) -> str:
+    if (reference_dir / "source").is_dir():
+        return "A"
+    if (reference_dir / "document.md").is_file():
+        return "B"
+    sys.exit(
+        f"error: neither {reference_dir}/source/ nor {reference_dir}/document.md exists "
+        f"— run paper-extraction Step 1 (substrate acquisition) first"
+    )
+
+
+# ---------------------------------------------------------------------------
+# Path A — LaTeX source
+# ---------------------------------------------------------------------------
+
+
+def list_tex_files(source_dir: Path) -> list[Path]:
+    return sorted(source_dir.rglob("*.tex"))
+
+
+# A `%` not preceded by `\\` starts a LaTeX comment running to end-of-line.
+# We strip comment *content* but keep the `\n` so line numbers are preserved.
+COMMENT = re.compile(r"(?<!\\)%[^\n]*")
+
+
+def strip_comments(content: str) -> str:
+    """Strip LaTeX comments (line content after unescaped `%`), preserving newlines."""
+    return COMMENT.sub("", content)
+
+
+# Match `\newcommand[*]{\name}{body}` — no-args form only. Args (`[2]`) are skipped.
+NEWCOMMAND = re.compile(
+    r"\\(?:newcommand|renewcommand|providecommand)\*?\s*\{?\s*\\([A-Za-z]+)\s*\}?\s*\{",
+)
+
+
+def collect_simple_macros(tex_files: list[tuple[Path, str]]) -> dict[str, str]:
+    """Build a `\\name -> body` dict for no-arg `\\newcommand` macros across the source.
+
+    Skips macros with arguments (e.g. `\\newcommand{\\foo}[2]{...}`) — handling those
+    requires expansion, which is out of scope. Skips macros whose body is the same as
+    their name (e.g. `\\newcommand{\\foo}{\\foo}`) which would loop.
+    """
+    macros: dict[str, str] = {}
+    for _, content in tex_files:
+        for match in NEWCOMMAND.finditer(content):
+            name = match.group(1)
+            # Walk balanced braces to find the body.
+            body = walk_balanced_braces(content, match.end() - 1)
+            if body is None:
+                continue
+            # Skip if there's an arg-count specifier between name and body:
+            # we already consumed up to the body's opening `{`, so this regex
+            # can match args-form too. Detect by checking if body looks like
+            # an args spec — actually simpler: check if `[N]` lies between
+            # name end and body start in the original source.
+            between_start = match.end(1)
+            between_end = match.end() - 1
+            between = content[between_start:between_end]
+            if re.search(r"\[\s*\d+\s*\]", between):
+                continue  # args-form, skip
+            if body.strip() == f"\\{name}":
+                continue  # self-referential
+            macros[name] = body
+    return macros
+
+
+def expand_macros(text: str, macros: dict[str, str], max_iterations: int = 5) -> str:
+    """Substitute `\\name` (where name is in `macros`) iteratively. Stops at fixed point or
+    `max_iterations` (handles nested macros, prevents infinite loops on pathological input).
+    """
+    if not text or not macros:
+        return text
+    # Match `\name` where name is in our table. Order longest-first so `\desidrone`
+    # wins over `\desi` if both exist.
+    names = sorted(macros.keys(), key=len, reverse=True)
+    pattern = re.compile(r"\\(" + "|".join(re.escape(n) for n in names) + r")(?![A-Za-z])")
+    out = text
+    for _ in range(max_iterations):
+        new = pattern.sub(lambda m: macros[m.group(1)], out)
+        if new == out:
+            return out
+        out = new
+    return out
+
+
+def read_tex_with_origin(source_dir: Path) -> list[tuple[Path, str]]:
+    """Read each .tex file (stripped of comments) in *paper-reading order*.
+
+    Order is determined by walking the main file's `\\input{}` / `\\include{}` chain.
+    The main file is the one containing `\\documentclass`. Files not reachable from
+    the input chain are appended at the end (alphabetical) as orphans.
+
+    Comments are stripped at read time to prevent commented-out LaTeX from leaking
+    into figure / table / section / citation extraction. Newlines are preserved so
+    line numbers are still meaningful.
+    """
+    paths = list_tex_files(source_dir)
+    if not paths:
+        return []
+
+    contents: dict[Path, str] = {}
+    for p in paths:
+        try:
+            contents[p] = strip_comments(p.read_text(errors="replace"))
+        except OSError as e:
+            print(f"warn: could not read {p}: {e}", file=sys.stderr)
+
+    # Find the main file (contains \documentclass, after comment stripping).
+    main = next((p for p in paths if r"\documentclass" in contents.get(p, "")), None)
+    if main is None:
+        # No main file detected — fall back to alphabetical order.
+        return [(p, contents[p]) for p in paths if p in contents]
+
+    # Map basename (without extension) → path, for resolving \input{name} or \input{path/name}.
+    by_stem: dict[str, Path] = {}
+    for p in paths:
+        by_stem.setdefault(p.stem, p)
+
+    INPUT_CMD = re.compile(r"\\(?:input|include)\{([^}]+)\}")
+    ordered: list[Path] = []
+    seen: set[Path] = set()
+
+    def walk(p: Path) -> None:
+        if p in seen or p not in contents:
+            return
+        seen.add(p)
+        ordered.append(p)
+        for match in INPUT_CMD.finditer(contents[p]):
+            target = match.group(1).strip()
+            target = target.removesuffix(".tex")
+            stem = Path(target).stem  # last path component, no extension
+            sub = by_stem.get(stem)
+            if sub is not None:
+                walk(sub)
+
+    walk(main)
+    # Append unreached files (orphans — supplementary, unused, etc.) at the end.
+    for p in paths:
+        if p not in seen and p in contents:
+            ordered.append(p)
+
+    return [(p, contents[p]) for p in ordered]
+
+
+def join_tex(tex_files: list[tuple[Path, str]]) -> str:
+    return "\n".join(content for _, content in tex_files)
+
+
+def extract_figures(
+    reference_dir: Path,
+    source_dir: Path,
+    tex_files: list[tuple[Path, str]],
+    macros: dict[str, str],
+) -> tuple[list[dict], list[str]]:
+    """Walk every figure block; copy resolved figure files; return (entries, warnings)."""
+    fig_dir = reference_dir / "figures"
+    fig_dir.mkdir(exist_ok=True)
+    entries: list[dict] = []
+    warnings: list[str] = []
+    counter = 0
+
+    for tex_path, content in tex_files:
+        for match in FIGURE_BLOCK.finditer(content):
+            counter += 1
+            block = match.group(1)
+            caption = extract_caption(block, macros)
+            label = first_match(LABEL, block)
+
+            # Capture every external figure reference in the block. Besides
+            # \includegraphics, AASTeX/emulateapj papers often use \plotone /
+            # \plottwo, while ML papers often \input Matplotlib/PGF exports.
+            # Multi-panel / subfloat figures routinely have several.
+            graphic_matches = external_figure_refs(block)
+            files_rel: list[str] = []
+            for graphic in graphic_matches:
+                resolved = resolve_graphic(source_dir, graphic)
+                if resolved:
+                    dest = fig_dir / resolved.name
+                    if not dest.exists():
+                        shutil.copy2(resolved, dest)
+                    files_rel.append(f"figures/{resolved.name}")
+                else:
+                    warnings.append(
+                        f"figure fig{counter}: \\includegraphics{{{graphic}}} could not resolve to a file in source/"
+                    )
+
+            inline_figure = bool(re.search(r"\\begin\{(?:tikzpicture|picture|pspicture)\}", block))
+            if not graphic_matches and not inline_figure:
+                warnings.append(f"figure fig{counter}: no external figure file found in block")
+            if not caption:
+                warnings.append(f"figure fig{counter}: no \\caption found")
+
+            entries.append(
+                {
+                    "id": f"fig{counter}",
+                    "label": label,
+                    "caption": caption,
+                    # Single-graphic figures keep the simple shape (the common case);
+                    # multi-graphic figures expose all panels under "files".
+                    "source_path": graphic_matches[0] if graphic_matches else None,
+                    "file": files_rel[0] if files_rel else None,
+                    "files": files_rel if len(files_rel) > 1 else None,
+                    "block_origin": str(tex_path.relative_to(source_dir)),
+                    "line": line_at(content, match.start()),
+                }
+            )
+
+    return entries, warnings
+
+
+def external_figure_refs(block: str) -> list[str]:
+    """Return external figure-like files referenced inside a figure block."""
+    refs: list[str] = []
+    refs.extend(INCLUDEGRAPHICS.findall(block))
+    refs.extend(PLOTONE.findall(block))
+    for first, second in PLOTTWO.findall(block):
+        refs.extend([first, second])
+    refs.extend(FIGURE_INPUT.findall(block))
+    # Preserve order while de-duplicating repeated panels.
+    seen: set[str] = set()
+    out = []
+    for ref in refs:
+        if ref not in seen:
+            seen.add(ref)
+            out.append(ref)
+    return out
+
+
+def resolve_graphic(source_dir: Path, graphic: str) -> Path | None:
+    """LaTeX \\includegraphics filenames can omit the extension; try common ones."""
+    base = source_dir / graphic
+    if base.exists():
+        return base
+    for ext in (".pdf", ".png", ".jpg", ".jpeg", ".eps"):
+        candidate = base.with_suffix(ext)
+        if candidate.exists():
+            return candidate
+    matches = list(source_dir.rglob(f"{Path(graphic).stem}.*"))
+    return matches[0] if matches else None
+
+
+def extract_tables(
+    reference_dir: Path,
+    tex_files: list[tuple[Path, str]],
+    source_dir: Path,
+    macros: dict[str, str],
+) -> tuple[list[dict], list[str]]:
+    tab_dir = reference_dir / "tables"
+    tab_dir.mkdir(exist_ok=True)
+    entries: list[dict] = []
+    warnings: list[str] = []
+    counter = 0
+
+    for tex_path, content in tex_files:
+        for match in TABLE_BLOCK.finditer(content):
+            counter += 1
+            block = match.group(0)  # full \begin{table}...\end{table}
+            body = match.group(1)
+            label = first_match(LABEL, body)
+            caption = extract_caption(body, macros)
+            slug = label.replace(":", "-").replace(" ", "_") if label else f"tab{counter}"
+            out = tab_dir / f"{slug}.tex"
+            if not out.exists():
+                out.write_text(block)
+            if not caption:
+                warnings.append(f"table tab{counter}: no \\caption found")
+            if not label:
+                warnings.append(f"table tab{counter}: no \\label — wrote as {slug}.tex")
+            entries.append(
+                {
+                    "id": f"tab{counter}",
+                    "label": label,
+                    "caption": caption,
+                    "file": f"tables/{slug}.tex",
+                    "block_origin": str(tex_path.relative_to(source_dir)),
+                    "line": line_at(content, match.start()),
+                }
+            )
+
+    return entries, warnings
+
+
+def extract_outline(
+    tex_files: list[tuple[Path, str]], source_dir: Path, macros: dict[str, str]
+) -> list[dict]:
+    """Walk \\section{}, \\subsection{}, \\subsubsection{} in source order.
+
+    Attach a \\label{} only when it directly follows the section command (whitespace
+    between is fine, but no other content). The convention is `\\section{Foo}\\label{sec:foo}`
+    or with one newline between — anything more, and the label belongs elsewhere.
+    """
+    level_map = {"section": 1, "subsection": 2, "subsubsection": 3}
+    immediate_label = re.compile(r"\A\s*\\label\{([^}]+)\}")
+    out = []
+    for tex_path, content in tex_files:
+        for match in SECTION.finditer(content):
+            kind, title = match.group(1), expand_macros(match.group(2).strip(), macros)
+            tail = content[match.end() : match.end() + 200]
+            label_match = immediate_label.match(tail)
+            label = label_match.group(1) if label_match else None
+            out.append(
+                {
+                    "level": level_map[kind],
+                    "title": title,
+                    "label": label,
+                    "source_file": str(tex_path.relative_to(source_dir)),
+                    "line": line_at(content, match.start()),
+                }
+            )
+    return out
+
+
+def extract_citations(
+    tex_files: list[tuple[Path, str]], source_dir: Path
+) -> dict[str, list[dict]]:
+    """Map each citation key to every (file, line) location it's cited.
+
+    Shape: {"smith24": [{"file": "main.tex", "line": 42}, {"file": "main.tex", "line": 89}], ...}
+    """
+    out: dict[str, list[dict]] = {}
+    for tex_path, content in tex_files:
+        rel_file = str(tex_path.relative_to(source_dir))
+        for match in CITE.finditer(content):
+            line = line_at(content, match.start())
+            for key in match.group(1).split(","):
+                k = key.strip()
+                if not k:
+                    continue
+                out.setdefault(k, []).append({"file": rel_file, "line": line})
+    # Sort keys for stable output
+    return {k: out[k] for k in sorted(out)}
+
+
+def walk_balanced_braces(content: str, start: int) -> str | None:
+    """Given the index of the opening `{`, return the content between matched
+    braces (exclusive of the braces themselves), or None if unbalanced.
+    Honors escaped braces (`\\{`, `\\}`).
+    """
+    depth = 1
+    i = start + 1
+    while i < len(content) and depth > 0:
+        c = content[i]
+        if c == "\\" and i + 1 < len(content):
+            i += 2  # skip escaped char
+            continue
+        if c == "{":
+            depth += 1
+        elif c == "}":
+            depth -= 1
+        i += 1
+    if depth == 0:
+        return content[start + 1 : i - 1]
+    return None
+
+
+def extract_abstract(tex_files: list[tuple[Path, str]], macros: dict[str, str]) -> str | None:
+    """Extract abstract content. Supports two LaTeX forms:
+
+    - environment: `\\begin{abstract}...\\end{abstract}` (most journals)
+    - command:    `\\abstract{...}` (A&A's aa.cls and similar)
+    """
+    for _, content in tex_files:
+        # Form 1: environment
+        match = ABSTRACT_BLOCK.search(content)
+        if match:
+            return expand_macros(match.group(1).strip(), macros)
+
+        # Form 2: command — balanced-brace walk
+        cmd = re.search(r"\\abstract\s*\{", content)
+        if cmd:
+            body = walk_balanced_braces(content, cmd.end() - 1)
+            if body is not None:
+                return expand_macros(body.strip(), macros)
+    return None
+
+
+def extract_title(tex_files: list[tuple[Path, str]], macros: dict[str, str]) -> str | None:
+    """Extract \\title{...} (or \\title[short]{full}) content with balanced braces."""
+    for _, content in tex_files:
+        match = TITLE_CMD.search(content)
+        if match:
+            body = walk_balanced_braces(content, match.end() - 1)
+            if body is not None:
+                expanded = expand_macros(" ".join(body.split()), macros)
+                # Strip common font-style wrappers that a `\\boldmath`-prefixed title
+                # leaves behind after macro expansion (no-op if not present).
+                expanded = re.sub(r"^\\boldmath\s*", "", expanded)
+                return expanded
+    return None
+
+
+def derive_astra_id(arxiv_id: str | None, doi: str | None) -> str:
+    """Stable ASTRA id from arXiv ID or DOI. Lowercase, [a-z0-9_]+, leading letter."""
+    if arxiv_id:
+        slug = "arxiv_" + arxiv_id.replace(".", "_").replace("/", "_").lower()
+    elif doi:
+        slug = "doi_" + re.sub(r"[^a-z0-9]+", "_", doi.lower()).strip("_")
+    else:
+        slug = "paper_unknown"
+    # Ensure leading letter, only [a-z0-9_]
+    slug = re.sub(r"[^a-z0-9_]+", "_", slug)
+    if not slug or not slug[0].isalpha():
+        slug = "paper_" + slug
+    return slug
+
+
+def write_astra_yaml_stub(
+    reference_dir: Path,
+    arxiv_id: str | None,
+    doi: str | None,
+    title: str | None,
+    abstract: str | None,
+) -> str:
+    """Emit a stub `work/reference/astra.yaml` that the agent fills in.
+
+    The script populates: id, version, name, narrative.summary (from abstract),
+    inputs/outputs as empty lists, and an empty findings map. The agent's job
+    (Step 5 in SKILL.md) is to walk the paper and append findings entries with
+    quote evidence, plus a `narrative.findings:` cross-link. Once that's in,
+    `astra validate work/reference/astra.yaml` should pass.
+
+    If the file already exists, leave it alone — it may have agent edits.
+    """
+    out = reference_dir / "astra.yaml"
+    if out.exists():
+        return "astra.yaml"
+
+    astra_id = derive_astra_id(arxiv_id, doi)
+    title_str = title or "TODO: paper title (script could not extract \\title{})"
+    summary_str = abstract or "TODO: one-paragraph summary of the paper (no abstract extracted)"
+
+    # Indent the summary as a block scalar so multi-line abstracts round-trip
+    summary_indented = "\n".join("    " + line for line in summary_str.splitlines())
+
+    content = f"""# Stub ASTRA representation of the source paper.
+#
+# Populated by paper-extraction's script: id, version, name, narrative.summary.
+# The agent (paper-extraction Step 5) fills in `findings:` with the paper's
+# claimed numerical results plus a `narrative.findings:` cross-link, then runs
+# `astra validate astra.yaml` to confirm.
+
+id: {astra_id}
+version: "{ASTRA_SCHEMA_VERSION}"
+name: {json.dumps(title_str)}
+
+narrative:
+  summary: |
+{summary_indented}
+
+inputs: []
+outputs: []
+
+# Agent: append entries here, one per central numerical claim the paper makes.
+# Shape: see https://w3id.org/ASTRA/insight (Insight + Evidence). Minimal entry:
+#
+#   <id>:
+#     id: <id>
+#     claim: "<1-2 sentences capturing the result>"
+#     created_at: "<ISO 8601 datetime>"
+#     evidence:
+#       - id: <evidence_id>
+#         doi: "<paper DOI>"
+#         version: <paper version, integer>
+#         quote:
+#           exact: "<exact text from the paper that supports the claim>"
+findings: {{}}
+"""
+    out.write_text(content)
+    return "astra.yaml"
+
+
+def copy_embedded_bibliography(reference_dir: Path, source_dir: Path) -> tuple[str | None, str | None]:
+    """Copy any .bib / .bbl files from source/ into work/reference/."""
+    bib_src = next(iter(source_dir.rglob("*.bib")), None)
+    bbl_src = next(iter(source_dir.rglob("*.bbl")), None)
+
+    bib_rel = None
+    bbl_rel = None
+    if bib_src:
+        dest = reference_dir / "bibliography-source.bib"
+        if not dest.exists():
+            shutil.copy2(bib_src, dest)
+        bib_rel = "bibliography-source.bib"
+    if bbl_src:
+        dest = reference_dir / "bibliography-source.bbl"
+        if not dest.exists():
+            shutil.copy2(bbl_src, dest)
+        bbl_rel = "bibliography-source.bbl"
+    return bib_rel, bbl_rel
+
+
+# ---------------------------------------------------------------------------
+# Path B — Docling fallback
+# ---------------------------------------------------------------------------
+
+
+def extract_path_b(reference_dir: Path) -> dict:
+    """Path B: Docling already produced figures/ + tables/ + metadata.json. Build index from those."""
+    metadata_path = reference_dir / "metadata.json"
+    if not metadata_path.exists():
+        sys.exit(
+            f"error: {metadata_path} not found — Path B requires Docling output. Re-run substrate acquisition."
+        )
+    docling = json.loads(metadata_path.read_text())
+
+    astra_rel = write_astra_yaml_stub(
+        reference_dir, arxiv_id=None, doi=None, title=None, abstract=None
+    )
+    index = {
+        "path": "B",
+        "paper_pdf": "paper.pdf" if (reference_dir / "paper.pdf").exists() else None,
+        "paper_tex": None,
+        "source_dir": None,
+        "document_md": "document.md" if (reference_dir / "document.md").exists() else None,
+        "bibliography_source_bib": None,
+        "bibliography_source_bbl": None,
+        "astra_yaml": astra_rel,
+        "title": None,  # Future refinement: parse from Docling's markdown
+        "abstract": None,  # Future refinement: parse from Docling's markdown
+        "figures": docling.get("figures", []),
+        "tables": docling.get("tables", []),
+        "outline": [],  # Future refinement: parse Docling's markdown headings
+        "citations": {},  # Future refinement: extract citation markers from document.md
+        "extraction_warnings": [
+            "Path B (Docling fallback): title + abstract + outline + citations not yet extracted from document.md; that's a future refinement."
+        ],
+    }
+    return index
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+
+def main() -> None:
+    p = argparse.ArgumentParser(
+        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
+    )
+    p.add_argument("--reference-dir", type=Path, default=Path("work/reference"))
+    p.add_argument("--arxiv-id", help="arXiv ID, used to populate astra.yaml id and evidence.doi")
+    p.add_argument("--doi", help="paper DOI (used when arXiv ID is unavailable)")
+    args = p.parse_args()
+
+    reference_dir = args.reference_dir
+    if not reference_dir.is_dir():
+        sys.exit(f"error: {reference_dir} not found — run paper-extraction Step 1 first")
+
+    path = detect_path(reference_dir)
+    print(f"detected path: {path}")
+
+    if path == "A":
+        source_dir = reference_dir / "source"
+        tex_files = read_tex_with_origin(source_dir)
+        if not tex_files:
+            sys.exit(f"error: no .tex content found in {source_dir}")
+
+        macros = collect_simple_macros(tex_files)
+        figures, fig_warnings = extract_figures(reference_dir, source_dir, tex_files, macros)
+        tables, tab_warnings = extract_tables(reference_dir, tex_files, source_dir, macros)
+        outline = extract_outline(tex_files, source_dir, macros)
+        citations = extract_citations(tex_files, source_dir)
+        abstract = extract_abstract(tex_files, macros)
+        title = extract_title(tex_files, macros)
+        bib_rel, bbl_rel = copy_embedded_bibliography(reference_dir, source_dir)
+        astra_rel = write_astra_yaml_stub(
+            reference_dir, args.arxiv_id, args.doi, title, abstract
+        )
+
+        paper_tex = reference_dir / "paper.tex"
+        index = {
+            "path": "A",
+            "paper_pdf": "paper.pdf" if (reference_dir / "paper.pdf").exists() else None,
+            "paper_tex": "paper.tex" if paper_tex.exists() or paper_tex.is_symlink() else None,
+            "source_dir": "source",
+            "document_md": None,
+            "bibliography_source_bib": bib_rel,
+            "bibliography_source_bbl": bbl_rel,
+            "astra_yaml": astra_rel,
+            "title": title,
+            "abstract": abstract,
+            "figures": figures,
+            "tables": tables,
+            "outline": outline,
+            "citations": citations,
+            "extraction_warnings": fig_warnings + tab_warnings,
+        }
+
+        print(
+            f"  figures: {len(figures)}, tables: {len(tables)}, "
+            f"sections: {len(outline)}, citation-keys: {len(citations)}, "
+            f"title: {'yes' if title else 'no'}, abstract: {'yes' if abstract else 'no'}, "
+            f"warnings: {len(index['extraction_warnings'])}"
+        )
+    else:
+        index = extract_path_b(reference_dir)
+        print(
+            f"  figures: {len(index['figures'])}, tables: {len(index['tables'])} (from Docling), "
+            f"warnings: {len(index['extraction_warnings'])}"
+        )
+
+    index_path = reference_dir / "index.json"
+    index_path.write_text(json.dumps(index, indent=2))
+    print(f"wrote {index_path}")
+
+
+if __name__ == "__main__":
+    main()

From 8a4decad8fd752fe9ab893ffa7f56bfbc3aeca40 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Fri, 8 May 2026 19:22:35 +0200
Subject: [PATCH 018/124] paper2astra: rewire ACQUIRE to /paper-extraction
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

paper2astra delegates all paper-substrate work — arXiv LaTeX download,
Docling fallback, figures/tables/outline/citations, embedded bibliography,
the paper-as-ASTRA-artifact — to /paper-extraction. ACQUIRE keeps Step 2
(code-clone), which is reproduction-specific.

- SKILL.md: swap /managing-bibliography → /paper-extraction in the
  description, the bundle table (with the new entry-point summary), and
  the "Skills (activate before working)" list.
- references/acquire.md: replace the long LaTeX/Docling Step 1 with a
  thin "invoke /paper-extraction" delegation. Survey signals updated:
  presence of work/reference/index.json now indicates Step 1 done.

Substrate authority moves to paper-extraction: if something needs fixing
about how the paper gets read, the fix lands in /paper-extraction, not in
ACQUIRE. Equation- and section-number conventions stay in ACQUIRE notes
because they're consumed by downstream phases regardless of substrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/paper2astra/SKILL.md  |   6 +-
 .../skills/paper2astra/references/acquire.md  | 113 +++++-------------
 2 files changed, 30 insertions(+), 89 deletions(-)

diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md
index b6b62b94..a9688740 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/paper2astra/SKILL.md
@@ -7,7 +7,7 @@ description: >
   reproduction work. The loop is 9 phases bookended by two always-interactive
   seams (INTERVIEW at start, REVIEW at close-out); ARCHITECT writes a stub
   astra.yaml decomposition before SPECIFY's two-pass-per-sub-analysis fills
-  it in. Composes sibling skills for each phase: managing-bibliography for
+  it in. Composes sibling skills for each phase: paper-extraction for
   ACQUIRE and narrative for SPECIFY. Use when the user wants to reproduce
   a paper, has a DOI or arXiv ID and wants to start a reproduction project,
   or asks to "reproduce <paper>", "set up reproduction", "paper2astra",
@@ -36,7 +36,7 @@ paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. Al
 
 | Sibling skill | Where it's invoked |
 |---|---|
-| [`/managing-bibliography`](../managing-bibliography/SKILL.md) | ACQUIRE — arXiv LaTeX source download (primary) and BibTeX caching |
+| [`/paper-extraction`](../paper-extraction/SKILL.md) | ACQUIRE — turns an arXiv ID or DOI into `work/reference/` (structural index + stub `astra.yaml`); arXiv LaTeX source primary, PDF + Docling fallback |
 | [`/constitution`](../constitution/SKILL.md) | INTERVIEW — drafting the per-paper reproduction constitution |
 | [`/ralph-loops`](../ralph-loops/SKILL.md) | After interview — launches the loop that drives all subsequent phases (when the chosen runtime mode is one of the loop modes) |
 | [`/narrative`](../narrative/SKILL.md) | SPECIFY — authoring the `narrative:` and `rationale:` prose in `astra.yaml` |
@@ -198,7 +198,7 @@ Workdir signals (file existence implies the phase has been done):
 
 - [`/constitution`](../constitution/SKILL.md) — for the interview's drafting phase
 - [`/ralph-loops`](../ralph-loops/SKILL.md) — for the bash-loop and tmux-orchestrated runtime modes
-- [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE
+- [`/paper-extraction`](../paper-extraction/SKILL.md) — for ACQUIRE
 - [`/narrative`](../narrative/SKILL.md) — for SPECIFY
 - [`/figure-comparison`](../figure-comparison/SKILL.md) — for REVIEW (close-out, mandatory)
 - [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) — for REVIEW (close-out, opt-in)
diff --git a/claude/lightcone/skills/paper2astra/references/acquire.md b/claude/lightcone/skills/paper2astra/references/acquire.md
index 3682ff0b..89bf6b2e 100644
--- a/claude/lightcone/skills/paper2astra/references/acquire.md
+++ b/claude/lightcone/skills/paper2astra/references/acquire.md
@@ -1,6 +1,6 @@
 # ACQUIRE — fetch the paper, structure it, clone the code
 
-Acquire the paper's full text, structure it for downstream consumption, and (when available) clone the reference code repository. The bundle's primary acquisition path is **arXiv LaTeX source via `/managing-bibliography`**; PDF + Docling is the fallback for non-arXiv papers. ACQUIRE folds in what was previously a separate PARSE phase — for arXiv-LaTeX, the structure is already in the tarball (no extra work); for the PDF fallback, ACQUIRE runs Docling itself.
+Acquire the paper's reading materials and (when available) clone the reference code repository. The substrate work — LaTeX-source download, Docling fallback, figures, tables, outline, citations, embedded bibliography, paper-as-ASTRA-artifact — is delegated to **`/paper-extraction`**, which paper2astra trusts blindly. ACQUIRE adds **Step 2: code-clone**, which is reproduction-specific and stays here.
 
 The constitution's per-phase mode controls whether this runs interactively or as a sub-agent. Default is sub-agent — surfacing happens only on download failures.
 
@@ -11,95 +11,35 @@ The constitution's per-phase mode controls whether this runs interactively or as
 
 ## Outputs
 
-Two shapes depending on the acquisition path:
+After Step 1 (`/paper-extraction`):
 
-**Path A — arXiv LaTeX source:**
+- `work/reference/index.json` — structural index (figures, tables, outline, citations with line numbers, paths)
+- `work/reference/astra.yaml` — ASTRA-shape representation of the paper, including the paper's claimed numerical findings as ASTRA `findings:` (when paper-extraction's optional Step 5 is run)
+- `work/reference/paper.pdf` — always
+- `work/reference/paper.tex` + `work/reference/source/` — Path A (arXiv LaTeX)
+- `work/reference/document.md` — Path B (PDF + Docling)
+- `work/reference/figures/` — figure files
+- `work/reference/tables/` — one .tex file per `\begin{table}` block
+- `work/reference/bibliography-source.{bib,bbl}` — Path A only, copied from source tarball when present
 
-- `work/reference/source/` — extracted arXiv tarball (the canonical text source: `.tex`, `.bbl`, figure files, etc.)
-- `work/reference/paper.pdf` — paper PDF (kept as a backup for `astra validate --verify-evidence`)
+After Step 2 (this phase):
 
-**Path B — PDF + Docling fallback:**
-
-- `work/reference/document.md` — paper as markdown (Docling-extracted)
-- `work/reference/figures/` — extracted figures
-- `work/reference/tables/` — extracted tables
-- `work/reference/metadata.json` — figure / table index with captions and page numbers
-- `work/reference/paper.pdf` — paper PDF
-
-**Both paths:**
-
-- `work/reference/code/` — clone of the code repo (or absent if not found)
+- `work/reference/code/` — cloned reference repo (or absent if not found)
 - `work/reference/code-status.yaml` — record of where the code came from
 
-## Step 1: Acquire and structure the paper text
-
-### Path A — arXiv ID is available (preferred)
-
-Invoke `/managing-bibliography`. Use it to download the arXiv LaTeX source tarball:
-
-```bash
-curl -L -o /tmp/<arxiv-id>.tar.gz "https://arxiv.org/src/<arxiv-id>"
-mkdir -p work/reference/source && cd work/reference/source && tar -xzf /tmp/<arxiv-id>.tar.gz
-ls *.tex
-```
-
-The LaTeX source gives clean equations, captions, tables, and bibliography — none of the math collapse, ligature artifacts, or caption flattening that plagues PDF extraction. **No conversion to markdown is needed.** Downstream phases (ARCHITECT's paper-side Explore sub-agent, SPECIFY's evidence quotes) read `.tex` directly — Claude reads LaTeX fine, and rendering it to markdown only loses information. The tarball stays as `work/reference/source/`.
+## Step 1 — Stand up the paper's reading materials
 
-If you want to identify the main `.tex` file for downstream tools:
+Invoke `/paper-extraction <arxiv-id-or-doi>`. The skill is idempotent — it surveys `work/reference/` first and skips work that's already done.
 
-```bash
-grep -l '\\documentclass' work/reference/source/*.tex
 ```
-
-Cache the paper for ASTRA's evidence-verification surface:
-
-```bash
-astra paper add 10.48550/arXiv.<arxiv-id>
-cp "$(astra paper path 10.48550/arXiv.<arxiv-id>)" work/reference/paper.pdf
+/paper-extraction <arxiv-id-or-doi>
 ```
 
-`astra paper add` for arXiv DOIs fetches the PDF directly. The PDF stays as a backup for `astra validate --verify-evidence`, even though the LaTeX source is the primary text.
-
-There is no PARSE step on Path A. Equation numbers, section numbers, figure references — all preserved in the source. ARCHITECT's paper-side Explore sub-agent (and SPECIFY's evidence-quote pass) resolves `\ref{}` against `\label{}` directly in the source tree.
-
-### Path B — non-arXiv paper (PDF + Docling fallback)
-
-```bash
-astra paper add <DOI>
-cp "$(astra paper path <DOI>)" work/reference/paper.pdf
-file work/reference/paper.pdf
-```
-
-The `file` output must say "PDF document". If it says "HTML document" or anything else, the download was blocked (CAPTCHA, paywall). Search the web for an open-access copy (NASA ADS, arXiv, Unpaywall, Semantic Scholar, the journal's open-access link), download with `curl -L -o work/reference/paper.pdf <url>`, re-validate, then `astra paper add <DOI> --pdf work/reference/paper.pdf` to register the resolved file.
-
-If a valid PDF cannot be obtained, write a clear error to `work/reference/acquire-error.txt` and stop.
-
-Then run Docling to structure the PDF — without this, downstream phases have nothing to read but the raw PDF:
-
-```bash
-docling --output work/reference work/reference/paper.pdf
-```
-
-Docling produces `document.md`, `figures/`, `tables/`, and `metadata.json` directly into `work/reference/`. The `metadata.json` index has the shape:
-
-```json
-{
-  "figures": [
-    {"id": "fig1", "caption": "...", "file": "figures/fig1.pdf", "label": "fig:bao"}
-  ],
-  "tables": [
-    {"id": "tab1", "caption": "...", "file": "tables/tab1.csv", "label": "tab:results"}
-  ]
-}
-```
-
-The `label` field is the source label (where Docling can extract it) so SPECIFY's anchor work can reference the same artifact.
-
-If Docling fails, the PDF may be corrupt — re-download before giving up.
+This produces everything under `work/reference/` *except* the code clone. paper2astra ACQUIRE does not re-implement the substrate logic; if something is wrong with the substrate, fix it in `/paper-extraction`, not here.
 
-Skip Step 1 if the path's outputs already exist (`work/reference/source/` for Path A, `work/reference/document.md` for Path B).
+Two starting surfaces: `work/reference/index.json` (structural — figures, tables, outline, citations with line numbers) and `work/reference/astra.yaml` (semantic — the paper as an ASTRA artifact, with `findings:` carrying the paper's central numerical claims as quote-anchored evidence). ARCHITECT reads index.json when its Explore sub-agents fan out across the paper; SPECIFY reads astra.yaml when authoring `prior_insights:` against the paper's claims (the paper's `findings:` map directly to a reproduction's `prior_insights:`).
 
-## Step 2: Search for the code repository
+## Step 2 — Clone the reference code repository
 
 This step matters more than its size suggests. When `work/reference/code/` exists, every implementing iteration treats it as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md). Without it, iterations have only the paper to anchor to and drift toward "looks right" rather than "matches."
 
@@ -125,14 +65,15 @@ Skip Step 2 if `work/reference/code/` already exists.
 
 Run `ls work/reference/` first.
 
-- If `paper.pdf` is present and either `source/` (Path A) or `document.md` (Path B) is also present, ACQUIRE is done — proceed to ARCHITECT.
-- If `paper.pdf` is present but neither structure exists, run the structuring step for the appropriate path.
-- If nothing is there, run the full ACQUIRE.
+- If `paper.pdf` is present, **and** the path indicator (`source/` for Path A or `document.md` for Path B) is present, **and** `index.json` is present → Step 1 is done.
+- If `work/reference/code/` is present (or `code-status.yaml` records `found: false`) → Step 2 is done.
+- When both are done, ACQUIRE is complete; proceed to ARCHITECT.
+- Otherwise, run whichever step is missing. `/paper-extraction` handles its own idempotency for Step 1.
 
 ## Notes
 
-- **arXiv DOI form is `10.48550/arXiv.<id>`.** `astra paper add` accepts that form directly.
-- **Journal DOIs that 403 on Unpaywall** can be aliased to a locally-downloaded arXiv preprint via `astra paper add <JOURNAL_DOI> --pdf <path-to-arxiv-pdf>`.
-- **Path A is preferred whenever arXiv source is acquirable.** Math, ligatures, and caption fidelity all come through clean from the LaTeX source; PDF + Docling is the fallback for non-arXiv where there's no better source. The acquisition layer's ASTRA-side counterpart — `astra paper add` preferring LaTeX over PDF for the verification cache, and applying the same logic to bibliography references — is filed as a separate ASTRA issue; paper2astra inherits the improvement once it lands.
-- **Equation numbers and section numbers must match the rendered paper.** On Path A, the printed numbers come from the rendered tarball (look at the PDF if uncertain). On Path B, Docling preserves printed numbers in its markdown output. When citing "eq. N" or "§N" in any downstream phase, find the equation or heading by content, not by a naïve count of TeX blocks or markdown headings.
-- This phase's job is acquisition + structuring, not understanding. Do not start indexing or comparing the paper here — that's ARCHITECT.
+- **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces that paper-extraction doesn't cover, file it as paper-extraction work — not as ACQUIRE work.
+- **arXiv DOI form is `10.48550/arXiv.<id>`.** Useful when downstream tools want a DOI rather than an arXiv ID.
+- **Equation numbers and section numbers must match the rendered paper.** When citing "eq. N" or "§N" in any downstream phase, find the equation or heading by content, not by a naïve count of TeX blocks or markdown headings. Path A: source preserves printed numbers in `\label{}`s. Path B: Docling preserves printed numbers in its markdown.
+- **This phase is acquisition + code-clone, not understanding.** Do not start indexing or comparing the paper here — that's ARCHITECT.
+- **Code-as-canonical** is loaded by every subsequent phase. The per-paper `CLAUDE.md` restates the rule; ACQUIRE just makes sure `work/reference/code/` exists when possible.

From cd5b83ae84ea40b11f58b4ed519e468010e8d9b5 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Fri, 8 May 2026 19:23:46 +0200
Subject: [PATCH 019/124] retire managing-bibliography; rewire bundle
 cross-refs to paper-extraction

paper-extraction subsumes managing-bibliography's only load-bearing role
(arXiv LaTeX fetching for paper2astra ACQUIRE) and is better-shaped:
structural extraction with multi-file source handling, comment-aware regex
passes, AASTeX/PGF figure recognition, biblatex citation commands,
optional findings extraction with verbatim-quote evidence. ADS BibTeX
management was the other piece, and per ea6ee52 has been deferred to a
future skill rather than folded into paper-extraction.

- Delete claude/lightcone/skills/managing-bibliography/.
- skills/README.md: replace the managing-bibliography row with
  paper-extraction.
- skills/narrative/SKILL.md: bundle list now names paper-extraction.
- top-level CLAUDE.md: bundle ASCII layout updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md                                     |   2 +-
 claude/lightcone/skills/README.md             |   2 +-
 .../skills/managing-bibliography/SKILL.md     | 152 ------------------
 claude/lightcone/skills/narrative/SKILL.md    |   2 +-
 4 files changed, 3 insertions(+), 155 deletions(-)
 delete mode 100644 claude/lightcone/skills/managing-bibliography/SKILL.md

diff --git a/CLAUDE.md b/CLAUDE.md
index 9b695a0d..b7a8e693 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -69,7 +69,7 @@ src/lightcone/              # namespace — NO __init__.py
 claude/lightcone/           # Claude plugin source — force-included into the wheel
 ├── skills/                 # lc-new, lc-build, lc-verify, lc-migrate, lc-feedback;
 │                            # paper-reproduction bundle: paper2astra, narrative,
-│                            # constitution, ralph-loops, managing-bibliography
+│                            # constitution, ralph-loops, paper-extraction
 │                            # (see skills/README.md for the full bundle map)
 ├── agents/                 # lc-extractor
 ├── guides/                 # astra-reference, lightcone-cli-reference, ui-brand
diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 07588636..7219c38c 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -22,7 +22,7 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 | [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. |
 | [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. |
 | [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by paper2astra's bash-loop and tmux-orchestrated runtime modes. |
-| [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. |
+| [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations) plus a stub `astra.yaml` for the paper. Primary acquisition path for paper2astra's ACQUIRE phase. |
 | [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from paper2astra's REVIEW close-out (opt-in); also user-invokable directly. |
 | [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from paper2astra's REVIEW close-out (mandatory); also user-invokable directly. |
 
diff --git a/claude/lightcone/skills/managing-bibliography/SKILL.md b/claude/lightcone/skills/managing-bibliography/SKILL.md
deleted file mode 100644
index 1924693a..00000000
--- a/claude/lightcone/skills/managing-bibliography/SKILL.md
+++ /dev/null
@@ -1,152 +0,0 @@
----
-name: managing-bibliography
-description: >
-  Read arXiv paper source and add BibTeX entries via ADS API. Use for
-  research that requires reading full paper text and managing citations.
-  Also the canonical paper-acquisition path inside the lightcone-cli
-  paper-reproduction bundle: `/paper2astra` calls this during the ACQUIRE
-  phase to fetch arXiv LaTeX source, with PDF + Docling as the non-arXiv
-  fallback. Triggers on: "read paper", "cite", "add to bibliography",
-  "bibtex", "ADS", "arXiv", "find paper", "add citation", or any request
-  to read scientific papers or manage references.
----
-
-Read scientific papers and manage citations. Two capabilities:
-
-1. **Read papers** — Download arXiv LaTeX source to read full text, verify claims, understand methodology
-2. **Cite papers** — Fetch BibTeX from NASA ADS and add to bibliography
-
-**Activation**: Use this skill when you need to:
-- Read a paper's full text (not just abstract)
-- Verify a claim before citing it
-- Add citations to your bibliography
-- Research how other papers phrase similar findings
-- Acquire a paper for the `/paper2astra` reproduction pipeline (ACQUIRE phase)
-
-**Usage pattern**:
-- "Read the KiDS-Legacy paper to see how they report B-mode PTEs"
-- "Add [paper description] to the bibliography"
-- "Find and cite [author name] [year] [topic]"
-
----
-
-## Reading Papers
-
-Download arXiv LaTeX source to read full paper text:
-
-```bash
-# Download source (replace ID as needed)
-curl -L -o /tmp/2503.19441.tar.gz "https://arxiv.org/src/2503.19441"
-
-# Extract
-mkdir -p /tmp/2503.19441 && cd /tmp/2503.19441 && tar -xzf /tmp/2503.19441.tar.gz
-
-# Find the main tex file
-ls *.tex
-```
-
-This gives you:
-- Full paper text (not just abstract)
-- Equations and methodology details
-- How authors phrased specific claims
-- Their bibliography (.bib or .bbl files)
-
-Use when you need to:
-- Verify a claim before citing
-- See exact phrasing in another paper
-- Understand methodology not in abstract
-- Cross-reference their citations
-
----
-
-## ADS API Setup
-
-The ADS API requires an API token. Before using citation features:
-
-1. **Check for token**: The skill reads `$ADS_API_TOKEN` from the environment
-2. **If missing**: Tell the user to create one at https://ui.adsabs.harvard.edu/user/settings/token and set it:
-   ```bash
-   # Add to ~/.zshrc or ~/.bashrc
-   export ADS_API_TOKEN="your-token-here"
-   ```
-3. **Do not proceed** with ADS API calls until the token is available — check with `echo $ADS_API_TOKEN`
-
----
-
-## Citing Papers
-
-When adding a paper to the bibliography:
-
-1. **Web search** for the paper using description + "arxiv"
-   - Look for arXiv ID in format `YYMM.NNNNN`
-   - If multiple results, show options and ask user to select
-
-2. **Query ADS API** to get bibcode using arXiv ID
-   ```bash
-   curl -H "Authorization: Bearer $ADS_API_TOKEN" \
-     'https://api.adsabs.harvard.edu/v1/search/query?q=arXiv:YYMM.NNNNN&fl=bibcode'
-   ```
-
-3. **Fetch BibTeX entry** with abstract from ADS
-   ```bash
-   curl -H "Authorization: Bearer $ADS_API_TOKEN" \
-     'https://api.adsabs.harvard.edu/v1/export/bibtexabs/{bibcode}'
-   ```
-
-4. **Parse BibTeX** to extract author names and year:
-   - Parse `author = {...}` field for last names
-   - Parse `year = YYYY` field for publication year
-   - Generate citation key based on author count:
-     - 1 author: `firstauthor{YY}` (e.g., `asgari17`)
-     - 2 authors: `firstauthor.secondauthor{YY}` (e.g., `schneider.kilbinger12`)
-     - 3+ authors: `firstauthor.etal{YY}` (e.g., `wright.etal25`)
-   - Use only last names, lowercase, final 2 digits of year
-
-5. **Replace citation key** in BibTeX entry
-   - Update the entry key on the first line (before the opening brace)
-   - Keep all other fields unchanged
-
-6. **Append to bibliography** file
-   - Add the modified entry to the project's `.bib` file
-   - Check for duplicate keys first and warn if found
-
-7. **Report success**
-   - Show the user the complete entry that was added
-   - Confirm file location
-
-## Citation Key Generation
-
-**Examples from BibTeX parsing**:
-- `author = {{Wright}, Angus H. and {Stölzner}, Benjamin and ...}` + `year = 2025` → `wright.etal25`
-- `author = {{Schneider}, Peter and {Kilbinger}, Martin}` + `year = 2012` → `schneider.kilbinger12`
-- `author = {{Asgari}, Marika}` + `year = 2017` → `asgari17`
-
-## Error Handling
-
-- **No arXiv ID found**: Ask user to provide it manually or search for the paper directly
-- **Multiple search results**: Show options and ask user to select the correct paper
-- **ADS API fails**: Show error and suggest manual bibcode lookup or entry
-- **Duplicate citation key**: Warn user, show existing entry, offer to replace or rename
-- **Missing bibliography file**: Report error and ask for correct file path
-
-## Key Configuration Points
-
-- **ADS API Token**: Read from `$ADS_API_TOKEN` environment variable
-- **ADS Search endpoint**: `https://api.adsabs.harvard.edu/v1/search/query`
-- **ADS Export endpoint**: `https://api.adsabs.harvard.edu/v1/export/bibtexabs/{bibcode}`
-- **Export format**: Use `bibtexabs` endpoint to include abstracts
-
-## Bibliography File Paths
-
-Adapt to your project structure:
-- `docs/unions_bmodes/unions_bmodes.bib` (example UNIONS project)
-- `references/bibliography.bib` (common alternative)
-- User should specify their bibliography file path
-
-## Notes
-
-- Always use the `bibtexabs` endpoint to include abstract in the entry
-- Parse author list carefully: format is `author = {{LastName}, FirstName and {LastName}, FirstName ...}`
-- Year is straightforward: `year = YYYY`
-- Before appending, verify file exists and has proper BibTeX format
-- Preserve existing entries when appending new ones
diff --git a/claude/lightcone/skills/narrative/SKILL.md b/claude/lightcone/skills/narrative/SKILL.md
index 32524830..441336b6 100644
--- a/claude/lightcone/skills/narrative/SKILL.md
+++ b/claude/lightcone/skills/narrative/SKILL.md
@@ -26,7 +26,7 @@ Per-element prose (what each `Input`, `Output`, `Decision`, `Option`, or `Insigh
 This skill is also part of the lightcone-cli paper-reproduction bundle: the
 `/paper2astra` orchestrator invokes it during the SPECIFY phase to author the
 narrative for the spec it has just crafted. Sibling skills in the bundle —
-`constitution`, `ralph-loops`, `managing-bibliography`,
+`constitution`, `ralph-loops`, `paper-extraction`,
 `check-sentence-by-sentence`, `figure-comparison` — solve adjacent pieces of
 the reproduction story; this skill stands alone and does not need to know
 about them.

From 7bfc649ef8d860a0cc097091bd47f6356017868a Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 01:42:44 +0200
Subject: [PATCH 020/124] lc-new: skill-creator-shaped description (scoped to
 fresh-idea)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Update the description per skill-creator skill guidance: imperative
"Use this skill when..." framing, pushy "even if user doesn't say X
explicitly", concrete trigger examples, exclusion clause for non-entry
use. Validated working in real interactive Claude Code — auto-triggers
cleanly on natural phrases like "I'd like to start working on a new
analysis for cross-correlations with euclid".

Trigger surface narrowed to verbs (`new`, `start`, `scope`) × nouns
(`analysis`, `project`, `question`, `research`). Scoped to fresh-idea
projects (research-question driven); migrate-existing-code and
reproduce-paper intents stay routed to lc-migrate and paper2astra
respectively via their own descriptions.

Body of lc-new is unchanged from origin (Liam #98).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-new/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-new/SKILL.md b/claude/lightcone/skills/lc-new/SKILL.md
index 78073ad9..b4060cde 100644
--- a/claude/lightcone/skills/lc-new/SKILL.md
+++ b/claude/lightcone/skills/lc-new/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: lc-new
-description: Create a new ASTRA analysis project with integrated literature support. Scope the research question through conversation, structure outputs and decisions, search for and extract evidence from scientific papers, and build a complete astra.yaml specification. Use when starting a new analysis, when the user says "new project", "new analysis", or "scope". Triggers on "new", "scope", "research question", "start analysis".
+description: Use this skill whenever the user starts a new ASTRA analysis from a research question — scoping the question, structuring inputs and outputs, identifying decisions through literature, and landing astra.yaml + project CLAUDE.md. Triggers on verbs (`new`, `start`, `scope`) combined with nouns (`analysis`, `project`, `question`, `research`) — e.g. "new analysis", "start project", "scope research question" — even if the user doesn't say "project" explicitly. Don't use this for working inside an existing ASTRA project; this is for fresh scoping only.
 allowed-tools: Read, Write(astra.yaml), Write(universes/*), Write(CLAUDE.md), Edit(astra.yaml), Edit(universes/*), Edit(CLAUDE.md), Glob, Grep, Bash(astra:*), Bash(lc:*), WebSearch, WebFetch, AskUserQuestion, Agent
 ---
 

From 604267d79ec5e9d00f4aee09887f981c03876127 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 02:48:27 +0200
Subject: [PATCH 021/124] rename lc-new -> lc-from-question; lc-migrate ->
 lc-from-code; paper2astra -> lc-from-paper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The /lc-from-* family is parallel by what the user starts from — a
question, code, or a paper. The previous names were asymmetric
(method-named or substrate-named); the new family lines up by
starting condition.

Renames the three skill directories under claude/lightcone/skills/
with git mv so history follows, updates SKILL.md frontmatter, and
threads the new names through:

- sibling SKILL.mds (narrative, figure-comparison,
  check-sentence-by-sentence, paper-extraction's referrer chain)
- the bundle README (claude/lightcone/skills/README.md)
- top-level CLAUDE.md, README.md
- the lc-extractor agent definition
- internal references inside lc-from-paper/references/*
- the Zensical docs site (zensical.toml nav, docs/skills/* renames,
  docs/index.md, docs/user/*, docs/cli/init.md)

Mentions of the historical Paper2ASTRA Python package (the separate
repo / legacy provenance) are preserved verbatim in lc-from-paper's
SKILL.md and references — the external repo's name is unchanged.
---
 CLAUDE.md                                     |  7 +--
 README.md                                     | 12 +++--
 claude/lightcone/agents/lc-extractor.md       |  2 +-
 claude/lightcone/skills/README.md             | 23 ++++-----
 .../check-sentence-by-sentence/SKILL.md       |  8 ++--
 .../skills/figure-comparison/SKILL.md         | 12 ++---
 .../{lc-migrate => lc-from-code}/SKILL.md     |  6 +--
 .../{paper2astra => lc-from-paper}/SKILL.md   | 24 +++++-----
 .../references/acquire.md                     |  4 +-
 .../references/architect.md                   |  0
 .../references/compare.md                     |  0
 .../references/implement.md                   |  0
 .../references/interview.md                   | 10 ++--
 .../references/literature.md                  |  0
 .../references/review.md                      |  0
 .../references/run.md                         |  0
 .../references/specify.md                     |  0
 .../{lc-new => lc-from-question}/SKILL.md     |  4 +-
 claude/lightcone/skills/narrative/SKILL.md    |  2 +-
 docs/cli/init.md                              |  8 ++--
 docs/index.md                                 |  5 +-
 docs/skills/index.md                          | 20 +++++---
 .../skills/{lc-migrate.md => lc-from-code.md} | 14 +++---
 .../skills/{lc-new.md => lc-from-question.md} | 13 ++---
 docs/user/agent-workflow.md                   | 47 ++++++++++++++-----
 docs/user/getting-started.md                  | 11 +++--
 docs/user/glossary.md                         | 16 ++++---
 docs/user/index.md                            | 10 ++--
 docs/user/install.md                          |  4 +-
 docs/user/multiverse.md                       |  8 ++--
 docs/user/troubleshooting.md                  |  4 +-
 docs/user/tutorial.md                         |  4 +-
 zensical.toml                                 |  4 +-
 33 files changed, 163 insertions(+), 119 deletions(-)
 rename claude/lightcone/skills/{lc-migrate => lc-from-code}/SKILL.md (95%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/SKILL.md (93%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/acquire.md (94%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/architect.md (100%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/compare.md (100%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/implement.md (100%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/interview.md (94%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/literature.md (100%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/review.md (100%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/run.md (100%)
 rename claude/lightcone/skills/{paper2astra => lc-from-paper}/references/specify.md (100%)
 rename claude/lightcone/skills/{lc-new => lc-from-question}/SKILL.md (99%)
 rename docs/skills/{lc-migrate.md => lc-from-code.md} (84%)
 rename docs/skills/{lc-new.md => lc-from-question.md} (86%)

diff --git a/CLAUDE.md b/CLAUDE.md
index b7a8e693..7612b47b 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -67,9 +67,10 @@ src/lightcone/              # namespace — NO __init__.py
     ├── harness.py, sandbox.py, graders.py, build.py, report.py, models.py
 
 claude/lightcone/           # Claude plugin source — force-included into the wheel
-├── skills/                 # lc-new, lc-build, lc-verify, lc-migrate, lc-feedback;
-│                            # paper-reproduction bundle: paper2astra, narrative,
-│                            # constitution, ralph-loops, paper-extraction
+├── skills/                 # lc-from-question, lc-from-code, lc-from-paper,
+│                            # lc-build, lc-verify, lc-feedback;
+│                            # paper-reproduction bundle: lc-from-paper (entry),
+│                            # narrative, constitution, ralph-loops, paper-extraction
 │                            # (see skills/README.md for the full bundle map)
 ├── agents/                 # lc-extractor
 ├── guides/                 # astra-reference, lightcone-cli-reference, ui-brand
diff --git a/README.md b/README.md
index b4c0db3d..75a3feae 100644
--- a/README.md
+++ b/README.md
@@ -18,11 +18,13 @@ cd my-analysis
 claude
 ```
 
-Then tell the agent `/lc-new` to scope your research question. After the spec exists, just tell the agent to build it — implementation is a normal Claude Code workflow guided by `.claude/guides/`.
+Then tell the agent `/lc-from-question` to scope your research question. After the spec exists, just tell the agent to build it — implementation is a normal Claude Code workflow guided by `.claude/guides/`.
 
 ## Skills
 
-### `/lc-new` — Scope and specify an analysis
+The `/lc-from-*` family is parallel by what you start from: a question, code, or a paper.
+
+### `/lc-from-question` — Scope and specify an analysis
 
 Guides you from a research question to a complete `astra.yaml` specification through interactive conversation. The agent will:
 
@@ -34,10 +36,14 @@ Guides you from a research question to a complete `astra.yaml` specification thr
 
 You don't write any code or YAML during this phase — the agent produces the full specification.
 
-### `/lc-migrate` — Bring an existing project into ASTRA
+### `/lc-from-code` — Bring an existing project into ASTRA
 
 Scans an existing codebase, drafts an `astra.yaml` that captures its inputs, outputs, and analytical decisions, parameterizes the code so decisions can vary across universes, and runs the analysis through `lc` until every output materializes. Existing logic is left intact — changes are confined to parameter plumbing.
 
+### `/lc-from-paper` — Reproduce a published paper
+
+Interview-first orchestrator for reproducing a published paper in ASTRA. Drafts a per-paper reproduction constitution and `CLAUDE.md`, then drives a multi-session loop through nine phases (ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW). Composes a bundle of sibling skills (paper-extraction, constitution, ralph-loops, narrative, figure-comparison, check-sentence-by-sentence). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
+
 ### `/lc-feedback` — Report a bug
 
 Files a GitHub issue against the right repo (ASTRA or lightcone-cli) with version info and error context auto-collected from your session.
diff --git a/claude/lightcone/agents/lc-extractor.md b/claude/lightcone/agents/lc-extractor.md
index f377a922..b13c3a33 100644
--- a/claude/lightcone/agents/lc-extractor.md
+++ b/claude/lightcone/agents/lc-extractor.md
@@ -1,6 +1,6 @@
 ---
 name: lc-extractor
-description: Extract prior insights from scientific papers for ASTRA analyses. Reads PDFs, identifies claims relevant to target decisions, extracts verbatim quotes, and verifies them. Use for literature extraction during /lc-new.
+description: Extract prior insights from scientific papers for ASTRA analyses. Reads PDFs, identifies claims relevant to target decisions, extracts verbatim quotes, and verifies them. Use for literature extraction during /lc-from-question.
 tools: Read, Bash
 model: sonnet
 ---
diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 7219c38c..53652a32 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -6,10 +6,11 @@ Each subdirectory is one Claude Code skill: `SKILL.md` plus optional `references
 
 | Skill | Role |
 |---|---|
-| `lc-new` | Scaffold a new ASTRA-shaped project from scratch. |
+| `lc-from-question` | Scaffold a new ASTRA-shaped project from a research question. |
+| `lc-from-code` | Bring an existing codebase into ASTRA — scan, spec, parameterize. |
+| `lc-from-paper` | Reproduce a published paper in ASTRA (paper-reproduction bundle entry point — see below). |
 | `lc-build` | Build container images and dependencies for a project. |
 | `lc-verify` | Run validation across an ASTRA project. |
-| `lc-migrate` | Migrate legacy projects to current conventions. |
 | `lc-feedback` | Report bugs and feature requests upstream. |
 
 ## Paper-reproduction bundle
@@ -18,18 +19,18 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. The constitution carries 9 phases — INTERVIEW → ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW — bookended by two always-interactive seams (INTERVIEW at start, REVIEW at close-out); every other phase is configurable per the user's per-phase mode choice, with ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT additionally tuned by a frugality / rigor dial that drives each phase's internal fresh-context self-review. |
-| [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. |
-| [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. |
-| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by paper2astra's bash-loop and tmux-orchestrated runtime modes. |
-| [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations) plus a stub `astra.yaml` for the paper. Primary acquisition path for paper2astra's ACQUIRE phase. |
-| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from paper2astra's REVIEW close-out (opt-in); also user-invokable directly. |
-| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from paper2astra's REVIEW close-out (mandatory); also user-invokable directly. |
+| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. The constitution carries 9 phases — INTERVIEW → ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW — bookended by two always-interactive seams (INTERVIEW at start, REVIEW at close-out); every other phase is configurable per the user's per-phase mode choice, with ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT additionally tuned by a frugality / rigor dial that drives each phase's internal fresh-context self-review. |
+| [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by lc-from-paper during SPECIFY. |
+| [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by lc-from-paper during the interview. |
+| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by lc-from-paper's bash-loop and tmux-orchestrated runtime modes. |
+| [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations) plus a stub `astra.yaml` for the paper. Primary acquisition path for lc-from-paper's ACQUIRE phase. |
+| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from lc-from-paper's REVIEW close-out (opt-in); also user-invokable directly. |
+| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from lc-from-paper's REVIEW close-out (mandatory); also user-invokable directly. |
 
-The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra.
+The full reproduction story spans these seven skills. lc-from-paper's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about lc-from-paper.
 
 ### Why bundle (not depend on plugin install)
 
-- **Testability.** We want to verify paper2astra invokes constitution + ralph-loops + the others correctly. That only works when all are in the same checkout.
+- **Testability.** We want to verify lc-from-paper invokes constitution + ralph-loops + the others correctly. That only works when all are in the same checkout.
 - **Single install path.** `lc init` brings the full toolkit. Adding a separate plugin-marketplace step is friction we don't need.
 - **Future consolidation is open.** The long-run shape may be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all. See [[lightcone/skills-location-policy]].
diff --git a/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
index e7a2ca2b..9a9cd09c 100644
--- a/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
+++ b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
@@ -6,7 +6,7 @@ description: >
   discussion, and appendices, locate the corresponding code (file:line) or
   mark NOT FOUND. Use when the user says "check reproduction", "verify the
   paper line by line", or "sentence-by-sentence audit". Run from the project
-  folder containing astra.yaml. In paper2astra projects, read paper sources
+  folder containing astra.yaml. In lc-from-paper projects, read paper sources
   from work/reference/: prefer arXiv TeX under work/reference/source/, fall
   back to Docling/Pandoc markdown at work/reference/document.md.
 allowed-tools: Read, Glob, Grep, Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), AskUserQuestion, Agent
@@ -20,7 +20,7 @@ Every sentence that asserts an implementation detail or a numerical/empirical
 result is located in the code (`file:line`) or marked NOT FOUND. The agent
 does NOT run any code -- this is a static reading audit.
 
-In paper2astra projects, the paper substrate comes from `work/reference/`.
+In lc-from-paper projects, the paper substrate comes from `work/reference/`.
 Path A is arXiv source at `work/reference/source/`; Path B is the parsed
 markdown fallback at `work/reference/document.md`, produced by Docling or
 Pandoc.
@@ -41,7 +41,7 @@ Pandoc.
    1. If the argument is a `.tex` file, use it in `tex` mode.
    2. If the argument is `work/reference/` or another directory, first look
       for TeX source under `<dir>/source/`, then for `<dir>/document.md`.
-   3. If no argument was supplied, prefer the paper2astra layout:
+   3. If no argument was supplied, prefer the lc-from-paper layout:
       - `work/reference/source/<main>.tex` if TeX source exists. Identify the
         main file with `grep -l '\\documentclass' work/reference/source/*.tex`;
         if exactly one file matches, use it. If multiple files match, ask the
@@ -51,7 +51,7 @@ Pandoc.
         main TeX wrapper.
       - `work/reference/document.md` if there is no TeX source. This is the
         Docling/Pandoc fallback and should be audited in `markdown` mode.
-   4. Only after those paper2astra paths fail, look for an obvious legacy
+   4. Only after those lc-from-paper paths fail, look for an obvious legacy
       `.tex` source in cwd: a top-level `*.tex`, or one inside `paper/`,
       `tex/`, or a similarly named subdirectory. If exactly one obvious
       candidate is found, use it in `tex` mode.
diff --git a/claude/lightcone/skills/figure-comparison/SKILL.md b/claude/lightcone/skills/figure-comparison/SKILL.md
index dfed8dd4..e9638ce2 100644
--- a/claude/lightcone/skills/figure-comparison/SKILL.md
+++ b/claude/lightcone/skills/figure-comparison/SKILL.md
@@ -2,7 +2,7 @@
 name: figure-comparison
 description: >
   Build a self-contained HTML report comparing the figures, tables, and
-  numerical results in paper2astra's `work/reference/` paper substrate
+  numerical results in lc-from-paper's `work/reference/` paper substrate
   against artifacts produced under `results/<universe>/`. When
   `comparison-report.yaml` or `targets/targets.md` exists, use that scoped
   target set first; otherwise fall back to paper-driven inventory from arXiv
@@ -55,22 +55,22 @@ results.
    2. If the argument is an arXiv source directory containing `.tex` files,
       use it as `source_root`, and use its parent `work/reference/` as the
       paper reference root when that parent exists.
-   3. If no argument was supplied, prefer paper2astra's layout:
+   3. If no argument was supplied, prefer lc-from-paper's layout:
       - `work/reference/source/` when arXiv TeX source exists. Use the TeX
         files there for labels/captions and the parsed artifacts under
         `work/reference/{figures,tables,metadata.json}` for renderable
         reference files.
       - `work/reference/document.md` plus
         `work/reference/{figures,tables,metadata.json}` when no TeX source
-        exists. This is the PDF + Docling fallback from paper2astra.
-   4. Only after paper2astra paths fail, look for a legacy unzipped arXiv
+        exists. This is the PDF + Docling fallback from lc-from-paper.
+   4. Only after lc-from-paper paths fail, look for a legacy unzipped arXiv
       dir in cwd: a directory containing both a `*.tex` file and figure
       files (`*.pdf`, `*.png`, `*.eps`). Common names: `paper_source/`,
       `arxiv_source/`, `*_Original_Paper/`.
 
    If no usable reference substrate is found, ask:
 
-   > "Where is the paper reference directory? In a paper2astra project this
+   > "Where is the paper reference directory? In a lc-from-paper project this
    > should usually be `work/reference/`, containing `document.md`,
    > `metadata.json`, and extracted `figures/` / `tables/`."
 
@@ -84,7 +84,7 @@ Read, in this order:
 
 1. **Scoped comparison artifacts, if present.**
    - If `comparison-report.yaml` exists, treat it as the highest-priority
-     scope because it records what paper2astra actually compared. Use its
+     scope because it records what lc-from-paper actually compared. Use its
      `outputs:` entries, including `type`, `priority`, `paper_value`,
      `reproduced_value`, `reference_file`, `reproduced_file`, `match`, and
      `notes` when present.
diff --git a/claude/lightcone/skills/lc-migrate/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
similarity index 95%
rename from claude/lightcone/skills/lc-migrate/SKILL.md
rename to claude/lightcone/skills/lc-from-code/SKILL.md
index d5f42389..eb622671 100644
--- a/claude/lightcone/skills/lc-migrate/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -1,10 +1,10 @@
 ---
-name: lc-migrate
-description: Migrate an existing project into ASTRA / lightcone-cli. Scans code, generates astra.yaml, parameterizes decisions, and runs until outputs materialize. Triggers on "migrate", "convert", "existing project".
+name: lc-from-code
+description: Bring an existing project into ASTRA / lightcone-cli, starting from the code. Scans the codebase, generates astra.yaml, parameterizes decisions, and runs until outputs materialize. Triggers on "migrate", "convert", "existing project", "wrap this code", "start from code".
 allowed-tools: Read, Write, Edit, Glob, Grep, Bash(astra:*), Bash(lc:*), Bash(python:*), Bash(pip:*), Bash(git:*), Bash(mkdir:*), Bash(ls:*), Agent, AskUserQuestion
 ---
 
-# /lc-migrate
+# /lc-from-code
 
 End-to-end migration: scan existing code, generate the ASTRA spec, parameterize decisions in the code, and run until everything materializes. The user's existing logic stays intact — changes should be minimal.
 
diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
similarity index 93%
rename from claude/lightcone/skills/paper2astra/SKILL.md
rename to claude/lightcone/skills/lc-from-paper/SKILL.md
index a9688740..9cecc9fb 100644
--- a/claude/lightcone/skills/paper2astra/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -1,5 +1,5 @@
 ---
-name: paper2astra
+name: lc-from-paper
 description: >
   Reproduce a published scientific paper in ASTRA. Interview the user
   about the paper and the intended scope, draft a per-paper reproduction
@@ -10,14 +10,14 @@ description: >
   it in. Composes sibling skills for each phase: paper-extraction for
   ACQUIRE and narrative for SPECIFY. Use when the user wants to reproduce
   a paper, has a DOI or arXiv ID and wants to start a reproduction project,
-  or asks to "reproduce <paper>", "set up reproduction", "paper2astra",
-  "/paper2astra <doi>", or hands you a published paper as a starting point
+  or asks to "reproduce <paper>", "set up reproduction", "lc-from-paper",
+  "/lc-from-paper <doi>", or hands you a published paper as a starting point
   for ASTRA work.
 ---
 
-# paper2astra
+# lc-from-paper
 
-Reproduce a published paper in ASTRA. The skill is **interview-first**: a short interactive crafting phase up front that produces both a **per-paper reproduction constitution** and a **per-paper `CLAUDE.md`**. After the interview, paper2astra hands the constitution to a multi-session loop that drives the reproduction. Successive iterations survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
+Reproduce a published paper in ASTRA. The skill is **interview-first**: a short interactive crafting phase up front that produces both a **per-paper reproduction constitution** and a **per-paper `CLAUDE.md`**. After the interview, lc-from-paper hands the constitution to a multi-session loop that drives the reproduction. Successive iterations survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
 
 This is a Claude-Code-native skill. There is no Python orchestrator, no state machine, no resume mechanic — the workdir on disk + git history are the substrate.
 
@@ -26,13 +26,13 @@ A reproduction does not fit in one context window. The loop is, in its simplest
 ## When to use this skill
 
 - The user has a paper (DOI, arXiv ID, or PDF) and wants to reproduce its analysis
-- The user invokes `/paper2astra` (with or without an argument)
+- The user invokes `/lc-from-paper` (with or without an argument)
 - The user is starting a fresh reproduction project under `Reproductions/<collab>/<short-name>/`
 - An existing paper-reproduction workdir needs the next phase driven forward (in which case skip the interview, see "Resuming an in-flight reproduction" below)
 
 ## The bundle
 
-paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. All siblings live in the same `claude/lightcone/skills/` directory and are available without separate installs:
+lc-from-paper composes the rest of the lightcone-cli paper-reproduction bundle. All siblings live in the same `claude/lightcone/skills/` directory and are available without separate installs:
 
 | Sibling skill | Where it's invoked |
 |---|---|
@@ -41,7 +41,7 @@ paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. Al
 | [`/ralph-loops`](../ralph-loops/SKILL.md) | After interview — launches the loop that drives all subsequent phases (when the chosen runtime mode is one of the loop modes) |
 | [`/narrative`](../narrative/SKILL.md) | SPECIFY — authoring the `narrative:` and `rationale:` prose in `astra.yaml` |
 
-paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them. The siblings stand alone; they don't know about paper2astra.
+lc-from-paper does not re-implement what these skills already do — it tells the agent at each phase to invoke them. The siblings stand alone; they don't know about lc-from-paper.
 
 Two further siblings are invoked from **REVIEW** (the close-out), the always-interactive phase that runs after the COMPARE → IMPLEMENT loop terminates: [`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable side-by-side HTML report (paper artifacts vs reproduced), and [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) (optional) audits paper claims against code locations. Both have `AskUserQuestion` in their `allowed-tools`, so REVIEW runs interactively in the main loop session — spawning them under the `Task` tool would fire prompts into nothing.
 
@@ -67,7 +67,7 @@ The interview has six jobs:
 
    CLAUDE.md tells you *what kind of place this is*; the constitution tells you *what we're doing here and when we're done*.
 
-Both files live inside the reproduction's directory. After they are approved the interview ends, and paper2astra launches whichever runtime the user chose.
+Both files live inside the reproduction's directory. After they are approved the interview ends, and lc-from-paper launches whichever runtime the user chose.
 
 ### Runtime modes
 
@@ -77,7 +77,7 @@ The interview asks the user to pick *how* the loop runs. Three modes, picked fro
 |---|---|---|
 | **(1) Interactive** | No autonomous loop. The user prompts through phases by hand from the same Claude session, one or two phases at a time. | Tight control, small paper, or token budget is tight. No new substrate beyond Claude itself. |
 | **(2) Bash-loop** | A plain shell loop the user pastes into a terminal (`while …; do claude --dangerously-skip-permissions … ; done`-shaped). No tmux dependency. | Tmux isn't available locally and the connection is stable. Fragile across SSH disconnects unless wrapped in `nohup` — and `nohup` blocks interaction, so for unstable connections this isn't really a fix; mode (3) is. |
-| **(3) Tmux-orchestrated** | A loop inside a tmux session paper2astra drives directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the tmux pane, monitors, intervenes. | The smoothest path whenever tmux is available. Becomes the de-facto default once `lc launch claude` ships its registry-shipped python-slim agent container with tmux pre-installed. |
+| **(3) Tmux-orchestrated** | A loop inside a tmux session lc-from-paper drives directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the tmux pane, monitors, intervenes. | The smoothest path whenever tmux is available. Becomes the de-facto default once `lc launch claude` ships its registry-shipped python-slim agent container with tmux pre-installed. |
 
 The interview probes for tmux availability with `command -v tmux` and only offers mode (3) when present. Mode (3) is preferred when it's available; it isn't required.
 
@@ -205,7 +205,7 @@ Workdir signals (file existence implies the phase has been done):
 
 ## Discipline
 
-- **paper2astra is the workflow story; phase references are the depth.** SKILL.md tells you when to read which reference; the references carry the prompt prose ported from the legacy Paper2ASTRA Python package.
+- **lc-from-paper is the workflow story; phase references are the depth.** SKILL.md tells you when to read which reference; the references carry the prompt prose ported from the legacy Paper2ASTRA Python package.
 - **Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each iteration's first move is *survey*.
 - **Deterministic checks live in scripts.** When the answer is yes/no, call the script — `astra validate`, `git log`, `yq`, `ls`. Don't ask the agent to introspect what a deterministic check would tell you.
 - **Use the up-to-date CLI surfaces, not skill-specific wrappers.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces.
@@ -217,7 +217,7 @@ Workdir signals (file existence implies the phase has been done):
 - **ARCHITECT decides structure; SPECIFY decides content.** ARCHITECT's two parallel Explore sub-agents (paper-side + code-side) feed a synthesis sub-agent that writes the stub `astra.yaml` — sub-analyses, inputs, outputs, narrative prose. SPECIFY's per-sub-analysis paper pass + code pass + self-review fills in `decisions:`, `prior_insights:`, `findings:` and weaves anchor references into the narrative. Splitting **structure** from **content** keeps each phase's cognitive load bounded.
 - **No synthetic data.** Unless the paper itself uses synthetic data as its input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement phase reference repeats this; treat it as load-bearing.
 - **Tmux preferred-when-available, never required.** Modes (1) and (2) work without it.
-- **The siblings don't know about paper2astra.** Each SKILL stands on its own.
+- **The siblings don't know about lc-from-paper.** Each SKILL stands on its own.
 - **Workdir conventions stay.** The phase references preserve Paper2ASTRA's workdir layout (`work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`) so workdirs from the legacy Paper2ASTRA package are interoperable with workdirs driven by this skill.
 
 ## Anti-patterns
diff --git a/claude/lightcone/skills/paper2astra/references/acquire.md b/claude/lightcone/skills/lc-from-paper/references/acquire.md
similarity index 94%
rename from claude/lightcone/skills/paper2astra/references/acquire.md
rename to claude/lightcone/skills/lc-from-paper/references/acquire.md
index 89bf6b2e..6828dc5d 100644
--- a/claude/lightcone/skills/paper2astra/references/acquire.md
+++ b/claude/lightcone/skills/lc-from-paper/references/acquire.md
@@ -1,6 +1,6 @@
 # ACQUIRE — fetch the paper, structure it, clone the code
 
-Acquire the paper's reading materials and (when available) clone the reference code repository. The substrate work — LaTeX-source download, Docling fallback, figures, tables, outline, citations, embedded bibliography, paper-as-ASTRA-artifact — is delegated to **`/paper-extraction`**, which paper2astra trusts blindly. ACQUIRE adds **Step 2: code-clone**, which is reproduction-specific and stays here.
+Acquire the paper's reading materials and (when available) clone the reference code repository. The substrate work — LaTeX-source download, Docling fallback, figures, tables, outline, citations, embedded bibliography, paper-as-ASTRA-artifact — is delegated to **`/paper-extraction`**, which lc-from-paper trusts blindly. ACQUIRE adds **Step 2: code-clone**, which is reproduction-specific and stays here.
 
 The constitution's per-phase mode controls whether this runs interactively or as a sub-agent. Default is sub-agent — surfacing happens only on download failures.
 
@@ -35,7 +35,7 @@ Invoke `/paper-extraction <arxiv-id-or-doi>`. The skill is idempotent — it sur
 /paper-extraction <arxiv-id-or-doi>
 ```
 
-This produces everything under `work/reference/` *except* the code clone. paper2astra ACQUIRE does not re-implement the substrate logic; if something is wrong with the substrate, fix it in `/paper-extraction`, not here.
+This produces everything under `work/reference/` *except* the code clone. lc-from-paper ACQUIRE does not re-implement the substrate logic; if something is wrong with the substrate, fix it in `/paper-extraction`, not here.
 
 Two starting surfaces: `work/reference/index.json` (structural — figures, tables, outline, citations with line numbers) and `work/reference/astra.yaml` (semantic — the paper as an ASTRA artifact, with `findings:` carrying the paper's central numerical claims as quote-anchored evidence). ARCHITECT reads index.json when its Explore sub-agents fan out across the paper; SPECIFY reads astra.yaml when authoring `prior_insights:` against the paper's claims (the paper's `findings:` map directly to a reproduction's `prior_insights:`).
 
diff --git a/claude/lightcone/skills/paper2astra/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
similarity index 100%
rename from claude/lightcone/skills/paper2astra/references/architect.md
rename to claude/lightcone/skills/lc-from-paper/references/architect.md
diff --git a/claude/lightcone/skills/paper2astra/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
similarity index 100%
rename from claude/lightcone/skills/paper2astra/references/compare.md
rename to claude/lightcone/skills/lc-from-paper/references/compare.md
diff --git a/claude/lightcone/skills/paper2astra/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
similarity index 100%
rename from claude/lightcone/skills/paper2astra/references/implement.md
rename to claude/lightcone/skills/lc-from-paper/references/implement.md
diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
similarity index 94%
rename from claude/lightcone/skills/paper2astra/references/interview.md
rename to claude/lightcone/skills/lc-from-paper/references/interview.md
index 7afb7a59..d29f839b 100644
--- a/claude/lightcone/skills/paper2astra/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -1,6 +1,6 @@
 # Interview — drafting the per-paper reproduction constitution and CLAUDE.md
 
-The interview is the only phase paper2astra runs interactively. It happens once per project, up front, before any loop is launched. Its job is to crystallize what the user actually wants — which paper, what scope, which runtime, which seams want their attention, which they want delegated — and bake that into the artifacts every iteration walks up to.
+The interview is the only phase lc-from-paper runs interactively. It happens once per project, up front, before any loop is launched. Its job is to crystallize what the user actually wants — which paper, what scope, which runtime, which seams want their attention, which they want delegated — and bake that into the artifacts every iteration walks up to.
 
 Use the [`/constitution`](../../constitution/SKILL.md) skill to draft the constitution. The interview's job is to *gather* the inputs both the constitution and the per-paper `CLAUDE.md` need; the constitution skill carries the discipline of writing the constitution.
 
@@ -13,13 +13,13 @@ The interview produces a **directory for the reproduction** containing two markd
 - **`<paper-slug>/CLAUDE.md`** — *info and rules.* Paper identity (DOI / arxiv id / authors / one-line subject), where the original code lives (`work/reference/code/`), the canonical-resolution rule (code-as-canonical when `work/reference/code/` exists), the never-block-on-`AskUserQuestion`-mid-sub-agent rule, any paper-specific conventions or warnings, pointers to the constitution and `open-questions.md`. Auto-loaded by Claude Code on every walk-up to this directory. **Evolves over time** — iterations that learn new conventions or surface paper-specific gotchas can add lines so future sessions don't re-derive the same context.
 - **`<paper-slug>/<constitution>.md`** — *desired state.* Pointers (not snapshots) for the runner: what "done" looks like, evidence checks, scope fence, the runtime mode the user chose, the termination criterion (weak/strong), the per-phase mode table, and the open-questions section iterations resolve. Read by the runner each iteration as the explicit task.
 
-Both are written at the end of the interview from the same conversation. CLAUDE.md tells you *what kind of place this is*; the constitution tells you *what we're doing here and when we're done*. After they are approved, paper2astra launches whichever runtime the user chose:
+Both are written at the end of the interview from the same conversation. CLAUDE.md tells you *what kind of place this is*; the constitution tells you *what we're doing here and when we're done*. After they are approved, lc-from-paper launches whichever runtime the user chose:
 
 | Runtime | Launch |
 |---|---|
 | **(1) Interactive** | No launch. The user prompts through phases by hand from this Claude session. |
 | **(2) Bash-loop** | Show the user the loop snippet to paste into a terminal — `while …; do claude --dangerously-skip-permissions … ; done`-shaped. |
-| **(3) Tmux-orchestrated** | `../ralph-loops/scripts/ralph <constitution>.md` — paper2astra drives the tmux session directly. |
+| **(3) Tmux-orchestrated** | `../ralph-loops/scripts/ralph <constitution>.md` — lc-from-paper drives the tmux session directly. |
 
 There is no separate "interview state" file. Everything lives in the two artifacts and the workdir.
 
@@ -29,7 +29,7 @@ There is no separate "interview state" file. Everything lives in the two artifac
 
 ### 1. Identify the paper
 
-Use `AskUserQuestion` if the user did not supply enough on `/paper2astra` invocation:
+Use `AskUserQuestion` if the user did not supply enough on `/lc-from-paper` invocation:
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
 - **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) **If code is available, every implementing iteration will read from `work/reference/code/`** and treat code as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md).
@@ -60,7 +60,7 @@ Offer the modes the environment supports:
 
 - **(1) Interactive** — no autonomous loop; the user prompts through phases by hand from this Claude session. Right when control is tight, the paper is small, or the token budget is constrained.
 - **(2) Bash-loop** — a plain shell loop the user pastes into a terminal. No tmux dependency. Right when tmux isn't available *and* the connection is stable. Fragile across SSH disconnects unless wrapped in `nohup`, and `nohup` blocks interaction — so for unstable connections, mode (3) is the answer, not this.
-- **(3) Tmux-orchestrated** — paper2astra drives a tmux session directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the pane, monitors, intervenes. Preferred when tmux is available.
+- **(3) Tmux-orchestrated** — lc-from-paper drives a tmux session directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the pane, monitors, intervenes. Preferred when tmux is available.
 
 If tmux isn't installed, only (1) and (2) appear in the question. The chosen mode goes into the per-paper constitution.
 
diff --git a/claude/lightcone/skills/paper2astra/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
similarity index 100%
rename from claude/lightcone/skills/paper2astra/references/literature.md
rename to claude/lightcone/skills/lc-from-paper/references/literature.md
diff --git a/claude/lightcone/skills/paper2astra/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
similarity index 100%
rename from claude/lightcone/skills/paper2astra/references/review.md
rename to claude/lightcone/skills/lc-from-paper/references/review.md
diff --git a/claude/lightcone/skills/paper2astra/references/run.md b/claude/lightcone/skills/lc-from-paper/references/run.md
similarity index 100%
rename from claude/lightcone/skills/paper2astra/references/run.md
rename to claude/lightcone/skills/lc-from-paper/references/run.md
diff --git a/claude/lightcone/skills/paper2astra/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
similarity index 100%
rename from claude/lightcone/skills/paper2astra/references/specify.md
rename to claude/lightcone/skills/lc-from-paper/references/specify.md
diff --git a/claude/lightcone/skills/lc-new/SKILL.md b/claude/lightcone/skills/lc-from-question/SKILL.md
similarity index 99%
rename from claude/lightcone/skills/lc-new/SKILL.md
rename to claude/lightcone/skills/lc-from-question/SKILL.md
index b4060cde..50fcaa3c 100644
--- a/claude/lightcone/skills/lc-new/SKILL.md
+++ b/claude/lightcone/skills/lc-from-question/SKILL.md
@@ -1,10 +1,10 @@
 ---
-name: lc-new
+name: lc-from-question
 description: Use this skill whenever the user starts a new ASTRA analysis from a research question — scoping the question, structuring inputs and outputs, identifying decisions through literature, and landing astra.yaml + project CLAUDE.md. Triggers on verbs (`new`, `start`, `scope`) combined with nouns (`analysis`, `project`, `question`, `research`) — e.g. "new analysis", "start project", "scope research question" — even if the user doesn't say "project" explicitly. Don't use this for working inside an existing ASTRA project; this is for fresh scoping only.
 allowed-tools: Read, Write(astra.yaml), Write(universes/*), Write(CLAUDE.md), Edit(astra.yaml), Edit(universes/*), Edit(CLAUDE.md), Glob, Grep, Bash(astra:*), Bash(lc:*), WebSearch, WebFetch, AskUserQuestion, Agent
 ---
 
-# /lc-new
+# /lc-from-question
 
 Create a new ASTRA analysis project through conversation. Build the spec iteratively -- write to `astra.yaml` after each phase so the user sees progress. Literature search and decision identification happen in distinct phases -- talk first, then extract papers, then identify decisions informed by both conversation and literature.
 
diff --git a/claude/lightcone/skills/narrative/SKILL.md b/claude/lightcone/skills/narrative/SKILL.md
index 441336b6..456d8923 100644
--- a/claude/lightcone/skills/narrative/SKILL.md
+++ b/claude/lightcone/skills/narrative/SKILL.md
@@ -24,7 +24,7 @@ Per-element prose (what each `Input`, `Output`, `Decision`, `Option`, or `Insigh
 `narrative` is the analysis-level story that weaves the pieces together.
 
 This skill is also part of the lightcone-cli paper-reproduction bundle: the
-`/paper2astra` orchestrator invokes it during the SPECIFY phase to author the
+`/lc-from-paper` orchestrator invokes it during the SPECIFY phase to author the
 narrative for the spec it has just crafted. Sibling skills in the bundle —
 `constitution`, `ralph-loops`, `paper-extraction`,
 `check-sentence-by-sentence`, `figure-comparison` — solve adjacent pieces of
diff --git a/docs/cli/init.md b/docs/cli/init.md
index 98f5abf9..aee28326 100644
--- a/docs/cli/init.md
+++ b/docs/cli/init.md
@@ -41,7 +41,7 @@ universes/                    # placeholder; populate via `astra universe genera
 > The historical `--target`, `--existing-project`, and `--sub-analysis`
 > flags have been removed; today's `lc init` only knows the three flags
 > above. For migrating an existing project, run `lc init` in a fresh
-> directory and use the `/lc-migrate` skill from inside Claude Code.
+> directory and use the `/lc-from-code` skill from inside Claude Code.
 
 ## Permission tiers
 
@@ -70,7 +70,7 @@ lc init . --permissions yolo           # for autonomous loops you trust
 cd my-analysis
 claude           # open Claude Code
 # Inside Claude Code:
-/lc-new          # scope a research question into astra.yaml
-/lc-build        # implement and run it
-/lc-verify       # audit the result
+/lc-from-question  # scope a research question into astra.yaml
+/lc-build          # implement and run it
+/lc-verify         # audit the result
 ```
diff --git a/docs/index.md b/docs/index.md
index 56144227..b25d2e0e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -56,7 +56,8 @@ src/lightcone/                  # PEP 420 namespace package — NO __init__.py
 src/snakemake_executor_plugin_dask/   # Snakemake executor → dask.distributed
 
 claude/lightcone/               # Claude Code plugin (force-included into the wheel)
-├── skills/                     # lc-new, lc-build, lc-verify, lc-migrate, lc-feedback
+├── skills/                     # lc-from-question, lc-from-code, lc-from-paper,
+│                                # lc-build, lc-verify, lc-feedback (+ bundle siblings)
 ├── agents/                     # lc-extractor (literature subagent)
 ├── guides/                     # astra-reference, lightcone-cli-reference, ui-brand
 ├── templates/                  # project CLAUDE.md template
@@ -129,5 +130,5 @@ just docs-serve     # live docs preview
 - [Architecture](architecture.md) — the full execution and integrity story
 - [CLI Reference](cli/index.md) — every command currently shipped
 - [Python API](api/index.md) — the engine modules
-- [Skills](skills/index.md) — what each `/lc-*` skill is supposed to do
+- [Skills](skills/index.md) — what each `/lc-*` skill does (including the `/lc-from-*` family)
 - [Contributing](contributing/setup.md) — getting the dev loop running
diff --git a/docs/skills/index.md b/docs/skills/index.md
index 5100013b..1270a2fc 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -10,12 +10,16 @@ guide is the friendly version. This page is for maintainers.
 
 ## Available skills
 
+The `/lc-from-*` family is parallel by what you start from: a question,
+code, or a paper.
+
 | Skill | Command | Purpose |
 |-------|---------|---------|
-| [lc-new](lc-new.md) | `/lc-new` | Scope a research question into an `astra.yaml`, with optional literature extraction. |
+| [lc-from-question](lc-from-question.md) | `/lc-from-question` | Scope a research question into an `astra.yaml`, with optional literature extraction. |
+| [lc-from-code](lc-from-code.md) | `/lc-from-code` | Wrap an existing codebase in ASTRA: scan, generate spec, parameterize, run. |
+| lc-from-paper | `/lc-from-paper` | Reproduce a published paper in ASTRA — interview-first orchestrator, multi-session loop. (See the paper-reproduction bundle in [`claude/lightcone/skills/README.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md) for the full bundle map.) |
 | [lc-build](lc-build.md) | `/lc-build` | Plan + autonomous loop until all outputs in a universe materialize. |
 | [lc-verify](lc-verify.md) | `/lc-verify` | Read-only audit: spec validity, materialization status, decision-code alignment, result file shapes. |
-| [lc-migrate](lc-migrate.md) | `/lc-migrate` | Wrap an existing codebase in ASTRA: scan, generate spec, parameterize, run. |
 | [lc-feedback](lc-feedback.md) | `/lc-feedback` | File a GitHub issue against the right Lightcone repo with auto-collected context. |
 
 ## How a skill is wired
@@ -44,11 +48,13 @@ files, anti-patterns. The skill bundles its own helper scripts under
 ```
 claude/lightcone/
 ├── skills/
-│   ├── lc-new/SKILL.md
+│   ├── lc-from-question/SKILL.md
+│   ├── lc-from-code/SKILL.md
+│   ├── lc-from-paper/{SKILL.md, references/*.md}
 │   ├── lc-build/{SKILL.md, assets/loop-prompt.md, scripts/setup-lc-build.sh}
 │   ├── lc-verify/SKILL.md
-│   ├── lc-migrate/SKILL.md
-│   └── lc-feedback/SKILL.md
+│   ├── lc-feedback/SKILL.md
+│   └── …                              # paper-reproduction bundle siblings
 ├── agents/lc-extractor.md             # subagent definition
 ├── guides/                            # reference docs loaded by skills
 ├── templates/CLAUDE.md                # the project CLAUDE.md template
@@ -63,10 +69,10 @@ The plugin is force-included into the wheel via
 
 | File | Purpose |
 |------|---------|
-| `claude/lightcone/guides/astra-reference.md` | Full `astra.yaml` schema reference. Loaded by `lc-new`, `lc-build`, `lc-migrate`. |
+| `claude/lightcone/guides/astra-reference.md` | Full `astra.yaml` schema reference. Loaded by `lc-from-question`, `lc-build`, `lc-from-code`. |
 | `claude/lightcone/guides/lightcone-cli-reference.md` | CLI commands, status interpretation, failure diagnosis. Loaded by build/verify skills. |
 | `claude/lightcone/guides/ui-brand.md` | Visual formatting conventions for skill output. |
-| `claude/lightcone/agents/lc-extractor.md` | Literature extraction subagent invoked by `/lc-new`. |
+| `claude/lightcone/agents/lc-extractor.md` | Literature extraction subagent invoked by `/lc-from-question`. |
 
 ## Authoring a new skill
 
diff --git a/docs/skills/lc-migrate.md b/docs/skills/lc-from-code.md
similarity index 84%
rename from docs/skills/lc-migrate.md
rename to docs/skills/lc-from-code.md
index 3ba5f9f2..0d8281d4 100644
--- a/docs/skills/lc-migrate.md
+++ b/docs/skills/lc-from-code.md
@@ -1,11 +1,11 @@
-# /lc-migrate
+# /lc-from-code
 
-Migrate an existing project into ASTRA / lightcone-cli. Scans the
-code, generates `astra.yaml`, parameterizes hardcoded analytical
-choices, and runs until outputs materialize. Existing logic stays
-intact — changes should be minimal.
+Bring an existing project into ASTRA / lightcone-cli, starting from the
+code. Scans the codebase, generates `astra.yaml`, parameterizes hardcoded
+analytical choices, and runs until outputs materialize. Existing logic
+stays intact — changes should be minimal.
 
-Source: [`claude/lightcone/skills/lc-migrate/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-migrate/SKILL.md).
+Source: [`claude/lightcone/skills/lc-from-code/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-from-code/SKILL.md).
 
 ## Allowed tools
 
@@ -80,6 +80,6 @@ present the summary.
 
 ## Related
 
-- [`/lc-new`](lc-new.md) — for greenfield analyses.
+- [`/lc-from-question`](lc-from-question.md) — for greenfield analyses.
 - [`/lc-verify`](lc-verify.md) — run after migration to confirm
   spec-code-results alignment.
diff --git a/docs/skills/lc-new.md b/docs/skills/lc-from-question.md
similarity index 86%
rename from docs/skills/lc-new.md
rename to docs/skills/lc-from-question.md
index a86bcf4a..144f8fa6 100644
--- a/docs/skills/lc-new.md
+++ b/docs/skills/lc-from-question.md
@@ -1,10 +1,10 @@
-# /lc-new
+# /lc-from-question
 
-Scope a new ASTRA analysis through conversation. Produces a complete
-`astra.yaml` (and optionally a literature evidence trail) with no code
-written.
+Scope a new ASTRA analysis from a research question through conversation.
+Produces a complete `astra.yaml` (and optionally a literature evidence
+trail) with no code written.
 
-Source: [`claude/lightcone/skills/lc-new/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-new/SKILL.md).
+Source: [`claude/lightcone/skills/lc-from-question/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-from-question/SKILL.md).
 
 ## Allowed tools
 
@@ -42,6 +42,7 @@ at the end so the user has something visible to review at every step.
 
 ## Hard restrictions (from the SKILL.md)
 
+
 - Specification agent only — cannot write Python, R, or other
   implementation code.
 - Files it may touch: `astra.yaml`, `universes/*.yaml`, `CLAUDE.md`
@@ -62,6 +63,6 @@ at the end so the user has something visible to review at every step.
 
 ## Related
 
-- [`/lc-build`](lc-build.md) — the next step after `/lc-new`.
+- [`/lc-build`](lc-build.md) — the next step after `/lc-from-question`.
 - [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md) — `astra.yaml` schema, decision criteria, prior insights / findings, universe management.
 - [`claude/lightcone/agents/lc-extractor.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/agents/lc-extractor.md) — the literature extraction subagent definition.
diff --git a/docs/user/agent-workflow.md b/docs/user/agent-workflow.md
index 502194cc..33ad6ff2 100644
--- a/docs/user/agent-workflow.md
+++ b/docs/user/agent-workflow.md
@@ -1,7 +1,9 @@
 # The Agent Workflow
 
-The agentic surface is five slash commands. Each one is a structured
-prompt — the agent follows a specific phased flow, not free-form chat.
+The agentic surface is six slash commands. The `/lc-from-*` family is
+parallel by what you start from — a question, code, or a paper — and
+the build/verify/feedback skills follow. Each one is a structured
+prompt: the agent follows a specific phased flow, not free-form chat.
 This page walks through each of them in the order you'd naturally hit
 them.
 
@@ -9,7 +11,7 @@ them.
 > writes to disk. You stay in charge of approving everything; the agent
 > never publishes a paper for you.
 
-## `/lc-new` — scope a new analysis
+## `/lc-from-question` — scope a new analysis
 
 **You start with a research question. You end with a complete
 `astra.yaml` (and optionally a literature evidence trail).**
@@ -37,9 +39,9 @@ The skill walks you through four phases:
    universe; the `## Working Notes` section of `CLAUDE.md` gets the
    conversational context that wouldn't otherwise survive a `/clear`.
 
-You don't write any code or YAML during `/lc-new`. By the time it
-finishes, you have a precise specification. The agent enforces this:
-the skill is *only allowed* to edit `astra.yaml`, files in
+You don't write any code or YAML during `/lc-from-question`. By the
+time it finishes, you have a precise specification. The agent enforces
+this: the skill is *only allowed* to edit `astra.yaml`, files in
 `universes/`, and `CLAUDE.md`.
 
 ## `/lc-build` — implement and run
@@ -84,14 +86,14 @@ The skill never modifies anything. If it finds a discrepancy, it
 suggests concrete fixes; you re-run `/lc-build` (or fix by hand) and
 re-verify.
 
-## `/lc-migrate` — wrap existing code
+## `/lc-from-code` — wrap existing code
 
 **You have a folder of scripts. You end with an ASTRA project around
 them.**
 
 When you have an existing analysis (a notebook, a folder of `.py`
-files, a config-driven pipeline), `/lc-migrate` does the wrapping for
-you. Three phases:
+files, a config-driven pipeline), `/lc-from-code` does the wrapping
+for you. Three phases:
 
 1. **Scan.** A subagent reads every script and notebook and returns a
    structured inventory: what each script reads, writes, and contains
@@ -104,9 +106,30 @@ you. Three phases:
    identified decisions, leaves the actual analytical logic alone, and
    iterates on `lc run` until everything materializes.
 
-The hard rule of `/lc-migrate` is **minimal changes**: the skill never
-refactors, renames, or "improves" your code. It only adds the parameter
-plumbing.
+The hard rule of `/lc-from-code` is **minimal changes**: the skill
+never refactors, renames, or "improves" your code. It only adds the
+parameter plumbing.
+
+## `/lc-from-paper` — reproduce a published paper
+
+**You have a DOI or arXiv ID. You end with a reproduction project
+driven by a multi-session loop.**
+
+`/lc-from-paper` is the entry point of the paper-reproduction bundle.
+It opens with a short interactive interview — paper identity, scope
+(full vs targeted), runtime mode (interactive, bash-loop, or
+tmux-orchestrated), termination criterion (frugality vs rigor), and
+per-phase mode — then drafts a per-paper reproduction constitution and
+a per-paper `CLAUDE.md`. After approval, the loop drives nine phases
+(ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN →
+COMPARE → REVIEW), bookended by INTERVIEW and REVIEW as the
+always-interactive seams.
+
+The bundle composes sibling skills: `paper-extraction`, `constitution`,
+`ralph-loops`, `narrative`, `figure-comparison`, and
+`check-sentence-by-sentence`. See
+[`claude/lightcone/skills/README.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md)
+for the full bundle map.
 
 ## `/lc-feedback` — file an issue without context-switching
 
diff --git a/docs/user/getting-started.md b/docs/user/getting-started.md
index 3f022e0b..dde6fdfd 100644
--- a/docs/user/getting-started.md
+++ b/docs/user/getting-started.md
@@ -57,16 +57,19 @@ claude
 That opens an interactive session inside `my-analysis/`. Claude Code
 reads `astra.yaml` and `CLAUDE.md` so it has context.
 
-## 4. The five slash commands
+## 4. The slash commands
 
-Inside Claude Code:
+Inside Claude Code. The `/lc-from-*` family is parallel by what you
+start from — a question, code, or a paper — and the build/verify/feedback
+skills follow.
 
 | Command | Use it when… |
 |---------|--------------|
-| `/lc-new` | You're starting from a research question and an empty `astra.yaml`. |
+| `/lc-from-question` | You're starting from a research question and an empty `astra.yaml`. |
+| `/lc-from-code` | You have an existing codebase you want wrapped in ASTRA. |
+| `/lc-from-paper` | You have a published paper (DOI / arXiv ID) you want to reproduce. |
 | `/lc-build` | You have a scoped `astra.yaml` and you want the analysis implemented and run. |
 | `/lc-verify` | You finished a build and want a read-only audit. |
-| `/lc-migrate` | You have an existing codebase you want wrapped in ASTRA. |
 | `/lc-feedback` | Something broke and you want to file a GitHub issue without leaving the session. |
 
 The next page, [The Claude Code Workflow](claude-workflow.md),
diff --git a/docs/user/glossary.md b/docs/user/glossary.md
index be072f4c..f2e94eec 100644
--- a/docs/user/glossary.md
+++ b/docs/user/glossary.md
@@ -139,18 +139,20 @@ node launched via `srun`.
 
 ## Skill
 
-A Claude Code slash command bundled with the lightcone-cli plugin
-(`/lc-new`, `/lc-build`, `/lc-verify`, `/lc-migrate`,
-`/lc-feedback`). Each one is a structured prompt that drives the
-agent through a specific phased workflow.
+A Claude Code slash command bundled with the lightcone-cli plugin.
+The `/lc-from-*` family is parallel by what you start from — a question
+(`/lc-from-question`), code (`/lc-from-code`), or a paper
+(`/lc-from-paper`). The build/verify/feedback skills (`/lc-build`,
+`/lc-verify`, `/lc-feedback`) follow. Each one is a structured prompt
+that drives the agent through a specific phased workflow.
 
 ## Subagent
 
 A Claude Code agent invoked by another agent via the `Task` tool. The
 `lc-extractor` subagent reads PDFs and pulls verifiable quotes; it's
-spawned by `/lc-new` during the literature deep-dive phase. Subagents
-have isolated context, which is why `/lc-new` uses one per paper —
-PDFs are big.
+spawned by `/lc-from-question` during the literature deep-dive phase.
+Subagents have isolated context, which is why `/lc-from-question` uses
+one per paper — PDFs are big.
 
 ## Prior insight
 
diff --git a/docs/user/index.md b/docs/user/index.md
index c96a3316..af21edaa 100644
--- a/docs/user/index.md
+++ b/docs/user/index.md
@@ -15,9 +15,9 @@ implementation; you stay in charge of the scientific choices.
   machine.
 - [Getting Started](getting-started.md) — your first `lc init` and
   what every directory means.
-- [The Claude Code Workflow](claude-workflow.md) — `/lc-new`,
-  `/lc-build`, `/lc-verify`, `/lc-migrate`, `/lc-feedback` — what each
-  one does and when to reach for it.
+- [The Claude Code Workflow](claude-workflow.md) — `/lc-from-question`,
+  `/lc-from-code`, `/lc-from-paper`, `/lc-build`, `/lc-verify`,
+  `/lc-feedback` — what each one does and when to reach for it.
 - [Tutorial: Your First Analysis](tutorial.md) — an end-to-end worked
   example, written so you can read it without running anything.
 - [Multiverse Analyses](multiverse.md) — how to explore alternative
@@ -36,14 +36,14 @@ implementation; you stay in charge of the scientific choices.
         ```bash
         uv tool install lightcone-cli
         lc init my-analysis && cd my-analysis
-        claude                                 # then, inside Claude Code: /lc-new
+        claude                                 # then, inside Claude Code: /lc-from-question
         ```
 
     === "pip"
         ```bash
         pip install lightcone-cli
         lc init my-analysis && cd my-analysis
-        claude                                # then, inside Claude Code: /lc-new
+        claude                                # then, inside Claude Code: /lc-from-question
         ```
 
 That's the shortest possible path. The rest of the guide is the
diff --git a/docs/user/install.md b/docs/user/install.md
index c9708150..1a3b5a7c 100644
--- a/docs/user/install.md
+++ b/docs/user/install.md
@@ -78,8 +78,8 @@ Open a project (in the next page we make one) with:
 claude
 ```
 
-Inside Claude Code you'll type slash commands like `/lc-new` and
-`/lc-build` — see [The Claude Code Workflow](claude-workflow.md).
+Inside Claude Code you'll type slash commands like `/lc-from-question`
+and `/lc-build` — see [The Claude Code Workflow](claude-workflow.md).
 
 ## 5. (Optional) Docker or Podman
 
diff --git a/docs/user/multiverse.md b/docs/user/multiverse.md
index e36e1ce4..77ccc233 100644
--- a/docs/user/multiverse.md
+++ b/docs/user/multiverse.md
@@ -28,7 +28,7 @@ decisions:
       none:  { label: "No outlier removal" }
 ```
 
-A few rules of thumb (the `/lc-new` skill enforces these):
+A few rules of thumb (the `/lc-from-question` skill enforces these):
 
 - **One choice, one decision.** Don't bundle "preprocessing strategy"
   into one decision with five options that mix different axes.
@@ -140,9 +140,9 @@ with its own `astra.yaml` and own decisions. The full tree is
 resolved automatically; sub-analyses can refer to each other's
 outputs.
 
-`/lc-new` will ask "should this be one analysis or several?" and
-help you split. The default answer is one — split only when each part
-genuinely has different inputs and outputs.
+`/lc-from-question` will ask "should this be one analysis or several?"
+and help you split. The default answer is one — split only when each
+part genuinely has different inputs and outputs.
 
 ## What lightcone-cli does *not* do
 
diff --git a/docs/user/troubleshooting.md b/docs/user/troubleshooting.md
index 64848509..fa759aa6 100644
--- a/docs/user/troubleshooting.md
+++ b/docs/user/troubleshooting.md
@@ -161,12 +161,12 @@ PY
 ## I want to start the spec over
 
 Move `astra.yaml` aside (don't delete it — agents like having context
-about what you tried), then `/lc-new` again:
+about what you tried), then `/lc-from-question` again:
 
 ```bash
 mv astra.yaml astra.previous.yaml
 claude
-# /lc-new
+# /lc-from-question
 ```
 
 ## File a bug from inside the session
diff --git a/docs/user/tutorial.md b/docs/user/tutorial.md
index 2076f6b8..2d66c68d 100644
--- a/docs/user/tutorial.md
+++ b/docs/user/tutorial.md
@@ -31,12 +31,12 @@ You're now in Claude Code, sitting in a fresh project. The first line
 on the screen is the session start banner, which probably says "no
 recipes yet."
 
-## 2. Scope the analysis with `/lc-new`
+## 2. Scope the analysis with `/lc-from-question`
 
 Type:
 
 ```
-/lc-new
+/lc-from-question
 ```
 
 The agent banner switches to **RESEARCH QUESTION** and asks something
diff --git a/zensical.toml b/zensical.toml
index f38bc093..962ad903 100644
--- a/zensical.toml
+++ b/zensical.toml
@@ -46,10 +46,10 @@ nav = [
     ]},
     {"Skills" = [
       {"Overview" = "skills/index.md"},
-      {"lc-new" = "skills/lc-new.md"},
+      {"lc-from-question" = "skills/lc-from-question.md"},
+      {"lc-from-code" = "skills/lc-from-code.md"},
       {"lc-build" = "skills/lc-build.md"},
       {"lc-verify" = "skills/lc-verify.md"},
-      {"lc-migrate" = "skills/lc-migrate.md"},
       {"lc-feedback" = "skills/lc-feedback.md"},
       {"Authoring Skills" = "skills/authoring.md"},
     ]},

From 0b5b39c27480d104755be8cad860e7012d84d101 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 02:50:41 +0200
Subject: [PATCH 022/124] restore lc-new (don't rename to lc-from-question)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The previous commit renamed three skills, but lc-new should stay as is:
the lc-from-* family is for substrate-bearing entries (code, paper),
while lc-new is the fresh-idea entry — "I'm starting something" rather
than "I have this material to ASTRA-ify." The asymmetry is meaningful
and the name reads cleanly as the natural verb when there's no substrate
yet.

Reverts only the lc-new portion. Keeps:
  lc-migrate -> lc-from-code
  paper2astra -> lc-from-paper

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md                                                 | 2 +-
 README.md                                                 | 4 ++--
 claude/lightcone/agents/lc-extractor.md                   | 2 +-
 claude/lightcone/skills/README.md                         | 2 +-
 .../skills/{lc-from-question => lc-new}/SKILL.md          | 4 ++--
 docs/cli/init.md                                          | 2 +-
 docs/index.md                                             | 2 +-
 docs/skills/index.md                                      | 8 ++++----
 docs/skills/lc-from-code.md                               | 2 +-
 docs/skills/{lc-from-question.md => lc-new.md}            | 6 +++---
 docs/user/agent-workflow.md                               | 4 ++--
 docs/user/getting-started.md                              | 2 +-
 docs/user/glossary.md                                     | 6 +++---
 docs/user/index.md                                        | 6 +++---
 docs/user/install.md                                      | 2 +-
 docs/user/multiverse.md                                   | 4 ++--
 docs/user/troubleshooting.md                              | 4 ++--
 docs/user/tutorial.md                                     | 4 ++--
 zensical.toml                                             | 2 +-
 19 files changed, 34 insertions(+), 34 deletions(-)
 rename claude/lightcone/skills/{lc-from-question => lc-new}/SKILL.md (99%)
 rename docs/skills/{lc-from-question.md => lc-new.md} (91%)

diff --git a/CLAUDE.md b/CLAUDE.md
index 7612b47b..42de7cf9 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -67,7 +67,7 @@ src/lightcone/              # namespace — NO __init__.py
     ├── harness.py, sandbox.py, graders.py, build.py, report.py, models.py
 
 claude/lightcone/           # Claude plugin source — force-included into the wheel
-├── skills/                 # lc-from-question, lc-from-code, lc-from-paper,
+├── skills/                 # lc-new, lc-from-code, lc-from-paper,
 │                            # lc-build, lc-verify, lc-feedback;
 │                            # paper-reproduction bundle: lc-from-paper (entry),
 │                            # narrative, constitution, ralph-loops, paper-extraction
diff --git a/README.md b/README.md
index 75a3feae..8efaea78 100644
--- a/README.md
+++ b/README.md
@@ -18,13 +18,13 @@ cd my-analysis
 claude
 ```
 
-Then tell the agent `/lc-from-question` to scope your research question. After the spec exists, just tell the agent to build it — implementation is a normal Claude Code workflow guided by `.claude/guides/`.
+Then tell the agent `/lc-new` to scope your research question. After the spec exists, just tell the agent to build it — implementation is a normal Claude Code workflow guided by `.claude/guides/`.
 
 ## Skills
 
 The `/lc-from-*` family is parallel by what you start from: a question, code, or a paper.
 
-### `/lc-from-question` — Scope and specify an analysis
+### `/lc-new` — Scope and specify an analysis
 
 Guides you from a research question to a complete `astra.yaml` specification through interactive conversation. The agent will:
 
diff --git a/claude/lightcone/agents/lc-extractor.md b/claude/lightcone/agents/lc-extractor.md
index b13c3a33..f377a922 100644
--- a/claude/lightcone/agents/lc-extractor.md
+++ b/claude/lightcone/agents/lc-extractor.md
@@ -1,6 +1,6 @@
 ---
 name: lc-extractor
-description: Extract prior insights from scientific papers for ASTRA analyses. Reads PDFs, identifies claims relevant to target decisions, extracts verbatim quotes, and verifies them. Use for literature extraction during /lc-from-question.
+description: Extract prior insights from scientific papers for ASTRA analyses. Reads PDFs, identifies claims relevant to target decisions, extracts verbatim quotes, and verifies them. Use for literature extraction during /lc-new.
 tools: Read, Bash
 model: sonnet
 ---
diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 53652a32..acfc6a45 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -6,7 +6,7 @@ Each subdirectory is one Claude Code skill: `SKILL.md` plus optional `references
 
 | Skill | Role |
 |---|---|
-| `lc-from-question` | Scaffold a new ASTRA-shaped project from a research question. |
+| `lc-new` | Scaffold a new ASTRA-shaped project from a research question. |
 | `lc-from-code` | Bring an existing codebase into ASTRA — scan, spec, parameterize. |
 | `lc-from-paper` | Reproduce a published paper in ASTRA (paper-reproduction bundle entry point — see below). |
 | `lc-build` | Build container images and dependencies for a project. |
diff --git a/claude/lightcone/skills/lc-from-question/SKILL.md b/claude/lightcone/skills/lc-new/SKILL.md
similarity index 99%
rename from claude/lightcone/skills/lc-from-question/SKILL.md
rename to claude/lightcone/skills/lc-new/SKILL.md
index 50fcaa3c..b4060cde 100644
--- a/claude/lightcone/skills/lc-from-question/SKILL.md
+++ b/claude/lightcone/skills/lc-new/SKILL.md
@@ -1,10 +1,10 @@
 ---
-name: lc-from-question
+name: lc-new
 description: Use this skill whenever the user starts a new ASTRA analysis from a research question — scoping the question, structuring inputs and outputs, identifying decisions through literature, and landing astra.yaml + project CLAUDE.md. Triggers on verbs (`new`, `start`, `scope`) combined with nouns (`analysis`, `project`, `question`, `research`) — e.g. "new analysis", "start project", "scope research question" — even if the user doesn't say "project" explicitly. Don't use this for working inside an existing ASTRA project; this is for fresh scoping only.
 allowed-tools: Read, Write(astra.yaml), Write(universes/*), Write(CLAUDE.md), Edit(astra.yaml), Edit(universes/*), Edit(CLAUDE.md), Glob, Grep, Bash(astra:*), Bash(lc:*), WebSearch, WebFetch, AskUserQuestion, Agent
 ---
 
-# /lc-from-question
+# /lc-new
 
 Create a new ASTRA analysis project through conversation. Build the spec iteratively -- write to `astra.yaml` after each phase so the user sees progress. Literature search and decision identification happen in distinct phases -- talk first, then extract papers, then identify decisions informed by both conversation and literature.
 
diff --git a/docs/cli/init.md b/docs/cli/init.md
index aee28326..0520252c 100644
--- a/docs/cli/init.md
+++ b/docs/cli/init.md
@@ -70,7 +70,7 @@ lc init . --permissions yolo           # for autonomous loops you trust
 cd my-analysis
 claude           # open Claude Code
 # Inside Claude Code:
-/lc-from-question  # scope a research question into astra.yaml
+/lc-new  # scope a research question into astra.yaml
 /lc-build          # implement and run it
 /lc-verify         # audit the result
 ```
diff --git a/docs/index.md b/docs/index.md
index b25d2e0e..ff9789ee 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -56,7 +56,7 @@ src/lightcone/                  # PEP 420 namespace package — NO __init__.py
 src/snakemake_executor_plugin_dask/   # Snakemake executor → dask.distributed
 
 claude/lightcone/               # Claude Code plugin (force-included into the wheel)
-├── skills/                     # lc-from-question, lc-from-code, lc-from-paper,
+├── skills/                     # lc-new, lc-from-code, lc-from-paper,
 │                                # lc-build, lc-verify, lc-feedback (+ bundle siblings)
 ├── agents/                     # lc-extractor (literature subagent)
 ├── guides/                     # astra-reference, lightcone-cli-reference, ui-brand
diff --git a/docs/skills/index.md b/docs/skills/index.md
index 1270a2fc..917ec88a 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -15,7 +15,7 @@ code, or a paper.
 
 | Skill | Command | Purpose |
 |-------|---------|---------|
-| [lc-from-question](lc-from-question.md) | `/lc-from-question` | Scope a research question into an `astra.yaml`, with optional literature extraction. |
+| [lc-new](lc-new.md) | `/lc-new` | Scope a research question into an `astra.yaml`, with optional literature extraction. |
 | [lc-from-code](lc-from-code.md) | `/lc-from-code` | Wrap an existing codebase in ASTRA: scan, generate spec, parameterize, run. |
 | lc-from-paper | `/lc-from-paper` | Reproduce a published paper in ASTRA — interview-first orchestrator, multi-session loop. (See the paper-reproduction bundle in [`claude/lightcone/skills/README.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md) for the full bundle map.) |
 | [lc-build](lc-build.md) | `/lc-build` | Plan + autonomous loop until all outputs in a universe materialize. |
@@ -48,7 +48,7 @@ files, anti-patterns. The skill bundles its own helper scripts under
 ```
 claude/lightcone/
 ├── skills/
-│   ├── lc-from-question/SKILL.md
+│   ├── lc-new/SKILL.md
 │   ├── lc-from-code/SKILL.md
 │   ├── lc-from-paper/{SKILL.md, references/*.md}
 │   ├── lc-build/{SKILL.md, assets/loop-prompt.md, scripts/setup-lc-build.sh}
@@ -69,10 +69,10 @@ The plugin is force-included into the wheel via
 
 | File | Purpose |
 |------|---------|
-| `claude/lightcone/guides/astra-reference.md` | Full `astra.yaml` schema reference. Loaded by `lc-from-question`, `lc-build`, `lc-from-code`. |
+| `claude/lightcone/guides/astra-reference.md` | Full `astra.yaml` schema reference. Loaded by `lc-new`, `lc-build`, `lc-from-code`. |
 | `claude/lightcone/guides/lightcone-cli-reference.md` | CLI commands, status interpretation, failure diagnosis. Loaded by build/verify skills. |
 | `claude/lightcone/guides/ui-brand.md` | Visual formatting conventions for skill output. |
-| `claude/lightcone/agents/lc-extractor.md` | Literature extraction subagent invoked by `/lc-from-question`. |
+| `claude/lightcone/agents/lc-extractor.md` | Literature extraction subagent invoked by `/lc-new`. |
 
 ## Authoring a new skill
 
diff --git a/docs/skills/lc-from-code.md b/docs/skills/lc-from-code.md
index 0d8281d4..17552643 100644
--- a/docs/skills/lc-from-code.md
+++ b/docs/skills/lc-from-code.md
@@ -80,6 +80,6 @@ present the summary.
 
 ## Related
 
-- [`/lc-from-question`](lc-from-question.md) — for greenfield analyses.
+- [`/lc-new`](lc-new.md) — for greenfield analyses.
 - [`/lc-verify`](lc-verify.md) — run after migration to confirm
   spec-code-results alignment.
diff --git a/docs/skills/lc-from-question.md b/docs/skills/lc-new.md
similarity index 91%
rename from docs/skills/lc-from-question.md
rename to docs/skills/lc-new.md
index 144f8fa6..fd322819 100644
--- a/docs/skills/lc-from-question.md
+++ b/docs/skills/lc-new.md
@@ -1,10 +1,10 @@
-# /lc-from-question
+# /lc-new
 
 Scope a new ASTRA analysis from a research question through conversation.
 Produces a complete `astra.yaml` (and optionally a literature evidence
 trail) with no code written.
 
-Source: [`claude/lightcone/skills/lc-from-question/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-from-question/SKILL.md).
+Source: [`claude/lightcone/skills/lc-new/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-new/SKILL.md).
 
 ## Allowed tools
 
@@ -63,6 +63,6 @@ at the end so the user has something visible to review at every step.
 
 ## Related
 
-- [`/lc-build`](lc-build.md) — the next step after `/lc-from-question`.
+- [`/lc-build`](lc-build.md) — the next step after `/lc-new`.
 - [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md) — `astra.yaml` schema, decision criteria, prior insights / findings, universe management.
 - [`claude/lightcone/agents/lc-extractor.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/agents/lc-extractor.md) — the literature extraction subagent definition.
diff --git a/docs/user/agent-workflow.md b/docs/user/agent-workflow.md
index 33ad6ff2..4ee7404c 100644
--- a/docs/user/agent-workflow.md
+++ b/docs/user/agent-workflow.md
@@ -11,7 +11,7 @@ them.
 > writes to disk. You stay in charge of approving everything; the agent
 > never publishes a paper for you.
 
-## `/lc-from-question` — scope a new analysis
+## `/lc-new` — scope a new analysis
 
 **You start with a research question. You end with a complete
 `astra.yaml` (and optionally a literature evidence trail).**
@@ -39,7 +39,7 @@ The skill walks you through four phases:
    universe; the `## Working Notes` section of `CLAUDE.md` gets the
    conversational context that wouldn't otherwise survive a `/clear`.
 
-You don't write any code or YAML during `/lc-from-question`. By the
+You don't write any code or YAML during `/lc-new`. By the
 time it finishes, you have a precise specification. The agent enforces
 this: the skill is *only allowed* to edit `astra.yaml`, files in
 `universes/`, and `CLAUDE.md`.
diff --git a/docs/user/getting-started.md b/docs/user/getting-started.md
index dde6fdfd..512f65c7 100644
--- a/docs/user/getting-started.md
+++ b/docs/user/getting-started.md
@@ -65,7 +65,7 @@ skills follow.
 
 | Command | Use it when… |
 |---------|--------------|
-| `/lc-from-question` | You're starting from a research question and an empty `astra.yaml`. |
+| `/lc-new` | You're starting from a research question and an empty `astra.yaml`. |
 | `/lc-from-code` | You have an existing codebase you want wrapped in ASTRA. |
 | `/lc-from-paper` | You have a published paper (DOI / arXiv ID) you want to reproduce. |
 | `/lc-build` | You have a scoped `astra.yaml` and you want the analysis implemented and run. |
diff --git a/docs/user/glossary.md b/docs/user/glossary.md
index f2e94eec..a08426f1 100644
--- a/docs/user/glossary.md
+++ b/docs/user/glossary.md
@@ -141,7 +141,7 @@ node launched via `srun`.
 
 A Claude Code slash command bundled with the lightcone-cli plugin.
 The `/lc-from-*` family is parallel by what you start from — a question
-(`/lc-from-question`), code (`/lc-from-code`), or a paper
+(`/lc-new`), code (`/lc-from-code`), or a paper
 (`/lc-from-paper`). The build/verify/feedback skills (`/lc-build`,
 `/lc-verify`, `/lc-feedback`) follow. Each one is a structured prompt
 that drives the agent through a specific phased workflow.
@@ -150,8 +150,8 @@ that drives the agent through a specific phased workflow.
 
 A Claude Code agent invoked by another agent via the `Task` tool. The
 `lc-extractor` subagent reads PDFs and pulls verifiable quotes; it's
-spawned by `/lc-from-question` during the literature deep-dive phase.
-Subagents have isolated context, which is why `/lc-from-question` uses
+spawned by `/lc-new` during the literature deep-dive phase.
+Subagents have isolated context, which is why `/lc-new` uses
 one per paper — PDFs are big.
 
 ## Prior insight
diff --git a/docs/user/index.md b/docs/user/index.md
index af21edaa..f69f51c9 100644
--- a/docs/user/index.md
+++ b/docs/user/index.md
@@ -15,7 +15,7 @@ implementation; you stay in charge of the scientific choices.
   machine.
 - [Getting Started](getting-started.md) — your first `lc init` and
   what every directory means.
-- [The Claude Code Workflow](claude-workflow.md) — `/lc-from-question`,
+- [The Claude Code Workflow](claude-workflow.md) — `/lc-new`,
   `/lc-from-code`, `/lc-from-paper`, `/lc-build`, `/lc-verify`,
   `/lc-feedback` — what each one does and when to reach for it.
 - [Tutorial: Your First Analysis](tutorial.md) — an end-to-end worked
@@ -36,14 +36,14 @@ implementation; you stay in charge of the scientific choices.
         ```bash
         uv tool install lightcone-cli
         lc init my-analysis && cd my-analysis
-        claude                                 # then, inside Claude Code: /lc-from-question
+        claude                                 # then, inside Claude Code: /lc-new
         ```
 
     === "pip"
         ```bash
         pip install lightcone-cli
         lc init my-analysis && cd my-analysis
-        claude                                # then, inside Claude Code: /lc-from-question
+        claude                                # then, inside Claude Code: /lc-new
         ```
 
 That's the shortest possible path. The rest of the guide is the
diff --git a/docs/user/install.md b/docs/user/install.md
index 1a3b5a7c..ebc28618 100644
--- a/docs/user/install.md
+++ b/docs/user/install.md
@@ -78,7 +78,7 @@ Open a project (in the next page we make one) with:
 claude
 ```
 
-Inside Claude Code you'll type slash commands like `/lc-from-question`
+Inside Claude Code you'll type slash commands like `/lc-new`
 and `/lc-build` — see [The Claude Code Workflow](claude-workflow.md).
 
 ## 5. (Optional) Docker or Podman
diff --git a/docs/user/multiverse.md b/docs/user/multiverse.md
index 77ccc233..4183af9b 100644
--- a/docs/user/multiverse.md
+++ b/docs/user/multiverse.md
@@ -28,7 +28,7 @@ decisions:
       none:  { label: "No outlier removal" }
 ```
 
-A few rules of thumb (the `/lc-from-question` skill enforces these):
+A few rules of thumb (the `/lc-new` skill enforces these):
 
 - **One choice, one decision.** Don't bundle "preprocessing strategy"
   into one decision with five options that mix different axes.
@@ -140,7 +140,7 @@ with its own `astra.yaml` and own decisions. The full tree is
 resolved automatically; sub-analyses can refer to each other's
 outputs.
 
-`/lc-from-question` will ask "should this be one analysis or several?"
+`/lc-new` will ask "should this be one analysis or several?"
 and help you split. The default answer is one — split only when each
 part genuinely has different inputs and outputs.
 
diff --git a/docs/user/troubleshooting.md b/docs/user/troubleshooting.md
index fa759aa6..64848509 100644
--- a/docs/user/troubleshooting.md
+++ b/docs/user/troubleshooting.md
@@ -161,12 +161,12 @@ PY
 ## I want to start the spec over
 
 Move `astra.yaml` aside (don't delete it — agents like having context
-about what you tried), then `/lc-from-question` again:
+about what you tried), then `/lc-new` again:
 
 ```bash
 mv astra.yaml astra.previous.yaml
 claude
-# /lc-from-question
+# /lc-new
 ```
 
 ## File a bug from inside the session
diff --git a/docs/user/tutorial.md b/docs/user/tutorial.md
index 2d66c68d..2076f6b8 100644
--- a/docs/user/tutorial.md
+++ b/docs/user/tutorial.md
@@ -31,12 +31,12 @@ You're now in Claude Code, sitting in a fresh project. The first line
 on the screen is the session start banner, which probably says "no
 recipes yet."
 
-## 2. Scope the analysis with `/lc-from-question`
+## 2. Scope the analysis with `/lc-new`
 
 Type:
 
 ```
-/lc-from-question
+/lc-new
 ```
 
 The agent banner switches to **RESEARCH QUESTION** and asks something
diff --git a/zensical.toml b/zensical.toml
index 962ad903..da4716ef 100644
--- a/zensical.toml
+++ b/zensical.toml
@@ -46,7 +46,7 @@ nav = [
     ]},
     {"Skills" = [
       {"Overview" = "skills/index.md"},
-      {"lc-from-question" = "skills/lc-from-question.md"},
+      {"lc-new" = "skills/lc-new.md"},
       {"lc-from-code" = "skills/lc-from-code.md"},
       {"lc-build" = "skills/lc-build.md"},
       {"lc-verify" = "skills/lc-verify.md"},

From d13479b6095f7220cb441aae4db2c1cda95250c7 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 03:06:43 +0200
Subject: [PATCH 023/124] docs: drop retired skill pages from nav

Remove the orphaned docs pages and navigation entries for the retired build and verify slash-command wrappers. The current workflow keeps the CLI commands themselves: implement through Claude Code, run lc run and lc status, then validate with astra validate and lc verify.

Co-Authored-By: Codex <noreply@openai.com>
---
 docs/skills/lc-build.md  | 83 ----------------------------------------
 docs/skills/lc-verify.md | 59 ----------------------------
 zensical.toml            |  2 -
 3 files changed, 144 deletions(-)
 delete mode 100644 docs/skills/lc-build.md
 delete mode 100644 docs/skills/lc-verify.md

diff --git a/docs/skills/lc-build.md b/docs/skills/lc-build.md
deleted file mode 100644
index f1d8524a..00000000
--- a/docs/skills/lc-build.md
+++ /dev/null
@@ -1,83 +0,0 @@
-# /lc-build
-
-Build an ASTRA analysis from spec to materialized results. Plans
-interactively, then loops autonomously via the ralph-wiggum stop hook
-until all outputs are materialized or `--max-iterations` is reached.
-
-Source: [`claude/lightcone/skills/lc-build/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-build/SKILL.md).
-
-Argument hint: `[DESCRIPTION] [--universe NAME] [--max-iterations N]`.
-Defaults: universe `baseline`, max-iterations `25`.
-
-## Allowed tools
-
-```
-Read, Write, Edit, Glob, Grep,
-Bash(astra:*), Bash(lc:*), Bash(python:*), Bash(git:*), Bash(pip:*), Bash(mkdir:*),
-Bash(setup-lc-build:*),
-Agent, AskUserQuestion
-```
-
-## Phases
-
-### Phase 0 — Resume an interrupted loop
-
-If `.claude/ralph-loop.local.md` exists, ask the user via
-`AskUserQuestion` whether to resume or start fresh. Resume runs
-`setup-lc-build.sh --resume`; fresh deletes the state file.
-
-### Phase 1 — Plan (interactive)
-
-1. **Validate prerequisites** via `setup-lc-build.sh --validate
-   --universe <U> --max-iterations <N>`. Bails out with actionable
-   error messages if `astra.yaml`, the universe file, or required
-   tools are missing.
-2. **Read context** — `astra.yaml`, `CLAUDE.md`,
-   `.claude/guides/astra-reference.md`,
-   `.claude/guides/lightcone-cli-reference.md`,
-   `universes/<U>.yaml`, any existing `scripts/`.
-3. **Produce a plan** at `.lightcone/plans/build-plan-<U>.md` with:
-   analysis overview; dependency graph; decision selections; ordered
-   build checklist with per-output script / decisions / dependencies /
-   estimated cost; verification checklist.
-4. **Get approval** via `AskUserQuestion`: "Approve and start building"
-   vs "Let me edit the plan first."
-
-**Rule:** Phase 1 is read-only exploration. No code, no spec edits
-until the user approves.
-
-### Phase 2 — Loop (autonomous)
-
-Once approved, `setup-lc-build.sh --activate` writes
-`.claude/ralph-loop.local.md`. The Claude Code stop hook intercepts
-session exits and re-injects the loop prompt
-([`assets/loop-prompt.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-build/assets/loop-prompt.md))
-until the agent emits `<promise>BUILD_COMPLETE</promise>` or
-max-iterations is hit.
-
-Each iteration: survey state, decide what to do next, work, commit,
-exit. The plan file persists across crashes for easy resumption and
-is deleted on successful completion.
-
-## State files
-
-| File | Purpose |
-|------|---------|
-| `.lightcone/plans/build-plan-<universe>.md` | The user-approved plan. Persists across crashes. Deleted on completion. |
-| `.claude/ralph-loop.local.md` | Loop state: iteration count, max iterations, session id, universe. Used by the session-start hook to detect interruptions. |
-
-## Cancellation
-
-Mid-loop: `/cancel-ralph` (provided by the ralph-loop plugin).
-
-## Dependency on the ralph-loop plugin
-
-The loop machinery (the stop hook, `/cancel-ralph`) ships in a
-separate Claude Code plugin. `setup-lc-build.sh` will attempt to
-install it on demand from the marketplace; if installation fails it
-errors out and cleans up.
-
-## Related
-
-- [`/lc-verify`](lc-verify.md) — read-only audit, run after a successful build.
-- [`claude/lightcone/guides/lightcone-cli-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/lightcone-cli-reference.md) — CLI and execution reference loaded by the skill.
diff --git a/docs/skills/lc-verify.md b/docs/skills/lc-verify.md
deleted file mode 100644
index 47ea30a1..00000000
--- a/docs/skills/lc-verify.md
+++ /dev/null
@@ -1,59 +0,0 @@
-# /lc-verify
-
-Read-only audit. Checks that `astra.yaml`, the code, and the
-materialized results all agree.
-
-Source: [`claude/lightcone/skills/lc-verify/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-verify/SKILL.md).
-
-## Allowed tools
-
-```
-Read, Glob, Grep,
-Bash(astra:*), Bash(lc:*), Bash(python:*), Bash(ls:*),
-AskUserQuestion
-```
-
-No `Write`, no `Edit`. The skill cannot modify the project.
-
-## What it checks (per universe; default `baseline`)
-
-1. **Spec validation** — `astra validate astra.yaml`. Fix and iterate
-   until clean.
-2. **Materialization status** — `lc status --universe <U>`. Every
-   output should be `ok`. Anything `stale`, `missing`, or `alias`
-   that's not expected gets flagged.
-3. **Decision-code alignment** — *the core value*. For every decision
-   in `astra.yaml`, confirm the code accepts it as a parameter rather
-   than hardcoding the value. Cross-checks `astra info --decisions`
-   against argparse usage in `scripts/`.
-4. **Results match spec** — for every output, verify the result files
-   exist and look well-formed. For `type: metric` outputs, check that
-   each JSON file parses and contains a `{"value": …}` entry.
-
-## Report format
-
-```
-| Check                    | Status |
-|--------------------------|--------|
-| Spec validation          | ✓/✗    |
-| Materialization (N/N)    | ✓/✗    |
-| Decision-code alignment  | ✓/⚠/✗  |
-| Results match spec (N/N) | ✓/✗    |
-```
-
-The skill lists each finding with file paths and line numbers, and
-suggests concrete fixes when something fails.
-
-## Hard rules
-
-- Read-only — never modifies files.
-- One universe at a time.
-- Never skips the decision-code alignment check.
-- Always reads actual result files; never infers from code.
-
-## Related
-
-- [`/lc-build`](lc-build.md) — fix anything `/lc-verify` flags.
-- [`lc verify`](../cli/verify.md) — the deeper, hash-based audit on the
-  CLI side. They complement each other: the skill checks
-  spec-vs-code-vs-results alignment; the CLI checks data integrity.
diff --git a/zensical.toml b/zensical.toml
index da4716ef..c99589c1 100644
--- a/zensical.toml
+++ b/zensical.toml
@@ -48,8 +48,6 @@ nav = [
       {"Overview" = "skills/index.md"},
       {"lc-new" = "skills/lc-new.md"},
       {"lc-from-code" = "skills/lc-from-code.md"},
-      {"lc-build" = "skills/lc-build.md"},
-      {"lc-verify" = "skills/lc-verify.md"},
       {"lc-feedback" = "skills/lc-feedback.md"},
       {"Authoring Skills" = "skills/authoring.md"},
     ]},

From 223d9ec03a6bbf3780f5de43fdfa4e51df5c7612 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 03:07:59 +0200
Subject: [PATCH 024/124] docs: align skill indexes with current bundle

Remove stale build and verify wrapper references from overview surfaces. The docs now describe the current flow positively: entry skills create or migrate the spec, then the agent implements directly with lc run, lc status, astra validate, and lc verify.

Co-Authored-By: Codex <noreply@openai.com>
---
 CLAUDE.md                         |  2 +-
 README.md                         |  2 +-
 claude/lightcone/skills/README.md |  2 --
 docs/index.md                     |  2 +-
 docs/skills/index.md              | 16 ++++++----------
 5 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index 42de7cf9..92d2a3c6 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -68,7 +68,7 @@ src/lightcone/              # namespace — NO __init__.py
 
 claude/lightcone/           # Claude plugin source — force-included into the wheel
 ├── skills/                 # lc-new, lc-from-code, lc-from-paper,
-│                            # lc-build, lc-verify, lc-feedback;
+│                            # lc-feedback;
 │                            # paper-reproduction bundle: lc-from-paper (entry),
 │                            # narrative, constitution, ralph-loops, paper-extraction
 │                            # (see skills/README.md for the full bundle map)
diff --git a/README.md b/README.md
index 8efaea78..105879fa 100644
--- a/README.md
+++ b/README.md
@@ -50,7 +50,7 @@ Files a GitHub issue against the right repo (ASTRA or lightcone-cli) with versio
 
 ### Building and verifying
 
-There is no `/lc-build` or `/lc-verify` skill — building and verifying are part of the normal Claude Code workflow once `astra.yaml` exists. The agent reads `.claude/guides/lightcone-cli-reference.md` (workflow, commands, status meanings) and `.claude/guides/astra-reference.md` (spec syntax) and drives the build directly: write scripts under `src/`, run `lc run`, watch `lc status` until every output is `ok`, then `astra validate astra.yaml` and `lc verify` to confirm the spec is valid and the provenance chain is intact.
+Once `astra.yaml` exists, the agent reads `.claude/guides/lightcone-cli-reference.md` (workflow, commands, status meanings) and `.claude/guides/astra-reference.md` (spec syntax), writes the analysis scripts under `src/`, runs `lc run`, watches `lc status` until every output is `ok`, then runs `astra validate astra.yaml` and `lc verify` to confirm the spec is valid and the provenance chain is intact.
 
 ## CLI Reference
 
diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index acfc6a45..262c49e3 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -9,8 +9,6 @@ Each subdirectory is one Claude Code skill: `SKILL.md` plus optional `references
 | `lc-new` | Scaffold a new ASTRA-shaped project from a research question. |
 | `lc-from-code` | Bring an existing codebase into ASTRA — scan, spec, parameterize. |
 | `lc-from-paper` | Reproduce a published paper in ASTRA (paper-reproduction bundle entry point — see below). |
-| `lc-build` | Build container images and dependencies for a project. |
-| `lc-verify` | Run validation across an ASTRA project. |
 | `lc-feedback` | Report bugs and feature requests upstream. |
 
 ## Paper-reproduction bundle
diff --git a/docs/index.md b/docs/index.md
index ff9789ee..ec18104e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -57,7 +57,7 @@ src/snakemake_executor_plugin_dask/   # Snakemake executor → dask.distributed
 
 claude/lightcone/               # Claude Code plugin (force-included into the wheel)
 ├── skills/                     # lc-new, lc-from-code, lc-from-paper,
-│                                # lc-build, lc-verify, lc-feedback (+ bundle siblings)
+│                                # lc-feedback (+ bundle siblings)
 ├── agents/                     # lc-extractor (literature subagent)
 ├── guides/                     # astra-reference, lightcone-cli-reference, ui-brand
 ├── templates/                  # project CLAUDE.md template
diff --git a/docs/skills/index.md b/docs/skills/index.md
index 917ec88a..f900428a 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -18,8 +18,6 @@ code, or a paper.
 | [lc-new](lc-new.md) | `/lc-new` | Scope a research question into an `astra.yaml`, with optional literature extraction. |
 | [lc-from-code](lc-from-code.md) | `/lc-from-code` | Wrap an existing codebase in ASTRA: scan, generate spec, parameterize, run. |
 | lc-from-paper | `/lc-from-paper` | Reproduce a published paper in ASTRA — interview-first orchestrator, multi-session loop. (See the paper-reproduction bundle in [`claude/lightcone/skills/README.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md) for the full bundle map.) |
-| [lc-build](lc-build.md) | `/lc-build` | Plan + autonomous loop until all outputs in a universe materialize. |
-| [lc-verify](lc-verify.md) | `/lc-verify` | Read-only audit: spec validity, materialization status, decision-code alignment, result file shapes. |
 | [lc-feedback](lc-feedback.md) | `/lc-feedback` | File a GitHub issue against the right Lightcone repo with auto-collected context. |
 
 ## How a skill is wired
@@ -29,11 +27,11 @@ YAML frontmatter:
 
 ```yaml
 ---
-name: lc-build
+name: lc-new
 description: >
-  Build an ASTRA analysis from spec to materialized results...
-allowed-tools: Read, Write, Edit, Glob, Grep, Bash(astra:*), Bash(lc:*), ...
-argument-hint: "[DESCRIPTION] [--universe NAME] [--max-iterations N]"
+  Scope a new ASTRA analysis from a research question...
+allowed-tools: Read, Write(astra.yaml), Edit(astra.yaml), Glob, Grep, Bash(astra:*), ...
+argument-hint: "[DESCRIPTION]"
 ---
 ```
 
@@ -51,8 +49,6 @@ claude/lightcone/
 │   ├── lc-new/SKILL.md
 │   ├── lc-from-code/SKILL.md
 │   ├── lc-from-paper/{SKILL.md, references/*.md}
-│   ├── lc-build/{SKILL.md, assets/loop-prompt.md, scripts/setup-lc-build.sh}
-│   ├── lc-verify/SKILL.md
 │   ├── lc-feedback/SKILL.md
 │   └── …                              # paper-reproduction bundle siblings
 ├── agents/lc-extractor.md             # subagent definition
@@ -69,8 +65,8 @@ The plugin is force-included into the wheel via
 
 | File | Purpose |
 |------|---------|
-| `claude/lightcone/guides/astra-reference.md` | Full `astra.yaml` schema reference. Loaded by `lc-new`, `lc-build`, `lc-from-code`. |
-| `claude/lightcone/guides/lightcone-cli-reference.md` | CLI commands, status interpretation, failure diagnosis. Loaded by build/verify skills. |
+| `claude/lightcone/guides/astra-reference.md` | Full `astra.yaml` schema reference. Loaded by `lc-new` and `lc-from-code`. |
+| `claude/lightcone/guides/lightcone-cli-reference.md` | CLI commands, status interpretation, failure diagnosis. Loaded by implementation and validation workflows. |
 | `claude/lightcone/guides/ui-brand.md` | Visual formatting conventions for skill output. |
 | `claude/lightcone/agents/lc-extractor.md` | Literature extraction subagent invoked by `/lc-new`. |
 

From 8f0ec83b2bfede1a1428b041581438d79b8c38e3 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 03:09:28 +0200
Subject: [PATCH 025/124] docs: rewrite user workflow around current commands

Remove prose for retired build and verify slash-command wrappers from user-facing docs. The tutorial and CLI init path now show the current workflow directly: implement in Claude Code, run lc run and lc status, then astra validate astra.yaml and lc verify.

Co-Authored-By: Codex <noreply@openai.com>
---
 docs/cli/init.md             |  4 +-
 docs/skills/lc-from-code.md  |  4 +-
 docs/skills/lc-new.md        |  3 +-
 docs/user/agent-workflow.md  | 59 +++++------------------------
 docs/user/getting-started.md |  6 +--
 docs/user/glossary.md        | 18 ++++-----
 docs/user/index.md           |  4 +-
 docs/user/install.md         |  5 ++-
 docs/user/multiverse.md      |  2 +-
 docs/user/troubleshooting.md | 24 ------------
 docs/user/tutorial.md        | 73 +++++++++++++++---------------------
 11 files changed, 63 insertions(+), 139 deletions(-)

diff --git a/docs/cli/init.md b/docs/cli/init.md
index 0520252c..f408bbaf 100644
--- a/docs/cli/init.md
+++ b/docs/cli/init.md
@@ -71,6 +71,6 @@ cd my-analysis
 claude           # open Claude Code
 # Inside Claude Code:
 /lc-new  # scope a research question into astra.yaml
-/lc-build          # implement and run it
-/lc-verify         # audit the result
+# Then ask the agent to implement the spec.
+# It will run lc run, watch lc status, then validate and verify.
 ```
diff --git a/docs/skills/lc-from-code.md b/docs/skills/lc-from-code.md
index 17552643..52276c7a 100644
--- a/docs/skills/lc-from-code.md
+++ b/docs/skills/lc-from-code.md
@@ -81,5 +81,5 @@ present the summary.
 ## Related
 
 - [`/lc-new`](lc-new.md) — for greenfield analyses.
-- [`/lc-verify`](lc-verify.md) — run after migration to confirm
-  spec-code-results alignment.
+- After migration, run `lc verify` to confirm the spec is valid and
+  the provenance chain is intact.
diff --git a/docs/skills/lc-new.md b/docs/skills/lc-new.md
index fd322819..a3de0a38 100644
--- a/docs/skills/lc-new.md
+++ b/docs/skills/lc-new.md
@@ -63,6 +63,7 @@ at the end so the user has something visible to review at every step.
 
 ## Related
 
-- [`/lc-build`](lc-build.md) — the next step after `/lc-new`.
+- After `/lc-new`, ask the agent to implement the spec through the
+  normal Claude Code workflow.
 - [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md) — `astra.yaml` schema, decision criteria, prior insights / findings, universe management.
 - [`claude/lightcone/agents/lc-extractor.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/agents/lc-extractor.md) — the literature extraction subagent definition.
diff --git a/docs/user/agent-workflow.md b/docs/user/agent-workflow.md
index 4ee7404c..01ff6444 100644
--- a/docs/user/agent-workflow.md
+++ b/docs/user/agent-workflow.md
@@ -1,11 +1,11 @@
 # The Agent Workflow
 
-The agentic surface is six slash commands. The `/lc-from-*` family is
-parallel by what you start from — a question, code, or a paper — and
-the build/verify/feedback skills follow. Each one is a structured
-prompt: the agent follows a specific phased flow, not free-form chat.
-This page walks through each of them in the order you'd naturally hit
-them.
+The agentic surface is three entry slash commands plus feedback. The
+`/lc-from-*` family is parallel by what you start from — a question,
+code, or a paper — and `/lc-feedback` handles bug reports. Each one is
+a structured prompt: the agent follows a specific phased flow, not
+free-form chat. This page walks through each of them in the order you'd
+naturally hit them.
 
 > The bracketed `→ astra.yaml` etc. notes show what each phase actually
 > writes to disk. You stay in charge of approving everything; the agent
@@ -44,48 +44,6 @@ time it finishes, you have a precise specification. The agent enforces
 this: the skill is *only allowed* to edit `astra.yaml`, files in
 `universes/`, and `CLAUDE.md`.
 
-## `/lc-build` — implement and run
-
-**You have a scoped `astra.yaml`. You end with materialized outputs.**
-
-This is the longest-running skill. It has two phases.
-
-**Phase 1: plan.** The agent reads the spec, the universe file, and
-your existing scripts (if any), and writes a plan to
-`.lightcone/plans/build-plan-<universe>.md`. The plan covers
-dependencies, decision selections, ordered build checklist, and
-verification steps. It asks you to approve before doing anything else.
-
-**Phase 2: loop.** Once you approve, the skill activates an
-*autonomous loop*: the agent works through the plan, writes scripts,
-runs `lc run` to materialize outputs, fixes failures, and commits as
-it goes. The loop keeps going until either every output is
-materialized or it hits its iteration limit (default 25).
-
-You can interrupt the loop at any time. If you do, the next time you
-run `/lc-build` it asks whether to resume or start fresh.
-
-The plan file persists across crashes; only successful completion
-deletes it.
-
-## `/lc-verify` — audit a finished build
-
-**You have materialized outputs. You end with a verification report.**
-
-Read-only. Four checks:
-
-1. `astra validate astra.yaml` passes.
-2. `lc status` shows every output `ok` for the universe in question.
-3. **Decision-code alignment** (the most important check). For every
-   decision in the spec, the agent verifies the code accepts that
-   decision as a parameter — i.e. the value isn't silently hardcoded.
-4. Result files exist and look well-formed (a `type: metric` output
-   should be parseable JSON, etc.).
-
-The skill never modifies anything. If it finds a discrepancy, it
-suggests concrete fixes; you re-run `/lc-build` (or fix by hand) and
-re-verify.
-
 ## `/lc-from-code` — wrap existing code
 
 **You have a folder of scripts. You end with an ASTRA project around
@@ -153,5 +111,6 @@ interruptible — every phase writes to disk so a `/clear` (which frees
 up context) doesn't lose your work.
 
 If a skill seems stuck, a quick `/clear` followed by reinvoking the
-slash command is often the right move: the spec, plan, and universe
-files are all on disk, so the agent picks up exactly where it left off.
+slash command is often the right move: the spec, universe files, and
+written work products are all on disk, so the agent can pick up where
+it left off.
diff --git a/docs/user/getting-started.md b/docs/user/getting-started.md
index 512f65c7..b4c5cfa1 100644
--- a/docs/user/getting-started.md
+++ b/docs/user/getting-started.md
@@ -60,16 +60,14 @@ reads `astra.yaml` and `CLAUDE.md` so it has context.
 ## 4. The slash commands
 
 Inside Claude Code. The `/lc-from-*` family is parallel by what you
-start from — a question, code, or a paper — and the build/verify/feedback
-skills follow.
+start from — a question, code, or a paper — and `/lc-feedback` handles
+bug reports without leaving the session.
 
 | Command | Use it when… |
 |---------|--------------|
 | `/lc-new` | You're starting from a research question and an empty `astra.yaml`. |
 | `/lc-from-code` | You have an existing codebase you want wrapped in ASTRA. |
 | `/lc-from-paper` | You have a published paper (DOI / arXiv ID) you want to reproduce. |
-| `/lc-build` | You have a scoped `astra.yaml` and you want the analysis implemented and run. |
-| `/lc-verify` | You finished a build and want a read-only audit. |
 | `/lc-feedback` | Something broke and you want to file a GitHub issue without leaving the session. |
 
 The next page, [The Claude Code Workflow](claude-workflow.md),
diff --git a/docs/user/glossary.md b/docs/user/glossary.md
index a08426f1..4515139f 100644
--- a/docs/user/glossary.md
+++ b/docs/user/glossary.md
@@ -142,9 +142,9 @@ node launched via `srun`.
 A Claude Code slash command bundled with the lightcone-cli plugin.
 The `/lc-from-*` family is parallel by what you start from — a question
 (`/lc-new`), code (`/lc-from-code`), or a paper
-(`/lc-from-paper`). The build/verify/feedback skills (`/lc-build`,
-`/lc-verify`, `/lc-feedback`) follow. Each one is a structured prompt
-that drives the agent through a specific phased workflow.
+(`/lc-from-paper`). `/lc-feedback` files upstream issues from inside
+the session. Each one is a structured prompt that drives the agent
+through a specific phased workflow.
 
 ## Subagent
 
@@ -192,12 +192,12 @@ The three labels `lc verify` produces when something's wrong:
 
 ## Ralph loop
 
-The autonomous build loop driven by `/lc-build`. Each iteration:
-survey state, decide what to do next, write/run code, commit, exit.
-The Claude Code stop hook re-injects the loop prompt until the agent
-emits `BUILD_COMPLETE` or hits its iteration limit. State persists
-across crashes in `.claude/ralph-loop.local.md`. Cancel with
-`/cancel-ralph`.
+A reusable autonomous iteration pattern for long-running agent work.
+Each iteration surveys state, decides what to do next, writes or runs
+code, commits, and exits. The Claude Code stop hook can re-inject the
+loop prompt until the agent emits its completion signal or hits an
+iteration limit. State persists across crashes in
+`.claude/ralph-loop.local.md`. Cancel with `/cancel-ralph`.
 
 ## Permission tier
 
diff --git a/docs/user/index.md b/docs/user/index.md
index f69f51c9..c1c73705 100644
--- a/docs/user/index.md
+++ b/docs/user/index.md
@@ -16,8 +16,8 @@ implementation; you stay in charge of the scientific choices.
 - [Getting Started](getting-started.md) — your first `lc init` and
   what every directory means.
 - [The Claude Code Workflow](claude-workflow.md) — `/lc-new`,
-  `/lc-from-code`, `/lc-from-paper`, `/lc-build`, `/lc-verify`,
-  `/lc-feedback` — what each one does and when to reach for it.
+  `/lc-from-code`, `/lc-from-paper`, and `/lc-feedback` — what each
+  one does and when to reach for it.
 - [Tutorial: Your First Analysis](tutorial.md) — an end-to-end worked
   example, written so you can read it without running anything.
 - [Multiverse Analyses](multiverse.md) — how to explore alternative
diff --git a/docs/user/install.md b/docs/user/install.md
index ebc28618..7b06f33d 100644
--- a/docs/user/install.md
+++ b/docs/user/install.md
@@ -78,8 +78,9 @@ Open a project (in the next page we make one) with:
 claude
 ```
 
-Inside Claude Code you'll type slash commands like `/lc-new`
-and `/lc-build` — see [The Claude Code Workflow](claude-workflow.md).
+Inside Claude Code you'll type slash commands like `/lc-new`,
+`/lc-from-code`, and `/lc-from-paper` — see
+[The Claude Code Workflow](claude-workflow.md).
 
 ## 5. (Optional) Docker or Podman
 
diff --git a/docs/user/multiverse.md b/docs/user/multiverse.md
index 4183af9b..c98bbee7 100644
--- a/docs/user/multiverse.md
+++ b/docs/user/multiverse.md
@@ -107,7 +107,7 @@ The comparison itself is your code: read the per-universe
 `r2.json` files, plot them, write up the result. ASTRA's job is to
 make sure the comparison is *fair* — every universe used the same
 spec, the same container image, the same recipe text. If anything
-drifted, `lc verify` will flag it.
+drifted, `lc status` and `lc verify` will flag it.
 
 ## Decision constraints
 
diff --git a/docs/user/troubleshooting.md b/docs/user/troubleshooting.md
index 64848509..6bad6371 100644
--- a/docs/user/troubleshooting.md
+++ b/docs/user/troubleshooting.md
@@ -104,30 +104,6 @@ no longer exists. Usually caused by:
 
 Fix: `lc run` the downstream output. The chain will re-anchor.
 
-## "Active lc-build loop detected"
-
-You're picking up a session where a previous `/lc-build` was
-interrupted. The session-start hook prints this in the banner. To
-resume the loop, run `/lc-build --universe <name>`. To cancel it,
-`/cancel-ralph`.
-
-## The build loop runs forever / never says complete
-
-`/lc-build` defaults to a 25-iteration cap. If it's not making
-progress, that's a sign the analysis hit a real problem the agent
-can't resolve on its own — typically a missing dependency, an
-unparseable error, or a step that needs a human decision.
-
-What helps:
-
-- Read the last few iterations carefully — the agent usually
-  describes the blocker.
-- If there's an "open question" the agent flagged, answer it and
-  reinvoke `/lc-build`. The plan file persists; the loop picks up
-  where it left off.
-- A `/clear` followed by `/lc-build` doesn't lose state — only
-  context.
-
 ## Claude Code says it can't write a file
 
 The default permission tier (`recommended`) blocks edits to a few
diff --git a/docs/user/tutorial.md b/docs/user/tutorial.md
index 2076f6b8..29c1667d 100644
--- a/docs/user/tutorial.md
+++ b/docs/user/tutorial.md
@@ -100,70 +100,59 @@ Phase 4 (**FINALIZE**) runs `astra validate astra.yaml` and writes
 `universes/baseline.yaml`. You're handed back a short summary table —
 two outputs, one decision, zero prior insights.
 
-The agent suggests `/clear` to free up context, then `/lc-build`. Take
-its advice.
+The agent may suggest `/clear` to free up context. Take its advice,
+then ask Claude Code to implement the spec.
 
-## 3. Build it with `/lc-build`
+## 3. Build it
 
 ```
 /clear
-/lc-build
+Implement this analysis from astra.yaml. Write the scripts, run the baseline universe, and verify the result.
 ```
 
-**Phase 1: plan.** The agent reads everything (spec, universe file,
-empty `scripts/` dir, the references in `.claude/guides/`) and writes a
-build plan to `.lightcone/plans/build-plan-baseline.md`. It might look
-like this:
+The agent reads everything (spec, universe file, empty `scripts/` dir,
+and the references in `.claude/guides/`) and makes an implementation
+checklist. It might look like this:
 
 ```
 1. Add Python deps (scikit-learn, matplotlib) to requirements.txt
 2. Write Containerfile if missing
 3. scripts/fit.py — accepts --standardize {standardized,raw}, writes r2.json
 4. scripts/plot.py — reads r2_dir, writes fit_plot.png
-5. lc build to build the container
-6. lc run --universe baseline
-7. /lc-verify
+5. lc run --universe baseline
+6. lc status
+7. astra validate astra.yaml
+8. lc verify
 ```
 
-It asks you to approve. Pick "Approve and start building."
+It works through the checklist one item at a time. You'll see commands
+like:
 
-**Phase 2: loop.** The agent works through the plan one item at a
-time. You'll see lines like:
-
-```
-▶ scripts/fit.py — writing
-▶ lc build — building image lc-r2-decision-demo-9a1f3...
-▶ lc run accuracy --universe baseline
-▶ ✓ ok    r2
-▶ ▶ scripts/plot.py — writing
-▶ ✓ ok    fit_plot
-✓ build complete
+```bash
+lc run --universe baseline
+lc status
+astra validate astra.yaml
+lc verify
 ```
 
-The agent commits after each successful output, so your `git log` is a
-clean record of the build.
-
-## 4. Verify it with `/lc-verify`
+Expected `lc status` output:
 
 ```
-/lc-verify
+Universe baseline
+  ✓ ok    r2
+  ✓ ok    fit_plot
 ```
 
-Read-only audit:
-
-```
-| Check                    | Status |
-|--------------------------|--------|
-| Spec validation          | ✓      |
-| Materialization (2/2)    | ✓      |
-| Decision-code alignment  | ✓      |
-| Results match spec (2/2) | ✓      |
-```
+Expected validation and verification output is boring in the best way:
+`astra validate astra.yaml` exits cleanly, and `lc verify` reports no
+tampering, broken provenance chain, or missing manifests. If anything
+fails, ask the agent to fix the concrete error and rerun the same
+commands.
 
-If anything fails, the agent suggests a fix. Re-run `/lc-build` or fix
-by hand.
+The agent commits after each successful output, so your `git log` is a
+clean record of the build.
 
-## 5. Add the second universe
+## 4. Add the second universe
 
 The whole point of decisions is to sweep them. Drop out of Claude
 Code (`Ctrl+D` or `/exit`) and create the second universe:
@@ -194,7 +183,7 @@ Universe raw
 Each universe has its own `results/<universe>/` tree. The two `r2.json`
 files are the comparison your paper figure needs.
 
-## 6. Verify integrity
+## 5. Verify integrity
 
 ```bash
 lc verify

From a97f8146806f75ef078595ebc8c25ec52b83bcfa Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 12:44:32 +0200
Subject: [PATCH 026/124] Clarify code augmentation in paper reproduction
 skills

---
 claude/lightcone/skills/lc-from-code/SKILL.md   | 17 ++++++++++++-----
 claude/lightcone/skills/lc-from-paper/SKILL.md  |  2 ++
 .../lc-from-paper/references/implement.md       |  2 ++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-code/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
index eb622671..45b01f0f 100644
--- a/claude/lightcone/skills/lc-from-code/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -1,12 +1,12 @@
 ---
 name: lc-from-code
-description: Bring an existing project into ASTRA / lightcone-cli, starting from the code. Scans the codebase, generates astra.yaml, parameterizes decisions, and runs until outputs materialize. Triggers on "migrate", "convert", "existing project", "wrap this code", "start from code".
+description: Bring an existing project into ASTRA / lightcone-cli, starting from the code. Scans the codebase, drafts or augments astra.yaml, parameterizes decisions, and runs until outputs materialize. Triggers on "migrate", "convert", "existing project", "wrap this code", "start from code".
 allowed-tools: Read, Write, Edit, Glob, Grep, Bash(astra:*), Bash(lc:*), Bash(python:*), Bash(pip:*), Bash(git:*), Bash(mkdir:*), Bash(ls:*), Agent, AskUserQuestion
 ---
 
 # /lc-from-code
 
-End-to-end migration: scan existing code, generate the ASTRA spec, parameterize decisions in the code, and run until everything materializes. The user's existing logic stays intact — changes should be minimal.
+End-to-end migration: scan existing code, draft or add to `astra.yaml`, parameterize decisions in the code, and run until everything materializes. This works both as a fresh start from code and as an augmenting pass inside an existing ASTRA project. The user's existing logic stays intact — changes should be minimal.
 
 ## References
 
@@ -14,7 +14,12 @@ End-to-end migration: scan existing code, generate the ASTRA spec, parameterize
 
 ## Phase 1: Scan & Spec
 
-First, read the Decisions section of [ASTRA Reference](../../guides/astra-reference.md), then spawn an Explore subagent to scan the project. Include the decision criteria in the prompt so the subagent can classify candidates:
+First, read the Decisions section of [ASTRA Reference](../../guides/astra-reference.md), then decide which mode applies:
+
+- **Fresh migration:** no meaningful `astra.yaml` exists yet. Use the code scan to draft `astra.yaml` and `universes/baseline.yaml`.
+- **Augment existing ASTRA:** `astra.yaml` already exists from a paper, user interview, or prior ASTRA work. Use the code scan to add to the current spec — recipes, dependencies, containers, code-backed decision options, baseline selections, implementation notes, and missing inputs / outputs where they naturally belong. Do not create a second `astra.yaml`, do not replace the existing structure wholesale, and surface major structure conflicts to the user before reshaping the spec.
+
+Then spawn an Explore subagent to scan the project. Include the decision criteria in the prompt so the subagent can classify candidates:
 
 ```
 Agent(subagent_type="Explore", prompt="""
@@ -49,7 +54,9 @@ For reference, here are the decision criteria for classifying candidates:
 """)
 ```
 
-Write the scan results to `CLAUDE.md` under `## Project Notes` as a script inventory, then draft `astra.yaml` from the scan results following the spec structure documented in `.claude/guides/astra-reference.md`. Use the decision criteria from [ASTRA Reference](../../guides/astra-reference.md) to filter the subagent's candidate decisions down to only true analytical choices — most hardcoded values are implementation details, not decisions. Use current hardcoded values as defaults.
+Write the scan results to `CLAUDE.md` under `## Project Notes` as a script inventory, then draft or add to `astra.yaml` from the scan results following the spec structure documented in `.claude/guides/astra-reference.md`. Use the decision criteria from [ASTRA Reference](../../guides/astra-reference.md) to filter the subagent's candidate decisions down to only true analytical choices — most hardcoded values are implementation details, not decisions. Use current hardcoded values as defaults.
+
+In augment mode, preserve the existing paper-derived or user-derived `inputs`, `outputs`, `decisions`, `findings`, and `narrative` unless the code scan shows a real conflict. Attach code evidence to the nearest existing home first. Create new ASTRA structure only when the code reveals a real analysis object that has no suitable home in the current spec.
 
 For each output, list the upstream artifacts it depends on under `Output.inputs: [...]` and the decisions it consumes under `Output.decisions: [...]`. Then add a `recipe.command` template that references each via `{inputs.<id>}` / `{decisions.<id>}` and writes to `{output}`. Example:
 
@@ -68,7 +75,7 @@ outputs:
         --output {output}
 ```
 
-Also generate `universes/baseline.yaml` with all defaults matching the current hardcoded values (so the first run reproduces existing behavior).
+Also generate or update `universes/baseline.yaml` with all defaults matching the current hardcoded values (so the first run reproduces existing behavior).
 
 Write to `astra.yaml` and `universes/baseline.yaml`, then validate: `astra validate astra.yaml`. Fix any errors.
 
diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 9cecc9fb..0434b532 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -19,6 +19,8 @@ description: >
 
 Reproduce a published paper in ASTRA. The skill is **interview-first**: a short interactive crafting phase up front that produces both a **per-paper reproduction constitution** and a **per-paper `CLAUDE.md`**. After the interview, lc-from-paper hands the constitution to a multi-session loop that drives the reproduction. Successive iterations survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
 
+This is a **composer skill**. It coordinates a paper reproduction by proactively invoking the relevant sibling skills at the right stage — `/paper-extraction` for ACQUIRE, `/constitution` during INTERVIEW, `/narrative` during SPECIFY, `/lc-from-code` strategies when substantial reference code needs migrating into the current `astra.yaml`, and the review skills at close-out. Do not silently re-implement sibling skill behavior inside lc-from-paper; call the skill or explicitly follow its workflow where the phase says to.
+
 This is a Claude-Code-native skill. There is no Python orchestrator, no state machine, no resume mechanic — the workdir on disk + git history are the substrate.
 
 A reproduction does not fit in one context window. The loop is, in its simplest form, a way to split one goal across many context windows so each iteration starts uncluttered. That's the substrate, not an aesthetic.
diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index ca73927a..008d1a09 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -30,6 +30,8 @@ If `work/reference/code/` exists, **read the relevant code on every iteration**
 
 Without this discipline, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced.
 
+When the reference code is substantial enough that implementation is really a migration of an existing codebase, follow `/lc-from-code`'s migration workflow in **augment existing ASTRA** mode. Use its code scan, minimal parameter-plumbing, dependency/container, and baseline-preservation strategies, but apply them to this reproduction's existing `astra.yaml`. Do not create a second ASTRA project or duplicate the spec; add recipes, code-backed options, implementation notes, and missing structure to the current reproduction artifact.
+
 ### Parallelize where feasible
 
 When outputs are produced by independent scripts (no shared expensive computation), spawn one Task-tool sub-agent per output. Each sub-agent gets:

From 41e5183e2cd39d4ca4b23407df81c0c790242149 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 13:18:21 +0200
Subject: [PATCH 027/124] Clarify baseline universe skill behavior

---
 claude/lightcone/skills/lc-from-code/SKILL.md | 2 +-
 claude/lightcone/skills/lc-new/SKILL.md       | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-code/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
index 45b01f0f..4dae304c 100644
--- a/claude/lightcone/skills/lc-from-code/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -83,7 +83,7 @@ Use `AskUserQuestion` to ask the user to review the spec — they can open `astr
 
 ## Phase 2: Implement
 
-Parameterize the code so decisions can be varied across universes. The goal is minimal changes to user code. Use your best judgement for the approach — the options below are not exhaustive:
+Parameterize the code from ASTRA decisions so the baseline run reproduces the existing behavior. The goal is minimal changes to user code. Use your best judgement for the approach — the options below are not exhaustive:
 
 **For scripts with hardcoded values:** Add argparse (or extend existing argument parsing) and replace hardcoded values with the parsed args. This is the simplest case.
 
diff --git a/claude/lightcone/skills/lc-new/SKILL.md b/claude/lightcone/skills/lc-new/SKILL.md
index b4060cde..db95c21a 100644
--- a/claude/lightcone/skills/lc-new/SKILL.md
+++ b/claude/lightcone/skills/lc-new/SKILL.md
@@ -118,6 +118,8 @@ Stage banner: FINALIZING
 astra universe generate -n baseline
 ```
 
+Generate only `baseline` unless the user explicitly asks for additional universes.
+
 ### Populate Narrative
 
 Replace the TODO entries in `astra.yaml`'s `narrative:` block now that structure is stable: `summary` (one-paragraph framing), `methods` (decisions and sub-analyses), `inputs`, `outputs`. Use `#path.to.element` anchors for cross-references. Leave `findings` as TODO until results exist.

From d8adffa1567cfcb7acaf4130bdbd8f757922b9c0 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sat, 9 May 2026 13:31:14 +0200
Subject: [PATCH 028/124] Make from-paper skill instructions imperative

---
 .../lightcone/skills/lc-from-paper/SKILL.md   | 34 ++++++-------------
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 0434b532..f6d51d9a 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -1,36 +1,24 @@
 ---
 name: lc-from-paper
 description: >
-  Reproduce a published scientific paper in ASTRA. Interview the user
-  about the paper and the intended scope, draft a per-paper reproduction
-  constitution, then launch a ralph loop that drives the multi-session
-  reproduction work. The loop is 9 phases bookended by two always-interactive
-  seams (INTERVIEW at start, REVIEW at close-out); ARCHITECT writes a stub
-  astra.yaml decomposition before SPECIFY's two-pass-per-sub-analysis fills
-  it in. Composes sibling skills for each phase: paper-extraction for
-  ACQUIRE and narrative for SPECIFY. Use when the user wants to reproduce
-  a paper, has a DOI or arXiv ID and wants to start a reproduction project,
-  or asks to "reproduce <paper>", "set up reproduction", "lc-from-paper",
-  "/lc-from-paper <doi>", or hands you a published paper as a starting point
-  for ASTRA work.
+  This skill should be used when the user wants to reproduce a published
+  scientific paper in ASTRA, has a DOI/arXiv ID/PDF and wants to start or
+  resume a reproduction project, asks to "reproduce <paper>", "set up
+  reproduction", or "import a paper", or hands over a published paper as the
+  starting point for ASTRA work. It should also be used for existing
+  paper-reproduction workdirs when the user asks to continue, resume, drive the
+  next phase, or close out the reproduction.
 ---
 
 # lc-from-paper
 
-Reproduce a published paper in ASTRA. The skill is **interview-first**: a short interactive crafting phase up front that produces both a **per-paper reproduction constitution** and a **per-paper `CLAUDE.md`**. After the interview, lc-from-paper hands the constitution to a multi-session loop that drives the reproduction. Successive iterations survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
+Run an interview-first paper reproduction workflow in ASTRA. Start with a short interactive crafting phase that produces both a **per-paper reproduction constitution** and a **per-paper `CLAUDE.md`**. After the interview, hand the constitution to a multi-session loop that drives the reproduction. On each iteration, survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
 
-This is a **composer skill**. It coordinates a paper reproduction by proactively invoking the relevant sibling skills at the right stage — `/paper-extraction` for ACQUIRE, `/constitution` during INTERVIEW, `/narrative` during SPECIFY, `/lc-from-code` strategies when substantial reference code needs migrating into the current `astra.yaml`, and the review skills at close-out. Do not silently re-implement sibling skill behavior inside lc-from-paper; call the skill or explicitly follow its workflow where the phase says to.
+Treat lc-from-paper as a **composer skill**. Coordinate the reproduction by proactively invoking the relevant sibling skills at the right stage: `/paper-extraction` for ACQUIRE, `/constitution` during INTERVIEW, `/narrative` during SPECIFY, `/lc-from-code` strategies when substantial reference code needs migrating into the current `astra.yaml`, and the review skills at close-out. Do not silently re-implement sibling skill behavior inside lc-from-paper; call the skill or explicitly follow its workflow where the phase says to.
 
-This is a Claude-Code-native skill. There is no Python orchestrator, no state machine, no resume mechanic — the workdir on disk + git history are the substrate.
+Keep the workflow Claude-Code-native. Use the workdir on disk plus git history as the substrate; do not introduce a Python orchestrator, state machine, or separate resume mechanic.
 
-A reproduction does not fit in one context window. The loop is, in its simplest form, a way to split one goal across many context windows so each iteration starts uncluttered. That's the substrate, not an aesthetic.
-
-## When to use this skill
-
-- The user has a paper (DOI, arXiv ID, or PDF) and wants to reproduce its analysis
-- The user invokes `/lc-from-paper` (with or without an argument)
-- The user is starting a fresh reproduction project under `Reproductions/<collab>/<short-name>/`
-- An existing paper-reproduction workdir needs the next phase driven forward (in which case skip the interview, see "Resuming an in-flight reproduction" below)
+Split the reproduction across context windows deliberately. Use the loop to keep each iteration uncluttered while preserving continuity through the workdir, constitution, `CLAUDE.md`, and git history.
 
 ## The bundle
 

From b9089c67d138b8f49a96bf628455577891d0df78 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sun, 10 May 2026 13:49:57 +0200
Subject: [PATCH 029/124] Streamline from-paper phase workflow

---
 .../lightcone/skills/lc-from-paper/SKILL.md   | 228 ++++--------------
 1 file changed, 45 insertions(+), 183 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index f6d51d9a..d50094a2 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -12,207 +12,69 @@ description: >
 
 # lc-from-paper
 
-Run an interview-first paper reproduction workflow in ASTRA. Start with a short interactive crafting phase that produces both a **per-paper reproduction constitution** and a **per-paper `CLAUDE.md`**. After the interview, hand the constitution to a multi-session loop that drives the reproduction. On each iteration, survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized.
+Run an interview-first paper reproduction workflow in ASTRA. Use the workdir, the per-paper constitution, `CLAUDE.md`, and git history as the continuity layer across sessions. Survey the workdir at the start of each session, choose the current phase, read that phase's reference in full, and execute one or two phases before stopping at a clean handoff point.
 
-Treat lc-from-paper as a **composer skill**. Coordinate the reproduction by proactively invoking the relevant sibling skills at the right stage: `/paper-extraction` for ACQUIRE, `/constitution` during INTERVIEW, `/narrative` during SPECIFY, `/lc-from-code` strategies when substantial reference code needs migrating into the current `astra.yaml`, and the review skills at close-out. Do not silently re-implement sibling skill behavior inside lc-from-paper; call the skill or explicitly follow its workflow where the phase says to.
+## Phase Workflow
 
-Keep the workflow Claude-Code-native. Use the workdir on disk plus git history as the substrate; do not introduce a Python orchestrator, state machine, or separate resume mechanic.
+Read [`references/interview.md`](references/interview.md) before starting a fresh reproduction. The interview identifies the paper, scopes the target outputs, chooses runtime and rigor settings, decides which phases should run inline or in sub-agents, drafts the per-paper constitution with [`/constitution`](../constitution/SKILL.md), and writes the per-paper `CLAUDE.md`.
 
-Split the reproduction across context windows deliberately. Use the loop to keep each iteration uncluttered while preserving continuity through the workdir, constitution, `CLAUDE.md`, and git history.
+After the interview, drive the reproduction through these phases. Invoke the named sibling skills when the phase reaches their work; they carry the phase-local procedure.
 
-## The bundle
+| # | Phase | Reference | Skill composition | Outputs |
+|---|---|---|---|---|
+| 1 | ACQUIRE | [`references/acquire.md`](references/acquire.md) | Use [`/paper-extraction`](../paper-extraction/SKILL.md). | `work/reference/{source/ \| document.md, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml}` |
+| 2 | ARCHITECT | [`references/architect.md`](references/architect.md) | Use exploration sub-agents for paper/code indexing when helpful. | stub `astra.yaml`; `work/notes/architect/{paper-index.md, code-index.md}`; `work/notes/cited_papers.yaml` |
+| 3 | SPECIFY | [`references/specify.md`](references/specify.md) | Use [`/narrative`](../narrative/SKILL.md). Use [`/lc-from-code`](../lc-from-code/SKILL.md) in augment mode when substantial reference code should add to the current `astra.yaml`. | filled `astra.yaml`; `universes/baseline.yaml`; `targets/targets.md`; `implementation-notes.md` |
+| 4 | LITERATURE | [`references/literature.md`](references/literature.md) | Use parallel sub-agents for cited-paper resolution when useful. | `prior_insights:` evidence selectors resolved in `astra.yaml`; cited-paper notes under `work/notes/literature/` |
+| 5 | IMPLEMENT | [`references/implement.md`](references/implement.md) | Use implementation and review sub-agents according to the rigor setting. | `scripts/`, `requirements.txt`, executable recipes in `astra.yaml` |
+| 6 | RUN | [`references/run.md`](references/run.md) | Run the declared recipes and diagnose failures from command output. | `results/baseline/<output>/` |
+| 7 | COMPARE | [`references/compare.md`](references/compare.md) | Compare reproduced artifacts against the paper targets. | `comparison-report.{yaml,md}` |
+| 8 | REVIEW | [`references/review.md`](references/review.md) | Use [`/figure-comparison`](../figure-comparison/SKILL.md); optionally use [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md). | `REPRODUCTION-SUMMARY.md`, `.lightcone/comparison.html`, resolved `open-questions.md`, finalized constitution outcome |
 
-lc-from-paper composes the rest of the lightcone-cli paper-reproduction bundle. All siblings live in the same `claude/lightcone/skills/` directory and are available without separate installs:
+Iterate COMPARE -> IMPLEMENT -> RUN -> COMPARE until the verdict passes, the attempt budget is exhausted, or the user accepts a partial reproduction. Run REVIEW as the close-out phase after the comparison loop terminates.
 
-| Sibling skill | Where it's invoked |
-|---|---|
-| [`/paper-extraction`](../paper-extraction/SKILL.md) | ACQUIRE — turns an arXiv ID or DOI into `work/reference/` (structural index + stub `astra.yaml`); arXiv LaTeX source primary, PDF + Docling fallback |
-| [`/constitution`](../constitution/SKILL.md) | INTERVIEW — drafting the per-paper reproduction constitution |
-| [`/ralph-loops`](../ralph-loops/SKILL.md) | After interview — launches the loop that drives all subsequent phases (when the chosen runtime mode is one of the loop modes) |
-| [`/narrative`](../narrative/SKILL.md) | SPECIFY — authoring the `narrative:` and `rationale:` prose in `astra.yaml` |
-
-lc-from-paper does not re-implement what these skills already do — it tells the agent at each phase to invoke them. The siblings stand alone; they don't know about lc-from-paper.
-
-Two further siblings are invoked from **REVIEW** (the close-out), the always-interactive phase that runs after the COMPARE → IMPLEMENT loop terminates: [`/figure-comparison`](../figure-comparison/SKILL.md) builds a portable side-by-side HTML report (paper artifacts vs reproduced), and [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) (optional) audits paper claims against code locations. Both have `AskUserQuestion` in their `allowed-tools`, so REVIEW runs interactively in the main loop session — spawning them under the `Task` tool would fire prompts into nothing.
-
-The phase name **REVIEW** is the close-out (replacing what was briefly called SUMMARIZE_RUN); the rigor-dialed self-review pass that previously lived in a pre-implement REVIEW phase folded into ARCHITECT, SPECIFY, and IMPLEMENT as their internal cross-check. Same word, different jobs — the close-out is named by phase boundary, the self-reviews are named by their host phase.
-
-## Workflow
-
-### Interview (interactive — once per project)
-
-The interview is the first of two always-interactive bookends — INTERVIEW at the start, REVIEW at the close-out. Every phase between them is configurable per the user's per-phase mode choice. Read [`references/interview.md`](references/interview.md) in full before starting.
-
-The interview has six jobs:
-
-1. **Identify the paper** — DOI / arXiv ID / title; whether code is available; whether the user has prior experience with this paper.
-2. **Scope the reproduction** — full reproduction vs targeted (e.g. only the BAO fit), which figures/tables/numbers are the targets. The user's named targets get declared as `outputs:` in the stub `astra.yaml` during ARCHITECT and filled with evidence-backed `findings:` / `decisions:` during SPECIFY — there is no separate target-extraction phase.
-3. **Pick a runtime mode** — interactive / bash-loop / tmux-orchestrated. See "Runtime modes" below.
-4. **Pick a termination criterion** — frugality (weak) vs rigor (strong). The dial threads through ARCHITECT, SPECIFY, and IMPLEMENT, scaling each phase's internal self-review depth. See "Frugality vs rigor" below.
-5. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. Only INTERVIEW and REVIEW (close-out) are mandatory-interactive; every other phase is the user's call.
-6. **Draft the per-paper constitution and CLAUDE.md** — invoke `/constitution` to draft the constitution. Author the per-paper `CLAUDE.md` from the same conversation. The two files have separate jobs and don't overlap:
-
-   - **`CLAUDE.md`** is *info and rules* — paper identity (DOI / arXiv ID / title / authors), where the original code lives (`work/reference/code/`), the code-as-canonical rule, the never-block-on-`AskUserQuestion`-mid-sub-agent rule, any paper-specific conventions or warnings, pointers to the constitution and `open-questions.md`. Auto-loaded by Claude Code on every walk-up to this directory. **Evolves over time** — iterations that learn new conventions or surface paper-specific gotchas can add lines so future sessions don't re-derive the same context.
-   - **The constitution** is *desired state* — what "done" looks like, evidence checks, scope fence, the runtime mode the user chose, the termination criterion (weak/strong), per-phase routing (interactive vs sub-agent), and the open-questions section iterations resolve. Read by the runner each iteration as the explicit task.
-
-   CLAUDE.md tells you *what kind of place this is*; the constitution tells you *what we're doing here and when we're done*.
+## Runtime and Rigor
 
-Both files live inside the reproduction's directory. After they are approved the interview ends, and lc-from-paper launches whichever runtime the user chose.
-
-### Runtime modes
-
-The interview asks the user to pick *how* the loop runs. Three modes, picked from environment + preference:
+Offer three runtime modes during the interview:
 
 | Mode | What runs | Right when |
 |---|---|---|
-| **(1) Interactive** | No autonomous loop. The user prompts through phases by hand from the same Claude session, one or two phases at a time. | Tight control, small paper, or token budget is tight. No new substrate beyond Claude itself. |
-| **(2) Bash-loop** | A plain shell loop the user pastes into a terminal (`while …; do claude --dangerously-skip-permissions … ; done`-shaped). No tmux dependency. | Tmux isn't available locally and the connection is stable. Fragile across SSH disconnects unless wrapped in `nohup` — and `nohup` blocks interaction, so for unstable connections this isn't really a fix; mode (3) is. |
-| **(3) Tmux-orchestrated** | A loop inside a tmux session lc-from-paper drives directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the tmux pane, monitors, intervenes. | The smoothest path whenever tmux is available. Becomes the de-facto default once `lc launch claude` ships its registry-shipped python-slim agent container with tmux pre-installed. |
-
-The interview probes for tmux availability with `command -v tmux` and only offers mode (3) when present. Mode (3) is preferred when it's available; it isn't required.
-
-### Frugality vs rigor
-
-Independent of mode, the interview asks the user to pick the loop's termination criterion:
-
-- **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights.
-- **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens.
-
-Strong is the default for fidelity-critical reproductions; weak is the default when the user explicitly wants to cap token spend. The choice goes into the per-paper constitution (alongside the runtime-mode choice) and is honored by every iteration.
-
-### Phases (driven by ralph iterations after the interview)
-
-Inside each ralph iteration, the agent reads the per-paper constitution, surveys the workdir to determine which phase is current (file existence + git log), and runs that phase's reference. Each phase reference is self-contained — read the matching one in full before working:
-
-| # | Phase | Reference | Outputs |
-|---|---|---|---|
-| 1 | ACQUIRE | [`references/acquire.md`](references/acquire.md) | `work/reference/{source/ \| document.md, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml}` |
-| 2 | ARCHITECT | [`references/architect.md`](references/architect.md) | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative — no anchors yet); `work/notes/architect/{paper-index.md, code-index.md}`; `work/notes/cited_papers.yaml`; rigor-dialed self-review |
-| 3 | SPECIFY | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (decisions + findings authored from the paper, prior_insights as citation-only **placeholders**, anchored narrative); `universes/baseline.yaml`; `implementation-notes.md`; `targets/targets.md`; per-sub-analysis rigor-dialed self-review |
-| 4 | LITERATURE | [`references/literature.md`](references/literature.md) | `astra.yaml` with `prior_insights:` placeholders **resolved** (`evidence:` selectors authored against the cited papers); per-paper PDFs cached via `astra paper add`; rigor-dialed self-review |
-| 5 | IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml`; rigor-dialed paper-vs-implementation review iterations |
-| 6 | RUN | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
-| 7 | COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
-| 8 | REVIEW (close-out) | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, (optional) sentence audit, resolved `open-questions.md`, finalized constitution outcome |
-
-The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it. On pass (or user-accepted partial), control returns to the user and REVIEW runs interactively in the main session — drafting the report, invoking `/figure-comparison`, optionally `/check-sentence-by-sentence`, walking accumulated questions, and finalizing the constitution outcome.
-
-ACQUIRE folds in what was previously a separate PARSE phase: arxiv-LaTeX papers come pre-structured in their tarball (no Docling needed), and PDF-fallback papers run Docling inside ACQUIRE itself to produce `document.md` + extracted figures/tables. ARCHITECT replaces the old STUDY: instead of writing per-section paper-vs-code agreement-check files in markdown that SPECIFY would re-author into YAML, ARCHITECT writes the structural skeleton of `astra.yaml` directly (sub-analyses, inputs, outputs, narrative prose). SPECIFY then fills it in with `decisions:` and `findings:` and `astra-anchor:` references; `prior_insights:` are recorded as citation-only placeholders for LITERATURE to resolve next — fetching cited papers speculatively before SPECIFY would do work for citations that may never end up needed, so LITERATURE comes *after* SPECIFY now. LITERATURE then iterates over each placeholder, caches the cited paper via `astra paper add`, and authors the resolved `evidence:` selectors back into `astra.yaml`. The pre-implement REVIEW phase folded into ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT as a rigor-dialed self-review discipline at every artifact-producing seam, freeing the REVIEW *name* for the close-out (replacing SUMMARIZE_RUN, whose name was a verb stuck describing one piece of what the close-out actually does).
-
-### Per-phase mode (interactive vs sub-agent)
-
-A reproduction's most consequential decisions show up at known seams. Only the bookends are mandatory-interactive — INTERVIEW at the start, REVIEW (close-out) at the end. Every phase between them is configurable: the interview decides which run interactively (in the main loop session, the user reachable via `AskUserQuestion`) and which delegate to a sub-agent (Task tool with fresh context, no user reach).
-
-Defaults the constitution starts with:
-
-| # | Phase | Default | Why |
-|---|---|---|---|
-| 0 | INTERVIEW | **interactive — *always*** | The first bookend. Scope, runtime, rigor, per-phase mode all decided here. |
-| 1 | ACQUIRE | user choice | Mostly mechanical (LaTeX-tarball download / Docling fallback / code clone); surfacing happens only on download failures. |
-| 2 | ARCHITECT | sub-agent (two parallel Explore + synthesis; rigor-dialed self-review) | Two Task-tool sub-agents fan out (one paper-side, one code-side) and produce indexes; a synthesis sub-agent writes the stub `astra.yaml`. Rigor-dialed fresh-context self-review pass cross-checks the stub before SPECIFY runs. |
-| 3 | SPECIFY | user choice (default interactive); two-pass-per-sub-analysis | **Paper pass**: authors `decisions:` and `findings:` with paper-anchored evidence; records citation markers (`[12]`, `Smith+24`) as `prior_insights:` placeholders (citation-only — no `evidence:` selector yet, LITERATURE fills those in); weaves `astra-anchor:` references into the existing narrative. **Code pass** (when code present): augments / amends with code-as-canonical insights and material-disagreement entries; surfaces material conflicts via `AskUserQuestion` (interactive) or `<paper-slug>/open-questions.md` (sub-agent). **Self-review** (rigor-dialed): fresh-context sub-agent per sub-analysis. Per-sub-analysis parallelism when independent. |
-| 4 | LITERATURE | sub-agent (rigor-dialed self-review) | Reads SPECIFY's `prior_insights:` placeholders, caches each cited paper via `astra paper add`, and authors the resolved `evidence:` selectors back into `astra.yaml`'s `prior_insights[<id>].evidence[]` so each placeholder becomes a verified citation. One sub-agent per cited paper — pure parallel grunt-work. Self-review: fresh-context sub-agent reads each `prior_insights:` entry against its cited paper and asks "does this evidence actually justify the decision/finding it's attached to?" Core, not opt-in: verifiability against citations is what `prior_insights` evidence depends on. |
-| 5 | IMPLEMENT | sub-agent (rigor-dialed review iterations) | Writes recipes + scripts (parallelized by output where feasible). Frugal: minimal review pass after. Rigor: N rounds of fresh-context "is the implementation consistent with the paper?" review + fix iterations. |
-| 6 | RUN | user choice | Mechanical, but failures need diagnosis. |
-| 7 | COMPARE | user choice | Verdict (was the reproduction close enough?) is the user's call when interactive; sub-agent COMPARE writes the verdict and lets REVIEW (close-out) ratify. |
-| 8 | REVIEW (close-out) | **interactive — *always*** | The closing bookend. Drafts the report, runs `/figure-comparison` (mandatory) and `/check-sentence-by-sentence` (opt-in), walks `open-questions.md` with `AskUserQuestion`, finalizes the constitution outcome. |
-
-The constitution records the choice; iterations honor it. Sub-agent phases are spawned via the `Task` tool from inside the main loop session — that gives them fresh context but no user-reach. Interactive phases run inline in the loop session and may pause with `AskUserQuestion` at material seams.
-
-### Rigor vs frugality threads through ARCHITECT, SPECIFY, and IMPLEMENT
-
-The frugality/rigor dial picked in INTERVIEW is not just a termination criterion for the COMPARE → IMPLEMENT loop. It also tunes how aggressively each artifact-producing phase self-checks. Same shape at every seam:
-
-- **Frugal**: skip self-review, or run one fresh-context sub-agent pass and incorporate fixes once.
-- **Rigor**: N rounds of fresh-context sub-agent review + fix. Each round runs a brand-new reviewer that does NOT see prior rounds' findings or fixes. Stop when two consecutive rounds find no fixes (strong-termination), or after 5 rounds (system cap), whichever comes first.
-
-The artifact under review changes per phase — ARCHITECT reviews the stub `astra.yaml`; SPECIFY reviews each sub-analysis's filled spec; IMPLEMENT reviews `scripts/` + recipes against paper + code — but the cross-check shape is constant.
-
-The discipline is **never bias the reviewing sub-agent**: each round runs from fresh context with the prompt "check the artifact is consistent with the paper and the code" — not "here's what was just fixed; check it." Otherwise the reviewer pattern-matches on prior fixes rather than thinking from first principles.
-
-### Code-as-canonical
-
-When the original codebase is available at `work/reference/code/`, **the agent reads relevant code on every iteration when implementing**. Where paper and code disagree, the **code is canonical** for numerics, plotting, and method; the agent continues with the code's behavior and either ratifies (interactive phases) or logs (sub-agent / loop phases) the disagreement so the user resolves at the next interactive seam.
-
-This is the load-bearing fidelity discipline. Without it, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced (plot styles off, numerical results off). The per-paper CLAUDE.md restates the rule so every iteration's Claude session walks up to it.
-
-### Two surfaces for user attention: open-questions and REVIEW (close-out)
-
-The reproduction has two periods of human reach — the bookends. INTERVIEW at the start, REVIEW (close-out) at the end. In between, the loop runs without a human in the conversation. The discipline has two surfaces to match:
-
-- **`<paper-slug>/open-questions.md` — the during-loop accumulator.** When a sub-agent or loop iteration would normally surface a question to the user (paper-vs-code conflicts, figures whose intent isn't obvious, ambiguities the constitution doesn't resolve), it appends the question to `open-questions.md` and continues with the best-judgment default. Never block on `AskUserQuestion` from inside a sub-agent — the prompt fires into nothing.
-
-- **REVIEW (close-out) — the post-loop interactive close-out.** When the COMPARE→IMPLEMENT loop terminates (verdict=pass or budget exhausted), control returns to the user. REVIEW invokes `/figure-comparison` and (optionally) `/check-sentence-by-sentence` interactively — these skills can use `AskUserQuestion` because the human is back. Then it walks the user through `open-questions.md` with `AskUserQuestion`, lands resolutions, updates `astra.yaml` or `implementation-notes.md` accordingly, drafts `REPRODUCTION-SUMMARY.md`, and finalizes the constitution outcome.
-
-Stays in the conversation while the seams are still soft, walks away while the loop grinds, comes back to a rich review surface plus a list of "things you'd want to know."
-
-### Material conflicts (the SPECIFY code-pass seam)
+| Interactive | The user prompts through phases by hand from the current session. | Tight control, small paper, or token budget is tight. |
+| Bash-loop | A plain shell loop runs one session after another. | Tmux is unavailable and the connection is stable. |
+| Tmux-orchestrated | [`/ralph-loops`](../ralph-loops/SKILL.md) runs the loop inside a tmux session. | Preferred when tmux is available. |
 
-SPECIFY's code pass (per sub-analysis) is where paper-vs-code material disagreements surface. The paper pass authors decisions / findings from the paper alone; the code pass cross-checks them against the implementation. When paper and code disagree on something material:
+Set the rigor dial in the constitution:
 
-- **Material** = a different choice would plausibly change a numeric result the paper reports.
-- **Stylistic / cosmetic / pure-tooling differences** are not material — record them in `implementation-notes.md` and move on.
-- **Code is canonical** for numerics and method per "Code-as-canonical" above.
-- **Interactive SPECIFY**: surface the conflict with `AskUserQuestion`. The user picks which option `universes/baseline.yaml` selects.
-- **Sub-agent SPECIFY** (rare; default is interactive): take code as canonical, record the conflict in `open-questions.md`, and preserve both options in `astra.yaml` so the user can flip baseline at REVIEW (close-out).
+- **Frugal:** complete the phase checklist with minimal self-review.
+- **Rigorous:** run fresh-context review and fix rounds for artifact-producing phases until consecutive rounds find no fixes, or until the constitution's cap is reached.
 
-Both choices land in `astra.yaml` as decision options. Whichever the user picks becomes the option selected by `universes/baseline.yaml`; the alternative is preserved as a sibling option for future universe runs. See `references/specify.md` for the full SPECIFY discipline.
+Thread the rigor setting through ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT. Review the current artifact against the paper and code from fresh context; incorporate fixes before advancing phases.
 
-### Resuming an in-flight reproduction
+## Operating Discipline
 
-If the workdir already exists (`work/reference/source/` or `work/reference/document.md` is present, `astra.yaml` exists, etc.):
+- **Workdir survey first.** Determine the current phase from file existence, `git log`, and validation output before acting.
+- **ASTRA CLI checks are the authority.** Use `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`, and the current `astra --help` surfaces for deterministic checks.
+- **Acquire from the richest source.** Prefer arXiv source tarballs when available; use PDF + Docling fallback when source is unavailable.
+- **Code is canonical when present.** Keep original code under `work/reference/code/`; read relevant code during SPECIFY and IMPLEMENT; model numerics, plotting, and method on the code when paper prose and code disagree.
+- **Material disagreements become decisions.** Represent paper-vs-code conflicts as `decisions:` options in `astra.yaml`; select the baseline option according to the user's choice or the code-as-canonical default, and preserve alternatives for later exploration.
+- **ARCHITECT sets structure; SPECIFY fills content.** ARCHITECT writes the sub-analysis skeleton, inputs, outputs, and narrative scaffold. SPECIFY fills `decisions:`, `prior_insights:`, `findings:`, and ASTRA anchors.
+- **Use real inputs.** Unless the paper itself uses synthetic data as input, fetch or query real datasets during IMPLEMENT.
+- **Keep handoffs crisp.** Each session should leave the constitution, `CLAUDE.md`, `open-questions.md`, git history, and phase artifacts clear enough for the next session to resume.
 
-1. **Skip the interview** unless the user explicitly wants to revise scope.
-2. Read the per-paper constitution if it exists; if it does not, draft a minimal one from the current workdir state.
-3. Launch (or re-attach to) the ralph loop. Each iteration's first move is to survey the workdir and determine the current phase.
+## Resuming
 
-Workdir signals (file existence implies the phase has been done):
+If the workdir already exists, read the per-paper constitution and `CLAUDE.md`, survey the files below, and continue from the first incomplete phase. Draft a minimal constitution from current state when one is missing.
 
 | Signal | Phase done |
 |---|---|
-| `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) | ACQUIRE |
-| `work/reference/code/` | ACQUIRE (code clone) |
-| `work/notes/architect/{paper-index.md,code-index.md}` | ARCHITECT (Explore pass) |
-| `astra.yaml` validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
-| `work/notes/cited_papers.yaml` | ARCHITECT (citation extraction) |
-| `astra.yaml` has non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` entries present as citation-only placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
-| `astra.yaml`'s `prior_insights:` entries each have a resolved `evidence:` selector (verified by `astra validate --verify-evidence`); `work/notes/literature/<doi-slug>.yaml` files present (one per cited paper) | LITERATURE |
-| recipes present in `astra.yaml` | IMPLEMENT |
-| `results/<universe>/<output>/` | RUN |
+| `work/reference/source/` or `work/reference/document.md` | ACQUIRE |
+| `work/reference/code/` | ACQUIRE code clone |
+| `work/notes/architect/{paper-index.md,code-index.md}` | ARCHITECT indexing |
+| `astra.yaml` validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT stub |
+| `work/notes/cited_papers.yaml` | ARCHITECT citation extraction |
+| `astra.yaml` has non-empty `decisions:` and `findings:` per sub-analysis, citation-placeholder `prior_insights:`, `targets/targets.md`, and `implementation-notes.md` | SPECIFY |
+| `prior_insights:` entries have resolved `evidence:` selectors verified by `astra validate --verify-evidence`; `work/notes/literature/<doi-slug>.yaml` files exist | LITERATURE |
+| recipes exist in `astra.yaml` | IMPLEMENT |
+| `results/baseline/<output>/` | RUN |
 | `comparison-report.yaml` | COMPARE |
-| `REPRODUCTION-SUMMARY.md` + `.lightcone/comparison.html` + resolved `open-questions.md` | REVIEW (close-out) |
-
-`git log --oneline` complements this — phase commits are the chronological view.
-
-## Skills (activate before working)
-
-- [`/constitution`](../constitution/SKILL.md) — for the interview's drafting phase
-- [`/ralph-loops`](../ralph-loops/SKILL.md) — for the bash-loop and tmux-orchestrated runtime modes
-- [`/paper-extraction`](../paper-extraction/SKILL.md) — for ACQUIRE
-- [`/narrative`](../narrative/SKILL.md) — for SPECIFY
-- [`/figure-comparison`](../figure-comparison/SKILL.md) — for REVIEW (close-out, mandatory)
-- [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) — for REVIEW (close-out, opt-in)
-
-## Discipline
-
-- **lc-from-paper is the workflow story; phase references are the depth.** SKILL.md tells you when to read which reference; the references carry the prompt prose ported from the legacy Paper2ASTRA Python package.
-- **Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each iteration's first move is *survey*.
-- **Deterministic checks live in scripts.** When the answer is yes/no, call the script — `astra validate`, `git log`, `yq`, `ls`. Don't ask the agent to introspect what a deterministic check would tell you.
-- **Use the up-to-date CLI surfaces, not skill-specific wrappers.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces.
-- **arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv where there's no better source.
-- **The original code goes into `work/reference/code/`** during ACQUIRE when available, and stays there as the canonical reference for every subsequent iteration (see "Code-as-canonical" above).
-- **`/figure-comparison` and `/check-sentence-by-sentence` run inside REVIEW (close-out), not inside the loop.** Both have `AskUserQuestion` in their `allowed-tools`; REVIEW is the always-interactive close-out bookend that runs them in the main session so the prompts land. Don't try to spawn either under the `Task` tool from inside the loop.
-- **Only the bookends are mandatory-interactive.** INTERVIEW (start) and REVIEW (close). Every other phase is configurable per the interview's per-phase mode choice — no "always interactive" flag on anything in between. The dial that does the heavy lifting on quality is rigor/frugality, threaded through ARCHITECT, SPECIFY, and IMPLEMENT's internal self-review passes.
-- **Don't bias review sub-agents.** ARCHITECT, SPECIFY, and IMPLEMENT's self-review iterations spawn fresh sub-agents whose prompt is "check the artifact is consistent with the paper and the code" — never "here's what was just authored or fixed last round." Each round runs from a fresh reviewing context. Otherwise the reviewer pattern-matches on prior fixes rather than thinking from first principles.
-- **ARCHITECT decides structure; SPECIFY decides content.** ARCHITECT's two parallel Explore sub-agents (paper-side + code-side) feed a synthesis sub-agent that writes the stub `astra.yaml` — sub-analyses, inputs, outputs, narrative prose. SPECIFY's per-sub-analysis paper pass + code pass + self-review fills in `decisions:`, `prior_insights:`, `findings:` and weaves anchor references into the narrative. Splitting **structure** from **content** keeps each phase's cognitive load bounded.
-- **No synthetic data.** Unless the paper itself uses synthetic data as its input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement phase reference repeats this; treat it as load-bearing.
-- **Tmux preferred-when-available, never required.** Modes (1) and (2) work without it.
-- **The siblings don't know about lc-from-paper.** Each SKILL stands on its own.
-- **Workdir conventions stay.** The phase references preserve Paper2ASTRA's workdir layout (`work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`) so workdirs from the legacy Paper2ASTRA package are interoperable with workdirs driven by this skill.
-
-## Anti-patterns
-
-- **Asking the user mid-sub-agent.** Sub-agent phases cannot reach the user. If a material conflict surfaces in a sub-agent phase, take the code's behavior (or paper's, if no code) as canonical, record the conflict in `open-questions.md` and as a `decisions:` block with both options preserved in `astra.yaml`, and let the next interactive phase ratify. Never make the sub-agent pick silently and discard the alternative.
-- **Re-implementing what astra already does.** If `astra validate` returns clean, do not write a separate validator. If `astra paper add` caches the PDF, do not write a separate cache.
-- **Treating Paper2ASTRA workdir as legacy.** It is not legacy — it is the substrate. The phase references inherit its conventions intentionally.
-- **Bundling everything into one iteration.** Each iteration runs one or two phases, then exits. The constitution is realized across many iterations.
+| `REPRODUCTION-SUMMARY.md` and `.lightcone/comparison.html` | REVIEW |

From 44923a8baa457671324cabee4359bde6b1d7f5ab Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sun, 10 May 2026 17:40:17 +0200
Subject: [PATCH 030/124] Rewrite lc-from-paper as orchestrator + named
 sub-agents

Replace the ralph-loop runtime with a single orchestrator session that
spawns named per-phase sub-agents the user can drop into directly. Per-paper
artifact collapsed from constitution + CLAUDE.md to a single CLAUDE.md
(Goal / Rigor / Disagreements / Rules / Pointers); template extracted to
templates/CLAUDE.md so it's discoverable. Frugal-vs-rigorous as a global
termination dial reframed to "rigor is continuous, chosen per spawn."
COMPARE produces verdict + opportunity assessment instead of pass/fail-with-
budget. Per-phase interactive/sub-agent matrix removed; runtime-mode
trichotomy removed; open-questions.md demoted to autonomous-mode fallback.
Added git-tracked workdir + commit-as-you-go discipline.

References under references/ haven't caught up yet (especially interview.md);
that's the next round.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../lightcone/skills/lc-from-paper/SKILL.md   | 160 ++++++++++++------
 .../skills/lc-from-paper/templates/CLAUDE.md  |  46 +++++
 2 files changed, 154 insertions(+), 52 deletions(-)
 create mode 100644 claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index d50094a2..a892e8bc 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -2,79 +2,135 @@
 name: lc-from-paper
 description: >
   This skill should be used when the user wants to reproduce a published
-  scientific paper in ASTRA, has a DOI/arXiv ID/PDF and wants to start or
-  resume a reproduction project, asks to "reproduce <paper>", "set up
-  reproduction", or "import a paper", or hands over a published paper as the
-  starting point for ASTRA work. It should also be used for existing
-  paper-reproduction workdirs when the user asks to continue, resume, drive the
-  next phase, or close out the reproduction.
+  scientific paper in ASTRA — has a DOI, arXiv ID, or PDF — or asks to
+  "reproduce <paper>", "set up reproduction", or "import a paper". Also
+  use when continuing or resuming an existing reproduction workdir. The
+  skill instructs Claude to act as an orchestrator that drives the
+  reproduction across phases by spawning named sub-agents per phase, with
+  the user able to drop into any sub-agent's chat directly to steer.
 ---
 
 # lc-from-paper
 
-Run an interview-first paper reproduction workflow in ASTRA. Use the workdir, the per-paper constitution, `CLAUDE.md`, and git history as the continuity layer across sessions. Survey the workdir at the start of each session, choose the current phase, read that phase's reference in full, and execute one or two phases before stopping at a clean handoff point.
+You are helping the user reproduce a published scientific paper as a complete ASTRA project. This is a long, complex task that won't fit in a single context window — it spans discrete phases: acquire the paper and its code, architect the spec, specify decisions and findings, resolve cited literature, implement, run, compare, review. The complexity is exactly why your role matters. As **orchestrator**, you hold the whole shape for the user — guiding them through the workflow, explaining what's happening, tracking what's been done and what's next, deciding how to delegate. Each sub-agent only ever sees its own slice; you keep the through-line.
 
-## Phase Workflow
+The heavy lifting of any phase is done by a sub-agent: you spawn it pointed at the workdir (where its `CLAUDE.md` auto-loads), let it work in its own context window, and read what it returns when it's done. Your own context stays light — you carry user intent forward, watch the workdir, and choose what to spawn next.
 
-Read [`references/interview.md`](references/interview.md) before starting a fresh reproduction. The interview identifies the paper, scopes the target outputs, chooses runtime and rigor settings, decides which phases should run inline or in sub-agents, drafts the per-paper constitution with [`/constitution`](../constitution/SKILL.md), and writes the per-paper `CLAUDE.md`.
+**The user can interact with any sub-agent directly.** When you spawn one, it appears as a chat surface the user can switch into (typically at the bottom of the screen). Tell them explicitly: *"I'm launching the X sub-agent now — if you want to interact with it, switch to its chat before its first turn finishes."* While the user stays in that chat, the sub-agent stays active — natural turn-by-turn dialogue, prose questions, the user steering directly. When they switch back to you and the sub-agent goes idle, the surface goes away from their view; the sub-agent stays addressable from your side, and addressing it via SendMessage reopens the surface for the user too. **Sub-agents can be resumed at any time, with full context preserved** — if the user wants to drop into any earlier phase, you pull that phase's sub-agent back and it shows up in their chat exactly where it left off.
 
-After the interview, drive the reproduction through these phases. Invoke the named sibling skills when the phase reaches their work; they carry the phase-local procedure.
+**As orchestrator, keep your context lean.** Your job is to coordinate, not to absorb sub-agent outputs or the codebase in detail. The paper itself is the exception worth making — it's among the highest-value text in the workflow, the canonical source the spec is being built against, and worth reading carefully at the start. Your other regular reads are short and load-bearing: the paper-extraction index, `CLAUDE.md`, and what sub-agents return. For everything else, delegate: a quick `grep` or single-file lookup is fine to do directly, but anything more open-ended — cross-cutting search, repeated reads of large content — goes to an Explore sub-agent that reads on your behalf and returns a summary. The failure mode to avoid is the orchestrator quietly turning into "just another iteration" by reading everything itself.
 
-| # | Phase | Reference | Skill composition | Outputs |
+## Setup: git-tracked workdir
+
+The reproduction's directory should be a git repo — if not already, `git init` it locally before spawning the first sub-agent. Every sub-agent commits its work as it goes — small, descriptive commits per significant change. The git log is the chronological trail of the reproduction; `git diff` makes each sub-agent's work auditable from your side without you having to read source files directly. Don't push to a remote unless the user has set one up; local-only is the default.
+
+## The phases
+
+The reproduction runs through nine phases (zero-indexed). Phase 0 (INTERVIEW) and Phase 8 (REVIEW) are the bookends — they happen in your own session because they're short, interactive, and depend on the through-line context only you hold. Phases 1–7 are sub-agent dispatches: you spawn each as a named sub-agent, point it at the matching reference file in `references/`, and let it work in its own context with the per-paper `CLAUDE.md` auto-loading from the workdir.
+
+| # | Phase | Where it runs | Reference | Primary outputs |
 |---|---|---|---|---|
-| 1 | ACQUIRE | [`references/acquire.md`](references/acquire.md) | Use [`/paper-extraction`](../paper-extraction/SKILL.md). | `work/reference/{source/ \| document.md, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml}` |
-| 2 | ARCHITECT | [`references/architect.md`](references/architect.md) | Use exploration sub-agents for paper/code indexing when helpful. | stub `astra.yaml`; `work/notes/architect/{paper-index.md, code-index.md}`; `work/notes/cited_papers.yaml` |
-| 3 | SPECIFY | [`references/specify.md`](references/specify.md) | Use [`/narrative`](../narrative/SKILL.md). Use [`/lc-from-code`](../lc-from-code/SKILL.md) in augment mode when substantial reference code should add to the current `astra.yaml`. | filled `astra.yaml`; `universes/baseline.yaml`; `targets/targets.md`; `implementation-notes.md` |
-| 4 | LITERATURE | [`references/literature.md`](references/literature.md) | Use parallel sub-agents for cited-paper resolution when useful. | `prior_insights:` evidence selectors resolved in `astra.yaml`; cited-paper notes under `work/notes/literature/` |
-| 5 | IMPLEMENT | [`references/implement.md`](references/implement.md) | Use implementation and review sub-agents according to the rigor setting. | `scripts/`, `requirements.txt`, executable recipes in `astra.yaml` |
-| 6 | RUN | [`references/run.md`](references/run.md) | Run the declared recipes and diagnose failures from command output. | `results/baseline/<output>/` |
-| 7 | COMPARE | [`references/compare.md`](references/compare.md) | Compare reproduced artifacts against the paper targets. | `comparison-report.{yaml,md}` |
-| 8 | REVIEW | [`references/review.md`](references/review.md) | Use [`/figure-comparison`](../figure-comparison/SKILL.md); optionally use [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md). | `REPRODUCTION-SUMMARY.md`, `.lightcone/comparison.html`, resolved `open-questions.md`, finalized constitution outcome |
+| 0 | INTERVIEW | orchestrator session | [`references/interview.md`](references/interview.md) | per-paper `CLAUDE.md` |
+| 1 | ACQUIRE | sub-agent | [`references/acquire.md`](references/acquire.md) | `work/reference/{source/, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml}`; `work/notes/cited_papers.yaml` |
+| 2 | ARCHITECT | sub-agent | [`references/architect.md`](references/architect.md) | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative); `work/notes/architect/{paper-index.md, code-index.md}` |
+| 3 | SPECIFY | sub-agent | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
+| 4 | LITERATURE | sub-agent | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
+| 5 | IMPLEMENT | sub-agent | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
+| 6 | RUN | sub-agent | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
+| 7 | COMPARE | sub-agent | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
+| 8 | REVIEW | orchestrator session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
+
+COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are and how much they likely matter. You and the user decide together whether to spend another IMPLEMENT round now (close a high-leverage gap) or land the reproduction at its current rigor level and log the gap as an open opportunity in CLAUDE.md's Rigor section. Either way, control eventually passes to REVIEW.
+
+## Spawning a phase sub-agent
+
+When you launch a phase, spawn a named sub-agent in the background with the phase reference as its working spec:
+
+- **Name** the sub-agent after the phase: `architect`, `specify`, `implement`, etc. The name is what the user sees in their chat list. If you re-spawn under the same name, the previous instance becomes addressable only by ID.
+- **Prompt** the sub-agent to read its phase reference file (`references/<phase>.md`). The reproduction's `CLAUDE.md` auto-loads from the workdir, so it doesn't need to be passed explicitly. Trust the sub-agent to read what else it needs.
+- **Run in background** so the user can switch into the sub-agent's chat without you blocking on it.
+- **Announce the spawn to the user** before it starts: *"I'm launching the &lt;phase&gt; sub-agent now — switch to its chat now if you want to interact, otherwise it'll work autonomously and report back."*
+- **Note the agent ID** when you spawn it. Names are user-facing — if the user dismisses a sub-agent's surface (escape), the name binding goes away and `SendMessage` by name fails. The agent ID + on-disk transcript persist regardless; `SendMessage` by ID resumes the sub-agent from full context and reopens the surface for the user.
+
+When the sub-agent's turn closes you receive a notification with its full response in the `result` field. Read that, then decide: spawn the next phase, ask the user a clarifying question, or revisit a previous phase.
+
+## Per-paper artifact: CLAUDE.md
+
+The reproduction's directory holds a single `CLAUDE.md` that sub-agents and future orchestrator sessions walk up to automatically. It is the durable spec for the reproduction, drafted during INTERVIEW and evolving over time as iterations learn paper-specific gotchas. The starting shape is in [`templates/CLAUDE.md`](templates/CLAUDE.md). Sections:
+
+- **Paper identity** — DOI, arXiv ID, title, authors, one-line subject; where the original code lives (`work/reference/code/`).
+- **Goal** — what the reproduction is aiming for. Desired state, scope (in / out). Stays static once approved at INTERVIEW.
+- **Rigor** — where the reproduction currently stands and what's worth tightening if attention returns. *Current state* per output or per phase (e.g. *sketch / baseline / tightened / canonical*). *Open opportunities* — what could benefit from more attention, with a sense of leverage ("Figure 3's systematics treatment is sketch-level; tightening it would change the headline number by ~10%"). Updated by sub-agents as they work; mined during REVIEW for what's worth coming back for.
+- **Disagreements** — paper-vs-code material disagreements logged by sub-agents as they find them. Code is canonical for numerics; both options are preserved as decision options in `astra.yaml`. CLAUDE.md just summarizes them so every walk-up sees them at a glance. Surfaced to the user when they're around.
+- **Rules** — the code-as-canonical discipline, the never-block-on-`AskUserQuestion`-mid-sub-agent rule (with `open-questions.md` as the autonomous-mode fallback), arxiv-LaTeX-first acquisition, `astra validate --verify-evidence` as the fidelity gate.
+- **Pointers** — to `open-questions.md`, and any paper-specific conventions or warnings the user surfaced during the interview.
+
+Keep it short. Pointers, not snapshots.
 
-Iterate COMPARE -> IMPLEMENT -> RUN -> COMPARE until the verdict passes, the attempt budget is exhausted, or the user accepts a partial reproduction. Run REVIEW as the close-out phase after the comparison loop terminates.
+## The two bookends
 
-## Runtime and Rigor
+### Interview (Phase 0)
 
-Offer three runtime modes during the interview:
+The opening interactive phase. Read [`references/interview.md`](references/interview.md) in full before starting. The interview gathers: (1) the paper (DOI / arXiv ID / code repo URL / prior context), (2) scope (full vs targeted, sub-analysis structure), (3) any paper-specific conventions or warnings.
 
-| Mode | What runs | Right when |
-|---|---|---|
-| Interactive | The user prompts through phases by hand from the current session. | Tight control, small paper, or token budget is tight. |
-| Bash-loop | A plain shell loop runs one session after another. | Tmux is unavailable and the connection is stable. |
-| Tmux-orchestrated | [`/ralph-loops`](../ralph-loops/SKILL.md) runs the loop inside a tmux session. | Preferred when tmux is available. |
+These get drafted into the per-paper `CLAUDE.md` — paper identity, Goal section, Rules, Conventions. The Rigor section starts empty; sub-agents fill it in as they work. Show the user the draft, take corrections, refine, then save.
 
-Set the rigor dial in the constitution:
+After the user approves, launch the first sub-agent (typically ACQUIRE).
 
-- **Frugal:** complete the phase checklist with minimal self-review.
-- **Rigorous:** run fresh-context review and fix rounds for artifact-producing phases until consecutive rounds find no fixes, or until the constitution's cap is reached.
+### Review (Phase 8, close-out)
 
-Thread the rigor setting through ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT. Review the current artifact against the paper and code from fresh context; incorporate fixes before advancing phases.
+The closing interactive phase. Drafts `REPRODUCTION-SUMMARY.md`, invokes [`/figure-comparison`](../figure-comparison/SKILL.md) (mandatory) and optionally [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md), walks `open-questions.md` with the user, and finalizes the reproduction outcome.
 
-## Operating Discipline
+REVIEW runs in the orchestrator session because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which isn't available to sub-agents.
 
-- **Workdir survey first.** Determine the current phase from file existence, `git log`, and validation output before acting.
-- **ASTRA CLI checks are the authority.** Use `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`, and the current `astra --help` surfaces for deterministic checks.
-- **Acquire from the richest source.** Prefer arXiv source tarballs when available; use PDF + Docling fallback when source is unavailable.
-- **Code is canonical when present.** Keep original code under `work/reference/code/`; read relevant code during SPECIFY and IMPLEMENT; model numerics, plotting, and method on the code when paper prose and code disagree.
-- **Material disagreements become decisions.** Represent paper-vs-code conflicts as `decisions:` options in `astra.yaml`; select the baseline option according to the user's choice or the code-as-canonical default, and preserve alternatives for later exploration.
-- **ARCHITECT sets structure; SPECIFY fills content.** ARCHITECT writes the sub-analysis skeleton, inputs, outputs, and narrative scaffold. SPECIFY fills `decisions:`, `prior_insights:`, `findings:`, and ASTRA anchors.
-- **Use real inputs.** Unless the paper itself uses synthetic data as input, fetch or query real datasets during IMPLEMENT.
-- **Keep handoffs crisp.** Each session should leave the constitution, `CLAUDE.md`, `open-questions.md`, git history, and phase artifacts clear enough for the next session to resume.
+## Disciplines
 
-## Resuming
+**Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each phase sub-agent's first move is to survey the workdir on entry; you (orchestrator) survey at startup and after each completion notification.
 
-If the workdir already exists, read the per-paper constitution and `CLAUDE.md`, survey the files below, and continue from the first incomplete phase. Draft a minimal constitution from current state when one is missing.
+**Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every implementing sub-agent reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Disagreements* section so it's visible to every sub-agent and to the user. Surface it to the user the next time they're around. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
+
+**Rigor is continuous, chosen per spawn.** A reproduction isn't one-shot — it reaches a baseline, then accumulates rigor as the user comes back. When you spawn an artifact-producing sub-agent (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT), choose how much fresh-context self-review to ask of it based on where the artifact currently stands (CLAUDE.md's Rigor section) and what the user wants to invest now. *Cheap:* skip self-review or run one fresh-context pass. *Heavy:* iterate fresh-context review + fix until two consecutive rounds find no fixes (capped at 5 rounds). The reviewing sub-agent never sees prior rounds' fixes — fresh context each round, with the prompt "check the artifact is consistent with the paper and the code." Each spawn that produces an artifact updates CLAUDE.md's Rigor section so the picture stays honest across context windows.
+
+**arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv only.
+
+**Use the up-to-date `astra` CLI surfaces.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces — don't write skill-specific wrappers.
+
+**No synthetic data.** Unless the paper itself uses synthetic data as input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement reference repeats this; treat it as load-bearing.
+
+**Open-questions for autonomous mode only.** When the user is reachable (in the sub-agent's chat or in your orchestrator session), questions are asked directly in prose. The `<paper-slug>/open-questions.md` accumulator is for autonomous mode — when the user has explicitly stepped away. The user resolves accumulated questions in REVIEW before the reproduction closes.
+
+## Resuming an in-flight reproduction
+
+When you walk into a workdir that already has artifacts:
+
+1. **Skip INTERVIEW** unless the user explicitly wants to revise scope.
+2. CLAUDE.md auto-loads from the workdir — that's the spec.
+3. Survey the workdir to determine the current phase (table below).
+4. Spawn the appropriate next sub-agent.
+
+Workdir signals — file existence implies the phase has been done:
 
 | Signal | Phase done |
 |---|---|
-| `work/reference/source/` or `work/reference/document.md` | ACQUIRE |
-| `work/reference/code/` | ACQUIRE code clone |
-| `work/notes/architect/{paper-index.md,code-index.md}` | ARCHITECT indexing |
-| `astra.yaml` validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT stub |
-| `work/notes/cited_papers.yaml` | ARCHITECT citation extraction |
-| `astra.yaml` has non-empty `decisions:` and `findings:` per sub-analysis, citation-placeholder `prior_insights:`, `targets/targets.md`, and `implementation-notes.md` | SPECIFY |
-| `prior_insights:` entries have resolved `evidence:` selectors verified by `astra validate --verify-evidence`; `work/notes/literature/<doi-slug>.yaml` files exist | LITERATURE |
-| recipes exist in `astra.yaml` | IMPLEMENT |
-| `results/baseline/<output>/` | RUN |
+| `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) | ACQUIRE |
+| `work/reference/code/` | ACQUIRE (code clone) |
+| `work/notes/architect/{paper-index.md,code-index.md}` | ARCHITECT (Explore pass) |
+| `astra.yaml` validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
+| `work/notes/cited_papers.yaml` | ARCHITECT (citation extraction) |
+| `astra.yaml` non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
+| `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; `work/notes/literature/<doi-slug>.yaml` files present | LITERATURE |
+| recipes present in `astra.yaml` | IMPLEMENT |
+| `results/<universe>/<output>/` | RUN |
 | `comparison-report.yaml` | COMPARE |
-| `REPRODUCTION-SUMMARY.md` and `.lightcone/comparison.html` | REVIEW |
+| `REPRODUCTION-SUMMARY.md` + `.lightcone/comparison.html` + resolved `open-questions.md` | REVIEW |
+
+`git log --oneline` complements this — phase commits are the chronological view.
+
+## Anti-patterns
+
+- **Reading content the orchestrator doesn't need.** If the answer fits in a sub-agent's return, don't re-read the source yourself. Dispatch Explore for open-ended search.
+- **Doing phase work in the orchestrator session.** The orchestrator spawns and routes; phase work happens in sub-agents. Exception: INTERVIEW and REVIEW (the bookends).
+- **Asking a sub-agent to use `AskUserQuestion`.** Sub-agents don't have it. They ask in prose, or surface the question to you so you call `AskUserQuestion` from the orchestrator session.
+- **Re-implementing what `astra` already does.** If `astra validate` returns clean, don't write a separate validator. If `astra paper add` caches the PDF, don't write a separate cache.
+- **Bundling phases into one sub-agent.** Each sub-agent runs one phase. The granularity is what keeps each context window manageable; conflating phases re-creates the failure mode this architecture exists to avoid.
+- **Forgetting to announce the spawn to the user.** They need to know a sub-agent has launched and that they can switch into its chat before it finishes its first turn. Without the announcement, the surface comes and goes invisibly.
diff --git a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
new file mode 100644
index 00000000..0117c21c
--- /dev/null
+++ b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
@@ -0,0 +1,46 @@
+# <paper-slug>
+
+Reproduction of <paper title> (<arXiv ID>). DOI: <doi>.
+
+## Paper
+
+- Authors: <list>
+- One-line subject: <e.g. "BAO scale measurement from DESI DR1">
+- Code repo: <url> (cloned to `work/reference/code/` during ACQUIRE)
+
+## Goal
+
+<What "done" looks like for this reproduction. Concrete: which targets, what verdict against them, what validation passes. E.g.: "A complete `astra.yaml` with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.">
+
+**In scope:** <targeted figures / tables / numbers, methodological span being reproduced.>
+
+**Out of scope:** <explicit exclusions, fenced from drift.>
+
+## Rigor
+
+*Current state* — populated by sub-agents as they produce artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened / canonical*. Empty until the first phase produces something:
+
+- (none yet)
+
+*Open opportunities* — what could benefit from more attention if the user comes back, with a sense of leverage. Format: `<area> — <what could be tightened> — <leverage>`. Empty until a sub-agent surfaces a gap:
+
+- (none yet)
+
+## Paper-vs-code disagreements
+
+Material disagreements between paper and code, logged here as sub-agents find them. Code is canonical for numerics, plotting, and method (per the discipline below); both options are preserved in `astra.yaml` as decision alternatives. Each entry summarizes the disagreement and points to the corresponding decision so any sub-agent or future orchestrator session can see them at a glance. Surfaced to the user the next time they're around.
+
+- (none yet)
+
+## Rules
+
+- **Code-as-canonical when `work/reference/code/` exists.** Every implementing sub-agent reads relevant code on entry. Where paper and code disagree, code is canonical for numerics, plotting, and method.
+- **Never block on `AskUserQuestion` mid-sub-agent.** Sub-agents don't have `AskUserQuestion`. Ask in prose if the user is reachable; otherwise append the question to `open-questions.md` and continue with the best-judgment default. The user resolves accumulated questions in REVIEW.
+- **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
+- **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
+- **Commit as you go.** Small, descriptive commits per significant change. The git log is the chronological trail of the reproduction.
+
+## Pointers
+
+- `open-questions.md` — accumulated questions from autonomous-mode runs, resolved in REVIEW.
+- <any paper-specific conventions or warnings the user surfaced during the interview>

From 333fde18f27022d941d2ca5338929988c97ecb85 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sun, 10 May 2026 18:12:14 +0200
Subject: [PATCH 031/124] Rewrite lc-from-paper interview reference

Aligns interview.md with the orchestrator + named sub-agents architecture
landed in the SKILL.md rewrite. Drops the constitution/CLAUDE.md duality
(only CLAUDE.md now), the runtime-mode trichotomy, the global
frugal/rigorous termination criterion, and the per-phase mode matrix.
Collapses the six interview jobs to three (identify, scope, conventions)
and points at templates/CLAUDE.md instead of the /constitution skill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../lc-from-paper/references/interview.md     | 202 +++---------------
 .../narrative/references/co-drafting.md       |  79 +++++++
 .../narrative/references/interactive.md       | 184 ----------------
 3 files changed, 114 insertions(+), 351 deletions(-)
 create mode 100644 claude/lightcone/skills/narrative/references/co-drafting.md
 delete mode 100644 claude/lightcone/skills/narrative/references/interactive.md

diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index d29f839b..768701d5 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -1,209 +1,79 @@
-# Interview — drafting the per-paper reproduction constitution and CLAUDE.md
+# Interview — Phase 0
 
-The interview is the only phase lc-from-paper runs interactively. It happens once per project, up front, before any loop is launched. Its job is to crystallize what the user actually wants — which paper, what scope, which runtime, which seams want their attention, which they want delegated — and bake that into the artifacts every iteration walks up to.
+The opening interactive phase. Run from the orchestrator session, before any sub-agent is spawned. Its job is to crystallize what the user actually wants — which paper, what scope, any paper-specific gotchas — and bake that into the per-paper `CLAUDE.md` every sub-agent walks up to.
 
-Use the [`/constitution`](../../constitution/SKILL.md) skill to draft the constitution. The interview's job is to *gather* the inputs both the constitution and the per-paper `CLAUDE.md` need; the constitution skill carries the discipline of writing the constitution.
+The interview is short. Three to six `AskUserQuestion` rounds, total. The user does not need to teach you the paper; they need to tell you what they want reproduced.
 
 ---
 
 ## What the interview produces
 
-The interview produces a **directory for the reproduction** containing two markdown files. They have separate jobs and don't overlap:
+A single `<paper-slug>/CLAUDE.md`, drafted from the template at [`../templates/CLAUDE.md`](../templates/CLAUDE.md). It carries:
 
-- **`<paper-slug>/CLAUDE.md`** — *info and rules.* Paper identity (DOI / arxiv id / authors / one-line subject), where the original code lives (`work/reference/code/`), the canonical-resolution rule (code-as-canonical when `work/reference/code/` exists), the never-block-on-`AskUserQuestion`-mid-sub-agent rule, any paper-specific conventions or warnings, pointers to the constitution and `open-questions.md`. Auto-loaded by Claude Code on every walk-up to this directory. **Evolves over time** — iterations that learn new conventions or surface paper-specific gotchas can add lines so future sessions don't re-derive the same context.
-- **`<paper-slug>/<constitution>.md`** — *desired state.* Pointers (not snapshots) for the runner: what "done" looks like, evidence checks, scope fence, the runtime mode the user chose, the termination criterion (weak/strong), the per-phase mode table, and the open-questions section iterations resolve. Read by the runner each iteration as the explicit task.
+- **Paper identity** — DOI, arXiv ID, title, authors, one-line subject; where the original code lives.
+- **Goal** — what "done" looks like for this reproduction; in-scope and out-of-scope targets.
+- **Pointers** — any paper-specific conventions or warnings the user surfaced.
 
-Both are written at the end of the interview from the same conversation. CLAUDE.md tells you *what kind of place this is*; the constitution tells you *what we're doing here and when we're done*. After they are approved, lc-from-paper launches whichever runtime the user chose:
+The Rigor and Disagreements sections start empty — sub-agents fill them in as they work. The Rules section is standing discipline (universal across reproductions); leave it as the template provides.
 
-| Runtime | Launch |
-|---|---|
-| **(1) Interactive** | No launch. The user prompts through phases by hand from this Claude session. |
-| **(2) Bash-loop** | Show the user the loop snippet to paste into a terminal — `while …; do claude --dangerously-skip-permissions … ; done`-shaped. |
-| **(3) Tmux-orchestrated** | `../ralph-loops/scripts/ralph <constitution>.md` — lc-from-paper drives the tmux session directly. |
+There is no separate constitution, no runtime-mode choice, no global termination criterion. The architecture is fixed (orchestrator + named per-phase sub-agents) and rigor is chosen per spawn — see SKILL.md's *Rigor is continuous, chosen per spawn* discipline.
 
-There is no separate "interview state" file. Everything lives in the two artifacts and the workdir.
+After the user approves the draft, save it, ensure the workdir is a git repo (`git init` if needed) and commit `CLAUDE.md` as the first commit, then launch the ACQUIRE sub-agent.
 
 ---
 
-## The six jobs
+## The three jobs
 
 ### 1. Identify the paper
 
-Use `AskUserQuestion` if the user did not supply enough on `/lc-from-paper` invocation:
+Use `AskUserQuestion` for whatever the user did not supply on `/lc-from-paper` invocation:
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
-- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) **If code is available, every implementing iteration will read from `work/reference/code/`** and treat code as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md).
-- **User's prior familiarity.** Has the user reproduced this paper before? Read the paper recently? Worked with the original authors? This affects how much of the ARCHITECT / SPECIFY work needs human ratification.
+- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) When code is available, every implementing sub-agent reads from `work/reference/code/` and treats code as canonical for numerics + method. This is recorded in CLAUDE.md's Rules.
+- **User's prior familiarity.** Has the user reproduced this paper before? Read it recently? Worked with the original authors? Affects how much of ARCHITECT / SPECIFY benefits from heavier rigor settings on first spawn.
 - **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; ARCHITECT will read it.
 
 ### 2. Scope the reproduction
 
-A paper has many figures, tables, and numbers. The user usually does not want all of them.
+A paper has many figures, tables, numbers. The user usually does not want all of them.
 
 Ask:
 
-- **Full reproduction or targeted?** Full = every primary result the paper reports. Targeted = "I only care about figures 3, 4, 7 and the headline number in Table 2." Targeted is cheaper and produces a tighter astra.yaml.
+- **Full reproduction or targeted?** Full = every primary result the paper reports. Targeted = "I only care about figures 3, 4, 7 and the headline number in Table 2." Targeted is cheaper and produces a tighter `astra.yaml`.
 - **Specific decisions of interest.** A paper makes many choices. The user may care most about a few — e.g. "I want the BAO fit to use a different damping prior than the paper." These become first-class decisions in the spec, with the alternative preserved as a sibling option.
-- **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; ARCHITECT will mirror the structure as the stub's decomposition. If the paper is monolithic, one analysis suffices.
+- **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; ARCHITECT will mirror that structure as the stub's decomposition. If the paper is monolithic, one analysis suffices.
 
-These answers live in the constitution's **Desired State** section. There is no separate target-extraction phase — the targets the user names here become explicit `outputs:` declared in the stub `astra.yaml` during ARCHITECT, then filled with paper-anchored `findings:` / `decisions:` during SPECIFY.
+These answers go into CLAUDE.md's **Goal** section as "in scope" / "out of scope". There is no separate target-extraction phase — what the user names here becomes explicit `outputs:` declared in the stub `astra.yaml` during ARCHITECT, then filled with paper-anchored `findings:` / `decisions:` during SPECIFY.
 
-### 3. Pick a runtime mode
+### 3. Paper-specific conventions or warnings
 
-Probe for tmux first:
+Light touch. Ask the user if there's anything they want every sub-agent to know about this paper up front — a known pitfall, a non-obvious convention, a thing the authors did unusually. These go into CLAUDE.md's **Pointers** section as one-line notes. Skip cleanly if nothing comes to mind; sub-agents surface their own as they work.
 
-```bash
-command -v tmux
-```
-
-Offer the modes the environment supports:
-
-- **(1) Interactive** — no autonomous loop; the user prompts through phases by hand from this Claude session. Right when control is tight, the paper is small, or the token budget is constrained.
-- **(2) Bash-loop** — a plain shell loop the user pastes into a terminal. No tmux dependency. Right when tmux isn't available *and* the connection is stable. Fragile across SSH disconnects unless wrapped in `nohup`, and `nohup` blocks interaction — so for unstable connections, mode (3) is the answer, not this.
-- **(3) Tmux-orchestrated** — lc-from-paper drives a tmux session directly via `../ralph-loops/scripts/ralph`. Survives SSH disconnects; the skill sends keystrokes to the pane, monitors, intervenes. Preferred when tmux is available.
-
-If tmux isn't installed, only (1) and (2) appear in the question. The chosen mode goes into the per-paper constitution.
-
-### 4. Pick a termination criterion (frugality vs rigor)
-
-Ask:
-
-- **Weak (frugal):** "run until the checklist of tasks has been completed." Cheaper. Susceptible to one-shot oversights. ARCHITECT, SPECIFY, and IMPLEMENT each skip or run their internal self-review pass once.
-- **Strong (rigorous):** "run until you can't find any further contributions, fixes, or improvements that align with the goal." Almost always catches mistakes the one-shot left behind, but burns more tokens. ARCHITECT, SPECIFY, and IMPLEMENT each iterate their internal self-review — fresh-context sub-agent per round; fixes incorporated; a *fresh* sub-agent re-reviews; iterate until two consecutive rounds find no fixes (or a 5-round system cap).
-
-Default to strong for fidelity-critical reproductions; weak when the user wants to cap token spend. The choice goes into the per-paper constitution and is read by ARCHITECT, SPECIFY, and IMPLEMENT.
-
-### 5. Choose interactive vs sub-agent per phase
-
-Read the "Per-phase mode" table in `../SKILL.md`. The defaults are reasonable. Walk the user through it briefly:
-
-- **The two bookends are always interactive:** INTERVIEW (now) and REVIEW (close-out). These are the only mandatory user-reach phases — every other phase is the user's call.
-- **Phases whose defaults are sub-agent (parallel fresh context fits the work):** ARCHITECT (two parallel Explore sub-agents — paper-side + code-side — feed a synthesis sub-agent that writes the stub `astra.yaml`; rigor-dialed self-review pass after), LITERATURE (one sub-agent per cited paper), IMPLEMENT (recipe-writing parallelized by output where feasible, with rigor-dialed self-review iterations after).
-- **Phases whose default is interactive:** SPECIFY (material paper-vs-code conflicts in the code pass want ratification; per-sub-analysis self-review pass is rigor-dialed regardless of mode).
-- **Phases the user genuinely chooses:** ACQUIRE, RUN, COMPARE. These can run either way without losing the surface that matters most.
-
-If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table. Phases marked sub-agent that hit a question they'd normally surface to the user **append the question to `<paper-slug>/open-questions.md`** rather than blocking; the user resolves them in REVIEW (close-out).
-
-### 6. Draft the constitution and CLAUDE.md
-
-Invoke `/constitution`. Pass in:
-
-- The paper identity (DOI, arXiv ID, code URL)
-- The scope (full vs targeted, sub-analysis structure if known)
-- The per-phase mode table
-- Any prior context the user has shared
-
-The constitution skill carries the discipline of section voice (pointers, not snapshots; constitution, not plan; constraints with reasons). The constitution it produces will look approximately like:
-
-```markdown
 ---
-status: open
----
-
-# Reproduce <paper title> (<arXiv ID>)
-
-## Desired State
-
-A complete `astra.yaml` for <paper> at this workdir, with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.
-
-Non-goals: <e.g., reproducing Figure 12's MCMC stack — out of scope because compute too large for available targets>.
-
-## Scope
-
-In: <list — the targeted figures / tables / numbers, the methodological span being reproduced>.
-Out: <list — explicit exclusions, fenced from drift>.
-
-## Runtime mode
-
-<(1) interactive | (2) bash-loop | (3) tmux-orchestrated>
-
-## Termination criterion
-
-<weak | strong>
-
-The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or the attempt budget (default 5) is exhausted, with the chosen termination shaping how aggressively iterations self-check.
-
-## Per-phase mode
-
-| # | Phase | Mode |
-|---|---|---|
-| 0 | INTERVIEW | interactive (always) |
-| 1 | ACQUIRE | <per user> |
-| 2 | ARCHITECT | sub-agent (two parallel Explore + synthesis; rigor-dialed self-review) |
-| 3 | SPECIFY | interactive (two-pass per sub-analysis: paper, code, rigor-dialed self-review) |
-| 4 | LITERATURE | sub-agent (rigor-dialed self-review) |
-| 5 | IMPLEMENT | sub-agent (rigor-dialed review iterations) |
-| 6 | RUN | <per user> |
-| 7 | COMPARE | <per user> |
-| 8 | REVIEW (close-out) | interactive (always) |
-
-## Evidence
-
-- `ls work/reference/source/ || ls work/reference/document.md` — ACQUIRE done (arxiv-LaTeX tarball or Docling fallback)
-- `ls work/reference/code/` — original code present (canonical reference)
-- `ls work/notes/architect/paper-index.md && ls work/notes/architect/code-index.md` — ARCHITECT Explore pass done
-- `ls astra.yaml && astra validate astra.yaml` (with empty `decisions:`/`prior_insights:`/`findings:` blocks) — ARCHITECT stub written
-- `ls work/notes/cited_papers.yaml` — ARCHITECT citation list (used by SPECIFY for marker→DOI mapping; consumed by LITERATURE for placeholder resolution)
-- `astra validate astra.yaml` (with non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` populated as citation-only placeholders) `&& ls targets/targets.md && ls implementation-notes.md` — SPECIFY done
-- `ls work/notes/literature/` (one `<doi-slug>.yaml` per cited DOI) and `astra.yaml`'s `prior_insights:` entries each have a resolved `evidence:` selector — LITERATURE done
-- `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs (runs after LITERATURE)
-- `ls comparison-report.yaml && yq '.verdict' comparison-report.yaml` — most-recent COMPARE verdict
-- `ls REPRODUCTION-SUMMARY.md && ls .lightcone/comparison.html` — REVIEW (close-out) done
-- `git log --oneline` — chronological view of phase commits
-
-## Open Questions
-
-(empty — populated as the loop runs; questions accrete in `<paper-slug>/open-questions.md`, the running report the user resolves in REVIEW (close-out) before the constitution closes.)
-```
-
-Then author the per-paper `<paper-slug>/CLAUDE.md` from the same conversation. The CLAUDE.md is *info and rules*, not desired state — paper identity, where things live, disciplines that always apply. Approximate shape:
-
-```markdown
-# <paper-slug>
-
-Reproduction of <paper title> (<arXiv ID>). DOI: <doi>.
-
-## Paper
-
-- Authors: <list>
-- One-line subject: <e.g. "BAO scale measurement from DESI DR1">
-- Code repo: <url> (cloned to `work/reference/code/` during ACQUIRE)
-
-## Where things live
-
-- Workdir layout follows Paper2ASTRA conventions: `work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`.
-- The constitution (desired state, runtime mode, scope, evidence, per-phase mode) lives at `<constitution>.md` in this directory.
-- The during-loop questions log lives at `open-questions.md`. The user reviews it in REVIEW (close-out).
-
-## Rules
 
-- **Code-as-canonical when `work/reference/code/` exists.** Every implementing iteration reads relevant code. Where paper and code disagree, code is canonical for numerics, plotting, and method.
-- **Never block on `AskUserQuestion` mid-sub-agent.** When a sub-agent or loop phase would surface a question to the user, append it to `open-questions.md` and continue with the best-judgment default. The user resolves in REVIEW (close-out).
-- **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
-- **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
+## Drafting CLAUDE.md
 
-## Conventions and warnings
+Open the template at [`../templates/CLAUDE.md`](../templates/CLAUDE.md) and fill in:
 
-- <any paper-specific notes the user surfaced during the interview>
-```
+- The header (`<paper-slug>`, paper title, arXiv ID, DOI).
+- **Paper** — authors, one-line subject, code repo URL.
+- **Goal** — what "done" looks like; in-scope and out-of-scope.
+- **Pointers** — any paper-specific conventions the user surfaced.
 
-Show both drafts, take corrections, refine. When the user is happy:
+Leave the **Rigor**, **Paper-vs-code disagreements**, and **Rules** sections in their template state. Rigor and Disagreements grow as sub-agents work; Rules are universal.
 
-- Save both files inside the reproduction's directory.
-- For mode (3), optionally launch ralph: `../ralph-loops/scripts/ralph <constitution>.md`.
-- For mode (2), show the user the bash-loop snippet to paste.
-- For mode (1), tell the user the interview is done and they can prompt through phases from this session.
+Show the draft to the user, take corrections, refine, save to `<paper-slug>/CLAUDE.md`. Then `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline) and commit `CLAUDE.md` as the first commit.
 
-The interview ends here. Subsequent work happens inside iterations (modes 2 and 3) or in the same session (mode 1).
+After the user approves and the workdir is initialized, launch the ACQUIRE sub-agent. Follow SKILL.md's *Spawning a phase sub-agent* for the announcement pattern — the user needs to know the sub-agent has launched and that they can switch into its chat before its first turn finishes.
 
 ---
 
 ## Discipline
 
-- **The interview is short.** Do not turn it into a full paper-summarization session. The user does not need to teach you the paper — they need to tell you what they want reproduced. Three to six `AskUserQuestion` rounds, total. If the user is grinding through detail, gently steer back to scope.
-- **The constitution and CLAUDE.md are the work products.** Do not file separate "interview notes" or "scope document" files. Everything goes into one of those two artifacts. CLAUDE.md is durable project memory; constitution is the runner's spec.
-- **The defaults are the path.** When the user says "I don't know, you choose," take the defaults — runtime (3) when tmux is available else (2) for stable / (1) for unstable connections; rigor (strong) for fidelity-critical work; the per-phase mode table from `../SKILL.md`. The defaults reflect what the loops have learned about which seams matter.
-- **One paper at a time.** A single constitution covers one paper. If the user wants two, run the interview twice — two reproduction directories, two CLAUDE.mds, two constitutions.
+- **The interview is short.** Three to six `AskUserQuestion` rounds, total. If the user is grinding through detail, gently steer back to scope.
+- **CLAUDE.md is the only artifact.** No separate scope document, no interview notes, no constitution. Everything goes in CLAUDE.md.
+- **Defaults are the path.** When the user says "you choose," take the defaults — full reproduction, the paper's natural sub-analysis structure if any. The defaults reflect what the architecture has learned about which seams matter.
+- **One paper at a time.** A single CLAUDE.md covers one paper. If the user wants two, run the interview twice — two reproduction directories, two CLAUDE.mds.
 
 ---
 
@@ -213,8 +83,6 @@ Most failure modes resolve into "the user has not yet decided what 'reproduce' m
 
 - *"If we ran this and it produced figure 3 plus the headline number in Table 2, would you be done?"* — pins targeted vs full.
 - *"Is there a specific decision in the paper you want to vary, or are we trying to match the paper exactly?"* — pins whether universes need to span alternatives.
-- *"Do you want to look at every paper-vs-code conflict, or just the ones I think are material?"* — pins SPECIFY mode.
-- *"Do you want a quick run that stops at the checklist, or a thorough one that keeps looking for fixes?"* — pins frugality vs rigor.
-- *"Are you running this somewhere with a stable connection, or do you want it to survive disconnects?"* — pins runtime mode (when tmux is available).
+- *"Is there anything weird about this paper you want every sub-agent to know up front?"* — pins paper-specific conventions.
 
-When these answer cleanly, the constitution and CLAUDE.md write themselves.
+When these answer cleanly, CLAUDE.md writes itself.
diff --git a/claude/lightcone/skills/narrative/references/co-drafting.md b/claude/lightcone/skills/narrative/references/co-drafting.md
new file mode 100644
index 00000000..007353fb
--- /dev/null
+++ b/claude/lightcone/skills/narrative/references/co-drafting.md
@@ -0,0 +1,79 @@
+# Co-drafting mode (stub)
+
+> **Status: under development.** Use paper reproduction (the default flow when a paper exists) when applicable. This file names what's distinct about co-drafting and the open questions; it isn't yet production guidance.
+
+The narrative is being drafted in dialogue with the user, against an existing-shape `astra.yaml`. There's no paper to harvest from and no body of code or fibers to mine; the spec's structure is the only artifact, and the user is the source for everything the structure doesn't already carry.
+
+This mode covers a spectrum:
+
+- **Fresh scoping.** `astra.yaml` was scaffolded by `/lc-new` (or by hand); decisions and outputs are sketched but the analysis hasn't run. Narrative drafted against intent, not results.
+- **Live in-flight research.** Work is happening; data is coming in, decisions are settling, results are landing. Spec moves between conversations, narrative moves with it.
+- **Newly-stable analysis.** Work has finished or paused; the user wants to write a narrative for what they did. No paper, no fibers — they remember it, and that memory is the source.
+
+Pure greenfield (no `astra.yaml` at all) isn't a coherent narrative-skill task — there's nothing to cite into. If a user is at that stage, route them to `/lc-new` to scaffold structure first.
+
+## What's distinct from paper reproduction
+
+- **Source is conversation, not prose.** The paper-reproduction harvest move (paraphrase from a written source) doesn't apply. Draft moves come from `AskUserQuestion`-batched dialogue, not from extracting prose.
+- **Voice depends on stage.** Reproduction is always declarative ("The pipeline runs in…"). Co-drafting voice tracks where the work is: present-tense for live work, past tense for completed steps, provisional markers when content is volatile.
+- **Spec and narrative move together.** In reproduction the spec is fixed (or close to it) and the narrative reconstructs the paper. In co-drafting the spec may shift between drafts; expect to revisit narrative when a decision lands or a sub-analysis splits.
+
+## The ask-first discipline
+
+Co-drafting is the one mode where authoring without asking produces fiction. The user is available; ask. Use `AskUserQuestion` to batch up the load-bearing reads before drafting:
+
+- **Research question.** What are you trying to learn? One sentence.
+- **Current headline finding** (if any). What's been established so far? One sentence; a gesture is fine.
+- **Movement so far.** What pivots, abandoned options, surprises belong in the record?
+- **Implications.** What would you claim today about what this means? Premature strong claims aren't required; honest gestures are.
+
+The user's framing is the substrate. Don't draft around a guess at it.
+
+## Provisional voice
+
+When content is moving, make incompleteness visible. Three moves:
+
+**Phrasing carries confidence.** Not "we constrain X to 3%"; rather "our current best constraint on X is 3%, pending validation of the covariance in [reconstruction](#analyses.reconstruction)." Hedge what's uncertain; claim what's settled.
+
+**Explicit markers.** At the top of `summary` (or any volatile key), an italic note:
+
+```yaml
+summary: |
+  _(Provisional — revisit after bao_fitting. Last updated 2026-04-23.)_
+  We are measuring the BAO scale...
+```
+
+The `_(Provisional)_` prefix is a convention, not a spec field. It reads as expected-to-change without breaking the narrative shape.
+
+**Decision rationales can be open.** "We are currently running with option X, pending validation of Y. See [[fiber-or-sub-analysis]]." A `rationale:` doesn't have to be retrospective.
+
+When work stabilizes (a paper draft lands, results publish), revise into reproduction voice — past tense, declarative, scope clear. Co-drafting was scaffolding; the final narrative reads as a stable artifact.
+
+## Open questions before this is production-ready
+
+- **Provisional markers — convention or schema?** Today they're prose conventions (`_(Provisional)_`); whether they belong as structured metadata is open.
+- **What's a `tempered`-style flag for narrative?** `tempered: true` on fibers signals "solid enough to build on." A narrative-level analog could let renderers display freshness state.
+- **Anchor coverage for elements that don't exist yet.** "Once [reconstruction](#analyses.reconstruction) is run, we expect X." The validator currently requires anchors to resolve — co-drafting may need a "planned" sub-analysis form, or the prose may need to avoid forward-anchoring entirely.
+- **Boundary with `/lc-new`.** `/lc-new` does conversational scoping but defers narrative ("filled in later, once structural pieces have settled"). When does the user finish `/lc-new` and switch to `/narrative` for the prose pass? Unclear today.
+- **Boundary with retrofit.** A user co-drafting a narrative for completed work is reaching for the same artifacts retrofit mines. The line between "harvest from your own memory" (co-drafting) and "harvest from artifacts you produced" (retrofit) is fuzzy when the user is the artifacts' author.
+
+## Pointers when authoring today
+
+The substrate from SKILL.md applies in full: five keys, length cap, anchor grammar, reserved IDs, data flow, validation, craft. What changes is the *source* of content (dialogue) and the *voice* (provisional where moving).
+
+- Use first-person plural and present tense for live work; past tense for completed steps.
+- Hedge when uncertain; claim when confident. Over-hedging is its own failure mode.
+- Mark sub-analyses that don't exist yet with provisional language rather than fake anchors.
+- Inverted draft order can help: write `summary` first as a stub (to fix intent), then draft the rest, then return to `summary` last to revise. This is the opposite of reproduction's compress-last because the substrate is moving.
+
+## Anti-patterns (co-drafting-specific)
+
+- **Solo drafting.** The user is available; ask before guessing motivation, headline finding, or implications.
+- **False completeness.** Writing in reproduction voice ("we measure," "we constrain") when the measurement is in flight. Use "we are measuring" / "our current constraint is X, pending Y."
+- **Provisional everywhere.** If every sentence is hedged, the narrative reads as afraid of itself. Hedge the genuinely uncertain claims; state the settled ones plainly.
+- **Stale markers.** A "revisit after X" comment left in place after X has landed is worse than no marker at all. Revise on each touch.
+- **Over-committing to implications.** Promising what results will mean before they land. A gesture is honest; a claim before evidence is not.
+
+## Report friction
+
+If you hit co-drafting cases this stub doesn't cover, file a fiber or GitHub issue against `lightcone-cli` with `narrative` in the title so the next pass can firm this up.
diff --git a/claude/lightcone/skills/narrative/references/interactive.md b/claude/lightcone/skills/narrative/references/interactive.md
deleted file mode 100644
index e0861db2..00000000
--- a/claude/lightcone/skills/narrative/references/interactive.md
+++ /dev/null
@@ -1,184 +0,0 @@
-# Interactive mode — in-flight new research
-
-> **Status: under development.** This mode is scaffolded but not yet
-> production-ready. The workflow below is a working draft — treat it
-> as a starting point, not a locked spec. For the production-ready
-> path, use paper reproduction mode if applicable. Report friction
-> back so this reference can firm up.
-
-Research is being done now. A narrative is being drafted alongside the
-work, not reconstructed from a paper or archaeological sources. The
-narrative is expected to change as results land.
-
-Read the main SKILL.md first. This file adds what's specific to
-interactive.
-
-Interactive differs from reproduction (no source paper to reconstruct
-from — the narrative is the researcher's own) and from retrofit (the
-work is still happening, not finished — you are authoring live, with
-the researcher in the loop).
-
-The core discipline is **provisional voice**: the narrative makes its
-own incompleteness visible, so a reader can tell at a glance what's
-settled and what's pending.
-
-## Workflow
-
-### 1 · Orient
-
-1. `astra.yaml` and each sub-analysis — whole files. Note where
-   `findings` are stub-level, where decisions are unresolved, where
-   outputs don't exist yet.
-2. Any project `CLAUDE.md` / working notes.
-3. Active fibers at `.felt/` (if present). Fibers are the best
-   substrate in interactive mode — they carry the researcher's live
-   thinking, recent pivots, open questions. Read the relevant
-   top-level fiber and anything it wikilinks.
-4. Existing narrative, if any. Revision preserves what lands.
-
-### 2 · Ask first, draft second
-
-Interactive mode is not archaeology. The researcher is available.
-Don't guess at motivation or the headline finding — ask. Use
-`AskUserQuestion` to batch:
-
-- **Research question.** What are we trying to learn? One sentence.
-- **Current headline finding.** What, if anything, has been
-  established so far? One sentence.
-- **Movement so far.** What has already happened in the work that
-  belongs in movement-of-learning? (Pivots, abandoned options, things
-  that surprised the researcher.)
-- **Implications the researcher would claim today.** What does the
-  result — as far as it's gone — *mean*? A gesture is fine; a
-  premature strong claim is not.
-
-The researcher's framing is the substrate. Don't draft around a guess
-at it.
-
-### 3 · Draft order (inverted from reproduction)
-
-In interactive mode, the executive summary is drafted *first* (as a
-stub, to fix intent) and revised last. This is the opposite of
-reproduction.
-
-1. **`summary` — stub.** One paragraph, provisional. States
-   the question and the current best-guess outcome. Explicitly marked
-   provisional (see below). Useful because it forces a clear statement
-   of intent the rest of the narrative can align with.
-2. **`methods`** — the substance. The process is live; methods is
-   where the live thinking goes. Name decisions in flight. Name
-   pivots. Use first-person plural, with dates where iteration
-   matters. Use `[<date>: <what changed>]` inline if it's load-bearing.
-3. **`findings`** — what's been established so far, with anchors to
-   `findings.<id>` that actually exist. Phrase claims to make
-   dependency visible: "pending validation in
-   [reconstruction](#analyses.reconstruction)."
-4. **`inputs`** — what the work rests on.
-5. **`outputs`** — thin; what's been promoted to the top level, if
-   any.
-6. **Return to `summary`** and revise it against the rest of
-   the draft. Re-mark provisional.
-
-For a decision in flight, `rationale:` can explicitly call out
-open-ness: "We are currently running with option X, pending validation
-of Y. See [[fiber or sub-analysis]]."
-
-### 4 · Provisional voice
-
-Make incompleteness visible in three ways:
-
-**Phrasing.** Not "we constrain X to 3%"; rather "our current best
-constraint on X is 3%, pending validation of the covariance in
-[reconstruction](#analyses.reconstruction)." Not "we detect Y"; rather
-"we detect Y at the 4σ level in the current fit, with the fit being
-revisited after the prior rescope lands."
-
-**Explicit markers.** At the top of `summary` (and optionally
-on any key that's unusually volatile), an italic note:
-
-```yaml
-summary: >
-  _(Provisional — revisit after bao_fitting.  Last updated 2026-04-23.)_
-  We are measuring the BAO scale in the DESI DR1 LRG tracer as a
-  warm-up before folding in ELGs and QSOs.  Current best result is
-  [an 8σ detection of the acoustic peak at z = 0.7
-  ](#findings.lrg_bao_detection), with the aggregate precision
-  constraint pending completion of the covariance validation in
-  [reconstruction](#analyses.reconstruction).
-```
-
-The `_(Provisional ...)_` prefix is a convention, not a spec field. It
-reads as expected-to-change without breaking the narrative shape.
-
-### 5 · Revision cadence
-
-Interactive narratives accrete. File fibers for:
-
-- The ceiling date for next revision.
-- Open questions that will force rewrites when they close.
-- Decisions in flight and what a different resolution would change in
-  the narrative.
-
-When a major result lands (headline finding solidified, pivotal
-decision settled), a full revision pass — including re-drafting the
-executive summary in reproduction-style (past tense, declarative) for
-the now-settled content, while keeping provisional markers on what's
-still open.
-
-### 6 · Voice
-
-- **First person plural** ("we are measuring," "we found"), present
-  tense for live work, past tense for completed steps.
-- **Hedge when uncertain; claim when confident.** Interactive mode has
-  a sharper hedging signal than reproduction — the author's current
-  confidence *is* what the reader needs to know. Don't over-hedge
-  defensively and don't under-hedge performatively.
-- **Name sub-analyses that don't exist yet.** If the plan is to run
-  `reconstruction` next and the current narrative anticipates its
-  output, say so: "Once [reconstruction](#analyses.reconstruction) is
-  run, we expect X; if the expectation fails, Y follows." This is
-  legitimate movement-of-learning: it captures what a result is being
-  interpreted *against*.
-
-### 7 · Critique (adds to SKILL.md base)
-
-**Provisional audit.**
-
-- Is every claim phrased consistently with the actual confidence level?
-- Are provisional markers present where the content is volatile?
-- Will a reader one week from now know which pieces need revisiting
-  vs. which are settled?
-
-**Freshness audit.**
-
-- Any "last updated" or "revisit after" markers still current, or
-  stale?
-- Any referenced sub-analysis or finding that has since changed but
-  the narrative still reflects the old state?
-
-## Anti-patterns (interactive-specific)
-
-- **False completeness.** Writing in reproduction voice ("we measure,"
-  "we constrain") when the measurement is in flight. Use "we are
-  measuring" / "our current constraint is X, pending Y."
-- **Over-committing to implications.** Promising what results will
-  mean before they land. A gesture is honest; a claim before evidence
-  is not.
-- **Skipping movement-of-learning because "it's still moving."** The
-  live process *is* the movement. Capture it while it's cheap; it's
-  the hardest content to reconstruct later.
-- **Solo drafting.** Interactive is the one mode where authoring
-  without asking produces fiction. The researcher is available; ask.
-- **Provisional everywhere.** If every sentence is hedged, the
-  narrative reads as afraid of itself. Hedge the genuinely uncertain
-  claims; state the settled ones plainly.
-- **Stale markers.** A "revisit after X" comment left in place after
-  X has landed is worse than no marker at all. Revise on each touch.
-
-## When interactive stabilizes
-
-When the work is done (paper draft ready, results published, project
-wrapping up), the narrative should be rewritten in reproduction voice.
-Interactive was scaffolding; the final narrative reads as a stable
-artifact. That rewrite is its own pass — switch modes and treat the
-project's own prior drafts as a source, like a paper.

From bf4df04676439defc5f11731cc7d001e0f534fe4 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sun, 10 May 2026 18:17:25 +0200
Subject: [PATCH 032/124] Realign acquire/architect references; move
 cited_papers.yaml to ACQUIRE

Aligns ACQUIRE and ARCHITECT references with the orchestrator + named
sub-agents architecture. Drops the constitution / per-phase mode / global
rigor-dial framing throughout; re-points scope/identity references at
CLAUDE.md; renames the rigor levels to cheap/heavy and frames them as
chosen per spawn from CLAUDE.md's Rigor section.

ACQUIRE picks up cited_papers.yaml (mechanical: marker, citation, DOI
from the bibliography). ARCHITECT's paper-side Explore now augments the
existing file in place with relevance notes rather than building it from
scratch. The synthesis step no longer writes cited_papers.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../lc-from-paper/references/acquire.md       | 47 +++++++++--
 .../lc-from-paper/references/architect.md     | 82 +++++++++----------
 2 files changed, 75 insertions(+), 54 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/acquire.md b/claude/lightcone/skills/lc-from-paper/references/acquire.md
index 6828dc5d..adf0d0cb 100644
--- a/claude/lightcone/skills/lc-from-paper/references/acquire.md
+++ b/claude/lightcone/skills/lc-from-paper/references/acquire.md
@@ -1,13 +1,13 @@
 # ACQUIRE — fetch the paper, structure it, clone the code
 
-Acquire the paper's reading materials and (when available) clone the reference code repository. The substrate work — LaTeX-source download, Docling fallback, figures, tables, outline, citations, embedded bibliography, paper-as-ASTRA-artifact — is delegated to **`/paper-extraction`**, which lc-from-paper trusts blindly. ACQUIRE adds **Step 2: code-clone**, which is reproduction-specific and stays here.
+Acquire the paper's reading materials and (when available) clone the reference code repository. The substrate work — LaTeX-source download, Docling fallback, figures, tables, outline, citations, embedded bibliography, paper-as-ASTRA-artifact — is delegated to **`/paper-extraction`**, which lc-from-paper trusts blindly. ACQUIRE adds **Step 2: code-clone** and **Step 3: cited-papers index** on top.
 
-The constitution's per-phase mode controls whether this runs interactively or as a sub-agent. Default is sub-agent — surfacing happens only on download failures.
+This phase runs as the orchestrator-spawned `acquire` sub-agent. The orchestrator launches it, the user can drop into its chat for any failures (download issues, missing code repo), and it commits each artifact as it lands.
 
 ## Inputs
 
-- The paper's DOI or arXiv ID (from the constitution)
-- An optional code repo URL (from the interview, if the user knew it)
+- The paper's DOI or arXiv ID (from CLAUDE.md's Paper section)
+- An optional code repo URL (from the interview, if the user knew it; recorded in CLAUDE.md)
 
 ## Outputs
 
@@ -27,6 +27,10 @@ After Step 2 (this phase):
 - `work/reference/code/` — cloned reference repo (or absent if not found)
 - `work/reference/code-status.yaml` — record of where the code came from
 
+After Step 3 (this phase):
+
+- `work/notes/cited_papers.yaml` — citation marker → DOI mapping for every paper cited by the target paper. SPECIFY consumes this when authoring `prior_insights:` placeholders; LITERATURE consumes it when fetching cited papers to resolve those placeholders.
+
 ## Step 1 — Stand up the paper's reading materials
 
 Invoke `/paper-extraction <arxiv-id-or-doi>`. The skill is idempotent — it surveys `work/reference/` first and skips work that's already done.
@@ -41,7 +45,7 @@ Two starting surfaces: `work/reference/index.json` (structural — figures, tabl
 
 ## Step 2 — Clone the reference code repository
 
-This step matters more than its size suggests. When `work/reference/code/` exists, every implementing iteration treats it as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md). Without it, iterations have only the paper to anchor to and drift toward "looks right" rather than "matches."
+This step matters more than its size suggests. When `work/reference/code/` exists, every implementing sub-agent treats it as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md's Rules). Without it, sub-agents have only the paper to anchor to and drift toward "looks right" rather than "matches."
 
 1. Search the paper text for repository URLs — abstract, intro, conclusion, footnotes, "Code Availability" or "Data Availability" sections. (Path A: grep across `work/reference/source/*.tex`. Path B: grep `work/reference/document.md`.)
 2. If none found, web search: paper title + "github", Papers With Code, or the first author's GitHub profile.
@@ -61,13 +65,37 @@ Spend no more than a few searches before recording failure and moving on. **Do N
 
 Skip Step 2 if `work/reference/code/` already exists.
 
+## Step 3 — Build the cited-papers index
+
+Walk the paper's bibliography and produce `work/notes/cited_papers.yaml`: one entry per paper the target paper cites, keyed by citation marker form (the same form the paper invokes — `Smith+24`, `(Doe & Lee 2023)`, numeric `[12]`), each entry carrying the cited paper's DOI when resolvable.
+
+Sources for the bibliography:
+
+- **Path A (arXiv LaTeX):** `work/reference/bibliography-source.{bib,bbl}` carries the bibliography. Each `\bibitem{}` (or `.bib` entry) yields one record. DOIs come from the `doi` field where present; for entries without one, search for the title via Crossref / arXiv / ADS to resolve.
+- **Path B (PDF + Docling):** the references section is at the end of `work/reference/document.md`. Each citation block yields one record; DOIs are resolved by title search.
+
+The `relevance:` field is **not authored here** — it's filled in downstream. SPECIFY adds per-citation relevance notes when it links a citation to a decision (the `prior_insights:` placeholder authoring step), and LITERATURE deepens those notes as it fetches each cited paper. ACQUIRE just lays the index down.
+
+Output shape:
+
+```yaml
+papers:
+  - marker: "Smith+24"          # or "[12]" or whatever form the target paper uses
+    citation: "Smith et al. (2024), J. Cosmology"
+    doi: "10.xxxx/yyyy"          # null if unresolvable
+    # relevance: filled in by SPECIFY / LITERATURE
+```
+
+Skip Step 3 if `work/notes/cited_papers.yaml` already exists.
+
 ## Survey signals (entry into ACQUIRE)
 
-Run `ls work/reference/` first.
+Run `ls work/reference/ work/notes/` first.
 
 - If `paper.pdf` is present, **and** the path indicator (`source/` for Path A or `document.md` for Path B) is present, **and** `index.json` is present → Step 1 is done.
 - If `work/reference/code/` is present (or `code-status.yaml` records `found: false`) → Step 2 is done.
-- When both are done, ACQUIRE is complete; proceed to ARCHITECT.
+- If `work/notes/cited_papers.yaml` is present → Step 3 is done.
+- When all three are done, ACQUIRE is complete; the orchestrator proceeds to ARCHITECT.
 - Otherwise, run whichever step is missing. `/paper-extraction` handles its own idempotency for Step 1.
 
 ## Notes
@@ -75,5 +103,6 @@ Run `ls work/reference/` first.
 - **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces that paper-extraction doesn't cover, file it as paper-extraction work — not as ACQUIRE work.
 - **arXiv DOI form is `10.48550/arXiv.<id>`.** Useful when downstream tools want a DOI rather than an arXiv ID.
 - **Equation numbers and section numbers must match the rendered paper.** When citing "eq. N" or "§N" in any downstream phase, find the equation or heading by content, not by a naïve count of TeX blocks or markdown headings. Path A: source preserves printed numbers in `\label{}`s. Path B: Docling preserves printed numbers in its markdown.
-- **This phase is acquisition + code-clone, not understanding.** Do not start indexing or comparing the paper here — that's ARCHITECT.
-- **Code-as-canonical** is loaded by every subsequent phase. The per-paper `CLAUDE.md` restates the rule; ACQUIRE just makes sure `work/reference/code/` exists when possible.
+- **This phase is acquisition + code-clone + bibliography, not understanding.** Do not start indexing or comparing the paper here — that's ARCHITECT. The cited-papers index is mechanical: marker, citation, DOI. Relevance per citation lands later, where it's actually being used.
+- **Code-as-canonical** is loaded by every subsequent sub-agent. The per-paper `CLAUDE.md` restates the rule; ACQUIRE just makes sure `work/reference/code/` exists when possible.
+- **Commit each step as it lands.** ACQUIRE runs as a sub-agent; the orchestrator reads `git log` to see how far it got. One commit per artifact (paper materials, code clone, cited-papers index) keeps the trail readable.
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 1110ed2c..d40f9f12 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -1,29 +1,27 @@
 # ARCHITECT — write the stub `astra.yaml`
 
-ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub in with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps the cognitive load on each phase manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
+ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub in with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps the cognitive load on each sub-agent manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
 
-This phase replaces the old STUDY. The old shape wrote per-section paper-vs-code agreement-check files in markdown — same content SPECIFY would re-author into `astra.yaml` next. The new shape skips the markdown intermediate: ARCHITECT writes the structural skeleton directly in YAML, and SPECIFY's per-sub-analysis paper-pass / code-pass authors the content. One translation layer fewer.
-
-The constitution's per-phase mode is **always sub-agent** for this phase. The work is two parallel Explore sub-agents (one paper-side, one code-side), then one synthesis sub-agent that produces the stub. After the stub lands, a rigor-dialed self-review pass cross-checks it against paper + code before SPECIFY runs.
+This phase runs as the orchestrator-spawned `architect` sub-agent. Internally it does its work in three steps: paper-side index (Explore), code-side index (Explore), synthesis (write the stub). The two Explore reads run in parallel; synthesis runs once both indexes exist. After the stub lands, a self-review pass cross-checks it against paper + code; how heavy that review is, the orchestrator picks per spawn from CLAUDE.md's Rigor section.
 
 ## Inputs
 
 - `work/reference/source/` (Path A — arXiv LaTeX) **or** `work/reference/document.md` + `work/reference/figures/` + `work/reference/tables/` + `work/reference/metadata.json` (Path B — Docling)
 - `work/reference/code/` — the reference code repo (when present)
-- The per-paper constitution — names the user's intended replication targets (figures, tables, numbers) in its **Desired State**
-- `work/notes/notes.md` — user-supplied prior notes, if any (read by every phase if present)
+- `work/notes/cited_papers.yaml` — citation marker → DOI mapping from ACQUIRE (consumed by the paper-side Explore for cross-referencing citations against decision clusters)
+- CLAUDE.md — the per-paper artifact at the workdir root; its **Goal** section names the user's intended replication targets
+- `work/notes/notes.md` — user-supplied prior notes, if any (read by every sub-agent if present)
 
 ## Outputs
 
 - `astra.yaml` — **stub form**: sub-analyses named, architecture wired (inputs / outputs declared at the sub-analysis level), high-level `narrative:` prose blocks per analysis. **No `decisions:`, `prior_insights:`, `findings:`, or `astra-anchor:` references yet** — those entries don't exist for the narrative to reference.
 - `work/notes/architect/paper-index.md` — paper-side Explore output: section list, sub-analysis boundary candidates, decision clusters, result loci (figures / tables / quoted numerics)
 - `work/notes/architect/code-index.md` — code-side Explore output: top-level module map, natural decomposition, entry-points, where the analysis stages live
-- `work/notes/cited_papers.yaml` — citations worth following up on for prior insights (what LITERATURE consumes); populated from the paper-side index
-- `work/notes/architect/review-round-<N>.md` — each rigor-dialed self-review round's findings (rigor only; one file per round)
+- `work/notes/architect/review-round-<N>.md` — each self-review round's findings (one file per round; how many rounds depends on the rigor setting the orchestrator chose for this spawn)
 
-## Step 1: Two parallel Explore sub-agents
+## Step 1: Two parallel Explore reads
 
-Spawn two Task-tool sub-agents in parallel. Each is bounded — neither tries to compare paper to code, and neither writes `astra.yaml`. Their job is to give the synthesis sub-agent enough indexed context to draft the stub.
+From inside the architect sub-agent's session, spawn two Task-tool Explore sub-agents in parallel. Each is bounded — neither tries to compare paper to code, and neither writes `astra.yaml`. Their job is to give the synthesis step (next) enough indexed context to draft the stub.
 
 ### Paper-side Explore — system prompt
 
@@ -40,7 +38,7 @@ Spawn two Task-tool sub-agents in parallel. Each is bounded — neither tries to
 > 2. **Sub-analysis boundary candidates.** Where does the paper's pipeline have natural seams — places one stage's output flows as the next stage's input? Look for: a reconstruction stage producing a catalog consumed by a clustering stage; an MCMC producing a chain consumed by a parameter-estimation stage; a fit producing posteriors consumed by a comparison stage. Name each candidate with a noun phrase (`reconstruction`, `clustering`, `bao_fit`) and one-line description.
 > 3. **Decision clusters per sub-analysis.** Group the paper's choices by where they sit in the pipeline. Don't enumerate every choice — name the *clusters* (e.g. "fitting prior choices", "selection criteria for the catalog"). SPECIFY drills back into the paper to author each `decisions:` entry; you're indicating where to look.
 > 4. **Result loci.** Which figures / tables / in-text metrics report the paper's primary and secondary results? Use `path:line` for the `\includegraphics{}` or table source (Path A); use `metadata.json` indexes for Path B. Tag each as primary / secondary based on the paper's own emphasis.
-> 5. **Citations worth following up.** Citations that justify a method, parameter, or value (not general background). DOI when resolvable + one-line on why the citation matters. The synthesis agent merges your list into `work/notes/cited_papers.yaml` for LITERATURE to mine.
+> 5. **Augment `work/notes/cited_papers.yaml` with relevance notes.** Read the file (already populated by ACQUIRE with marker → citation → DOI for every citation in the bibliography). For each citation that justifies a method, parameter, or value used by the analysis (not general background), add a `relevance:` field with a one-line note on why the citation matters for replication. Skip citations that are pure background. Edit the file in place; preserve every entry (do not delete, even if you don't add relevance to most of them).
 > 6. **Data-flow shape.** A short prose paragraph: "Inputs flow from <source datasets> through <stage 1> producing <intermediate>, into <stage 2> producing <intermediate>, into <stage 3> producing <primary result>." This becomes the seed for the root narrative's data-flow paragraph.
 >
 > ### Output format — `work/notes/architect/paper-index.md`
@@ -61,17 +59,16 @@ Spawn two Task-tool sub-agents in parallel. Each is bounded — neither tries to
 > ## Result loci (primary + secondary)
 > - **<figure / table / metric>** — `<source-path:line>` or `metadata.json#<id>`; reported in §<X>; primary | secondary.
 >
-> ## Citations worth following up
-> - **<citation>** — DOI: <doi> — <one-line on why this citation matters for replication>.
->
 > ## Data-flow shape
 > <one-paragraph prose: how inputs flow through the pipeline to the primary result>.
 > ```
 >
+> Augmented relevance notes go directly into `work/notes/cited_papers.yaml`, not the index file.
+>
 > ### Rules
 >
 > - **Bounded read.** Do not read the code repo. Your job is paper-side only.
-> - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`. Those are SPECIFY's. Your output is markdown, not YAML.
+> - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`. Those are SPECIFY's. Your primary output is markdown (the index); the only YAML you touch is `work/notes/cited_papers.yaml`, and there only the `relevance:` field per existing entry.
 > - **Quote sparingly.** Brief paper quotes are OK to disambiguate a result locus or a sub-analysis boundary; verbatim claim quotes are SPECIFY's substrate, not yours.
 
 ### Code-side Explore — system prompt
@@ -118,9 +115,9 @@ Spawn two Task-tool sub-agents in parallel. Each is bounded — neither tries to
 > - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`, no recipes. Your output is markdown, not YAML.
 > - **Trust the imports.** Module dependencies tell the natural decomposition story more reliably than the README's prose summary.
 
-## Step 2: Synthesis sub-agent — write the stub `astra.yaml`
+## Step 2: Synthesis — write the stub `astra.yaml`
 
-Spawn one synthesis sub-agent that reads both index files and writes the stub. This is where the structural decisions actually get made: the synthesis agent reconciles paper-side vs code-side sub-analysis decompositions, picks the unified set of sub-analysis IDs, wires inputs and outputs at the sub-analysis level, and authors the high-level `narrative:` prose blocks.
+Once both index files exist, the architect sub-agent does the synthesis directly (no further fan-out). This is where the structural decisions actually get made: reconcile paper-side vs code-side sub-analysis decompositions, pick the unified set of sub-analysis IDs, wire inputs and outputs at the sub-analysis level, and author the high-level `narrative:` prose blocks. If the architect sub-agent prefers to delegate the synthesis to a fresh Task-tool sub-agent for a clean context window, that is fine — the prompt below covers either case.
 
 > You are an ASTRA architecture-synthesis agent. You read paper-side and code-side indexes and produce the stub `astra.yaml` that SPECIFY will fill in.
 >
@@ -129,7 +126,7 @@ Spawn one synthesis sub-agent that reads both index files and writes the stub. T
 > - `work/notes/architect/paper-index.md` — paper-side Explore output
 > - `work/notes/architect/code-index.md` — code-side Explore output (when present)
 > - `work/notes/notes.md` — user-supplied notes (if present)
-> - The per-paper constitution at the project root — its **Desired State** names the user's intended replication targets
+> - CLAUDE.md at the workdir root — its **Goal** section names the user's intended replication targets
 >
 > ### What to do
 >
@@ -137,17 +134,9 @@ Spawn one synthesis sub-agent that reads both index files and writes the stub. T
 > 2. **Choose: one analysis or sub-analyses?** If the paper has only one stage end-to-end (no clean intermediate handoffs), write a single analysis. If the paper has genuinely independent stages (each one's output flows as the next one's input), write sub-analyses. Sub-analysis IDs must be noun phrases (not verb phrases): `reconstruction`, `clustering`, `bao_fit`. Avoid reserved names (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
 > 3. **Wire inputs and outputs at the sub-analysis level.** For each sub-analysis:
 >    - Declare `inputs:` from the data-dependency list in the code-side index plus any paper-named external datasets. The depth (acquisition path, selection criteria) is SPECIFY's; ARCHITECT names the input and gives it a stable id.
->    - Declare `outputs:` matching the result loci from the paper-side index plus any intermediate artifacts a downstream sub-analysis consumes. Tag each output's `priority:` from the paper's emphasis (primary / secondary). The reproduction's targeted scope from the constitution's Desired State takes precedence — if the user only wants Figure 3 and Table 2, only those land as `outputs:` (the rest are out-of-scope and noted as such).
+>    - Declare `outputs:` matching the result loci from the paper-side index plus any intermediate artifacts a downstream sub-analysis consumes. Tag each output's `priority:` from the paper's emphasis (primary / secondary). The reproduction's targeted scope from CLAUDE.md's **Goal** takes precedence — if the user only wants Figure 3 and Table 2, only those land as `outputs:` (the rest are out-of-scope and noted as such).
 > 4. **Author the root and per-analysis narrative.** Use `/narrative` for prose authoring (it carries the discipline on reserved names, voice, the data-flow paragraph requirement). High-level prose only — *no `astra-anchor:` references yet, because the entries those would point at don't exist*. SPECIFY will weave in anchors as it authors `decisions:` / `prior_insights:` / `findings:` per sub-analysis. The root `narrative:` MUST include a top-down end-to-end data-flow paragraph (per the narrative skill's data-flow rules — closes lightcone-cli#108) when sub-analyses exist.
-> 5. **Build `work/notes/cited_papers.yaml`** from the paper-side index's "Citations worth following up" entries:
->    ```yaml
->    papers:
->      - doi: "10.xxxx/yyyy"
->        citation: "Smith et al. (2020)"
->        relevance: "One-line description of why this paper matters for replication"
->    ```
->    This is the marker→DOI map SPECIFY uses to write each `prior_insights:` placeholder's `doi:` field, and LITERATURE consumes when fetching the cited papers to resolve those placeholders.
-> 6. **Validate** with `astra validate astra.yaml`. The stub MUST validate as written — even with empty `decisions:` / `prior_insights:` / `findings:` blocks, the structural fields and the narrative prose must pass schema checks.
+> 5. **Validate** with `astra validate astra.yaml`. The stub MUST validate as written — even with empty `decisions:` / `prior_insights:` / `findings:` blocks, the structural fields and the narrative prose must pass schema checks.
 >
 > ### Stub shape — what `astra.yaml` looks like after ARCHITECT
 >
@@ -190,20 +179,22 @@ Spawn one synthesis sub-agent that reads both index files and writes the stub. T
 > - **Stub, not snapshot.** Don't try to author content for `decisions:`, `prior_insights:`, `findings:`. Those go in SPECIFY. Your job is the structural skeleton.
 > - **Reserved names.** Sub-analysis IDs are noun phrases; avoid the reserved set listed above. Each ID must be unique across the spec.
 > - **Code-as-canonical for structure.** Where paper and code disagree on the decomposition, the code's structure is canonical (the paper compresses for narrative; the code reveals real seams).
-> - **Targeted scope wins.** The constitution's Desired State scopes the reproduction. If the user only wants Figures 3 and 4 plus Table 2, only those land as `outputs:` in the stub.
+> - **Targeted scope wins.** CLAUDE.md's **Goal** scopes the reproduction. If the user only wants Figures 3 and 4 plus Table 2, only those land as `outputs:` in the stub.
 > - **Narrative prose, no anchors.** Author `narrative:` prose at the root and per-sub-analysis level. Do NOT add `astra-anchor:` references — the entries those would point at don't exist yet.
 > - **Validate before exit.** `astra validate astra.yaml` must return clean.
 
-## Step 3: Rigor-dialed self-review
+## Step 3: Self-review (rigor chosen per spawn)
 
 After the stub lands, a fresh-context sub-agent cross-checks it against paper + code: are the sub-analyses the right decomposition? Are the inputs and outputs declared at the sub-analysis level wired correctly? Does the narrative prose accurately describe what each sub-analysis does?
 
-The depth of self-review is set by the constitution's frugality / rigor dial:
+The depth of self-review is set by the rigor level the orchestrator picked when it spawned this `architect` sub-agent — read CLAUDE.md's **Rigor** section for the current state and what the orchestrator flagged as the chosen rigor for this spawn:
+
+- **Cheap:** skip review entirely, or run a single fresh-context Task-tool sub-agent pass and incorporate its fixes once.
+- **Heavy:** N rounds — each round spawns a fresh Task-tool reviewer against `astra.yaml` + paper + code; the architect sub-agent incorporates fixes (regenerate the stub or edit it directly for trivial cases); the next round spawns another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap.
 
-- **Frugal:** skip review entirely, or run a single fresh-context sub-agent pass and incorporate its fixes once.
-- **Rigor:** N rounds — each round runs a fresh reviewer against `astra.yaml` + paper + code; ARCHITECT incorporates fixes (regenerate the stub or edit it directly for trivial cases); the next round runs another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes (the strong-termination criterion the loop already uses), or a 5-round system cap.
+The discipline: each round spawns a brand-new Task-tool sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; the architect sub-agent edits the stub between rounds (or for trivial mechanical fixes, the orchestrator can do the edit directly).
 
-The discipline matches REVIEW's old shape (folded here): each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; a separate fix pass (the orchestrator inline for trivial fixes, or another ARCHITECT iteration for structural changes) edits the stub.
+After the self-review terminates, the architect sub-agent updates CLAUDE.md's **Rigor** section with the post-spawn state of `astra.yaml` (e.g. *stub: baseline* after a cheap pass, *stub: tightened* after heavy review). That keeps the picture honest across sub-agents.
 
 ### Per-round fresh sub-agent — system prompt
 
@@ -216,14 +207,14 @@ The discipline matches REVIEW's old shape (folded here): each round runs a brand
 > - `work/notes/architect/code-index.md` — code-side Explore output (when present)
 > - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
 > - `work/reference/code/` (when present) — canonical reference for stage boundaries + entry-points
-> - The per-paper constitution — for the Desired State scope fence
+> - CLAUDE.md — for the **Goal** section's scope fence
 >
 > ### What to check
 >
 > 1. **Sub-analysis decomposition.** Are the sub-analyses the right cuts? Where the code structure shows a clean stage boundary, is the stub's split consistent with it? Where the paper compresses across stages, is the stub's decomposition still defensible against the code? Where there is no code, does the stub's decomposition match the paper's natural seams?
 > 2. **Sub-analysis IDs.** Noun phrases, not verb phrases. No reserved-name collisions (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
 > 3. **Inputs at sub-analysis level.** Each declared input has a stable id; the data dependency is real (cross-check against `work/notes/architect/code-index.md`'s External-data-dependencies list and the paper's data section). No phantom inputs invented to round out the structure.
-> 4. **Outputs at sub-analysis level.** Each declared output corresponds to a result locus from the paper-side index OR an intermediate artifact a downstream sub-analysis consumes. The targeted scope from the constitution's Desired State is honored — no out-of-scope outputs sneaking in, no in-scope targets missed.
+> 4. **Outputs at sub-analysis level.** Each declared output corresponds to a result locus from the paper-side index OR an intermediate artifact a downstream sub-analysis consumes. The targeted scope from CLAUDE.md's **Goal** is honored — no out-of-scope outputs sneaking in, no in-scope targets missed.
 > 5. **Narrative coverage.** The root narrative includes a data-flow paragraph (when sub-analyses exist). Each sub-analysis's `narrative:` accurately describes its role. No `astra-anchor:` references at this stage (those land in SPECIFY); flag any that snuck in.
 > 6. **Validates.** `astra validate astra.yaml` returns clean.
 >
@@ -259,27 +250,28 @@ The discipline matches REVIEW's old shape (folded here): each round runs a brand
 
 ### Termination
 
-- `weak` (frugal): one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
-- `strong` (rigor):
+- **Cheap:** one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
+- **Heavy:**
   - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
   - If round N is the first round (N=1), spawn round 2 unconditionally so we can compare.
   - If round N produced fixes, spawn round (N+1) as a fresh sub-agent that does not see round N's findings or the fixes.
-  - If N hits the system cap of 5 rounds without two consecutive clean rounds, surface to the user via `AskUserQuestion`: "ARCHITECT review reached round cap with N fixes still landing; continue, accept the current stub, or revise the constitution?" Default on user silence: accept the current stub, log the unfinished tail in `<paper-slug>/open-questions.md`, proceed to LITERATURE.
+  - If N hits the 5-round system cap without two consecutive clean rounds, the architect sub-agent stops and reports back to the orchestrator. If the user is reachable in the architect sub-agent's chat or the orchestrator session, ask in prose: "ARCHITECT review reached round cap with N fixes still landing; continue, accept the current stub, or revise scope?" If the user is unreachable, accept the current stub, log the unfinished tail in `open-questions.md` at the workdir root, and let the orchestrator decide whether to proceed to SPECIFY or re-spawn ARCHITECT later.
 
 ## Survey signals (entry into ARCHITECT)
 
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) exists ⇒ ready to architect
+- `work/notes/cited_papers.yaml` exists from ACQUIRE ⇒ paper-side Explore can augment it with `relevance:` notes
 - `work/notes/architect/paper-index.md` and `work/notes/architect/code-index.md` (if code present) exist ⇒ Explore pass done
 - `astra.yaml` exists; `astra validate astra.yaml` returns clean; sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks are present-and-empty ⇒ stub written
-- For frugal: `work/notes/architect/review-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ ARCHITECT done
-- For rigor: two consecutive `work/notes/architect/review-round-<N>.md` files both have verdict `clean` ⇒ ARCHITECT done; proceed to LITERATURE
-- `work/notes/cited_papers.yaml` exists ⇒ LITERATURE has its input
+- For cheap: `work/notes/architect/review-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ ARCHITECT done
+- For heavy: two consecutive `work/notes/architect/review-round-<N>.md` files both have verdict `clean` ⇒ ARCHITECT done; orchestrator proceeds to SPECIFY
 
 ## Notes
 
-- **Run the Explore sub-agents in parallel.** They're fully independent (one reads paper-only, one reads code-only). The synthesis agent runs once, after both index files exist.
-- **The Explore agents do not write `astra.yaml`.** They write index markdown. Only the synthesis agent writes the stub. This separation keeps each Explore agent's context bounded — they don't have to think about ASTRA's schema, only the read.
+- **Run the Explore reads in parallel.** They're fully independent (one reads paper-only, one reads code-only). Synthesis runs once, after both index files exist.
+- **The Explore reads do not write `astra.yaml`.** They write index markdown (and the paper-side adds `relevance:` notes to `work/notes/cited_papers.yaml`). Only the synthesis step writes the stub. This separation keeps each Explore read's context bounded — it doesn't have to think about ASTRA's schema, only the read.
 - **The stub's empty blocks are intentional.** `decisions: {}`, `prior_insights: {}`, `findings: {}` make it clear at a glance that ARCHITECT's job is structural, and that SPECIFY is what fills them. Don't try to half-author content — empty is honest.
 - **Code-as-canonical for structure, paper-as-canonical for narrative voice.** The code reveals where the real stage boundaries are; the paper provides the words to describe them. The stub uses both.
-- **Resume is automatic.** If `astra.yaml` already validates and has the structural fields populated, the orchestrator skips Step 1 and Step 2 and runs Step 3 (review) only.
+- **Resume is automatic.** If `astra.yaml` already validates and has the structural fields populated, on re-spawn the architect sub-agent skips Step 1 and Step 2 and runs Step 3 (review) only.
 - **The narrative skill is the prose author, not the structure author.** Invoke `/narrative` for the prose blocks; ARCHITECT's job is the structural skeleton plus invoking `/narrative` to fill the `narrative:` keys cleanly.
+- **Commit each artifact as it lands.** The orchestrator reads `git log` to see how far the architect sub-agent got. Indexes commit before the stub; the stub commits before any review-round files; review-round files commit one per round. Small, descriptive commits keep the trail readable.

From bc6c266309be8a929e06b1a1e0dbaee4793c397b Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sun, 10 May 2026 18:24:09 +0200
Subject: [PATCH 033/124] Realign specify/literature/implement references to
 orchestrator architecture
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reframes the user-reachability story for the three artifact-producing
sub-agents that follow ARCHITECT. The interactive-vs-sub-agent dichotomy
goes away: every phase is a sub-agent, the user can drop into its chat
for ratification, and questions are asked in prose when reachable or
logged to CLAUDE.md's Paper-vs-code disagreements + open-questions.md
when not.

Renames the rigor levels weak/strong (frugal/rigorous) → cheap/heavy and
roots them in CLAUDE.md's Rigor section, chosen per spawn rather than as
a global constitution dial. Drops AskUserQuestion-on-cap-hit (sub-agents
can't call it); termination instead reports back to the orchestrator.

Internal fan-out becomes Task-tool sub-sub-agents from inside the named
sub-agent's session (per-paper resolution in LITERATURE, per-output
implementation in IMPLEMENT, per-sub-analysis review in SPECIFY).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../lc-from-paper/references/implement.md     | 55 +++++++++---------
 .../lc-from-paper/references/literature.md    | 56 ++++++++++---------
 .../lc-from-paper/references/specify.md       | 50 +++++++++--------
 3 files changed, 84 insertions(+), 77 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index 008d1a09..5bab21d7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -1,8 +1,8 @@
-# IMPLEMENT — write scripts and recipes; rigor-dialed self-review
+# IMPLEMENT — write scripts and recipes; per-spawn self-review
 
-Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, a rigor-dialed self-review pass cross-checks the implementation against paper + code — same fresh-context-no-bias shape ARCHITECT and SPECIFY use. Fixes feed back into IMPLEMENT for the next iteration.
+Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, a self-review pass cross-checks the implementation against paper + code — same fresh-context-no-bias shape ARCHITECT, SPECIFY, and LITERATURE use. Fixes feed back inside the same implement sub-agent for the next iteration.
 
-The constitution's per-phase mode defaults this to **sub-agent**. Most implementation is mechanical (translate spec → script), but algorithm choices on tricky steps may want ratification. Where parallelization is feasible (multiple independent outputs from different scripts), spawn one sub-agent per output and merge.
+This phase runs as the orchestrator-spawned `implement` sub-agent. Most implementation is mechanical (translate spec → script), but algorithm choices on tricky steps may want user ratification — the user can drop into the implement sub-agent's chat for that. Where parallelization is feasible (multiple independent outputs from different scripts), the implement sub-agent fans out to one Task-tool sub-sub-agent per output and merges.
 
 ## Inputs
 
@@ -10,38 +10,40 @@ The constitution's per-phase mode defaults this to **sub-agent**. Most implement
 - `implementation-notes.md` — tricky algorithms, numerical gotchas, data-format quirks
 - `work/notes/architect/paper-index.md` — for context when the spec compresses (sub-analysis decomposition, result loci, decision clusters)
 - `work/notes/architect/code-index.md` (when code present) — natural decomposition + entry-points + data dependencies + gotchas (the canonical map of where each sub-analysis's logic lives in `work/reference/code/`)
-- `work/reference/code/` (if present) — **canonical reference. Read on every iteration when implementing.** Where paper and code disagree, code wins for numerics, plotting, and method.
+- `work/reference/code/` (if present) — **canonical reference. Read it when implementing each output.** Where paper and code disagree, code wins for numerics, plotting, and method.
+- CLAUDE.md — **Rigor** for this spawn's chosen rigor level; **Paper-vs-code disagreements** for prior conflicts already logged.
 
 ## Outputs
 
 - `scripts/<output>.py` (or `.sh`, or whatever fits) — one script per output (or shared scripts for tightly-coupled outputs)
 - `requirements.txt` — Python dependencies
 - Recipes in `astra.yaml` — each output gets a `recipe:` block with `command:` and `inputs:`
-- `work/notes/implement-review/round-<N>.md` — each rigor-dialed review round's findings (rigor only; one file per round)
+- `work/notes/implement-review/round-<N>.md` — each review round's findings (one file per round; how many rounds depends on the rigor level)
+- CLAUDE.md updates — append to **Paper-vs-code disagreements** for any new conflict surfaced during implementation; update **Rigor** with the post-spawn state per output (e.g. *baseline* after a cheap pass, *tightened* after heavy review).
 
 ## Step 1: write recipes + scripts
 
 Read `astra.yaml` and `implementation-notes.md`. For each output, write a script in `scripts/` that produces it, and add a `recipe:` block to the output's entry in `astra.yaml` with `command:` and `inputs:`.
 
-If `work/reference/code/` exists, **read the relevant code on every iteration** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that SPECIFY's code pass missed:
+If `work/reference/code/` exists, **read the relevant code when implementing each output** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that SPECIFY's code pass missed:
 
-- **Interactive IMPLEMENT** (rare; usually sub-agent): surface via `AskUserQuestion`.
-- **Sub-agent IMPLEMENT** (default): continue with the code's behavior, append the disagreement to `<paper-slug>/open-questions.md`, and note it in `implementation-notes.md` so REVIEW (close-out) can ratify or override.
+- **User reachable** (in the implement sub-agent's chat): ask in prose — paper method + code method + plausible impact + which one to take.
+- **User unreachable**: continue with the code's behavior, append the disagreement to CLAUDE.md's **Paper-vs-code disagreements** AND `open-questions.md`, and note it in `implementation-notes.md` so REVIEW close-out can ratify or override.
 
-Without this discipline, iterations drift to "looks right" rather than "matches" — the failure mode the first-paper test surfaced.
+Without this discipline, the implementation drifts to "looks right" rather than "matches" — the failure mode the first-paper test surfaced.
 
 When the reference code is substantial enough that implementation is really a migration of an existing codebase, follow `/lc-from-code`'s migration workflow in **augment existing ASTRA** mode. Use its code scan, minimal parameter-plumbing, dependency/container, and baseline-preservation strategies, but apply them to this reproduction's existing `astra.yaml`. Do not create a second ASTRA project or duplicate the spec; add recipes, code-backed options, implementation notes, and missing structure to the current reproduction artifact.
 
 ### Parallelize where feasible
 
-When outputs are produced by independent scripts (no shared expensive computation), spawn one Task-tool sub-agent per output. Each sub-agent gets:
+When outputs are produced by independent scripts (no shared expensive computation), the implement sub-agent spawns one Task-tool sub-sub-agent per output. Each sub-sub-agent gets:
 
 - The output's spec entry from `astra.yaml` (including its sub-analysis's `decisions:` / `findings:` for context)
 - The relevant section of `implementation-notes.md`
 - The matching entry in `work/notes/architect/code-index.md`'s natural-decomposition / entry-points block — that's the pointer back to the canonical code location for the sub-analysis the output lives in
 - The relevant code path(s) under `work/reference/code/`
 
-The orchestrator merges scripts and recipes after the per-output sub-agents finish. Tightly-coupled outputs (e.g. an MCMC producing both a chain and a summary statistic) stay in one sub-agent and one script.
+The implement sub-agent merges scripts and recipes after the per-output sub-sub-agents finish. Tightly-coupled outputs (e.g. an MCMC producing both a chain and a summary statistic) stay in one sub-sub-agent and one script.
 
 ### Rules for the first pass
 
@@ -52,14 +54,14 @@ The orchestrator merges scripts and recipes after the per-output sub-agents fini
 5. **Do not execute scripts** — the RUN phase handles execution via `lc run`.
 6. **Validate** with `astra validate astra.yaml` after adding recipes.
 
-## Step 2: rigor-dialed self-review
+## Step 2: self-review (rigor chosen per spawn)
 
-After the first-pass implementation lands, the constitution's frugality / rigor dial decides what happens next:
+After the first-pass implementation lands, the rigor level the orchestrator picked for this spawn (read CLAUDE.md's **Rigor** section) decides what happens next:
 
-- **Frugal:** one minimal review pass — a single fresh sub-agent reads `scripts/`, `astra.yaml`'s recipes, and the paper, and reports any obvious paper-vs-implementation inconsistencies. Fixes are applied once; no further iteration. If no fixes are needed, IMPLEMENT proceeds to RUN.
-- **Rigor:** N rounds of fresh-context sub-agent review + fix. Each round runs a fresh reviewer that does not see the prior round's findings or fixes. Stop when **two consecutive rounds find no fixes** (strong termination criterion), or after 5 rounds (system cap), whichever comes first.
+- **Cheap:** one minimal review pass — a single fresh Task-tool sub-agent reads `scripts/`, `astra.yaml`'s recipes, and the paper, and reports any obvious paper-vs-implementation inconsistencies. Fixes are applied once; no further iteration. If no fixes are needed, IMPLEMENT proceeds to RUN.
+- **Heavy:** N rounds of fresh-context Task-tool sub-agent review + fix. Each round spawns a fresh reviewer that does not see the prior round's findings or fixes. Stop when **two consecutive rounds find no fixes**, or after 5 rounds (system cap), whichever comes first.
 
-The discipline is the same shape ARCHITECT and SPECIFY use: each round's reviewer is fresh, prompted to check "is the implementation consistent with the paper and the code?", and outputs findings only — not edits. Fixes are applied between rounds by a separate IMPLEMENT-fix sub-agent (or the orchestrator inline for trivial mechanical fixes). Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
+The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: each round's reviewer is fresh, prompted to check "is the implementation consistent with the paper and the code?", and outputs findings only — not edits. Fixes are applied between rounds by the implement sub-agent itself (or the orchestrator inline for trivial mechanical fixes). Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
 
 ### Per-round fresh sub-agent — system prompt
 
@@ -116,14 +118,14 @@ The discipline is the same shape ARCHITECT and SPECIFY use: each round's reviewe
 
 ### Step 3: IMPLEMENT-fix pass between rounds
 
-After each round's findings file lands, an IMPLEMENT-fix sub-agent (or the orchestrator inline for trivial fixes) edits `scripts/`, `astra.yaml` recipes, `requirements.txt`, and `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`, run `astra validate astra.yaml`.
+After each round's findings file lands, the implement sub-agent (or the orchestrator inline for trivial fixes) edits `scripts/`, `astra.yaml` recipes, `requirements.txt`, and `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`, run `astra validate astra.yaml`.
 
 ### Step 4: termination check
 
-- `weak` (frugal): one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
-- `strong` (rigor):
+- **Cheap:** one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
+- **Heavy:**
   - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
-  - If N hits the system cap of 5 without two consecutive clean rounds, surface to the user via `AskUserQuestion`: "implement-review reached round cap with N fixes still landing; continue, accept the current implementation, or revise the constitution?" Default on user silence: accept current implementation, log the unfinished tail in `<paper-slug>/open-questions.md`, proceed.
+  - If N hits the 5-round system cap without two consecutive clean rounds, the implement sub-agent stops and reports back to the orchestrator. If the user is reachable, ask in prose: "implement-review reached round cap with N fixes still landing; continue, accept the current implementation, or revise scope?" If the user is unreachable, accept current implementation, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed or re-spawn.
 
 The IMPLEMENT-review iterations are independent of the COMPARE → IMPLEMENT retry loop — review iterations run before RUN, on the spec/implementation alignment side; COMPARE retries run after RUN, on the result-matching side.
 
@@ -137,7 +139,7 @@ If a dataset is behind a paywall, requires registration, or is "available upon r
 
 ## Retry attempts (post-COMPARE)
 
-If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, the IMPLEMENT iteration is a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. The constitution carries the attempt budget (default 5); the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, surface to the user via `AskUserQuestion` ("verdict still failing after N attempts — continue, change scope, or accept partial?") rather than burning more cycles.
+If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, the orchestrator may re-spawn `implement` as a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. Default attempt budget is 5 (the orchestrator can override per spawn); the implement sub-agent's first move is to check whether `attempt` in the report has reached the budget. If it has, stop and report back — if the user is reachable, ask in prose ("verdict still failing after N attempts — continue, change scope, or accept partial?"); if not, accept partial, log the failure in CLAUDE.md's **Rigor** section as an open opportunity, and let the orchestrator decide.
 
 A retry attempt re-runs the IMPLEMENT-review iterations on the changed scripts before proceeding to RUN.
 
@@ -145,14 +147,15 @@ A retry attempt re-runs the IMPLEMENT-review iterations on the changed scripts b
 
 - `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement first pass
 - `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done
-- For frugal: `work/notes/implement-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ IMPLEMENT done
-- For rigor: two consecutive `work/notes/implement-review/round-<N>.md` files both have verdict `clean` ⇒ IMPLEMENT done; proceed to RUN
-- `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to REVIEW (close-out)
+- For cheap: `work/notes/implement-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ IMPLEMENT done
+- For heavy: two consecutive `work/notes/implement-review/round-<N>.md` files both have verdict `clean` ⇒ IMPLEMENT done; orchestrator proceeds to RUN
+- `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; orchestrator proceeds to REVIEW close-out
 
 ## Notes
 
 - **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
 - **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
 - **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
-- **The fresh-context discipline is the same as ARCHITECT's and SPECIFY's self-review.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Each round must spawn a brand-new sub-agent.
-- **Minimize churn in fixes.** Targeted edits, not restructures. Big restructures defeat the round-over-round comparison the orchestrator uses to decide termination.
+- **The fresh-context discipline is the same as ARCHITECT's, SPECIFY's, and LITERATURE's self-review.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Each round must spawn a brand-new Task-tool sub-agent.
+- **Minimize churn in fixes.** Targeted edits, not restructures. Big restructures defeat the round-over-round comparison the implement sub-agent uses to decide termination.
+- **Commit per output as it lands.** One commit per script + recipe wiring; one commit per review-round file; one commit per fix pass. The orchestrator reads `git log` to track progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index 9b392593..ebda5c23 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -4,14 +4,15 @@ After SPECIFY's paper pass records each citation marker as a `prior_insights:` *
 
 LITERATURE runs **after SPECIFY**, not before — relevant `prior_insights:` are defined by the decisions and findings they justify. Fetching cited papers speculatively before SPECIFY would do work for citations that may never end up needed.
 
-The constitution's per-phase mode is **always sub-agent** for this phase. Spawn one Task-tool sub-agent per cited paper for parallel resolution — they edit disjoint subsets of `astra.yaml`'s `prior_insights:` entries (only the placeholders whose `doi:` matches the sub-agent's paper). A merge step (orchestrator-inline) writes the per-paper resolutions back into `astra.yaml` after all sub-agents complete; a final fresh-context sub-agent runs the rigor-dialed self-review.
+This phase runs as the orchestrator-spawned `literature` sub-agent. Internally it fans out: one Task-tool sub-sub-agent per cited paper for parallel resolution — they edit disjoint subsets of `astra.yaml`'s `prior_insights:` entries (only the placeholders whose `doi:` matches the sub-sub-agent's paper). A merge step (the literature sub-agent itself) writes the per-paper resolutions back into `astra.yaml` after all sub-sub-agents complete; a final fresh-context Task-tool sub-agent runs the self-review at the rigor level the orchestrator picked for this spawn.
 
 ## Inputs
 
 - `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries with `claim:` + `doi:` + `decision_links:` but no `evidence:` selector. These are the placeholders LITERATURE resolves.
-- `work/notes/cited_papers.yaml` — citation marker → DOI mapping from ARCHITECT (used to discover which DOIs need fetching, complementing the per-placeholder `doi:` lookup).
-- `work/notes/architect/paper-index.md` — has the decision clusters per sub-analysis; per-paper sub-agents get it as context.
+- `work/notes/cited_papers.yaml` — citation marker → DOI → relevance mapping (built by ACQUIRE, augmented with relevance notes by ARCHITECT's paper-side Explore). Used to discover which DOIs need fetching, complementing the per-placeholder `doi:` lookup.
+- `work/notes/architect/paper-index.md` — has the decision clusters per sub-analysis; per-paper sub-sub-agents get it as context.
 - `work/reference/source/` (Path A — arXiv LaTeX) or `work/reference/document.md` (Path B — Docling) — the target paper (for context on how the cited paper is invoked).
+- CLAUDE.md — **Rigor** for this spawn's chosen rigor level.
 
 ## Outputs
 
@@ -21,12 +22,12 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 
 ## How it runs
 
-1. **Discovery.** Read `astra.yaml` and collect every `prior_insights:` entry whose `evidence:` is missing or empty. Group by `doi:`. Each group becomes a per-paper sub-agent invocation.
-2. **Per-paper resolution (parallel).** Spawn one Task-tool sub-agent per DOI group. Each sub-agent: caches the PDF via `astra paper add`, reads the cited paper, finds verbatim quote(s) supporting each placeholder claim in its group, and writes the per-placeholder `evidence:` resolutions to `work/notes/literature/<doi-slug>.yaml`. Sub-agents do not edit `astra.yaml` directly — they write their per-paper YAML and exit.
-3. **Merge.** A short orchestrator pass (or a single merge sub-agent) reads each `work/notes/literature/<doi-slug>.yaml` and writes the resolved `evidence:` entries back into `astra.yaml`'s `prior_insights[<insight_id>].evidence[]`. Single writer, no merge conflicts.
-4. **Rigor-dialed self-review.** A fresh-context sub-agent reads each `prior_insights:` entry against its cited paper and asks "does this evidence actually justify the claim it's attached to?" Iterate per the rigor dial — frugal: one pass; rigor: N rounds until two consecutive rounds find no fixes (or a 5-round system cap).
+1. **Discovery.** Read `astra.yaml` and collect every `prior_insights:` entry whose `evidence:` is missing or empty. Group by `doi:`. Each group becomes a per-paper sub-sub-agent invocation.
+2. **Per-paper resolution (parallel).** Spawn one Task-tool sub-sub-agent per DOI group. Each one: caches the PDF via `astra paper add`, reads the cited paper, finds verbatim quote(s) supporting each placeholder claim in its group, and writes the per-placeholder `evidence:` resolutions to `work/notes/literature/<doi-slug>.yaml`. Sub-sub-agents do not edit `astra.yaml` directly — they write their per-paper YAML and exit.
+3. **Merge.** The literature sub-agent itself reads each `work/notes/literature/<doi-slug>.yaml` and writes the resolved `evidence:` entries back into `astra.yaml`'s `prior_insights[<insight_id>].evidence[]`. Single writer, no merge conflicts.
+4. **Self-review (rigor chosen per spawn).** A fresh-context Task-tool sub-agent reads each `prior_insights:` entry against its cited paper and asks "does this evidence actually justify the claim it's attached to?" Iterate per the rigor level the orchestrator chose — cheap: one pass; heavy: N rounds until two consecutive rounds find no fixes (or a 5-round system cap).
 
-## Per-paper resolution sub-agent — system prompt
+## Per-paper resolution sub-sub-agent — system prompt
 
 > You are an ASTRA evidence-resolution agent. Your task is to find the verbatim quotes in a single cited paper that justify a set of `prior_insights:` placeholders authored by SPECIFY.
 >
@@ -50,7 +51,7 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 >    - Performance benchmarks or validation results relevant to the choices.
 >    - Recommendations or caveats about specific methods / parameters.
 > 3. For each supporting passage, build a `TextQuoteSelector` (`exact:` + `prefix:` + `suffix:`) and `FragmentSelector` (`page:`).
-> 4. If a placeholder's claim has no supporting evidence in the paper (the citation was loose or the claim was paraphrased beyond what the paper actually says), record it under `unresolved:` with a brief note rather than fabricating evidence. The self-review pass surfaces these to the user via `<paper-slug>/open-questions.md`.
+> 4. If a placeholder's claim has no supporting evidence in the paper (the citation was loose or the claim was paraphrased beyond what the paper actually says), record it under `unresolved:` with a brief note rather than fabricating evidence. The self-review pass surfaces these to `open-questions.md` for the user to resolve at REVIEW close-out.
 > 5. Write the per-placeholder resolutions to the specified output file.
 >
 > ### Caching the source PDF
@@ -119,28 +120,28 @@ The constitution's per-phase mode is **always sub-agent** for this phase. Spawn
 
 ## Merge step
 
-After all per-paper sub-agents complete, the orchestrator (or a single merge sub-agent) reads each `work/notes/literature/<doi-slug>.yaml` and writes the resolutions back into `astra.yaml`:
+After all per-paper sub-sub-agents complete, the literature sub-agent reads each `work/notes/literature/<doi-slug>.yaml` and writes the resolutions back into `astra.yaml`:
 
 - For each entry in `resolutions:`, locate `prior_insights[<insight_id>]` in `astra.yaml` (sub-analysis ownership is implicit in the id; the placeholder already lives there) and set its `evidence:` field to the resolved selectors.
-- For each entry in `unresolved:`, append a line to `<paper-slug>/open-questions.md` describing the unresolved placeholder and the reason — the user resolves at REVIEW (close-out) by either supplying a different citation, weakening the placeholder's `claim:`, or removing the placeholder entirely.
+- For each entry in `unresolved:`, append a line to `open-questions.md` describing the unresolved placeholder and the reason — the user resolves at REVIEW close-out by either supplying a different citation, weakening the placeholder's `claim:`, or removing the placeholder entirely.
 - Re-run `astra validate astra.yaml` after each per-paper merge to catch any structural breakage early.
 
 A single writer (the merge step) avoids YAML round-trip conflicts that parallel writes would produce.
 
-## Rigor-dialed self-review
+## Self-review (rigor chosen per spawn)
 
-After the merge lands, a fresh-context sub-agent cross-checks each resolved `prior_insights:` entry against its cited paper:
+After the merge lands, a fresh-context Task-tool sub-agent cross-checks each resolved `prior_insights:` entry against its cited paper:
 
 - Does the `evidence:` quote belong to the cited paper at the cited page? (`astra validate --verify-evidence` does the deterministic check; the sub-agent does the semantic check.)
 - Does the quote actually justify the placeholder's `claim:`? Or is the quote technically present but tangential?
 - Does the placeholder's `claim:` actually support the decision option it's linked to via `decision_links:`?
 
-The depth of self-review is set by the constitution's frugality / rigor dial:
+The depth of self-review follows the rigor level the orchestrator picked for this spawn (read CLAUDE.md's **Rigor** section):
 
-- **Frugal:** skip review entirely, or run a single fresh-context sub-agent pass and incorporate its fixes once.
-- **Rigor:** N rounds — each round runs a fresh reviewer against the resolved `prior_insights:` + the cited papers + the target paper; LITERATURE incorporates fixes (re-spawn the per-paper sub-agent for entries that need a different quote, or adjust unresolved entries); the next round runs another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes (the strong-termination criterion the loop already uses), or a 5-round system cap.
+- **Cheap:** skip review entirely, or run a single fresh-context Task-tool sub-agent pass and incorporate its fixes once.
+- **Heavy:** N rounds — each round spawns a fresh Task-tool reviewer against the resolved `prior_insights:` + the cited papers + the target paper; the literature sub-agent incorporates fixes (re-spawn the per-paper sub-sub-agent for entries that need a different quote, or adjust unresolved entries); the next round spawns another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap.
 
-The discipline matches ARCHITECT's and SPECIFY's self-review shape: each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; a separate fix pass (the orchestrator inline for trivial fixes, or another LITERATURE iteration for substantive changes) edits `astra.yaml`.
+The discipline matches ARCHITECT's and SPECIFY's self-review shape: each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; the literature sub-agent edits `astra.yaml` between rounds for trivial mechanical fixes, or re-spawns the relevant per-paper sub-sub-agent for substantive changes.
 
 ### Per-round fresh sub-agent — system prompt
 
@@ -151,7 +152,7 @@ The discipline matches ARCHITECT's and SPECIFY's self-review shape: each round r
 > - `astra.yaml` — focus on every `analyses.<sub-analysis-id>.prior_insights:` entry. Each should have a resolved `evidence:` block.
 > - The cited papers (cached PDFs).
 > - `work/notes/cited_papers.yaml` — DOI lookups.
-> - `<paper-slug>/open-questions.md` — to see which placeholders the resolution sub-agents flagged unresolved.
+> - `open-questions.md` — to see which placeholders the resolution sub-sub-agents flagged unresolved.
 > - `work/reference/source/` (or `document.md`) — the target paper, for context on how the cited paper is invoked.
 >
 > ### What to check
@@ -160,7 +161,7 @@ The discipline matches ARCHITECT's and SPECIFY's self-review shape: each round r
 > 2. **Evidence justifies claim.** For each `prior_insights:` entry, does the quote actually support the `claim:`? Or is it tangential / weaker than the claim asserts?
 > 3. **Claim supports the decision.** For each placeholder's `decision_links:`, does the placeholder's claim actually justify the linked decision option(s)? Or is the link a leap?
 > 4. **Cited paper is the right paper.** Does the target paper actually invoke this DOI for this claim? (Sometimes a citation marker is misread; the wrong paper gets cached.)
-> 5. **Unresolved entries are honest.** For entries in `<paper-slug>/open-questions.md` flagged unresolved, does a closer read of the cited paper actually find supporting evidence? (If yes, the resolution sub-agent missed it; flag for re-resolution.)
+> 5. **Unresolved entries are honest.** For entries in `open-questions.md` flagged unresolved, does a closer read of the cited paper actually find supporting evidence? (If yes, the resolution sub-sub-agent missed it; flag for re-resolution.)
 >
 > ### Output
 >
@@ -191,9 +192,9 @@ The discipline matches ARCHITECT's and SPECIFY's self-review shape: each round r
 
 ### LITERATURE-fix pass between rounds
 
-After each round's findings file lands, a LITERATURE-fix pass (or the orchestrator inline for trivial mechanical fixes) responds to the findings — re-resolving placeholders with different quotes, adjusting claims, re-linking decisions, or surfacing unresolvable entries to `<paper-slug>/open-questions.md`. After any change to `astra.yaml`, re-run `astra validate astra.yaml --verify-evidence` to confirm the structural and quote-fidelity checks still pass.
+After each round's findings file lands, the literature sub-agent responds to the findings — re-resolving placeholders with different quotes, adjusting claims, re-linking decisions, or surfacing unresolvable entries to `open-questions.md`. After any change to `astra.yaml`, re-run `astra validate astra.yaml --verify-evidence` to confirm the structural and quote-fidelity checks still pass.
 
-If N hits the system cap of 5 rounds without two consecutive clean rounds, surface to the user via `AskUserQuestion`: "LITERATURE review reached round cap with N fixes still landing; continue, accept the current resolutions, or revise the constitution?" Default on user silence: accept current state, log the unfinished tail in `<paper-slug>/open-questions.md`, and proceed to IMPLEMENT.
+If N hits the 5-round system cap without two consecutive clean rounds, the literature sub-agent stops and reports back to the orchestrator. If the user is reachable, ask in prose: "LITERATURE review reached round cap with N fixes still landing; continue, accept the current resolutions, or revise scope?" If the user is unreachable, accept current state, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed or re-spawn.
 
 ## Survey signals (entry into LITERATURE)
 
@@ -201,14 +202,15 @@ If N hits the system cap of 5 rounds without two consecutive clean rounds, surfa
 - `work/notes/literature/<doi-slug>.yaml` files exist (one per cited DOI) ⇒ per-paper resolution done
 - `astra.yaml`'s `prior_insights:` entries each have a resolved `evidence:` selector ⇒ merge done
 - `astra validate astra.yaml --verify-evidence` returns clean ⇒ structural validation done
-- For frugal: at least a `work/notes/literature-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ LITERATURE review done
-- For rigor: two consecutive `round-<N>.md` files with verdict `clean` ⇒ LITERATURE review done
+- For cheap: at least a `work/notes/literature-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ LITERATURE review done
+- For heavy: two consecutive `round-<N>.md` files with verdict `clean` ⇒ LITERATURE review done
 
-When all of the above hold ⇒ LITERATURE complete; proceed to IMPLEMENT.
+When all of the above hold ⇒ LITERATURE complete; orchestrator proceeds to IMPLEMENT.
 
 ## Notes
 
-- **Run per-paper resolutions in parallel.** One sub-agent per cited DOI; they edit disjoint subsets of `prior_insights:` so write conflicts don't arise — but the merge step still serializes the writes back to `astra.yaml` to keep YAML round-trip safe.
+- **Run per-paper resolutions in parallel.** One Task-tool sub-sub-agent per cited DOI; they edit disjoint subsets of `prior_insights:` so write conflicts don't arise — but the merge step still serializes the writes back to `astra.yaml` to keep YAML round-trip safe.
 - **Resume is automatic.** If `work/notes/literature/<doi-slug>.yaml` already exists, skip the per-paper resolution for that DOI. The merge re-runs whenever new per-paper files appear.
-- **Unresolved is not failure.** A placeholder that no quote in the cited paper supports is a real signal — the target paper cited loosely, or paraphrased beyond what the source actually says. Surface to `<paper-slug>/open-questions.md`; don't fabricate evidence to make it green.
-- **`astra validate --verify-evidence` runs after the merge, not after each per-paper sub-agent.** Sub-agents write to per-paper YAMLs; the deterministic check happens once `astra.yaml` is updated.
+- **Unresolved is not failure.** A placeholder that no quote in the cited paper supports is a real signal — the target paper cited loosely, or paraphrased beyond what the source actually says. Surface to `open-questions.md`; don't fabricate evidence to make it green.
+- **`astra validate --verify-evidence` runs after the merge, not after each per-paper sub-sub-agent.** Sub-sub-agents write to per-paper YAMLs; the deterministic check happens once `astra.yaml` is updated.
+- **Commit each per-paper resolution as it lands.** Plus the merge as one commit, plus each review-round file as it lands. The orchestrator reads `git log` to see how far the literature sub-agent got.
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 392fe03d..878a5ab1 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -1,24 +1,24 @@
 # SPECIFY — fill the stub `astra.yaml`, two passes per sub-analysis
 
-Read the stub `astra.yaml` from ARCHITECT and fill in `decisions:`, `prior_insights:`, `findings:` per sub-analysis, weaving the existing narrative with `astra-anchor:` references as entries land. SPECIFY is the **first user-ratification seam** — material paper-vs-code conflicts surface here; the default mode is interactive so the user can ratify.
+Read the stub `astra.yaml` from ARCHITECT and fill in `decisions:`, `prior_insights:`, `findings:` per sub-analysis, weaving the existing narrative with `astra-anchor:` references as entries land. SPECIFY is the **first user-ratification seam** — material paper-vs-code conflicts surface here, and they're often the highest-value moments for the user to weigh in on directly.
 
-This phase replaces the old SPECIFY's monolithic shape. The new structure runs **two passes per sub-analysis** (paper, then code, when code exists), then a rigor-dialed self-review pass. The two passes are the cross-check: the paper pass authors what the paper says; the code pass surfaces where the code says something different; the difference is gold (it's where the reproduction has to make a decision).
+This phase runs as the orchestrator-spawned `specify` sub-agent. When the orchestrator launches it, it announces to the user: *"specify is the natural seam for paper-vs-code conflicts — drop into its chat if you want to ratify them as they come up; otherwise it'll take code as canonical and log disagreements to CLAUDE.md."* If the user is reachable in the sub-agent's chat, SPECIFY asks paper-vs-code conflicts in prose. If not, it takes the canonical-resolution default (code wins where paper and code disagree on a material choice) and logs the disagreement to CLAUDE.md's **Paper-vs-code disagreements** section plus `open-questions.md` for REVIEW close-out.
 
-The constitution's per-phase mode defaults to **interactive** for this phase, but the user can flip it. When SPECIFY runs as a sub-agent, it falls back to the canonical-resolution rule (code wins where paper and code disagree on a material choice) and surfaces unresolved conflicts to `<paper-slug>/open-questions.md`.
+The new structure runs **two passes per sub-analysis** (paper, then code, when code exists), then a self-review pass whose depth follows the rigor level the orchestrator picked for this spawn. The two passes are the cross-check: the paper pass authors what the paper says; the code pass surfaces where the code says something different; the difference is gold (it's where the reproduction has to make a decision).
 
-Per-sub-analysis work is parallelizable when sub-analyses are independent. Each sub-analysis's two passes (paper, then code) run sequentially within that sub-analysis; across sub-analyses the work fans out.
+Per-sub-analysis work is parallelizable when sub-analyses are independent. Each sub-analysis's two passes (paper, then code) run sequentially within that sub-analysis; across sub-analyses the work fans out via Task-tool sub-sub-agents from inside the specify session.
 
 ## Inputs
 
 - `astra.yaml` — the stub from ARCHITECT (sub-analyses, inputs, outputs, narrative; empty `decisions:` / `prior_insights:` / `findings:` blocks)
 - `work/notes/architect/paper-index.md` — paper-side decision clusters, result loci, citations
 - `work/notes/architect/code-index.md` (when code present) — module map, natural decomposition, entry-points, gotchas
-- `work/notes/cited_papers.yaml` — citation marker → DOI mapping (from ARCHITECT); SPECIFY uses it to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch
+- `work/notes/cited_papers.yaml` — citation marker → DOI mapping (from ACQUIRE, with `relevance:` notes added by ARCHITECT's paper-side Explore); SPECIFY uses it to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
 - `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (Path B only)
 - `work/reference/code/` (if present) — original code, canonical reference for numerics + method
-- The per-paper constitution — its **Desired State** + the per-phase mode + the rigor / frugality dial
-- `work/notes/notes.md` — user-supplied context (read by every phase if present)
+- CLAUDE.md — **Goal** for scope, **Rigor** for the rigor level the orchestrator chose for this spawn, **Paper-vs-code disagreements** for prior-spawn entries
+- `work/notes/notes.md` — user-supplied context (read by every sub-agent if present)
 
 ## Outputs
 
@@ -26,7 +26,8 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 - `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
 - `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
 - `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
-- `work/notes/specify-review/<sub-analysis>-round-<N>.md` — each rigor-dialed review round's findings (rigor only; one file per round per sub-analysis)
+- CLAUDE.md updates — append entries to **Paper-vs-code disagreements** for each material conflict surfaced; update **Rigor** with the post-spawn state of `astra.yaml` per sub-analysis (e.g. *baseline* after a cheap pass, *tightened* after heavy review)
+- `work/notes/specify-review/<sub-analysis>-round-<N>.md` — each review round's findings (one file per round per sub-analysis; how many rounds depends on the rigor level)
 
 ## Substrate skills to invoke
 
@@ -80,9 +81,9 @@ Read the code that implements this sub-analysis (`work/notes/architect/code-inde
    - **Material** = a different choice would plausibly change a numeric result the paper reports.
    - **Stylistic / cosmetic / pure-tooling** = not material; record in `implementation-notes.md` and move on.
 
-   For **material** disagreements, behavior depends on whether SPECIFY is interactive:
-   - **Interactive SPECIFY** (default): pause and surface via `AskUserQuestion`. Present the paper's stated method (with quote + section), the code's actual method (with `path:line`), the plausible impact ("changes the BAO peak amplitude by ~5%"), and three options: paper, code, *something else* (custom, with the user's choice spelled out). **Default on user silence is code when `work/reference/code/` exists, otherwise paper.**
-   - **Sub-agent SPECIFY** (rare; the constitution lists this only when the user explicitly chose it): take **code as canonical** per the canonical-resolution rule, append the conflict to `<paper-slug>/open-questions.md` so the user sees it at the next session boundary, and let `universes/baseline.yaml` select the code's method. The user can flip the baseline at REVIEW (close-out).
+   For **material** disagreements, behavior depends on whether the user is reachable:
+   - **User reachable** (in the specify sub-agent's chat or in the orchestrator session): ask in prose — present the paper's stated method (with quote + section), the code's actual method (with `path:line`), the plausible impact ("changes the BAO peak amplitude by ~5%"), and offer three paths: paper, code, or something custom. The user can also defer ("just take code, I'll look in REVIEW"). **Default on user silence is code when `work/reference/code/` exists, otherwise paper.**
+   - **User unreachable** (sub-agent surface dismissed and orchestrator hasn't relayed): take **code as canonical** per the canonical-resolution rule, append the conflict to CLAUDE.md's **Paper-vs-code disagreements** section AND to `open-questions.md` so the user sees it at the next session boundary, and let `universes/baseline.yaml` select the code's method. The user can flip the baseline at REVIEW close-out.
 
    Either way, the override is preserved in `astra.yaml` as a `decisions:` entry with both options preserved, plus the `universes/baseline.yaml` selecting whichever option won. A `findings:` entry (or an insight if the conflict matters for replication discipline broadly) records the conflict with quote + line evidence.
 
@@ -90,16 +91,16 @@ Read the code that implements this sub-analysis (`work/notes/architect/code-inde
 
 3. **Decision-option augmentation.** Where the code reveals an option the paper didn't mention but is defensible (a sibling implementation alternative used in the codebase or referenced in a comment), add it as a sibling option to the relevant `decisions:` entry. Do not pre-emptively author every code variant; only the ones that bear on a real choice.
 
-4. **Surface paper-vs-code material disagreements** to `<paper-slug>/open-questions.md` (sub-agent) or via `AskUserQuestion` (interactive) per the canonical-resolution rule above. The verbatim paper quote + the `path:line` code anchor + the plausible-impact one-liner should both make it into the open-questions entry so the user sees enough to decide at REVIEW (close-out).
+4. **Surface paper-vs-code material disagreements** in prose (when user reachable) or to CLAUDE.md's **Paper-vs-code disagreements** section + `open-questions.md` (when user unreachable). The verbatim paper quote + the `path:line` code anchor + the plausible-impact one-liner should make it into both surfaces so the user sees enough to decide at REVIEW close-out.
 
-### Pass C — rigor-dialed self-review
+### Pass C — self-review (rigor chosen per spawn)
 
-After the paper + code passes land for a sub-analysis, a fresh-context sub-agent cross-checks: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
+After the paper + code passes land for a sub-analysis, a fresh-context Task-tool sub-agent cross-checks: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
 
-Self-review depth follows the constitution's frugality / rigor dial — same shape as ARCHITECT's review pass and IMPLEMENT's:
+Self-review depth follows the rigor level the orchestrator picked for this spawn (read CLAUDE.md's **Rigor** section). Same shape as ARCHITECT's review pass and IMPLEMENT's:
 
-- **Frugal:** skip self-review, or run a single fresh sub-agent pass and incorporate its fixes once.
-- **Rigor:** N rounds — each round runs a fresh reviewer; fixes are incorporated; the next round runs another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes (the strong-termination criterion the loop already uses), or a 5-round system cap. Each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check.
+- **Cheap:** skip self-review, or run a single fresh Task-tool sub-agent pass and incorporate its fixes once.
+- **Heavy:** N rounds — each round spawns a fresh Task-tool reviewer; fixes are incorporated; the next round spawns another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap. Each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check.
 
 #### Per-round fresh sub-agent — system prompt
 
@@ -169,12 +170,12 @@ astra validate astra.yaml --verify-evidence  # after LITERATURE has resolved the
 
 #### Termination
 
-- `weak` (frugal): one pass per sub-analysis. Done after fixes (or immediately, if `fixes_needed` was 0).
-- `strong` (rigor):
+- **Cheap:** one pass per sub-analysis. Done after fixes (or immediately, if `fixes_needed` was 0).
+- **Heavy:**
   - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
   - If round N is the first round (N=1), spawn round 2 unconditionally so we can compare.
   - If round N produced fixes, spawn round (N+1) as a fresh sub-agent that does not see round N's findings or the fixes.
-  - If N hits the system cap of 5 rounds without two consecutive clean rounds, surface to the user via `AskUserQuestion`: "SPECIFY review for <sub-analysis-id> reached round cap with N fixes still landing; continue, accept the current spec, or revise the constitution?" Default on user silence: accept the current sub-analysis spec, log the unfinished tail in `<paper-slug>/open-questions.md`, and proceed.
+  - If N hits the 5-round system cap without two consecutive clean rounds, the specify sub-agent stops on this sub-analysis and reports back to the orchestrator. If the user is reachable, ask in prose: "SPECIFY review for <sub-analysis-id> reached round cap with N fixes still landing; continue, accept the current spec, or revise scope?" If the user is unreachable, accept the current sub-analysis spec, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed or re-spawn.
 
 When all sub-analyses' reviews terminate, SPECIFY produces the final outputs:
 
@@ -207,8 +208,8 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative; empty decisions / prior_insights / findings) ⇒ ready to specify
 - For each sub-analysis: `decisions:` and `findings:` populated with paper-anchored `evidence:` selectors AND `prior_insights:` populated as citation-only placeholders (id, claim, doi, decision_links — no `evidence:` selector yet, LITERATURE fills those next) ⇒ paper pass done
 - For each sub-analysis: when `work/reference/code/` exists, code-pass material-disagreement entries land in `decisions:` (with both options) and `universes/baseline.yaml` selects the canonical-resolution choice; `implementation-notes.md` carries non-material gotchas ⇒ code pass done
-- For frugal: each sub-analysis has at least a `work/notes/specify-review/<sub>-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ SPECIFY review done
-- For rigor: each sub-analysis has two consecutive `<sub>-round-<N>.md` files with verdict `clean` ⇒ SPECIFY review done
+- For cheap: each sub-analysis has at least a `work/notes/specify-review/<sub>-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ SPECIFY review done
+- For heavy: each sub-analysis has two consecutive `<sub>-round-<N>.md` files with verdict `clean` ⇒ SPECIFY review done
 - `astra validate astra.yaml` returns clean (placeholders without `evidence:` are valid at this stage) ⇒ structural side validated; `--verify-evidence` waits until LITERATURE has authored the resolved `evidence:` selectors
 - `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-ledger done
 - `implementation-notes.md` exists ⇒ practical-guidance side done
@@ -216,8 +217,9 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 
 ## Notes
 
-- **Material conflicts that the user explicitly defers** are appended to `<paper-slug>/open-questions.md` (the running report read at session boundaries). The next iteration sees them and either re-surfaces them or notes their continued deferral; the user resolves at REVIEW (close-out).
+- **Material conflicts that the user explicitly defers** are appended to CLAUDE.md's **Paper-vs-code disagreements** section AND `open-questions.md`. CLAUDE.md is the at-a-glance summary every future sub-agent or orchestrator session sees; `open-questions.md` is the autonomous-mode resolution accumulator. Both lead to the same place: the user resolves at REVIEW close-out.
 - **The narrative skill is the prose author, not the structure author.** SPECIFY's job is content correctness; `/narrative` invocation comes during the paper pass when authoring or extending the narrative prose to weave in anchor references.
 - **The target ledger is a derivation, not a separate phase's output.** Treat `targets/targets.md` as a small index produced alongside the filled `astra.yaml`, not a heavyweight artifact. The depth lives in `astra.yaml`'s `outputs:` / `findings:` / `decisions:`.
 - **Two-pass discipline is the cross-check.** Skipping the code pass (when code exists) loses the canonical-resolution surface and lets paper-vs-code material disagreements slip through. The fresh-context self-review can recover *some* of these but not all — the disciplined sequence (paper → code → self-review) catches more.
-- **Per-sub-analysis parallelism is opt-in.** When sub-analyses are independent (no shared decision blocks, no cross-sub-analysis findings), spawn one Task-tool sub-agent per sub-analysis to run its passes in parallel. When they share material decisions or findings (rare), serialize.
+- **Per-sub-analysis parallelism is opt-in.** When sub-analyses are independent (no shared decision blocks, no cross-sub-analysis findings), spawn one Task-tool sub-sub-agent per sub-analysis from inside specify's session to run its passes in parallel. When they share material decisions or findings (rare), serialize.
+- **Commit per sub-analysis as it lands.** Each sub-analysis's filled-in `astra.yaml` slice + its targets/implementation-notes/baseline updates earn one commit; review-round files commit one per round. The orchestrator reads `git log` to track progress; small commits keep the trail readable.

From 0f549bc8ab5e46da9f3482f75ad9bb32c52565d5 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sun, 10 May 2026 18:28:20 +0200
Subject: [PATCH 034/124] Realign run/compare/review references; add
 opportunity assessment to COMPARE

RUN and REVIEW pick up the orchestrator + sub-agent vocabulary; constitution
references go away. REVIEW's close-out drops the constitution outcome
finalization and replaces it with a CLAUDE.md Rigor-section opportunity
propagation step plus a marking commit.

COMPARE gains the opportunity assessment described in SKILL.md: a structured
opportunities: block surfaces high-leverage gaps even on `pass`, so the
orchestrator and user can decide whether to spend another IMPLEMENT round
or land at the current rigor and log opportunities to CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../lc-from-paper/references/compare.md       | 42 ++++++----
 .../skills/lc-from-paper/references/review.md | 79 +++++++++----------
 .../skills/lc-from-paper/references/run.md    | 11 +--
 3 files changed, 73 insertions(+), 59 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index c2fdf2bb..dd55e4cd 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -1,8 +1,8 @@
-# COMPARE — judge whether the reproduction matches
+# COMPARE — judge the match, name the opportunities
 
-Compare reproduced results against the paper's replication targets. Produce a structured verdict the IMPLEMENT-retry loop consumes.
+Compare reproduced results against the paper's replication targets. COMPARE returns two things: a **verdict** (pass / partial / fail) and an **opportunity assessment** — where the gaps are and how much they likely matter. The verdict drives whether the orchestrator re-spawns IMPLEMENT for another retry attempt; the opportunity assessment tells the orchestrator (and the user) which gaps would be high-leverage to close, even on `pass`. Together they replace the old yes/no framing.
 
-The constitution's per-phase mode is **user choice** for this phase — defaults to interactive for verdict ratification (was the reproduction close enough?), but a user who set the loop up to drive itself to terminal verdict can flip it to sub-agent. When sub-agent, COMPARE writes the report and the loop continues per the report's verdict; REVIEW (close-out) ratifies the final verdict at close-out.
+This phase runs as the orchestrator-spawned `compare` sub-agent. The orchestrator and the user together decide what to do with COMPARE's output — spend another IMPLEMENT round now (close a high-leverage gap), accept the current verdict and proceed to REVIEW, or land at the current rigor level and log the gap as an open opportunity in CLAUDE.md's **Rigor** section. The user can drop into the compare sub-agent's chat for the verdict ratification conversation, or wait until REVIEW close-out.
 
 ## Inputs
 
@@ -57,6 +57,11 @@ outputs:
 failure_diagnosis: null|"<root cause>"
 fix_suggestions:
   - "<specific actionable suggestion with script and line number>"
+opportunities:
+  - area: "<which output / sub-analysis / decision>"
+    gap: "<what could be tightened — even if the target matched>"
+    leverage: "<rough sense of impact: 'changes headline number by ~10%' / 'cosmetic only' / 'unknown'>"
+    fix_pointer: "<where the fix would land — script:line, decision id, or implementation-notes section>"
 ```
 
 ## Verdict rules
@@ -67,27 +72,36 @@ fix_suggestions:
 
 If verdict is not `pass`, **`fix_suggestions` MUST reference specific scripts and line numbers**. "The result is wrong" is not actionable; "scripts/bao_fit.py:42 uses `damping_prior=flat`, paper specifies Gaussian; change to gaussian per Howlett+2017 §4.2" is.
 
-Also write `comparison-report.md` with a human-readable summary. For figure / table comparisons, describe what you see in both and explain your match judgment.
+## Opportunity assessment rules
 
-## Verdict ratification (interactive COMPARE)
+The `opportunities:` block surfaces **gaps that didn't necessarily fail the verdict but would be high-leverage to close**. Examples worth flagging:
 
-When COMPARE runs interactively, surface the verdict to the user via `AskUserQuestion` after writing the report:
+- A primary-target match was within tolerance but the underlying method is a sketch (e.g. simplified noise model that happens to land in the right range — tightening it would change the headline by O(10%)).
+- A secondary target failed but is plausibly fixable from the same root cause as a primary that passed (one fix, two outputs).
+- A decision SPECIFY recorded with code-as-canonical that has an unresolved disagreement still in `open-questions.md` and could move the result.
+- A sub-analysis whose evidence quotes are paraphrased rather than verbatim (would fail `--verify-evidence` if pushed harder).
 
-- **If `pass`**: confirm before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Proceed to close-out?"* The user accepts → REVIEW (close-out) runs interactively (renders `/figure-comparison`, walks the open-questions ledger, lands resolutions, finalizes the constitution outcome); the user rejects → name what's still off and re-enter the loop.
-- **If `partial`**: show the user the failing targets and the diagnosis. *"Partial match. <N> outputs failing: <list>. Continue retrying or accept partial?"* If the attempt budget (from the constitution) is reached, this surfacing is mandatory.
-- **If `fail`**: same shape, but the loop's continuation should be questioned more sharply. A fundamental methodological issue may need a constitution amendment, not another implement retry.
+Each opportunity gets a leverage one-liner so the orchestrator and user can decide where to spend attention. Empty `opportunities:` is a strong signal — say "the reproduction is at canonical rigor across the targets" rather than padding.
 
-When COMPARE runs as a sub-agent, no `AskUserQuestion` — the report is the output. The loop reads the verdict and either retries (if budget remains and verdict is partial/fail) or proceeds to REVIEW (close-out), where the user ratifies the final verdict during close-out.
+Also write `comparison-report.md` with a human-readable summary. For figure / table comparisons, describe what you see in both and explain your match judgment. Include the opportunity assessment as its own section.
 
-The verdict is the agent's judgment; the **decision to keep iterating** is the user's, surfaced either at this seam (interactive COMPARE) or at REVIEW (close-out)'s close-out (sub-agent COMPARE). Default on user silence: continue the loop until the attempt budget is exhausted, then mandatory user surfacing.
+## Verdict + opportunity surfacing
+
+After writing the report, the compare sub-agent reports back to the orchestrator with the verdict, the failing-output count (if any), and the headline opportunities. The orchestrator either:
+
+- **Carries the report to the user** (if the user is reachable in the orchestrator session or the compare sub-agent's chat) for ratification: present verdict, the failing outputs (if `partial` / `fail`), and the top opportunities; ask whether to spend another IMPLEMENT round on a high-leverage gap, accept and proceed to REVIEW, or land at this rigor level and log the gaps as open opportunities in CLAUDE.md.
+- **Acts on standing rigor settings** (if the user is unreachable): if attempt < budget AND verdict is `partial` / `fail`, re-spawn `implement` for a retry; if verdict is `pass` OR attempt >= budget, log opportunities in CLAUDE.md's **Rigor** section as open opportunities and proceed to REVIEW.
+
+The verdict is the compare sub-agent's judgment; the **decision to keep iterating or move on** is the orchestrator's (in dialogue with the user). The opportunity assessment is the bridge — it turns a binary verdict into a graded picture the user can navigate.
 
 ## Survey signals (entry into COMPARE)
 
 - All outputs in `lc status --universe baseline` are `ok` ⇒ ready to compare
 - `comparison-report.yaml` exists with current `attempt` ⇒ COMPARE done for this attempt
-- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to REVIEW (close-out) (interactive close-out)
+- `comparison-report.yaml` verdict is `pass` (or `partial` accepted) ⇒ COMPARE → IMPLEMENT loop terminated; orchestrator proceeds to REVIEW close-out
 
 ## Notes
 
-- **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-<N>.yaml` if useful, or commit each between iterations so git carries the history.
-- **The verdict is the agent's; the keep-iterating decision is the user's.** Treat them as separate.
+- **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-<N>.yaml` if useful, or commit each between attempts so `git log` carries the history.
+- **The verdict is the compare sub-agent's; the keep-iterating decision is the orchestrator's** (in dialogue with the user, when reachable). Treat them as separate.
+- **The opportunity assessment is part of the durable record.** When the user accepts the current verdict, propagate the un-acted-on opportunities into CLAUDE.md's **Rigor** section's *Open opportunities* list. Future sessions and future-Cail returning to this reproduction see them; tightening any becomes a re-spawn of IMPLEMENT against a clearer target.
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index bd709be1..34302997 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -1,109 +1,108 @@
-# REVIEW — interactive close-out
+# REVIEW — orchestrator-session close-out
 
-The reproduction has converged (verdict `pass` or user-accepted `partial`). Control returns to the user. REVIEW is the second always-interactive bookend (INTERVIEW being the first); it runs in the main loop session, not as a sub-agent, so it can use `AskUserQuestion` and invoke sibling skills that need user reach. Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and finalize the constitution outcome — in one interactive arc.
+The reproduction has converged (verdict `pass` or user-accepted `partial`). Control returns to the user. REVIEW is the second of two bookends that run in the orchestrator session itself, not as a named sub-agent (INTERVIEW being the first). It runs orchestrator-side because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which sub-agents don't have.
 
-The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding into ARCHITECT, SPECIFY, and IMPLEMENT as their rigor-dialed self-review passes. This close-out is what the previous shape called SUMMARIZE_RUN.
+Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and propagate any un-acted-on opportunities from the latest COMPARE into CLAUDE.md's **Rigor** section — in one interactive arc.
 
-The constitution's per-phase mode is **always interactive** for this phase. It does not run as a sub-agent. There is no "silent close-out" path; the close-out is the human's review.
+The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding into ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT as their per-spawn self-review passes. This close-out is what the previous shape called SUMMARIZE_RUN.
 
 ## Inputs
 
 - `astra.yaml` — final spec (validates with `--verify-evidence` once LITERATURE has resolved every `prior_insights:` placeholder's `evidence:` selector)
-- `comparison-report.yaml`, `comparison-report.md` — final verdict
+- `comparison-report.yaml`, `comparison-report.md` — final verdict + opportunity assessment
 - `targets/targets.md` — what was being matched against; reference figures / tables in `targets/`
 - `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
-- `<paper-slug>/open-questions.md` — running report from sub-agent phases (paper-vs-code conflicts, ambiguities, anything sub-agents flagged for user resolution)
+- `open-questions.md` at the workdir root — running report from sub-agent phases (paper-vs-code conflicts, ambiguities, anything sub-agents flagged for user resolution)
 - `work/notes/architect/paper-index.md` and `work/notes/architect/code-index.md` — for context
-- The constitution at the project root — its `outcome:` field needs the final write
-- `<paper-slug>/CLAUDE.md` — paper identity, code location
+- `CLAUDE.md` at the workdir root — paper identity, Goal, Rigor, Paper-vs-code disagreements (the at-a-glance summary that's accumulated across all sub-agent spawns)
 
 ## Outputs
 
 - `.lightcone/comparison.html` — `/figure-comparison`'s portable side-by-side report (paper artifacts vs reproduced)
 - (Optional) `.lightcone/check-sentence-by-sentence.md` — `/check-sentence-by-sentence`'s claim audit (file:line or NOT FOUND per sentence)
-- `<paper-slug>/open-questions.md` — same file, but with `## Resolutions` section appended capturing what the user said for each entry
+- `open-questions.md` — same file, but with `## Resolutions` section appended capturing what the user said for each entry
 - Edits to `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` if any open-question resolution warrants a spec change
-- `REPRODUCTION-SUMMARY.md` — final report; concise (~1–2 pages)
-- Constitution `outcome:` rewritten to its final form
+- `REPRODUCTION-SUMMARY.md` — final report; concise (~1–2 pages); the canonical record of what the reproduction landed on
+- CLAUDE.md updates — *Open opportunities* list under **Rigor** propagated from COMPARE's un-acted-on opportunities; **Paper-vs-code disagreements** entries reconciled with their resolutions
 - A commit closing out the reproduction
 
 ## Step 1: render the validation surfaces
 
 ### `/figure-comparison` (mandatory)
 
-Invoke the `/figure-comparison` skill from this session. It builds a portable HTML side-by-side comparing paper artifacts (from `targets/`) to reproduced artifacts (from `results/<universe>/`). The skill uses `AskUserQuestion` for any inputs it can't infer from the workdir; that works because REVIEW is interactive — the prompts land in this session.
+Invoke the `/figure-comparison` skill from the orchestrator session. It builds a portable HTML side-by-side comparing paper artifacts (from `targets/`) to reproduced artifacts (from `results/<universe>/`). The skill uses `AskUserQuestion` for any inputs it can't infer from the workdir; that works because REVIEW runs orchestrator-side — the prompts land in this session, not in a sub-agent's chat.
 
 Output lands at `.lightcone/comparison.html`. Show the user the path and offer to open it (`open` on macOS, `xdg-open` on Linux, or just print the path so they click in their terminal).
 
-**Do not spawn `/figure-comparison` under the `Task` tool.** It has `AskUserQuestion` in its `allowed-tools`; a Task-tool sub-agent has no user-reach, so the prompt fires into nothing.
+**Do not spawn `/figure-comparison` under the `Task` tool or as a named sub-agent.** It has `AskUserQuestion` in its `allowed-tools`; sub-agents have no user-reach, so the prompt fires into nothing.
 
 ### `/check-sentence-by-sentence` (opt-in)
 
 Ask the user via `AskUserQuestion` whether they want the claim audit. It's optional because for many reproductions the figure-comparison already settles "did it match?"; the sentence-by-sentence audit earns its keep when the paper makes many specific quantitative claims and the user wants each one anchored to a code location.
 
-If yes, invoke `/check-sentence-by-sentence`. Same discipline as `/figure-comparison` — it can prompt the user; do not spawn under `Task`.
+If yes, invoke `/check-sentence-by-sentence`. Same discipline as `/figure-comparison` — it can prompt the user; do not spawn under `Task` or as a named sub-agent.
 
 Output lands at `.lightcone/check-sentence-by-sentence.md` (or wherever the skill writes it). Show the user the path.
 
-## Step 2: walk `<paper-slug>/open-questions.md` with the user
+## Step 2: walk `open-questions.md` with the user
 
-Read `<paper-slug>/open-questions.md`. For each unresolved entry, surface it via `AskUserQuestion` with:
+Read `open-questions.md` at the workdir root. For each unresolved entry, surface it via `AskUserQuestion` with:
 
 - **The question** (verbatim from the file)
-- **Origin** — which phase / sub-agent flagged it
-- **The default the loop applied** (if any — e.g. "code as canonical")
+- **Origin** — which sub-agent flagged it
+- **The default the sub-agent applied** (if any — e.g. "code as canonical")
 - **Three options**: ratify the default, override (user spells out their choice), or defer (leave as a known limitation in the final report)
 
-Append a `## Resolutions` section to `<paper-slug>/open-questions.md` capturing what the user said for each entry. This makes the resolution durable — re-runs and future sessions see it.
+Append a `## Resolutions` section to `open-questions.md` capturing what the user said for each entry. This makes the resolution durable — re-runs and future sessions see it. Cross-reference with CLAUDE.md's **Paper-vs-code disagreements** section: every entry there should now have its resolution recorded, either inline (if the user picked the canonical default) or in `open-questions.md`.
 
-If a resolution warrants a spec change (the user picks an override), edit `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` accordingly and re-run `astra validate astra.yaml`. If the change would invalidate the comparison report (e.g. flips the canonical method for a primary output), surface that to the user — in most cases the reproduction is "done" and the override is a known limitation, but the user may choose to re-enter the loop.
+If a resolution warrants a spec change (the user picks an override), edit `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` accordingly and re-run `astra validate astra.yaml`. If the change would invalidate the comparison report (e.g. flips the canonical method for a primary output), surface that to the user — in most cases the reproduction is "done" and the override is a known limitation, but the user may choose to re-spawn IMPLEMENT.
 
 ## Step 3: write `REPRODUCTION-SUMMARY.md`
 
-A single markdown file at the project root, ~1–2 pages. Sections:
+A single markdown file at the project root, ~1–2 pages. The canonical record of what this reproduction landed on. Sections:
 
 1. **What was reproduced** — the paper, the scope, the targets.
 2. **Verdict** — pass / partial. If partial, what failed and why we accepted it.
-3. **Material decisions** — the paper-vs-code conflicts SPECIFY's code pass surfaced, what the user chose (interactively or by canonical-resolution default), and why.
+3. **Material decisions** — the paper-vs-code conflicts SPECIFY's code pass (and any IMPLEMENT pass) surfaced, what the user chose (in prose ratification or by canonical-resolution default), and why.
 4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target with the path to the reproduced result and a one-line match note from the comparison report.
-5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded.
-6. **Resolved open questions** — pull from `<paper-slug>/open-questions.md`'s `## Resolutions` section. One bullet per question + its resolution.
-7. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`).
+5. **Open opportunities** — pull from `comparison-report.yaml`'s `opportunities:` block, plus anything in CLAUDE.md's **Rigor** section's *Open opportunities* list. One bullet each with the leverage assessment. This is what a future session (or a future-Cail revisiting) would tighten next.
+6. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut stricter than stated, etc.). The reproduction's value to the broader literature.
+7. **Resolved open questions** — pull from `open-questions.md`'s `## Resolutions` section. One bullet per question + its resolution.
+8. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the relevant `astra.yaml`, where CLAUDE.md lives so future sub-agents auto-load it).
 
 Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
 
-## Step 4: finalize the constitution outcome
+## Step 4: propagate opportunities into CLAUDE.md
 
-Rewrite the constitution's `outcome:` field to its final form. Now the user has walked the validation surfaces, ratified the open questions, and accepted (or explicitly partially-accepted) the reproduction. Write the outcome that teaches:
+For each opportunity in `comparison-report.yaml`'s `opportunities:` block that the user did NOT act on (i.e. they accepted the current verdict and chose to land here), append it to CLAUDE.md's **Rigor** section's *Open opportunities* list. Format: `<area> — <what could be tightened> — <leverage>`. This is what future sessions and future re-spawns walk up to; it's how the reproduction stays honest about what's at sketch / baseline / tightened / canonical rigor across its outputs.
 
-> Reproduced <paper> against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to <reason>. Material conflicts surfaced and resolved: <list>. Open questions resolved: <count> (full chain in `open-questions.md`). Spec at `astra.yaml` (validates with `--verify-evidence`); side-by-side at `.lightcone/comparison.html`; full report at `REPRODUCTION-SUMMARY.md`.
-
-The outcome should stand on its own — someone reading just `felt show <reproduction-fiber>` (or the kanban card) should learn the verdict, the material decisions that landed, and where the artifacts live. No "see the body for details."
+If the user acted on an opportunity (e.g. authorized one more IMPLEMENT round to close a gap), it doesn't go in the open list — but its closure is worth noting in *Current state* (e.g. *Figure 3: tightened* if the systematics treatment got a heavier pass).
 
 ## Step 5: commit
 
-Stage `REPRODUCTION-SUMMARY.md`, `<paper-slug>/open-questions.md` (with resolutions), the constitution with the final outcome, the final `astra.yaml`, the comparison artifacts, and any housekeeping changes. Commit with a message that names the verdict and the close-out:
+Stage `REPRODUCTION-SUMMARY.md`, `open-questions.md` (with resolutions), the updated CLAUDE.md, the final `astra.yaml`, the comparison artifacts, and any housekeeping changes. Commit with a message that names the verdict and the close-out:
 
 ```
 review: <paper-short-name> verdict <verdict>, summary at REPRODUCTION-SUMMARY.md
 ```
 
-After the commit, optionally flip the constitution's status to `closed` (or whatever the per-paper conventions name) so future surveys recognize the reproduction is done.
+This commit is the durable mark that the reproduction has reached close-out. Future walk-ups read CLAUDE.md and `git log` to know where the reproduction stands; the close-out commit + REPRODUCTION-SUMMARY.md together stand in for the old constitution `outcome:` field.
 
 ## Survey signals (entry into REVIEW)
 
 - `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready to close out
 - `.lightcone/comparison.html` exists ⇒ `/figure-comparison` rendered
-- `<paper-slug>/open-questions.md` has a `## Resolutions` section covering every entry ⇒ open-questions walkthrough done
+- `open-questions.md` has a `## Resolutions` section covering every entry ⇒ open-questions walkthrough done
 - `REPRODUCTION-SUMMARY.md` exists ⇒ final report written
-- Constitution `outcome:` reflects the final state ⇒ REVIEW done; reproduction complete
+- CLAUDE.md's **Rigor** section's *Open opportunities* list reflects the un-acted-on opportunities from the latest COMPARE ⇒ propagation done
+- A `review:` commit lands ⇒ REVIEW done; reproduction complete
 
 ## Notes
 
-- **This phase runs interactively in the main loop session.** Do not spawn it under `Task`. The whole point of REVIEW (close-out) is that the user is reachable — every step uses `AskUserQuestion` (directly, or via the sibling skills it invokes).
-- **`/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`.** That's why REVIEW is the always-interactive close-out and they live here, not in the loop. Spawning either under `Task` from inside the loop fires prompts into nothing.
-- **The user owns the verdict-acceptance decision.** REVIEW's purpose is to let the user see what the loop did and decide whether they accept it. The skill renders surfaces and asks; it does not unilaterally close.
-- **Don't confuse with the rigor-dialed self-reviews.** ARCHITECT, SPECIFY, and IMPLEMENT each run their own internal fresh-context self-review passes during the loop. Those are unrelated to this close-out — same word, different jobs. The phase boundary makes them unambiguous: rigor-dial reviews live inside their host phase's reference; this one is the always-interactive close-out.
-- **Open-question resolutions are durable.** Append to `<paper-slug>/open-questions.md`'s `## Resolutions` section so the next re-run / future session sees what was decided. Do not delete the original questions.
+- **This phase runs in the orchestrator session.** Do not spawn it as a named sub-agent. The whole point of REVIEW is that the user is reachable — every step uses `AskUserQuestion` (directly, or via the sibling skills it invokes).
+- **`/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`.** That's why REVIEW runs orchestrator-side and they live here, not in any sub-agent. Spawning either as a sub-agent fires prompts into nothing.
+- **The user owns the verdict-acceptance decision.** REVIEW's purpose is to let the user see what the sub-agents did and decide whether they accept it. The skill renders surfaces and asks; it does not unilaterally close.
+- **Don't confuse with the per-spawn self-reviews.** ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT each run their own internal fresh-context self-review passes during their work. Those are unrelated to this close-out — same word, different jobs. The phase boundary makes them unambiguous: per-spawn self-reviews live inside their host phase's reference; this one is the orchestrator-session close-out.
+- **Open-question resolutions are durable.** Append to `open-questions.md`'s `## Resolutions` section so the next re-run / future session sees what was decided. Do not delete the original questions.
 - **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous.
-- **Do not invent further work.** If the constitution's evidence checks all pass, the reproduction is done. The next session, the human, or a future revisit can decide whether the reproduction's place still serves them.
+- **Do not invent further work.** If the user has accepted the verdict and the opportunities are propagated, the reproduction is done. The next session, the user, or a future revisit can decide whether tightening any open opportunity still serves them.
diff --git a/claude/lightcone/skills/lc-from-paper/references/run.md b/claude/lightcone/skills/lc-from-paper/references/run.md
index 7f3240ef..a4e7e456 100644
--- a/claude/lightcone/skills/lc-from-paper/references/run.md
+++ b/claude/lightcone/skills/lc-from-paper/references/run.md
@@ -2,7 +2,7 @@
 
 Materialize every output in `astra.yaml` for the requested universe. RUN is mostly mechanical — `lc run --universe <id>` does the heavy lifting. The phase exists as a discrete step so failures get diagnosed and re-run before COMPARE.
 
-The constitution's per-phase mode is **user choice** — defaults to sub-agent. Failures may want diagnosis support; the user chooses based on how much trust they have in IMPLEMENT's first pass.
+This phase runs as the orchestrator-spawned `run` sub-agent. The user can drop into its chat if execution failures want diagnosis support; otherwise it logs failures, attempts targeted fixes within scope, and reports back. Universe defaults to `baseline` unless the orchestrator passes a different one when spawning.
 
 ## Inputs
 
@@ -21,7 +21,7 @@ Execute all recipes:
 lc run --universe baseline
 ```
 
-(Use whatever the constitution's `universe` field says; `baseline` is the default.)
+(Use whatever universe the orchestrator passed when spawning; `baseline` is the default.)
 
 Check status:
 
@@ -43,14 +43,15 @@ If outputs fail:
 
 - **Always use `lc run`** — do not run scripts directly. The runner manages dependencies, environments, and artifact paths; bypassing it produces inconsistent results.
 - **Re-runs are idempotent.** `lc run` skips outputs that are already materialized. To force re-execution, the runner has a flag for that — check `lc run --help`.
-- **Failures stay failures until fixed.** Do not "move on" past a failed output by editing it out of `astra.yaml`. Either fix the script or surface the failure as a constitution Open Question and stop.
+- **Failures stay failures until fixed.** Do not "move on" past a failed output by editing it out of `astra.yaml`. Either fix the script, ask the user in prose if reachable, or log the failure to `open-questions.md` and stop.
 
 ## Survey signals (entry into RUN)
 
 - `astra.yaml` has recipes and validates ⇒ ready to run
-- `lc status --universe baseline` returns all `ok` ⇒ RUN done; proceed to COMPARE
+- `lc status --universe baseline` returns all `ok` ⇒ RUN done; orchestrator proceeds to COMPARE
 
 ## Notes
 
 - The runner backend (Docker / local / SLURM) comes from the project's target configuration — `~/.lightcone/config.yaml` and `.lightcone/lightcone.yaml`. RUN does not need to choose; the runner picks based on config.
-- For long-running computations, the script's stdout / stderr stream into the result directory's log file. The phase agent should `tail` the log file to monitor progress, not poll `lc status` repeatedly.
+- For long-running computations, the script's stdout / stderr stream into the result directory's log file. The run sub-agent should `tail` the log file to monitor progress, not poll `lc status` repeatedly.
+- **Commit the materialized results' state when RUN settles.** The actual `results/` artifacts are gitignored heavy data, but the run-level outcome (which outputs reached `ok`, any failures logged) is worth a commit so the orchestrator can read `git log` to know RUN landed.

From 56a8b4e52d4f2572089dc3ea770549603a5efb02 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sun, 10 May 2026 18:30:24 +0200
Subject: [PATCH 035/124] Drop constitution and ralph-loops skills; realign
 top-level docs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The lc-from-paper rewrite as orchestrator + named per-phase sub-agents
removed every callsite for both skills:

- /constitution was invoked by the old interview phase to draft a
  per-paper reproduction constitution. The new interview produces only
  CLAUDE.md from a template; no constitution skill needed.
- ralph-loops drove the bash-loop and tmux-orchestrated runtime modes.
  The new architecture has no runtime-mode trichotomy — the orchestrator
  session is persistent and sub-agents run in the background natively,
  so neither autonomous-loop driver applies.

Top-level README, CLAUDE.md, skills/README, and docs/user/agent-workflow
all picked up references to the old shape (constitution, ralph-loops,
runtime modes, frugal/rigorous dial, per-phase mode matrix) and have
been realigned to the new orchestrator + sub-agents framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md                                     |   3 +-
 README.md                                     |   2 +-
 claude/lightcone/skills/README.md             |   8 +-
 claude/lightcone/skills/constitution/SKILL.md | 122 -----------
 .../constitution/references/constitution.md   | 139 -------------
 .../constitution/references/crafting.md       | 193 ------------------
 claude/lightcone/skills/ralph-loops/SKILL.md  |  56 -----
 .../skills/ralph-loops/assets/spec.md         |  29 ---
 .../skills/ralph-loops/scripts/ralph          | 124 -----------
 docs/user/agent-workflow.md                   |  31 +--
 10 files changed, 25 insertions(+), 682 deletions(-)
 delete mode 100644 claude/lightcone/skills/constitution/SKILL.md
 delete mode 100644 claude/lightcone/skills/constitution/references/constitution.md
 delete mode 100644 claude/lightcone/skills/constitution/references/crafting.md
 delete mode 100644 claude/lightcone/skills/ralph-loops/SKILL.md
 delete mode 100644 claude/lightcone/skills/ralph-loops/assets/spec.md
 delete mode 100755 claude/lightcone/skills/ralph-loops/scripts/ralph

diff --git a/CLAUDE.md b/CLAUDE.md
index 92d2a3c6..feb9b56c 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -70,7 +70,8 @@ claude/lightcone/           # Claude plugin source — force-included into the w
 ├── skills/                 # lc-new, lc-from-code, lc-from-paper,
 │                            # lc-feedback;
 │                            # paper-reproduction bundle: lc-from-paper (entry),
-│                            # narrative, constitution, ralph-loops, paper-extraction
+│                            # narrative, paper-extraction, figure-comparison,
+│                            # check-sentence-by-sentence
 │                            # (see skills/README.md for the full bundle map)
 ├── agents/                 # lc-extractor
 ├── guides/                 # astra-reference, lightcone-cli-reference, ui-brand
diff --git a/README.md b/README.md
index 105879fa..6c05aafd 100644
--- a/README.md
+++ b/README.md
@@ -42,7 +42,7 @@ Scans an existing codebase, drafts an `astra.yaml` that captures its inputs, out
 
 ### `/lc-from-paper` — Reproduce a published paper
 
-Interview-first orchestrator for reproducing a published paper in ASTRA. Drafts a per-paper reproduction constitution and `CLAUDE.md`, then drives a multi-session loop through nine phases (ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW). Composes a bundle of sibling skills (paper-extraction, constitution, ralph-loops, narrative, figure-comparison, check-sentence-by-sentence). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
+Interview-first orchestrator for reproducing a published paper in ASTRA. Drafts a per-paper `CLAUDE.md`, then runs as a persistent orchestrator session that spawns named per-phase sub-agents (acquire, architect, specify, literature, implement, run, compare) the user can drop into directly. The two bookends — INTERVIEW and REVIEW — run in the orchestrator session itself. Composes a bundle of sibling skills (paper-extraction, narrative, figure-comparison, check-sentence-by-sentence). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
 
 ### `/lc-feedback` — Report a bug
 
diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 262c49e3..9e223ff4 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -17,18 +17,16 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and per-paper `CLAUDE.md`, then launches one of three runtime modes (interactive, bash-loop, tmux-orchestrated) against the constitution. The constitution carries 9 phases — INTERVIEW → ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW — bookended by two always-interactive seams (INTERVIEW at start, REVIEW at close-out); every other phase is configurable per the user's per-phase mode choice, with ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT additionally tuned by a frugality / rigor dial that drives each phase's internal fresh-context self-review. |
+| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper `CLAUDE.md`, then runs as a persistent orchestrator session that spawns named per-phase sub-agents the user can drop into directly. Nine phases — INTERVIEW → ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW — bookended by INTERVIEW and REVIEW running in the orchestrator session itself; the seven phases between are sub-agent dispatches. Rigor is chosen per spawn from CLAUDE.md's Rigor section, not as a global dial. |
 | [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by lc-from-paper during SPECIFY. |
-| [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by lc-from-paper during the interview. |
-| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Used by lc-from-paper's bash-loop and tmux-orchestrated runtime modes. |
 | [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations) plus a stub `astra.yaml` for the paper. Primary acquisition path for lc-from-paper's ACQUIRE phase. |
 | [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from lc-from-paper's REVIEW close-out (opt-in); also user-invokable directly. |
 | [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from lc-from-paper's REVIEW close-out (mandatory); also user-invokable directly. |
 
-The full reproduction story spans these seven skills. lc-from-paper's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about lc-from-paper.
+The full reproduction story spans these five skills. lc-from-paper's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about lc-from-paper.
 
 ### Why bundle (not depend on plugin install)
 
-- **Testability.** We want to verify lc-from-paper invokes constitution + ralph-loops + the others correctly. That only works when all are in the same checkout.
+- **Testability.** We want to verify lc-from-paper invokes its sibling skills correctly. That only works when all are in the same checkout.
 - **Single install path.** `lc init` brings the full toolkit. Adding a separate plugin-marketplace step is friction we don't need.
 - **Future consolidation is open.** The long-run shape may be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all. See [[lightcone/skills-location-policy]].
diff --git a/claude/lightcone/skills/constitution/SKILL.md b/claude/lightcone/skills/constitution/SKILL.md
deleted file mode 100644
index 7fa24df3..00000000
--- a/claude/lightcone/skills/constitution/SKILL.md
+++ /dev/null
@@ -1,122 +0,0 @@
----
-name: constitution
-description: >
-  Draft a constitution — a markdown document describing a desired state
-  for autonomous iteration. Study the problem space, shape the
-  constitution interactively (two-diamonds rhythm; six stances on
-  demand), then hand it to a runner — `/ralph-loops` for a tmux loop,
-  felt's `/shuttle` for fiber-tracked dispatch, or any other
-  iteration-runner. Use for any work where adaptation matters more than
-  a fixed plan: science, refactoring, exploration, creative work,
-  research narratives.
-  Triggers: "constitution", "constitute", "draft a constitution",
-  "ralph spec", "set up a ralph", "shuttle this", "write a spec for
-  autonomous iteration".
----
-
-# Constitution
-
-A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It's designed to outlast any single agent or iteration and remain valid as the world changes around it. A good constitution never says "50 files remain" because that's a snapshot that goes stale; it says "check `grep -r 'old_pattern'`" because that's a principle that stays true until the work is done.
-
-Constitutions don't prescribe steps. They describe what the system looks like when it's right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what's highest value. Each iteration of the work does this with fresh context.
-
-This matters most in science and exploratory work, where each decision is informed by the result just before it. A plan assumes you know the path; a constitution trusts the agent to find it — with taste, judgment, and fresh eyes each time.
-
-**Separation of context: if you craft, you never do the work yourself.**
-
-## Workflow
-
-1. **Study** — Read relevant files, understand existing patterns. This informs the *constitution*, not implementation. The goal is pointers that iterations will follow.
-
-2. **Draft** — Create a markdown file for the constitution. The bundled template lives in the sibling `ralph-loops` skill:
-   ```bash
-   cp ../ralph-loops/assets/spec.md my-constitution.md
-   ```
-   If felt is installed and you're working in a felt-tracked project, you can author the constitution as a fiber instead — `felt add <slug> "Constitution title" -s open -t constitution` — and runners that read fibers (felt-shuttle) will pick it up. Fill in what you can; don't wait until it's perfect.
-
-3. **Refine** — Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. The two-diamonds rhythm and six stances in [`references/crafting.md`](references/crafting.md) help most when the user is deciding something non-trivial. Apply the qualitative ambiguity self-check before launching.
-
-4. **Launch** — When approved, hand the constitution to whichever runner is appropriate. Common options:
-
-   - **`/ralph-loops`** — bundled tmux loop runner. Re-spawns iterations against the constitution until the runner sees its done-conditions met.
-     ```bash
-     ../ralph-loops/scripts/ralph my-constitution.md [--backend claude|codex] [-- extra-flags...]
-     ```
-     Add `-- --chrome` for visual/frontend work. Session: `ralph-<spec-name>`. Attach: `tmux attach -t ralph-<spec-name>`.
-   - **`/shuttle`** (felt-aware) — fiber-tracked dispatch. Reads the `shuttle:` block from the fiber's frontmatter and spawns single-shot workers across sessions; the kanban surfaces what's in flight.
-   - **Other dispatchers** — anything that reads a markdown spec or fiber and spawns iterations. Their configuration is owned outside this skill.
-
-   The constitution stays editable while iteration runs; successive iterations re-read it each cycle, so refinements between iterations are normal.
-
-## What goes in a constitution
-
-A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what doesn't, add what's missing:
-
-```markdown
-## Desired State
-What the system looks like when it's done. Invariants, quality bar,
-done-conditions. Fence the scope — what to aim for AND what to leave alone.
-
-## Context
-File paths, existing patterns, architectural constraints. Things iterations
-need to *find* but not *achieve*.
-
-## Skills
-Which skills to activate before working.
-
-## Evidence
-How to check progress — commands, test suites, grep patterns. Pointers to
-the ground truth that iterations measure themselves against.
-
-## Open Questions
-Uncertainties the user should weigh in on. Iterations add to this; the user
-resolves between loops.
-```
-
-For deeper reference on each section's voice and the discipline that keeps a constitution from drifting into a plan, see [`references/constitution.md`](references/constitution.md).
-
-## Principles
-
-**Constitution, not plan.** Say what the system looks like when it's right. Never describe the current state — anything that becomes false or irrelevant as work progresses doesn't belong. If a section would be outdated after one iteration, it's a snapshot — replace it with a pointer.
-
-**Pointers, not snapshots.** "Check `grep -r 'old_pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
-
-**Reshape, don't accrete.** When the desired state evolves — testing surfaces a gap, a meeting changes the priority, a sibling decision lands — rewrite the affected sections so the body still reads as today's desired state. Don't tack on a "Round 2" section; don't add an "Amendments" appendix; don't keep the old framing alongside the new one as a sediment. A green-field constitution will change a lot as it matures. The chronology lives in the runner's history surface; the body lives in *now*.
-
-**Prefer existing systems.** Before designing anything new: can what's there handle this?
-
-**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
-
-**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it.
-
-## Constitutions that shape artifacts
-
-Some constitutions don't build code — they shape artifacts like documentation, dashboards, or research narratives. These have different rhythms:
-
-- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it's the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow.
-- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops.
-
-## Anti-patterns
-
-**Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment.
-
-**Vague done.** "Make it better" — when does iteration stop?
-
-**Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste.
-
-**Snapshot language.** "Currently 50 files" — will be wrong after one iteration.
-
-**Decision logs in the body.** "Resolved choices" / "Process notes" sections turn the constitution into a process journal. When a question gets answered, fold the answer into the narrative where it's contextually relevant — into Invariants, Desired State, Context — and let the runner's history surface (`felt history`, commits, etc.) carry the chronology.
-
-**Amendment scaffolding.** "Round 2", "v2 deltas", "Updates 2026-05-04 →". The same failure as a decision log, played out across edits: the body becomes a sediment of layered framings instead of the current desired state. When the desired state shifts, *reshape* the affected sections — rewrite headings, update prose, drop what no longer applies — so the document still reads as one coherent description of now.
-
----
-
-## References
-
-- [`references/constitution.md`](references/constitution.md) — depth on drafting voice, sections, and the crafting workflow. Felt-aware where felt is installed; the procedural steps work without felt too.
-- [`references/crafting.md`](references/crafting.md) — two-diamonds
-  rhythm, six stances, the funnel ledger, and the qualitative ambiguity
-  self-check. Use this when the conversation has careful-thinking
-  character — not every constitution drafting needs it, but the ones that
-  do are the ones that benefit most.
diff --git a/claude/lightcone/skills/constitution/references/constitution.md b/claude/lightcone/skills/constitution/references/constitution.md
deleted file mode 100644
index 09e63568..00000000
--- a/claude/lightcone/skills/constitution/references/constitution.md
+++ /dev/null
@@ -1,139 +0,0 @@
-# Constitution — depth reference
-
-Drafting a constitution. The SKILL body covers the procedural backbone (Study → Draft → Refine → Launch). This reference goes deeper on voice, sections, and the discipline that keeps a constitution from sliding into a plan.
-
-The constitution itself is just a markdown file with YAML frontmatter that a runner reads on each iteration. Common runners: the bundled `ralph-loops` (tmux loop, `scripts/ralph`), or external dispatchers like felt-shuttle (when felt is installed). The runner is interchangeable; the constitution is what matters.
-
----
-
-## What a constitution is
-
-A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It is designed to outlast any single iteration and remain valid as the world changes around it.
-
-**A good constitution never says "50 files remain"** — that is a snapshot that goes stale. It says `check "grep -r 'old_pattern'"` — that is a principle that stays true until the work is done.
-
-Constitutions do not prescribe steps. They describe what the system looks like when it is right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what is highest value. Each iteration of the work does this with fresh context.
-
-**Constitution, not plan.** Plans assume you know the path; constitutions trust the agent to find it — with taste, judgment, and fresh eyes each time. This matters most in science and exploratory work, where each decision is informed by the result just before it.
-
-**Separation of context: if you craft, you never do the work yourself.** The constitution is designed by one role; iterations are run by another.
-
----
-
-## When to write a constitution
-
-- Work where adaptation matters more than a fixed plan: scientific investigation, exploratory refactoring, creative writing
-- The desired state is clear (or can be made clear) but the path is not
-- Iterations need to re-read with fresh context and make judgment calls
-- A checklist would either be wrong after one step or race through without judgment
-
-Don't write a constitution for: clearly-scoped atomic tasks, anything where a checklist or a plan is genuinely the right shape.
-
----
-
-## Workflow (deeper)
-
-### 1. Study
-
-Read relevant files, understand existing patterns. This informs the **constitution**, not implementation — the goal is pointers that iterations will follow, not a head start on the work.
-
-### 2. Draft
-
-Create the spec file from the bundled template:
-
-```bash
-cp ../ralph-loops/assets/spec.md my-spec.md
-```
-
-(Or, if felt is installed and you are working in a felt-tracked project, you can create the constitution as a fiber and the runner will treat it as the spec — `felt add <slug> "Constitution title" -s open -t constitution` then edit the body. This is felt-only; the bundled template above works without felt.)
-
-Use the crafting process from [`crafting.md`](crafting.md):
-
-- **Wonder → Ontology:** what IS the desired state? Name it precisely.
-- **Design → Delivery:** what sections does this constitution need? Which are pointers vs snapshots?
-
-Stances that help most during constitution drafting:
-
-- **Ontologist** for naming the desired state ("what IS 'done' here?")
-- **Simplifier** for fencing scope ("what are we explicitly leaving alone?")
-- **Contrarian** for pressure-testing whether the whole framing is right
-- **Architect** when the constitution is about refactoring structure
-
-### 3. Refine
-
-Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. Apply the qualitative ambiguity self-check from `crafting.md` — goal, constraints, success — before launching.
-
-Repeat until it feels solid. It does not have to be complete; open questions belong in the Open Questions section.
-
-### 4. Launch
-
-When approved, hand to a runner. Bundled option: `../ralph-loops/scripts/ralph my-spec.md`. The runner re-reads the spec each iteration, so refinements between iterations are normal.
-
----
-
-## Constitutional sections
-
-A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what does not, add what is missing:
-
-```markdown
-## Desired State
-What the system looks like when it is done. Invariants, quality bar,
-done-conditions. Fence the scope — what to aim for AND what to leave alone.
-
-## Context
-File paths, existing patterns, architectural constraints. Things iterations
-need to *find* but not *achieve*.
-
-## Skills
-Which skills to activate before working.
-
-## Evidence
-How to check progress — commands, test suites, grep patterns. Pointers to
-ground truth that iterations measure themselves against.
-
-## Open Questions
-Uncertainties the user should weigh in on. Iterations add to this; the user
-resolves between loops.
-```
-
----
-
-## Principles (deeper)
-
-**Pointers, not snapshots.** `check "grep -r 'old_pattern'"` not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
-
-**Reshape, don't accrete.** When the desired state evolves — testing surfaces a gap, a meeting changes the priority, a sibling decision lands — rewrite the affected sections so the body still reads as today's desired state. Don't tack on a "Round 2" section; don't add an "Amendments" appendix; don't keep the old framing alongside the new one as a sediment. A green-field constitution will change a lot as it matures, and a mature one will keep changing as reality does. The chronology lives in the runner's history surface (commits, `felt history` if felt is in use); the body lives in *now*.
-
-**Prefer existing systems.** Before designing anything new: can what is there handle this?
-
-**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
-
-**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it.
-
----
-
-## Constitutions that shape artifacts
-
-Some constitutions do not build code — they shape artifacts like documentation or research narratives. These have different rhythms:
-
-- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it is the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow.
-- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops.
-
----
-
-## Anti-patterns
-
-- **Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment.
-- **Vague done.** "Make it better" — when does iteration stop? What would a reader see?
-- **Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste.
-- **Snapshot language.** "Currently 50 files" — will be wrong after one iteration.
-- **Immutable seed.** Not our shape. The constitution is meant to be edited between iterations; do not treat it as frozen.
-- **Numerical convergence.** "Iteration stops when similarity ≥ 0.95" — wrong shape for science. Stop when the Evidence section says the desired state has been reached.
-- **Decision logs in the body.** "Resolved choices" / "Decisions made" / "Process notes" sections turn the constitution into a process journal. When a question gets answered (in conversation, via `AskUserQuestion`, in a review), fold the answer into the narrative where it is contextually relevant — into Invariants, Desired State, Context — and let the runner's chronological surface (commits, `felt history` if felt is in use) carry the chronology. The constitution describes *what is*, not *how we got here*; an "Open Questions" section that has been fully resolved should be deleted, not left as a victory log.
-- **Amendment scaffolding.** "Round 2", "v2 deltas", "Updates 2026-05-04 →", "Second round amendments". The same failure as a decision log, played out across edits: the body becomes a sediment of layered framings instead of the current desired state. When the desired state shifts, *reshape* the affected sections — rewrite headings, update prose, drop what no longer applies — so the document still reads as one coherent description of now. The story of how it got here is what `felt history append` (or commit messages, when felt isn't in use) and the outcome blurb are for.
-
----
-
-## When crafting lands here
-
-The crafting rhythm in [`crafting.md`](crafting.md) applies to all careful interactive thinking; this reference kicks in when the target artifact is specifically a constitution. The diamonds do most of the work — the funnel mechanic used for open-ended exploration is not the primary move here, because there is already one specific artifact being produced. See the Workflow section above for which stances help most at each drafting phase.
diff --git a/claude/lightcone/skills/constitution/references/crafting.md b/claude/lightcone/skills/constitution/references/crafting.md
deleted file mode 100644
index 21595414..00000000
--- a/claude/lightcone/skills/constitution/references/crafting.md
+++ /dev/null
@@ -1,193 +0,0 @@
-# Crafting
-
-How to help the user think through something that hasn't crystallized, and turn the result into structured commitments — frontmatter on a fiber if felt is in use, otherwise inline structure in the constitution itself (decisions with excluded options, evidence pointers, scoped findings).
-
-Use it when the user is deciding something non-trivial, scoping a sub-analysis, drafting a living spec, or talking through an open question — any time careful interactive thinking is happening and the output can land in structured form.
-
-The rhythm is two diamonds: first understand what the thing IS, then decide what to DO about it. Each diamond diverges to explore and converges to commit. The ontological question — *what IS this, really?* — is the convergence point of the first diamond, and it is the most practical question you can ask.
-
-```
-    ◇ Wonder              ◇ Design
-   ╱  (diverge)          ╱  (diverge)
-  ╱    surface          ╱    alternatives
- ╱     questions       ╱     trade-offs
-●─────────────────────●─────────────────────●
- ╲                     ╲
-  ╲    crystallize      ╲    commit
-   ╲   the name          ╲   with reasons
-    ◇  (converge)         ◇  (converge)
-    Ontology              Delivery
-```
-
-Diamond 1 diverges into questions and converges on a name (*"this IS a decision about covariance estimation"*). Diamond 2 diverges into alternatives and converges on a commit (a default with `excluded_reason` for each rejection). The second diamond inherits the ontological commit from the first.
-
----
-
-## The two diamonds
-
-### Diamond 1: Wonder → Ontology
-
-**Wonder (diverge).** What are we actually trying to figure out? Surface questions, assumptions, ambiguities. Do not propose answers yet. If the user is already pitching solutions, back them up to the question.
-
-**Ontology (converge).** What IS this, really? Crystallize into a claim, decision, or question specific enough to act on. The convergence is complete when you can **name** the thing precisely — "this is a decision about covariance estimation" or "this is a question about whether leakage matters below ℓ=100." A good name is often the entire output of Diamond 1.
-
-**Output of Diamond 1:** a stub with a real name and at least one structural placeholder — a decision label, an insight claim, or input/output IDs. Not a full block — just the hook that identifies what kind of thing this is.
-
-### Diamond 2: Design → Delivery
-
-**Design (diverge).** What are the real alternatives? For each, what would make it right or wrong? Trade-offs, excluded options, edge cases. This is where the Contrarian and Simplifier stances are most useful.
-
-**Delivery (converge).** Commit to a default, write the `excluded_reason` for each rejected option, identify inputs and outputs, stage the evidence. The structure is now formalizable.
-
-**Output of Diamond 2:** structured fields populated — `decisions` with options and default, `inputs`/`outputs` with IDs and types, `insights` with claim and evidence. (If felt is in use, these go on the fiber; otherwise they live in the spec itself or in `astra.yaml`.)
-
-The two diamonds are sequential but the boundary is soft. If you find yourself naming alternatives before the thing is clear, back up to the ontology convergence point. If you converge too early on "this is a decision" when it is actually a question, the Design phase will feel forced — that is the cue to re-enter Wonder.
-
----
-
-## Stances
-
-Six lightweight lenses for when the conversation needs pressure. **Default is no stance** — straight conversation. Invoke a stance when pressure would help, announce it in one sentence, drop it when it has done its work. Do not stack or pipeline them.
-
-### Socratic — *"What are you assuming?"*
-
-Question-only. Never proposes answers. Surfaces the assumptions under the user's framing.
-
-- What are you assuming is true that might not be?
-- What would make option A right vs option B? What is the actual fork?
-- If you had to write the `excluded_reason` for the option you are about to reject, what would it say?
-
-**Use in Wonder and early Design.** When the user is about to commit to a path and you want the reasons made explicit.
-
-### Ontologist — *"What IS this, really?"*
-
-Pushes on definition before mechanism. Four questions:
-
-1. **Essence** — what is the true nature, stripping away accidental properties?
-2. **Root cause or symptom** — is this the fundamental issue or a surface effect?
-3. **Prerequisites** — what must exist first for this even to make sense?
-4. **Hidden assumptions** — what implicit beliefs is the framing resting on?
-
-**Use at the Ontology convergence point.** When a word is doing heavy lifting and may mean different things in different sentences.
-
-### Contrarian — *"What if the opposite were true?"*
-
-Challenges premises, not details.
-
-- What if the choice does not actually matter for your signal?
-- What if the constraint you are designing around is not real?
-- What if the simplest version is already good enough?
-
-**Use in Design.** When the conversation is burning effort on a distinction that may not matter, or a third option (do nothing, use the default) is being ignored.
-
-### Simplifier — *"Is this complexity earning its keep?"*
-
-YAGNI, concrete first, data over code.
-
-- What can we remove without losing the core value?
-- What is the simplest version that would work?
-- Can a data structure replace this logic?
-
-**Use in Design and early Delivery.** When the design is drifting toward over-engineering or a feature list is growing without anchoring reasons.
-
-### Researcher — *"What do we actually know?"*
-
-Evidence before interpretation. Especially useful for scientific work where a claim needs to be defensible.
-
-- What does the actual source say, not what we remember?
-- What would count as evidence here? What would falsify the claim?
-- What is the most specific claim we can make with the data in hand?
-
-**Use in Delivery.** When an insight needs a defensible claim, or when the user is about to write an outcome that is stronger than the evidence supports.
-
-### Architect — *"If we started over, would we build it this way?"*
-
-Structural root cause. The question behind the question when friction keeps recurring.
-
-- Is the same problem showing up in different forms?
-- Which abstraction does not match reality?
-- What assumption was wrong from the start?
-
-**Use when a debate keeps returning.** The user is circling a decision they have already made three times and cannot stick to — the real question is probably structural, not tactical.
-
----
-
-## The funnel
-
-When the conversation is exploratory — no single topic, things are accumulating — keep a private running ledger of what is falling out, classified by destination:
-
-| Item kind | What it looks like | Destination |
-|-----------|--------------------|-------------|
-| **Decision** | A choice between real alternatives | `decisions` block in spec / `astra.yaml` / fiber |
-| **Finding** | A claim with at least the start of evidence | `insights` block / fiber |
-| **Sub-analysis** | "Compute X from Y" with identifiable inputs/outputs | New `astra.yaml` sub-analysis or new fiber with `inputs`/`outputs` stubs |
-| **Question** | An open thread worth tracking, not yet answered | "Open Questions" section of the constitution / annotated fiber |
-| **Root-fiber change** | A pattern or gotcha that belongs in CLAUDE.md | Edit CLAUDE.md / root fiber |
-
-The ledger is your own working memory. **Do not surface it mid-conversation** unless the user asks or a flush cue fires.
-
-**Flush cues:**
-
-- User says "OK we should write this down" or similar
-- Three or more items have accumulated and the topic is about to shift
-- A natural pause after a decision or finding lands
-
-On flush, present the ledger grouped by destination, then file with the user's assent. If the user declines an item, discard it without argument.
-
----
-
-## Qualitative ambiguity self-check
-
-Before committing to a path — filing a decision, launching an iteration loop, sealing an outcome — check three things qualitatively. **No scoring, no thresholds.** If any feels fuzzy, resolve it with AskUserQuestion.
-
-1. **Goal.** Is what the user wants specific enough that two competent people would build the same thing from it? If not, what would pin it down?
-2. **Constraints.** Are the limits named? What cannot change, what must be preserved, what would break everything? Missing constraints tend to show up as "oh wait, we also need…" after the commit.
-3. **Success.** How will we know it is done or right? What is the evidence condition? Qualitative is fine ("a reviewer can follow the narrative cold"), but it has to be checkable.
-
-When one is fuzzy, use AskUserQuestion with concrete options rather than open prose questions. Iterate until the answer is "yeah, that's it." **Stop when the fuzziness resolves, not when a score crosses a threshold.** Scores on qualitative priors add false precision; the honest signal is whether the user knows what they want.
-
-This is a mirror, not a gate. If the user wants to file anyway with one dimension still fuzzy, file it — the fuzziness itself can live in an Open Questions section, and future iterations can refine it.
-
----
-
-## When to bring in /confer
-
-`/confer` routes a prompt through Codex for adversarial review. Good fits inside a crafting session:
-
-- A design choice where two plausible paths both look right and the user is stuck
-- Validating that an insight claim actually follows from its evidence
-- Pressure-testing a constitution's desired state before launching iteration
-
-Bad fits: routine decisions, the user has already committed, the dispute is stylistic, or the answer only needs three more seconds of thought. `/confer` is not a substitute for the user's taste — it is a second opinion when the first opinion is honestly unsure.
-
----
-
-## Mapping outputs to structure
-
-What comes out of the diamonds maps onto wherever you keep structured commitments:
-
-| Diamond output | Destination |
-|----------------|-------------|
-| Wonder questions left open | "Open Questions" section in the constitution; or a fiber with `status: open` (felt) |
-| Ontology convergence — "this IS a decision about X" | A `decisions.<key>.label` entry — in `astra.yaml`, in the constitution body, or on a fiber |
-| Design alternatives with trade-offs | `decisions.<key>.options`; rejected options get `excluded_reason` |
-| Delivery — the commit | `decisions.<key>.default` |
-| Finding at end of Delivery | `insights.<key>` with `claim` + `evidence` (or finding in `astra.yaml`) |
-| Sub-analysis scope | New sub-analysis in `astra.yaml`, or a new fiber with `inputs`/`outputs` |
-| Process-level lesson that generalizes | Edit to root CLAUDE.md / root fiber |
-
-The same shapes apply directly inline in `astra.yaml` or the constitution itself; no separate substrate is required.
-
----
-
-## Anti-patterns
-
-- **Ambiguity gates.** Do not withhold help until the user clarifies N dimensions. The self-check is a mirror, not a door.
-- **Numerical scoring.** Do not introduce 0–1 clarity scores with thresholds. The underlying signal is qualitative and the number adds false precision.
-- **Stance pipelines.** Do not run Socratic → Ontologist → Contrarian in sequence. Pick one when it helps; drop it when it has.
-- **Mandatory interview.** No prepared question list. Stances are responsive to the actual conversation.
-- **Surfacing the ledger too early.** A single item is not a flush. Wait for accumulation or a pause.
-- **Immutable outputs.** Nothing filed here is locked. Everything is editable; reversals are normal.
-- **Nine-minds overload.** Six stances is already generous. Add more only when a specific gap shows up, never preemptively.
-- **Interrogation without a ceiling.** Three questions is usually enough. If the user is getting irritated, stop asking and file what you have.
-- **Converging before the name is clear.** If Diamond 2 feels forced, Diamond 1 has not finished. Back up.
diff --git a/claude/lightcone/skills/ralph-loops/SKILL.md b/claude/lightcone/skills/ralph-loops/SKILL.md
deleted file mode 100644
index 2b011fc5..00000000
--- a/claude/lightcone/skills/ralph-loops/SKILL.md
+++ /dev/null
@@ -1,56 +0,0 @@
----
-name: ralph-loops
-description: >
-  Autonomous loop iteration toward a desired state. You are inside a ralph
-  loop — your spec is in the system prompt. Survey, contribute, update state
-  discoverably, exit. Activated automatically inside ralph loops, or when
-  launching one against an existing spec via scripts/ralph; for drafting
-  the spec itself, use /constitution.
-  Triggers: "ralph-loops", "launch ralph", "run ralph", "ralph loop on <spec>".
----
-
-# Ralph Loops
-
-You are inside a loop. Your spec is in the system prompt above. Each iteration: survey freely, work substantially, update state discoverably, exit.
-
-## Loop
-
-1. **Survey** — Fresh eyes. Explore agents, git log, tests. You decide what to check.
-2. **Contribute** — Work on 1–3 substantial pieces. Do NOT try to clear the whole queue in one iteration.
-3. **Update** — Before exiting: commit your work, update CLAUDE.md if warranted.
-4. **Exit** — `kill $PPID`
-
-**CRITICAL: Exit before compaction.** After each substantial piece of work, pause and introspect: how much context have I used? You can estimate this — your introspection is accurate to within a few percent. If you feel past 50%, wrap up and exit. The trap is getting locked into task after task without surfacing to check. Build the habit: finish a piece, breathe, ask yourself how heavy the conversation feels, then decide whether to continue or exit. Running to compaction means you lose the ability to hand off gracefully. The loop continues — you don't have to finish everything.
-
-## Rules
-
-**State, not checklist.** The spec describes what "done" looks like. Survey reality, decide what's highest value, work on that.
-
-**Discoverable updates.** Commits, test results, documentation — not notes or progress files. The next iteration finds what changed by inspecting the system.
-
-**Pointers, not snapshots.** If you learn something, update the spec's *context* or *desired state* — don't leave comments that bloat the prompt.
-
-**You have authority.** Trust the spec, don't ask permission. Make substantial contributions. Don't avoid ambitious solutions just because they span multiple iterations.
-
-**File uncertain decisions** so the user can answer after the loop. Use AskUserQuestion to batch up to 4 high-leverage questions before exiting — choices where user input redirects substantial work.
-
-### Long-Running Jobs
-
-Some iterations require waiting on computation (builds, cluster jobs, CI). When jobs are running:
-
-1. **Check state** — tail logs, check output
-2. **Sleep** — interval proportional to expected runtime (30s for minute-scale, 5m for hour-scale)
-3. **Check again** — look for errors or completion
-4. **Repeat** until jobs finish or fail
-
-Stay and shepherd computation through. Don't exit and hope the next iteration picks it up.
-
-## Exit
-
-If you **made substantial contributions**, `kill $PPID`. Do NOT close the spec — the loop continues.
-
-If you **cannot find any remaining work**, update the spec's YAML frontmatter to `status: closed` with a summary of what was accomplished.
-
----
-
-Pattern adapted from [Ralph Wiggum](https://ghuntley.com/ralph/).
diff --git a/claude/lightcone/skills/ralph-loops/assets/spec.md b/claude/lightcone/skills/ralph-loops/assets/spec.md
deleted file mode 100644
index 0da84d2a..00000000
--- a/claude/lightcone/skills/ralph-loops/assets/spec.md
+++ /dev/null
@@ -1,29 +0,0 @@
----
-status: open
----
-
-This is your spec for an autonomous iteration loop, a meditative iteration toward a desired state.
-
-## Desired State
-
-[Describe what you're building and why. Someone unfamiliar with the project should understand the goal from this section alone.
-
-Be detailed about "done": the architecture, behavior, constraints, quality bar. You'll check reality against this and work to close the gap.
-
-Use pointers, not snapshots. Say "check `grep -r 'pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid.]
-
-## Context
-
-[Point to relevant files and existing patterns. When you see real implementations, you build coherently on them rather than introducing alien patterns.]
-
-## Skills
-
-[Skills to activate before working. Use `/skill-name`.]
-
-## Evidence
-
-[How to check progress — commands, test suites, grep patterns. Pointers to the ground truth that iterations measure themselves against.]
-
-## Open Questions
-
-[Uncertainties the user should weigh in on. Iterations add to this; the user resolves between loops.]
diff --git a/claude/lightcone/skills/ralph-loops/scripts/ralph b/claude/lightcone/skills/ralph-loops/scripts/ralph
deleted file mode 100755
index a7269366..00000000
--- a/claude/lightcone/skills/ralph-loops/scripts/ralph
+++ /dev/null
@@ -1,124 +0,0 @@
-#!/bin/bash
-# Run a ralph loop on a spec file
-# Loops while spec status is open/active, appending spec content to system prompt
-# Usage: ralph <spec.md> [--backend claude|codex] [-- extra-flags...]
-#
-# Supports both Claude Code and Codex backends.
-# Default: claude. Set RALPH_BACKEND=codex or pass --backend codex.
-
-set -e
-
-SPEC_FILE="${1:?Usage: ralph <spec.md> [--backend claude|codex] [-- extra-flags...]}"
-shift
-
-BACKEND="${RALPH_BACKEND:-claude}"
-if [[ "$1" == "--backend" ]]; then
-    BACKEND="$2"
-    shift 2
-fi
-
-EXTRA_FLAGS=""
-if [[ "$1" == "--" ]]; then
-    shift
-    EXTRA_FLAGS="$*"
-fi
-
-# Resolve to absolute path
-SPEC_FILE="$(cd "$(dirname "$SPEC_FILE")" && pwd)/$(basename "$SPEC_FILE")"
-
-if [[ ! -f "$SPEC_FILE" ]]; then
-    echo "Spec file not found: $SPEC_FILE"
-    exit 1
-fi
-
-SESSION="ralph-$(basename "$SPEC_FILE" .md)"
-WORK_DIR="$(pwd)"
-
-# Check if already running
-if tmux has-session -t "$SESSION" 2>/dev/null; then
-    echo "Ralph already running: $SESSION"
-    echo "  Attach: tmux attach -t $SESSION"
-    exit 0
-fi
-
-# Write loop script to temp file (avoids heredoc quoting hell)
-LOOP_SCRIPT=$(mktemp /tmp/ralph-loop-XXXXXX.sh)
-cat > "$LOOP_SCRIPT" << 'LOOP'
-#!/bin/bash
-SPEC_FILE="$1"
-WORK_DIR="$2"
-BACKEND="$3"
-EXTRA_FLAGS="$4"
-
-iteration=0
-
-# Check YAML frontmatter for status field
-check_status() {
-    head -50 "$SPEC_FILE" | sed -n '/^---$/,/^---$/p' | grep -qiE 'status:.*(open|active)'
-}
-
-while check_status; do
-    cd "$WORK_DIR"
-    iteration=$((iteration + 1))
-    echo ""
-    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
-    echo "Ralph iteration $iteration — $(date '+%H:%M:%S')"
-    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
-
-    SPEC_CONTENT=$(cat "$SPEC_FILE")
-
-    SYSPROMPT_FILE=$(mktemp /tmp/ralph-sys-XXXXXX.txt)
-    PROMPT_FILE=$(mktemp /tmp/ralph-prompt-XXXXXX.txt)
-
-    cat > "$SYSPROMPT_FILE" << SYSEOF
-Ralph iteration $iteration. Spec: $SPEC_FILE
-
-$SPEC_CONTENT
-SYSEOF
-
-    cat > "$PROMPT_FILE" << 'PROMPTEOF'
-You are inside a Ralph loop — a meditative iteration toward a desired state. Activate the ralph-loops skill and follow its instructions for iterating on the spec above.
-PROMPTEOF
-
-    PROMPT=$(cat "$PROMPT_FILE")
-
-    if [[ "$BACKEND" == "codex" ]]; then
-        codex --dangerously-bypass-approvals-and-sandbox \
-            --config "developer_instructions=$(cat "$SYSPROMPT_FILE")" \
-            $EXTRA_FLAGS \
-            "$PROMPT"
-    else
-        claude --dangerously-skip-permissions \
-            $EXTRA_FLAGS \
-            --append-system-prompt "$(cat "$SYSPROMPT_FILE")" \
-            <<< "$PROMPT"
-    fi
-
-    rm -f "$SYSPROMPT_FILE" "$PROMPT_FILE"
-
-    echo "--- Iteration complete ---"
-    sleep 2
-done
-
-echo ""
-echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
-echo "Ralph complete — $iteration iterations"
-echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
-echo ""
-echo "Session kept open for inspection. Type exit to close."
-exec bash -l
-LOOP
-
-chmod +x "$LOOP_SCRIPT"
-
-echo "Starting ralph on $SPEC_FILE"
-echo "  Backend: $BACKEND"
-echo "  Work dir: $WORK_DIR"
-[[ -n "$EXTRA_FLAGS" ]] && echo "  Flags:    $EXTRA_FLAGS"
-
-# Launch tmux with a login shell running the loop script
-tmux new-session -d -s "$SESSION" -c "$WORK_DIR" \
-    bash -l "$LOOP_SCRIPT" "$SPEC_FILE" "$WORK_DIR" "$BACKEND" "$EXTRA_FLAGS"
-
-echo "  Session:  $SESSION"
-echo "  Attach:   tmux attach -t $SESSION"
diff --git a/docs/user/agent-workflow.md b/docs/user/agent-workflow.md
index 01ff6444..8ab18c57 100644
--- a/docs/user/agent-workflow.md
+++ b/docs/user/agent-workflow.md
@@ -71,21 +71,28 @@ parameter plumbing.
 ## `/lc-from-paper` — reproduce a published paper
 
 **You have a DOI or arXiv ID. You end with a reproduction project
-driven by a multi-session loop.**
+driven by an orchestrator session and named per-phase sub-agents.**
 
 `/lc-from-paper` is the entry point of the paper-reproduction bundle.
 It opens with a short interactive interview — paper identity, scope
-(full vs targeted), runtime mode (interactive, bash-loop, or
-tmux-orchestrated), termination criterion (frugality vs rigor), and
-per-phase mode — then drafts a per-paper reproduction constitution and
-a per-paper `CLAUDE.md`. After approval, the loop drives nine phases
-(ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN →
-COMPARE → REVIEW), bookended by INTERVIEW and REVIEW as the
-always-interactive seams.
-
-The bundle composes sibling skills: `paper-extraction`, `constitution`,
-`ralph-loops`, `narrative`, `figure-comparison`, and
-`check-sentence-by-sentence`. See
+(full vs targeted), and any paper-specific conventions — then drafts
+a per-paper `CLAUDE.md` (the durable spec every sub-agent walks up to).
+After approval, the skill becomes a persistent **orchestrator session**
+that spawns named per-phase sub-agents (`acquire`, `architect`,
+`specify`, `literature`, `implement`, `run`, `compare`) you can drop
+into directly via the chat surface. The two bookends — INTERVIEW at
+start and REVIEW at close-out — run in the orchestrator session itself.
+
+Rigor is chosen per spawn from CLAUDE.md's Rigor section: cheap
+(skip self-review or one fresh-context pass) or heavy (iterate
+fresh-context review until two consecutive clean rounds). COMPARE
+returns a verdict plus an opportunity assessment — where the gaps are
+and how much they likely matter — so you and the orchestrator can
+decide whether to spend another IMPLEMENT round or land at the
+current rigor.
+
+The bundle composes sibling skills: `paper-extraction`, `narrative`,
+`figure-comparison`, and `check-sentence-by-sentence`. See
 [`claude/lightcone/skills/README.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md)
 for the full bundle map.
 

From a312e1759d3fff5a356763fc217baa384cd8ec4a Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Sun, 10 May 2026 23:58:49 +0200
Subject: [PATCH 036/124] paper-extraction: resolve bibliography DOIs in
 extract-paper-substrate.py
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Move bibliography resolution into the substrate authority (paper-extraction)
instead of doing it downstream in lc-from-paper's ACQUIRE phase. The script
now enriches `index.json`'s `citations:` block in place from
`{key -> [locations]}` to `{key -> {locations, citation, doi}}`, removing
the need for an intermediate `work/notes/cited_papers.yaml` artifact.

Resolution path per cited entry: `doi:` field → `eprint:` → Crossref
bibliographic query → ADS title search (graceful fallback when
ADS_API_TOKEN env var or ~/.ads/dev_key is present). Lookups cache to
`work/reference/.doi-cache.json` so re-runs don't re-hit the network.

Stdlib-only: minimal BibTeX parser, urllib for Crossref/ADS, difflib for
title similarity gate. Handles Path A (.bib preferred, .bbl fallback) and
Path B (rendered references section in document.md, with synthetic
`<lastname>_<year>` keys). Bumps `index.json` to `schema_version: 1`.

Tested on arxiv:2503.19441 (KiDS-Legacy cosmic shear): 146 citation keys,
141 resolved (96.6%), 5 honestly flagged in extraction_warnings. Second run
is 0.2s (cached). Third run is byte-identical (idempotent).

Closes the bibliography-resolution side of
lightcone/paper2astra-as-skill/bibliography-in-paper-extraction; the
lc-from-paper phase reference sweep lands separately.
---
 .../scripts/extract-paper-substrate.py        | 780 +++++++++++++++++-
 1 file changed, 771 insertions(+), 9 deletions(-)

diff --git a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
index de02e272..93e50843 100755
--- a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
+++ b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
@@ -9,6 +9,7 @@
   - tables/<label-slug>.tex         # one file per LaTeX table block
   - bibliography-source.bib         # copy of any .bib found in source/ (Path A only)
   - bibliography-source.bbl         # copy of any .bbl found in source/ (Path A only)
+  - .doi-cache.json                 # Crossref/ADS lookup cache for re-run idempotency
   - index.json                      # single top-level index of everything extracted
 
 Path A (arXiv LaTeX source): reads from work/reference/source/.
@@ -20,17 +21,29 @@
 extraction — is the agent's job after this script runs. The agent reads
 index.json (specifically extraction_warnings) and fixes or surfaces gaps.
 
+`index.json`'s `citations:` block enriches the cite-key → location mapping
+with each cited paper's full text + resolved DOI, so downstream consumers
+can do citation-key lookups for a paper's bibliography directly (no separate
+cited-papers index file).
+
 Usage:
     python extract-paper-substrate.py [--reference-dir work/reference]
 
-Idempotent — skips files that already exist.
+Idempotent — skips files that already exist; cached DOI lookups don't
+re-hit the network on re-runs.
 """
 
 import argparse
+import hashlib
 import json
+import os
 import re
 import shutil
 import sys
+import urllib.error
+import urllib.parse
+import urllib.request
+from difflib import SequenceMatcher
 from pathlib import Path
 
 
@@ -53,6 +66,20 @@
     r"(?:\[[^\]]*\]){0,2}\{([^}]+)\}"
 )
 ASTRA_SCHEMA_VERSION = "0.0.7"  # bump when the ASTRA spec version we target changes
+
+# Bump when the structural shape of `index.json` changes in a backwards-incompatible
+# way (a new key added is fine; renaming/reshaping an existing value breaks consumers).
+# v1: introduced explicit versioning; `citations:` value shape transitioned from
+#     `key -> [locations]` to `key -> {locations, citation, doi}`.
+INDEX_SCHEMA_VERSION = 1
+
+CROSSREF_API = "https://api.crossref.org/works"
+CROSSREF_USER_AGENT = (
+    "paper-extraction (https://github.com/LightconeResearch/lightcone-cli; "
+    "mailto:cailmdaley@gmail.com)"
+)
+ADS_API = "https://api.adsabs.harvard.edu/v1/search/query"
+NETWORK_TIMEOUT_S = 10
 CAPTION = re.compile(r"\\caption\{((?:[^{}]|\{[^}]*\})*)\}", re.DOTALL)
 LABEL = re.compile(r"\\label\{([^}]+)\}")
 INCLUDEGRAPHICS = re.compile(r"\\includegraphics(?:\[[^\]]*\])?\{([^}]+)\}")
@@ -580,6 +607,716 @@ def copy_embedded_bibliography(reference_dir: Path, source_dir: Path) -> tuple[s
     return bib_rel, bbl_rel
 
 
+# ---------------------------------------------------------------------------
+# Bibliography resolution — shared by Path A (.bib/.bbl) and Path B (Docling)
+# ---------------------------------------------------------------------------
+#
+# Produces a list of bibliography entries, each `{key, citation, doi}`, that
+# downstream joins against `extract_citations()`'s `{key: [locations]}` to enrich
+# the `citations:` block in `index.json`.
+#
+# Path A: parse `bibliography-source.bib` first, fall back to `.bbl`. Keys come
+# from BibTeX directly (case-sensitive, unique per-paper, identical to what the
+# tex source's `\cite{}` invocations reference).
+#
+# Path B: parse the references section at the tail of `document.md`. Docling has
+# no \cite{} markers in the prose so we synthesize keys from first-author + year
+# (`asgari_2017`, disambiguated with letter suffixes when needed). The synthetic
+# keys carry no `locations:` entries — citation invocations from rendered prose
+# are a separate extraction problem flagged in `extraction_warnings`.
+
+
+# Parse @type{key, field = value, ...} entries. Skip @comment, @preamble, @string.
+BIB_ENTRY_HEAD = re.compile(r"@(\w+)\s*\{\s*([^,\s]+)\s*,", re.IGNORECASE)
+DOI_IN_TEXT = re.compile(r"\b10\.\d{4,9}/[-._;()/:A-Z0-9]+", re.IGNORECASE)
+ARXIV_ID_IN_TEXT = re.compile(
+    r"(?:arXiv:|astro-ph/|hep-(?:th|ph)/|gr-qc/|cond-mat/|math/|cs\.[A-Z]{2}/)"
+    r"\s*([a-zA-Z0-9\-./]+)",
+    re.IGNORECASE,
+)
+# Newer-format arXiv IDs without prefix: 4 digits, dot, 4-5 digits, optional vN
+ARXIV_BARE = re.compile(r"\b(\d{4}\.\d{4,5})(?:v\d+)?\b")
+
+
+def parse_bib(content: str) -> list[dict]:
+    """Parse BibTeX content into a list of entries.
+
+    Each entry: `{"type": str, "key": str, "fields": {<lowercased-field>: <stripped-value>}}`.
+    Skips `@comment`, `@preamble`, `@string` (handling string macros properly would require
+    substitution; for our enrichment purposes we can live without it).
+    Field values are unwrapped from `{...}` or `"..."` and have surrounding whitespace stripped.
+    Doesn't try to interpret LaTeX accents or commands — keeps them verbatim so re-running
+    on the same input is stable.
+    """
+    entries: list[dict] = []
+    i = 0
+    while i < len(content):
+        match = BIB_ENTRY_HEAD.search(content, i)
+        if not match:
+            break
+        entry_type = match.group(1).lower()
+        key = match.group(2)
+        cursor = match.end()
+        if entry_type in ("comment", "preamble", "string"):
+            # Skip to matching closing brace
+            depth = 1
+            while cursor < len(content) and depth > 0:
+                if content[cursor] == "{":
+                    depth += 1
+                elif content[cursor] == "}":
+                    depth -= 1
+                cursor += 1
+            i = cursor
+            continue
+        fields, cursor = _parse_bib_fields(content, cursor)
+        entries.append({"type": entry_type, "key": key, "fields": fields})
+        i = cursor
+    return entries
+
+
+def _parse_bib_fields(content: str, start: int) -> tuple[dict[str, str], int]:
+    """Parse `field = value, field = value, ...}` starting at `start`.
+
+    Returns the field dict plus the offset just after the closing entry brace.
+    """
+    fields: dict[str, str] = {}
+    i = start
+    while i < len(content):
+        # Skip whitespace + commas between fields
+        while i < len(content) and content[i] in " \t\n\r,":
+            i += 1
+        if i >= len(content) or content[i] == "}":
+            return fields, i + 1
+        # Field name
+        name_start = i
+        while i < len(content) and content[i] not in " \t\n\r=":
+            i += 1
+        name = content[name_start:i].strip().lower()
+        # Skip whitespace + `=`
+        while i < len(content) and content[i] in " \t\n\r":
+            i += 1
+        if i >= len(content) or content[i] != "=":
+            # Malformed entry — bail
+            return fields, _skip_to_entry_end(content, i)
+        i += 1
+        while i < len(content) and content[i] in " \t\n\r":
+            i += 1
+        # Field value: `{...}` (balanced), `"..."`, or bare token
+        value, i = _read_bib_value(content, i)
+        if name:
+            fields[name] = value
+    return fields, i
+
+
+def _read_bib_value(content: str, i: int) -> tuple[str, int]:
+    if i >= len(content):
+        return "", i
+    if content[i] == "{":
+        depth = 1
+        i += 1
+        start = i
+        while i < len(content) and depth > 0:
+            if content[i] == "\\" and i + 1 < len(content):
+                i += 2
+                continue
+            if content[i] == "{":
+                depth += 1
+            elif content[i] == "}":
+                depth -= 1
+                if depth == 0:
+                    break
+            i += 1
+        return content[start:i].strip(), i + 1
+    if content[i] == '"':
+        i += 1
+        start = i
+        while i < len(content) and content[i] != '"':
+            if content[i] == "\\" and i + 1 < len(content):
+                i += 2
+                continue
+            i += 1
+        return content[start:i].strip(), i + 1
+    # Bare token (number, string macro reference)
+    start = i
+    while i < len(content) and content[i] not in " \t\n\r,}":
+        i += 1
+    return content[start:i].strip(), i
+
+
+def _skip_to_entry_end(content: str, i: int) -> int:
+    depth = 1
+    while i < len(content) and depth > 0:
+        if content[i] == "{":
+            depth += 1
+        elif content[i] == "}":
+            depth -= 1
+        i += 1
+    return i
+
+
+def parse_bbl(content: str) -> list[dict]:
+    """Parse a rendered `.bbl` into bibitem records.
+
+    Each `\\bibitem[label]{key}` introduces an entry whose rendered text runs
+    until the next `\\bibitem` or end-of-file. `.bbl` has no field structure,
+    so we return `{key, raw}` — the resolver mines DOI/arXiv-ID hints from `raw`.
+    """
+    bibitem = re.compile(r"\\bibitem(?:\[[^\]]*\])?\s*\{([^}]+)\}", re.DOTALL)
+    matches = list(bibitem.finditer(content))
+    out: list[dict] = []
+    for idx, match in enumerate(matches):
+        key = match.group(1).strip()
+        start = match.end()
+        end = matches[idx + 1].start() if idx + 1 < len(matches) else len(content)
+        raw = content[start:end].strip()
+        out.append({"key": key, "raw": raw})
+    return out
+
+
+def parse_doc_references(document_md: str) -> list[str]:
+    """Parse the references section out of Docling-rendered markdown.
+
+    Heuristic: find a heading whose text matches `References` / `Bibliography`
+    / `Citations` (case-insensitive, optional numeric prefix), take everything
+    after it, split on blank lines, drop empty paragraphs.
+    """
+    heading_re = re.compile(
+        r"^\s*#{1,6}\s+(?:\d+\.?\s+)?(?:references|bibliography|citations)\s*$",
+        re.IGNORECASE | re.MULTILINE,
+    )
+    match = heading_re.search(document_md)
+    if not match:
+        return []
+    body = document_md[match.end():]
+    # If a subsequent top-level heading appears, stop there (acknowledgments,
+    # appendices, supplementary). Stop at the first heading at the same level
+    # or shallower than the references heading.
+    next_section = re.search(r"^\s*#{1,6}\s+\S", body, re.MULTILINE)
+    if next_section:
+        body = body[: next_section.start()]
+    # Split on blank lines into paragraphs; trim and drop empties.
+    paragraphs = [p.strip() for p in re.split(r"\n\s*\n", body)]
+    return [p for p in paragraphs if p]
+
+
+def format_bib_citation(fields: dict[str, str]) -> str:
+    """Build a one-line human-readable citation from a parsed `.bib` entry.
+
+    Best-effort — uses what's present (author, year, title, journal, volume, page).
+    Not a publication-quality formatter; just enough to be useful as a reading
+    aid when a downstream agent sees `index.json`'s `citations:` block.
+    """
+    author = _first_author_from_bib_field(fields.get("author", ""))
+    others = ", et al." if " and " in fields.get("author", "") else ""
+    year = fields.get("year", "").strip()
+    title = _clean_bib_text(fields.get("title", "")).strip().rstrip(".")
+    journal = _clean_bib_text(
+        fields.get("journal", "") or fields.get("booktitle", "") or fields.get("howpublished", "")
+    ).strip()
+    volume = fields.get("volume", "").strip()
+    pages = fields.get("pages", "").strip().replace("--", "-")
+    parts = []
+    if author:
+        parts.append(f"{author}{others}")
+    if year:
+        parts.append(f"({year})")
+    if title:
+        parts.append(f"{title}.")
+    if journal:
+        tail = journal
+        if volume:
+            tail += f" {volume}"
+        if pages:
+            tail += f", {pages}"
+        parts.append(tail)
+    return " ".join(parts).strip() or _clean_bib_text(fields.get("note", "")).strip()
+
+
+def _first_author_from_bib_field(author_field: str) -> str:
+    if not author_field:
+        return ""
+    # Split on ' and ' but respect outer braces — `{Planck Collaboration}` stays one author.
+    first = _split_first_author(author_field).strip()
+    # Brace-wrapped single name w/o internal comma: `{Planck Collaboration}` -> "Planck Collaboration".
+    if first.startswith("{") and "," not in _strip_outer_braces(first):
+        return _clean_bib_text(first.strip("{}"))
+    # BibTeX comma form: "Last, First" (incl. `{Abdalla}, Elcio` which IS comma-form
+    # with brace-protected lastname) -> "Last, F."
+    if _has_top_level_comma(first):
+        last, _, rest = _split_at_top_level_comma(first)
+        initials = " ".join(part[0] + "." for part in _clean_bib_text(rest).split() if part)
+        return f"{_clean_bib_text(last).strip()}, {initials}".strip(", ")
+    # "First Last" -> "Last, F."
+    parts = _clean_bib_text(first).split()
+    if len(parts) == 1:
+        return parts[0]
+    last = parts[-1]
+    initials = " ".join(p[0] + "." for p in parts[:-1] if p)
+    return f"{last}, {initials}"
+
+
+def _strip_outer_braces(s: str) -> str:
+    s = s.strip()
+    while s.startswith("{") and s.endswith("}"):
+        s = s[1:-1].strip()
+    return s
+
+
+def _has_top_level_comma(s: str) -> bool:
+    depth = 0
+    for c in s:
+        if c == "{":
+            depth += 1
+        elif c == "}":
+            depth -= 1
+        elif c == "," and depth == 0:
+            return True
+    return False
+
+
+def _split_at_top_level_comma(s: str) -> tuple[str, str, str]:
+    depth = 0
+    for i, c in enumerate(s):
+        if c == "{":
+            depth += 1
+        elif c == "}":
+            depth -= 1
+        elif c == "," and depth == 0:
+            return s[:i], ",", s[i + 1:]
+    return s, "", ""
+
+
+def _split_first_author(author_field: str) -> str:
+    """Return the substring up to the first top-level ` and ` separator."""
+    depth = 0
+    i = 0
+    while i < len(author_field):
+        if author_field[i] == "{":
+            depth += 1
+        elif author_field[i] == "}":
+            depth -= 1
+        elif depth == 0 and author_field[i:i + 5].lower() == " and ":
+            return author_field[:i]
+        i += 1
+    return author_field
+
+
+def _clean_bib_text(text: str) -> str:
+    """Strip the most common BibTeX/LaTeX wrappers so citations read cleanly.
+
+    Not exhaustive — anything we don't recognize passes through verbatim.
+    """
+    if not text:
+        return ""
+    text = re.sub(r"\\(?:textit|textbf|emph|texttt|mbox|protect)\s*\{([^{}]*)\}", r"\1", text)
+    text = re.sub(r"\\(?:url|href)\s*\{[^}]*\}\s*\{?([^{}]*)\}?", r"\1", text)
+    text = text.replace("{\\&}", "&").replace("\\&", "&")
+    text = re.sub(r"\{\\['\"`^~]([a-zA-Z])\}", r"\1", text)  # {\'e} -> e (lossy but readable)
+    text = re.sub(r"\\['\"`^~]\{([a-zA-Z])\}", r"\1", text)
+    text = re.sub(r"[{}]", "", text)
+    return re.sub(r"\s+", " ", text).strip()
+
+
+def _normalize_arxiv_id(raw: str) -> str | None:
+    """Turn an `eprint` field or in-text arXiv id into a clean `YYMM.NNNNN`
+    (new-style) or `field/YYMMNNN` (pre-2007) form. Returns None if no
+    recognizable ID."""
+    raw = raw.strip().lower()
+    raw = re.sub(r"^arxiv:", "", raw)
+    raw = re.sub(r"v\d+$", "", raw)  # drop version
+    if re.match(r"^\d{4}\.\d{4,5}$", raw):
+        return raw
+    if re.match(r"^[a-z\-]+(?:\.[a-z]{2})?/\d{7}$", raw):
+        return raw
+    return None
+
+
+def _doi_from_arxiv(arxiv_id: str) -> str:
+    return f"10.48550/arXiv.{arxiv_id}"
+
+
+def _extract_doi_hints_from_text(text: str) -> tuple[str | None, str | None]:
+    """Mine free-text bibliography rendering for DOI and arXiv ID.
+
+    Returns (doi, arxiv_id), each None when not found. DOI parsing strips trailing
+    punctuation that often follows DOIs in rendered text.
+    """
+    doi = None
+    match = DOI_IN_TEXT.search(text)
+    if match:
+        doi = match.group(0).rstrip(".,;)\"'>")
+    arxiv = None
+    match = ARXIV_ID_IN_TEXT.search(text)
+    if match:
+        arxiv = _normalize_arxiv_id(match.group(1))
+    if not arxiv:
+        match = ARXIV_BARE.search(text)
+        if match:
+            arxiv = match.group(1)
+    return doi, arxiv
+
+
+# Bibitem rendering tends to start with the authors (e.g. "Asgari, M., Lin, C.-A.,...").
+# Year usually follows in `\d{4}` form. This regex is sloppy on purpose — we just want
+# `first-author` for a synthetic key or a Crossref query, not a parser.
+LASTNAME_YEAR = re.compile(r"^([A-Z][A-Za-zÀ-ÿ\-']+).*?\b(19\d{2}|20\d{2})\b", re.DOTALL)
+
+
+def parse_rendered_entry(raw: str) -> dict:
+    """Extract first-author lastname + year from a rendered citation paragraph.
+
+    Returns `{"first_author": str, "year": str, "title_guess": str}` — best effort.
+    Used to build synthetic keys (Path B) and Crossref title queries.
+    """
+    cleaned = _clean_bib_text(raw)
+    match = LASTNAME_YEAR.match(cleaned)
+    first = match.group(1) if match else ""
+    year = match.group(2) if match else ""
+    # Title guess: take the chunk after the year up to next period that isn't an initial.
+    title_guess = ""
+    if year:
+        tail = cleaned.split(year, 1)[1]
+        # Drop a leading delimiter (.,) plus whitespace
+        tail = tail.lstrip(",.: ")
+        # Title ends at the first period followed by a space + capital letter that introduces
+        # journal/volume metadata. Heuristic — good enough for Crossref queries.
+        sentence_end = re.search(r"\.\s+[A-Z]", tail)
+        title_guess = tail[: sentence_end.start()] if sentence_end else tail
+    return {
+        "first_author": first.strip(),
+        "year": year,
+        "title_guess": title_guess.strip().rstrip(".").strip(),
+        "raw_clean": cleaned,
+    }
+
+
+def synth_key(first_author: str, year: str, taken: set[str]) -> str:
+    """Build a unique synthetic key for a Path B entry.
+
+    `<lastname>_<year>`, lowercased. If the name+year pair already exists in
+    `taken`, append a letter suffix (`a`, `b`, ...).
+    """
+    base = re.sub(r"[^a-z0-9]+", "_", (first_author or "anon").lower()).strip("_")
+    if not base:
+        base = "anon"
+    year = year or "ny"
+    candidate = f"{base}_{year}"
+    if candidate not in taken:
+        return candidate
+    for suffix in "abcdefghijklmnopqrstuvwxyz":
+        if f"{candidate}{suffix}" not in taken:
+            return f"{candidate}{suffix}"
+    # 26 collisions is absurd but be safe.
+    counter = 1
+    while f"{candidate}_{counter}" in taken:
+        counter += 1
+    return f"{candidate}_{counter}"
+
+
+# ---------------------------------------------------------------------------
+# DOI resolution
+
+
+class DOIResolver:
+    """Resolve a bibliography entry to a DOI string, with on-disk caching.
+
+    Resolution order, returning the first hit:
+      1. `doi:` field if present in the parsed entry.
+      2. `eprint:` field (or in-text arXiv ID) -> `10.48550/arXiv.<id>`.
+      3. Crossref bibliographic query against the cleaned title + first-author.
+      4. ADS title search (only if `ADS_API_TOKEN` env var or `~/.ads/dev_key` is present).
+
+    Caches `(title, first_author) -> doi` to `cache_path` so re-runs don't re-hit
+    the network. Unresolvable entries cache `None` too — re-running won't retry
+    a known miss (delete the cache to force re-resolution).
+    """
+
+    def __init__(self, cache_path: Path):
+        self.cache_path = cache_path
+        self.cache: dict[str, dict] = {}
+        if cache_path.exists():
+            try:
+                self.cache = json.loads(cache_path.read_text())
+            except (json.JSONDecodeError, OSError):
+                self.cache = {}
+        self.ads_key = self._load_ads_key()
+        self.network_failures = 0
+
+    @staticmethod
+    def _load_ads_key() -> str | None:
+        env = os.environ.get("ADS_API_TOKEN") or os.environ.get("ADS_DEV_KEY")
+        if env:
+            return env.strip()
+        for path in (Path.home() / ".ads" / "dev_key", Path.home() / ".config" / "ads" / "dev_key"):
+            if path.is_file():
+                try:
+                    return path.read_text().strip() or None
+                except OSError:
+                    pass
+        return None
+
+    def resolve(
+        self,
+        title: str,
+        first_author: str,
+        explicit_doi: str | None = None,
+        arxiv_id: str | None = None,
+    ) -> tuple[str | None, str]:
+        """Resolve to a DOI. Returns `(doi-or-None, source-tag)`.
+
+        `source-tag` is one of `doi-field`, `arxiv-eprint`, `crossref`, `ads`, `unresolved`.
+        """
+        if explicit_doi:
+            return self._normalize_doi(explicit_doi), "doi-field"
+        if arxiv_id:
+            return _doi_from_arxiv(arxiv_id), "arxiv-eprint"
+        cache_key = self._cache_key(title, first_author)
+        if cache_key in self.cache:
+            entry = self.cache[cache_key]
+            return entry.get("doi"), entry.get("source", "unresolved")
+        # Network resolution
+        doi, source = self._resolve_via_crossref(title, first_author)
+        if not doi and self.ads_key:
+            doi, source = self._resolve_via_ads(title, first_author)
+        self.cache[cache_key] = {"doi": doi, "source": source, "title": title, "first_author": first_author}
+        return doi, source
+
+    @staticmethod
+    def _normalize_doi(doi: str) -> str:
+        doi = doi.strip()
+        # Strip URL prefix variants
+        doi = re.sub(r"^(?:https?://(?:dx\.)?doi\.org/)", "", doi, flags=re.IGNORECASE)
+        return doi.rstrip(".,;)\"'>")
+
+    @staticmethod
+    def _cache_key(title: str, first_author: str) -> str:
+        digest = hashlib.sha256(
+            f"{title.lower().strip()}||{first_author.lower().strip()}".encode("utf-8")
+        ).hexdigest()
+        return digest[:24]
+
+    def _resolve_via_crossref(self, title: str, first_author: str) -> tuple[str | None, str]:
+        if not title:
+            return None, "unresolved"
+        query = f"{title} {first_author}".strip()
+        url = f"{CROSSREF_API}?query.bibliographic={urllib.parse.quote(query)}&rows=1"
+        try:
+            data = self._http_get_json(url)
+        except (urllib.error.URLError, TimeoutError, json.JSONDecodeError):
+            self.network_failures += 1
+            return None, "unresolved"
+        items = ((data or {}).get("message", {}) or {}).get("items", []) or []
+        if not items:
+            return None, "unresolved"
+        top = items[0]
+        candidate_doi = top.get("DOI")
+        candidate_titles = top.get("title") or []
+        if not candidate_doi:
+            return None, "unresolved"
+        # Title-similarity gate: drop noisy hits where the top result clearly isn't
+        # the paper we asked about.
+        if candidate_titles and _title_similarity(title, candidate_titles[0]) < 0.55:
+            return None, "unresolved"
+        return self._normalize_doi(candidate_doi), "crossref"
+
+    def _resolve_via_ads(self, title: str, first_author: str) -> tuple[str | None, str]:
+        if not title:
+            return None, "unresolved"
+        q = f'title:"{title}"'
+        if first_author:
+            q += f' author:"{first_author}"'
+        params = {"q": q, "fl": "doi,title", "rows": "1"}
+        url = f"{ADS_API}?{urllib.parse.urlencode(params)}"
+        try:
+            data = self._http_get_json(
+                url, headers={"Authorization": f"Bearer {self.ads_key}"}
+            )
+        except (urllib.error.URLError, TimeoutError, json.JSONDecodeError):
+            self.network_failures += 1
+            return None, "unresolved"
+        docs = ((data or {}).get("response", {}) or {}).get("docs", []) or []
+        if not docs:
+            return None, "unresolved"
+        doi_list = docs[0].get("doi") or []
+        if not doi_list:
+            return None, "unresolved"
+        return self._normalize_doi(doi_list[0]), "ads"
+
+    @staticmethod
+    def _http_get_json(url: str, headers: dict[str, str] | None = None) -> dict:
+        req = urllib.request.Request(url)
+        req.add_header("User-Agent", CROSSREF_USER_AGENT)
+        req.add_header("Accept", "application/json")
+        for key, value in (headers or {}).items():
+            req.add_header(key, value)
+        with urllib.request.urlopen(req, timeout=NETWORK_TIMEOUT_S) as resp:
+            payload = resp.read()
+        return json.loads(payload.decode("utf-8", errors="replace"))
+
+    def save(self) -> None:
+        try:
+            self.cache_path.write_text(json.dumps(self.cache, indent=2, sort_keys=True))
+        except OSError as e:
+            print(f"warn: could not write DOI cache: {e}", file=sys.stderr)
+
+
+def _title_similarity(a: str, b: str) -> float:
+    """Stdlib fuzzy ratio in [0, 1]. Used to filter Crossref hits whose top result
+    isn't actually the queried paper."""
+    a_norm = re.sub(r"\s+", " ", a.lower()).strip()
+    b_norm = re.sub(r"\s+", " ", b.lower()).strip()
+    if not a_norm or not b_norm:
+        return 0.0
+    return SequenceMatcher(None, a_norm, b_norm).ratio()
+
+
+# ---------------------------------------------------------------------------
+# Top-level bibliography pipeline
+
+
+def resolve_bibliography(
+    reference_dir: Path,
+    bib_path: Path | None,
+    bbl_path: Path | None,
+    document_md: Path | None,
+    extracted_citations: dict[str, list[dict]],
+) -> tuple[dict[str, dict], list[str]]:
+    """Build the enriched `citations:` block for `index.json`.
+
+    Joins parsed bibliography entries (from `.bib`, then `.bbl`, then `document.md`)
+    against `extracted_citations` (from `extract_citations()`). Each key maps to
+    `{locations, citation, doi}`; entries the bibliography has but the source
+    never cited are dropped (would otherwise be noise); entries cited but missing
+    from the bibliography keep `citation: null` and `doi: null` and a warning.
+    """
+    warnings: list[str] = []
+    bib_entries: dict[str, dict] = {}  # key -> {citation, fields, raw, doi_hint, arxiv_hint}
+
+    if bib_path and bib_path.is_file():
+        try:
+            parsed = parse_bib(bib_path.read_text(errors="replace"))
+        except OSError as e:
+            warnings.append(f"bibliography: could not read {bib_path.name}: {e}")
+            parsed = []
+        for entry in parsed:
+            fields = entry["fields"]
+            citation = format_bib_citation(fields)
+            doi_hint = fields.get("doi") or fields.get("DOI".lower())
+            arxiv_hint = _normalize_arxiv_id(fields.get("eprint", "") or "") if fields.get("eprint") else None
+            bib_entries[entry["key"]] = {
+                "citation": citation or None,
+                "doi_hint": doi_hint,
+                "arxiv_hint": arxiv_hint,
+                "title": _clean_bib_text(fields.get("title", "")).strip(),
+                "first_author": _first_author_from_bib_field(fields.get("author", "")).split(",")[0],
+                "source": "bib",
+            }
+
+    if bbl_path and bbl_path.is_file() and not bib_entries:
+        # Only fall back to .bbl if no .bib gave us anything.
+        try:
+            parsed_bbl = parse_bbl(bbl_path.read_text(errors="replace"))
+        except OSError as e:
+            warnings.append(f"bibliography: could not read {bbl_path.name}: {e}")
+            parsed_bbl = []
+        for entry in parsed_bbl:
+            cleaned = _clean_bib_text(entry["raw"])
+            doi_hint, arxiv_hint = _extract_doi_hints_from_text(entry["raw"])
+            parsed_rendering = parse_rendered_entry(entry["raw"])
+            bib_entries[entry["key"]] = {
+                "citation": cleaned or None,
+                "doi_hint": doi_hint,
+                "arxiv_hint": arxiv_hint,
+                "title": parsed_rendering["title_guess"],
+                "first_author": parsed_rendering["first_author"],
+                "source": "bbl",
+            }
+
+    # Path B (document.md): synthetic keys.
+    path_b_entries: list[tuple[str, dict]] = []
+    if document_md and document_md.is_file():
+        paragraphs = parse_doc_references(document_md.read_text(errors="replace"))
+        taken: set[str] = set(bib_entries)
+        for raw in paragraphs:
+            parsed_rendering = parse_rendered_entry(raw)
+            doi_hint, arxiv_hint = _extract_doi_hints_from_text(raw)
+            key = synth_key(parsed_rendering["first_author"], parsed_rendering["year"], taken)
+            taken.add(key)
+            path_b_entries.append(
+                (
+                    key,
+                    {
+                        "citation": parsed_rendering["raw_clean"] or None,
+                        "doi_hint": doi_hint,
+                        "arxiv_hint": arxiv_hint,
+                        "title": parsed_rendering["title_guess"],
+                        "first_author": parsed_rendering["first_author"],
+                        "source": "document_md",
+                    },
+                )
+            )
+
+    resolver = DOIResolver(reference_dir / ".doi-cache.json")
+    enriched: dict[str, dict] = {}
+
+    # Path A: enrich entries cited at least once in the source.
+    for key, locations in extracted_citations.items():
+        entry = bib_entries.get(key)
+        if entry is None:
+            warnings.append(
+                f"citation {key}: cited in source but no matching entry in bibliography-source.{{bib,bbl}}"
+            )
+            enriched[key] = {"locations": locations, "citation": None, "doi": None}
+            continue
+        doi, _source = resolver.resolve(
+            entry["title"], entry["first_author"], entry["doi_hint"], entry["arxiv_hint"]
+        )
+        if doi is None:
+            warnings.append(
+                f"citation {key}: could not resolve DOI; tried doi-field, eprint-field, "
+                f"Crossref{', ADS' if resolver.ads_key else ''}"
+            )
+        enriched[key] = {
+            "locations": locations,
+            "citation": entry["citation"],
+            "doi": doi,
+        }
+
+    # Path B: every parsed entry lands in the citations block with empty locations.
+    # (Citation invocations from rendered prose are a separate substrate-extraction
+    # problem we surface in extraction_warnings rather than solve here.)
+    for key, entry in path_b_entries:
+        doi, _source = resolver.resolve(
+            entry["title"], entry["first_author"], entry["doi_hint"], entry["arxiv_hint"]
+        )
+        if doi is None:
+            warnings.append(
+                f"citation {key}: could not resolve DOI; tried doi-field, eprint-field, "
+                f"Crossref{', ADS' if resolver.ads_key else ''}"
+            )
+        enriched[key] = {
+            "locations": [],
+            "citation": entry["citation"],
+            "doi": doi,
+        }
+
+    if path_b_entries:
+        warnings.append(
+            "Path B (Docling fallback): citation invocations in rendered prose are not yet "
+            "extracted; `locations:` is empty for every Path B citation. Bibliography "
+            "entries are still resolved by DOI."
+        )
+
+    resolver.save()
+    if resolver.network_failures:
+        warnings.append(
+            f"bibliography: {resolver.network_failures} network failure(s) during DOI "
+            "resolution; affected entries cached as unresolved — delete .doi-cache.json "
+            "to retry."
+        )
+    return enriched, warnings
+
+
 # ---------------------------------------------------------------------------
 # Path B — Docling fallback
 # ---------------------------------------------------------------------------
@@ -597,12 +1334,29 @@ def extract_path_b(reference_dir: Path) -> dict:
     astra_rel = write_astra_yaml_stub(
         reference_dir, arxiv_id=None, doi=None, title=None, abstract=None
     )
+
+    document_md = reference_dir / "document.md"
+    citations, bib_warnings = resolve_bibliography(
+        reference_dir,
+        bib_path=None,
+        bbl_path=None,
+        document_md=document_md if document_md.is_file() else None,
+        extracted_citations={},
+    )
+
+    extraction_warnings = [
+        "Path B (Docling fallback): title + abstract + outline not yet extracted from "
+        "document.md; that's a future refinement."
+    ]
+    extraction_warnings.extend(bib_warnings)
+
     index = {
+        "schema_version": INDEX_SCHEMA_VERSION,
         "path": "B",
         "paper_pdf": "paper.pdf" if (reference_dir / "paper.pdf").exists() else None,
         "paper_tex": None,
         "source_dir": None,
-        "document_md": "document.md" if (reference_dir / "document.md").exists() else None,
+        "document_md": "document.md" if document_md.is_file() else None,
         "bibliography_source_bib": None,
         "bibliography_source_bbl": None,
         "astra_yaml": astra_rel,
@@ -611,10 +1365,8 @@ def extract_path_b(reference_dir: Path) -> dict:
         "figures": docling.get("figures", []),
         "tables": docling.get("tables", []),
         "outline": [],  # Future refinement: parse Docling's markdown headings
-        "citations": {},  # Future refinement: extract citation markers from document.md
-        "extraction_warnings": [
-            "Path B (Docling fallback): title + abstract + outline + citations not yet extracted from document.md; that's a future refinement."
-        ],
+        "citations": citations,
+        "extraction_warnings": extraction_warnings,
     }
     return index
 
@@ -650,16 +1402,24 @@ def main() -> None:
         figures, fig_warnings = extract_figures(reference_dir, source_dir, tex_files, macros)
         tables, tab_warnings = extract_tables(reference_dir, tex_files, source_dir, macros)
         outline = extract_outline(tex_files, source_dir, macros)
-        citations = extract_citations(tex_files, source_dir)
+        raw_citations = extract_citations(tex_files, source_dir)
         abstract = extract_abstract(tex_files, macros)
         title = extract_title(tex_files, macros)
         bib_rel, bbl_rel = copy_embedded_bibliography(reference_dir, source_dir)
+        citations, bib_warnings = resolve_bibliography(
+            reference_dir,
+            bib_path=reference_dir / bib_rel if bib_rel else None,
+            bbl_path=reference_dir / bbl_rel if bbl_rel else None,
+            document_md=None,
+            extracted_citations=raw_citations,
+        )
         astra_rel = write_astra_yaml_stub(
             reference_dir, args.arxiv_id, args.doi, title, abstract
         )
 
         paper_tex = reference_dir / "paper.tex"
         index = {
+            "schema_version": INDEX_SCHEMA_VERSION,
             "path": "A",
             "paper_pdf": "paper.pdf" if (reference_dir / "paper.pdf").exists() else None,
             "paper_tex": "paper.tex" if paper_tex.exists() or paper_tex.is_symlink() else None,
@@ -674,12 +1434,14 @@ def main() -> None:
             "tables": tables,
             "outline": outline,
             "citations": citations,
-            "extraction_warnings": fig_warnings + tab_warnings,
+            "extraction_warnings": fig_warnings + tab_warnings + bib_warnings,
         }
 
+        resolved_dois = sum(1 for entry in citations.values() if entry.get("doi"))
         print(
             f"  figures: {len(figures)}, tables: {len(tables)}, "
-            f"sections: {len(outline)}, citation-keys: {len(citations)}, "
+            f"sections: {len(outline)}, citation-keys: {len(citations)} "
+            f"({resolved_dois} with DOI), "
             f"title: {'yes' if title else 'no'}, abstract: {'yes' if abstract else 'no'}, "
             f"warnings: {len(index['extraction_warnings'])}"
         )

From 21d79d837b1805f995235ea0ca97418acbf96a15 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 00:00:40 +0200
Subject: [PATCH 037/124] paper-extraction: document the enriched citations:
 block
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Update SKILL.md to reflect the new citations: shape (key -> {locations,
citation, doi}), the bibliography-resolution pipeline (.bib preferred,
.bbl fallback, Docling references for Path B), the DOI resolver order
(doi-field -> eprint -> Crossref -> ADS), the .doi-cache.json cache file,
and the new warning categories. Adds an explicit anti-pattern against
producing a side-file cited-papers artifact — consumers read the
citations block in index.json directly.
---
 .../skills/paper-extraction/SKILL.md          | 53 +++++++++++++------
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/claude/lightcone/skills/paper-extraction/SKILL.md b/claude/lightcone/skills/paper-extraction/SKILL.md
index 034cd279..98de0e4f 100644
--- a/claude/lightcone/skills/paper-extraction/SKILL.md
+++ b/claude/lightcone/skills/paper-extraction/SKILL.md
@@ -4,8 +4,9 @@ description: >
   Turn an arXiv ID or DOI into a standardized `work/reference/` directory:
   paper substrate (arXiv LaTeX source primary, PDF + Docling fallback),
   copied figure files, per-table `.tex` files, section outline with line
-  numbers, deduplicated citation keys with every location they appear,
-  abstract, embedded bibliography (when present in source), and a valid
+  numbers, deduplicated citation keys with every location they appear plus
+  each cited paper's full citation text and resolved DOI, abstract,
+  embedded bibliography (when present in source), and a valid
   `astra.yaml` representing the paper as an ASTRA artifact (with the
   paper's claimed numerical findings as ASTRA `findings:`). Emits a
   top-level `index.json` for the structural surface plus the `astra.yaml`
@@ -33,7 +34,7 @@ Under `work/reference/` (idempotent — skips work already done):
 
 ```
 work/reference/
-├── index.json                # structural index — figures, tables, outline, citations, paths
+├── index.json                # structural index — figures, tables, outline, citations (with DOIs), paths
 ├── astra.yaml                # ASTRA-shape representation: the paper as an ASTRA artifact, including findings
 ├── paper.pdf                 # always
 ├── paper.tex                 # Path A — symlink to the main .tex file
@@ -43,17 +44,19 @@ work/reference/
 ├── figures/                  # figure files (copied from LaTeX or rendered by Docling)
 ├── tables/                   # one .tex file per `\begin{table}` block (Path A)
 ├── bibliography-source.bib   # Path A only — copy of any .bib found in source/
-└── bibliography-source.bbl   # Path A only — copy of any .bbl found in source/
+├── bibliography-source.bbl   # Path A only — copy of any .bbl found in source/
+└── .doi-cache.json           # Crossref/ADS lookup cache for re-run idempotency
 ```
 
 The skill produces only the paper's own reading materials. Anything not contained in or derived from the paper itself — code repositories, supplementary datasets, related papers — is out of scope; the caller handles those.
 
 ### Two surfaces: `index.json` (structural) and `astra.yaml` (semantic)
 
-**`index.json` is structural and machine-friendly.** Everything the script could mechanically extract: figures, tables, section outline with line numbers, citation keys with every location, abstract, paths. Read this when you want to know "what's in this paper, where do I find it." Sample shape:
+**`index.json` is structural and machine-friendly.** Everything the script could mechanically extract: figures, tables, section outline with line numbers, citation keys with every location *plus the cited paper's full citation text and resolved DOI*, abstract, paths. Read this when you want to know "what's in this paper, where do I find it." Sample shape:
 
 ```json
 {
+  "schema_version": 1,
   "path": "A",                                  // or "B"
   "paper_pdf": "paper.pdf",
   "paper_tex": "paper.tex",                     // null on Path B
@@ -76,15 +79,26 @@ The skill produces only the paper's own reading materials. Anything not containe
     {"level": 1, "title": "Introduction", "label": "sec:intro", "source_file": "main.tex", "line": 157}
   ],
   "citations": {
-    "asgari17": [{"file": "main.tex", "line": 178}, {"file": "main.tex", "line": 561}],
-    "smith2024": [{"file": "main.tex", "line": 92}]
+    "asgari17": {
+      "locations": [{"file": "main.tex", "line": 178}, {"file": "main.tex", "line": 561}],
+      "citation": "Asgari, M., et al. (2017) KiDS-450: Tomographic cross-correlation cosmic shear results. MNRAS 464, 1676-1692",
+      "doi": "10.1093/mnras/stw2606"
+    },
+    "planck18_lensing": {
+      "locations": [{"file": "main.tex", "line": 92}],
+      "citation": "Planck Collaboration, et al. (2020) Planck 2018 results. VIII. Gravitational lensing. A&A 641, A8",
+      "doi": "10.1051/0004-6361/201833886"
+    }
   },
   "extraction_warnings": [
-    "figure fig3: \\includegraphics{...} could not resolve to a file in source/"
+    "figure fig3: \\includegraphics{...} could not resolve to a file in source/",
+    "citation kuijken:2011: could not resolve DOI; tried doi-field, eprint-field, Crossref, ADS"
   ]
 }
 ```
 
+The `citations:` block maps each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. Downstream consumers (e.g. lc-from-paper's SPECIFY when authoring `prior_insights:` placeholders, LITERATURE when discovering which DOIs to fetch) read the DOI directly from `citations[key].doi`. Unresolvable entries keep `citation: null` and/or `doi: null` and are flagged in `extraction_warnings`.
+
 **`astra.yaml` is semantic and ASTRA-validating.** Treats the paper as an ASTRA artifact: `id`, `version`, `name`, `narrative.summary`, and `findings:` carrying the paper's claimed numerical results in ASTRA's Insight + Evidence shape. Read this when you want to know "what does this paper claim, with quote evidence anchored to the source." The script writes a stub (id, version, name, narrative.summary from abstract, empty findings); Step 5 fills in `findings:`.
 
 Why both: the structural index is queryable by any consumer (`grep`, `jq`, agent code) without needing to know about ASTRA. The ASTRA file composes directly into reproductions, MySTRA, and any other ASTRA-aware tool — and the verbosity of the Insight + Evidence shape *is* the back-pressure against hallucinated numerical claims (the agent has to find and quote the actual text).
@@ -128,11 +142,14 @@ The script detects the path automatically and produces:
 - `figures/` populated with copied figure files (Path A) or untouched (Path B — Docling already populated it)
 - `tables/<label-slug>.tex` — one file per `\begin{table}` block (Path A only)
 - `bibliography-source.{bib,bbl}` if present in the source tarball (Path A only)
-- `index.json` — the unified structural index
+- `index.json` — the unified structural index, including the enriched `citations:` block (each cited key carries `{locations, citation, doi}`; DOI resolution covers ~96% of typical-paper bibliographies)
 - `astra.yaml` — stub ASTRA representation: id, version, name (from `\title{}`), narrative.summary (from abstract), empty `findings: {}` for Step 5
+- `.doi-cache.json` — Crossref/ADS lookup cache; re-runs skip the network for already-seen entries
 
 The `--arxiv-id` / `--doi` argument populates the `id` and the evidence `doi:` field in `astra.yaml`. If neither is provided, the script writes placeholder text the agent can fix.
 
+The DOI resolver tries, in order: the entry's `doi:` field → an `eprint:`-derived arXiv DOI → Crossref bibliographic query (free, no API key needed) → ADS title search (only if `ADS_API_TOKEN` env var or `~/.ads/dev_key` is present — graceful skip when absent). Title hits from Crossref are gated by a similarity check against the queried title to drop noisy false matches.
+
 ### Step 4 — Review the script's output and fix structural gaps
 
 The script is purely deterministic. It walks the structural surface but does not understand the paper. Read `index.json`'s `extraction_warnings` and address each:
@@ -140,7 +157,9 @@ The script is purely deterministic. It walks the structural surface but does not
 - **`figure figN: \includegraphics{X} could not resolve`** — the LaTeX referenced a file the script couldn't find. Search the source tree manually (sometimes figures live in non-standard subdirectories with non-standard extensions); copy the file into `figures/` and update the corresponding `index.json` entry's `file` so it's no longer null.
 - **`figure figN: no \caption found`** — composite figures (subfloats) sometimes lack a top-level caption; verify the figure block in source and either record the per-subfigure captions in `caption` or note that the figure is composite.
 - **`table tabN: no \label`** — verify the table is intentional (some `\begin{table}` blocks are non-tabular layout); rename or annotate as needed.
-- **Path B caveat** — outline + citation extraction are not yet implemented for the Docling fallback; the warnings list flags this. For now, on Path B, those fields are empty.
+- **`citation <key>: could not resolve DOI`** — the entry has no `doi:` / `eprint:` field, and neither Crossref nor ADS (when available) returned a match. The entry stays in `citations:` with `doi: null`; a downstream consumer can flag it for human resolution or skip it. If many entries are unresolved, check that the title field is clean (sometimes `.bib` titles carry uncleaned LaTeX commands that drag down the Crossref similarity gate). Delete `.doi-cache.json` to force re-resolution.
+- **`citation <key>: cited in source but no matching entry in bibliography-source.{bib,bbl}`** — a `\cite{<key>}` invocation has no corresponding bib record. Usually a typo in the LaTeX source; flag it and move on. The entry stays in `citations:` with `citation: null, doi: null`, locations preserved.
+- **Path B caveat** — outline extraction is not yet implemented for the Docling fallback. Bibliography resolution works on Path B by parsing the references section at the tail of `document.md` and synthesizing keys (`<lastname>_<year>`), but citation *invocations* from rendered prose aren't yet extracted — Path B citations carry empty `locations: []`. The warnings list flags this.
 
 Also eyeball `astra.yaml`'s `name:` and `narrative.summary:`. The title or abstract may contain unresolved custom `\newcommand` macros (defined elsewhere in the source); the script doesn't expand macros, so they pass through verbatim. Clean them up if you need pretty rendering downstream — none of this blocks validation.
 
@@ -203,7 +222,7 @@ The slash-command form is `/paper-extraction <arxiv-id-or-doi>`.
 
 **Script (`extract-paper-substrate.py`):** walks LaTeX (Path A) or Docling output (Path B) and emits two things:
 
-1. `index.json` — figures (with copied files + line numbers + multi-graphic panels), tables (one `.tex` per block, including AAS `deluxetable`), section outline (with line numbers, in paper-reading order), citation keys (with every file+line they appear on, including biblatex commands), abstract, title, paths.
+1. `index.json` — figures (with copied files + line numbers + multi-graphic panels), tables (one `.tex` per block, including AAS `deluxetable`), section outline (with line numbers, in paper-reading order), citation keys (with every file+line they appear on, including biblatex commands, *plus the cited paper's full citation text and resolved DOI*), abstract, title, paths.
 2. `astra.yaml` — a stub ASTRA artifact: `id` (derived from arxiv-id/DOI), `version`, `name` (from `\title{}`), `narrative.summary` (from abstract), empty `inputs:`/`outputs:`/`findings:`. Validates as-is.
 
 The script handles a few realities of LaTeX papers automatically:
@@ -212,8 +231,9 @@ The script handles a few realities of LaTeX papers automatically:
 - **Multi-file source** (`\input{}` / `\include{}` chains) is read in **paper-reading order** by walking `main.tex`'s input tree, not alphabetical filename order.
 - **Simple `\newcommand{\name}{body}` macros** are expanded in extracted titles, abstracts, captions, and section names. Macros with arguments (`\newcommand{\foo}[1]{...}`) pass through unexpanded — handling those would require evaluating arbitrary LaTeX.
 - **Standard table envs** (`table`, `table*`, `deluxetable`, `deluxetable*`) and **standard citation commands** (natbib family + biblatex `\autocite` / `\textcite` / `\parencite` / `\footcite` / `\smartcite`) are all recognized.
+- **Bibliography parsed in-script.** `.bib` files (preferred — `@type{key, field = value}` entries with brace-protected lastnames recognized) and `.bbl` files (rendered `\bibitem{key}` blocks) are parsed for Path A; the references section at the tail of `document.md` is parsed for Path B (synthesizing `<lastname>_<year>` keys with letter-suffix disambiguation). DOIs are resolved against Crossref + (optionally) ADS, cached for idempotency, and joined back against `\cite{}`-extracted locations.
 
-What the script does *not* do: understand what figures show, identify findings, infer methodology, or handle substrate acquisition (Step 2). It also doesn't expand macros with arguments, resolve `\graphicspath{}` overrides, or parse non-LaTeX abstract metadata blocks.
+What the script does *not* do: understand what figures show, identify findings, infer methodology, or handle substrate acquisition (Step 2). It also doesn't expand macros with arguments, resolve `\graphicspath{}` overrides, parse non-LaTeX abstract metadata blocks, or extract citation invocations from rendered prose (Path B `locations:` arrays are empty as a result).
 
 **Agent (Steps 4 + 5):** reads `index.json`'s `extraction_warnings` and fixes structural gaps (Step 4), then walks the paper and writes `findings:` into `astra.yaml` with quote-anchored evidence (Step 5). The verbosity of the Insight + Evidence shape *is* the back-pressure: the agent has to find and quote actual paper text, not invent.
 
@@ -221,16 +241,17 @@ What the script does *not* do: understand what figures show, identify findings,
 
 - **One entry-point.** `/paper-extraction <id>` is the whole surface. Don't have callers reach into `scripts/` or `references/` directly. The skill orchestrates; consumers trust `index.json`.
 - **Self-contained.** This skill takes a DOI and produces a standardized directory. It doesn't know who calls it or what they do with the result. Don't add caller-specific logic.
-- **Idempotent.** Survey-first, skip-if-done. Re-invoking on the same paper does no work and produces no errors.
+- **Idempotent.** Survey-first, skip-if-done. Re-invoking on the same paper does no work and produces no errors. DOI lookups cache to `.doi-cache.json`; re-runs don't re-hit the network for already-seen entries.
 - **arXiv-LaTeX is primary.** When an arXiv source tarball is acquirable, Path A wins. PDF + Docling is the fallback for non-arXiv only.
-- **Reading materials only.** The skill produces what's structurally in the paper itself — substrate, figures, tables, outline, citations, embedded bibliography. Adjacent assets (code repos, supplementary datasets, related papers, project bibliography management) are explicitly out of scope.
-- **Script is dumb on purpose.** The deterministic pieces (figure/table blocks, section headings, `\cite{}` keys) belong to the script. Anything that requires understanding what the paper is *about* lives outside this skill — paper-extraction sets the table; it doesn't read the meal.
-- **`extraction_warnings` is the agent surface.** When the script can't resolve something, it doesn't fail or guess — it warns. The agent reads the warnings and decides whether to fix or surface.
+- **Reading materials only.** The skill produces what's structurally in the paper itself — substrate, figures, tables, outline, citations (with resolved DOIs), embedded bibliography. Adjacent assets (code repos, supplementary datasets, related papers, project bibliography *management* — i.e. authoring new entries, curating across papers) are explicitly out of scope; *resolving* the bibliography that's already in the paper is in scope.
+- **Script is dumb on purpose.** The deterministic pieces (figure/table blocks, section headings, `\cite{}` keys, bibliography entries, DOI lookups) belong to the script. Anything that requires understanding what the paper is *about* lives outside this skill — paper-extraction sets the table; it doesn't read the meal.
+- **`extraction_warnings` is the agent surface.** When the script can't resolve something (unmatched citation key, unresolvable DOI, network failure), it doesn't fail or guess — it warns. The agent reads the warnings and decides whether to fix or surface.
 
 ## Anti-patterns
 
 - **Re-fetching what's already there.** Always survey `work/reference/` and read `index.json` first.
 - **Adding numerical-finding extraction to the script.** Macro-based extraction (`\newcommand{\Omegam}{0.315}`) catches almost no real papers; inline-value extraction needs semantic judgment about what's a *result* vs incidental. Findings live in `astra.yaml`, written by the agent in Step 5.
 - **Paraphrasing the `quote.exact` text.** Copy the paper's LaTeX text verbatim. Paraphrasing breaks `astra validate --verify-evidence` and weakens the back-pressure that justified ASTRA shape in the first place.
+- **Producing a parallel cited-papers artifact.** Bibliography resolution lives inside `index.json`'s `citations:` block, not in a side file. Anyone who needs the citation→DOI mapping reads `index.json#citations[key].doi` directly.
 - **Surfacing partial state silently.** If `paper.pdf` was fetched but the LaTeX-source download failed, write `work/reference/extraction-error.txt` with a clear cause and stop, rather than producing a half-populated `work/reference/` with no signal that more was intended.
 - **Knowing about the caller.** The skill's contract is the directory + index. If you're tempted to write logic that depends on a particular invoker, push that logic into the invoker instead.

From 6634ba870dc97864d69853ab1b2bd7339be35c44 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 00:04:00 +0200
Subject: [PATCH 038/124] lc-from-paper: collapse bibliography consumers onto
 index.json#citations

paper-extraction now resolves the bibliography and writes each cited
paper's DOI into index.json#citations[key]. The lc-from-paper phase
references no longer need a separate cited_papers.yaml; they read DOIs
directly from the structural index.

Changes:

- acquire.md: drop Step 3 (build cited-papers index) entirely; survey
  signals now just two-step (paper materials + code clone). The note
  about paper-extraction owning substrate gains an explicit sentence
  about bibliography resolution living there too.
- architect.md: drop the paper-side Explore's relevance-augmentation
  step (it was operating on the deleted artifact); paper-index.md no
  longer has that line. Rules now flag that bibliography is already
  resolved upstream, so the Explore agent doesn't try to rebuild it.
- specify.md: prior_insights doi: lookup now reads from
  index.json#citations[<cite-key>].doi; added handling for unresolved-
  DOI cases (set doi: null, claim notes it; LITERATURE surfaces).
- literature.md: same source swap; reviewer prompt updated to read
  from index.json#citations.
- SKILL.md: outputs column and resume-table no longer mention
  cited_papers.yaml; ACQUIRE's listed outputs now include index.json
  with the enriched citations block.

Closes the lc-from-paper consumer-side of
lightcone/paper2astra-as-skill/bibliography-in-paper-extraction.
---
 .../lightcone/skills/lc-from-paper/SKILL.md   |  3 +-
 .../lc-from-paper/references/acquire.md       | 48 ++++---------------
 .../lc-from-paper/references/architect.md     | 13 ++---
 .../lc-from-paper/references/literature.md    |  4 +-
 .../lc-from-paper/references/specify.md       | 12 ++---
 5 files changed, 24 insertions(+), 56 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index a892e8bc..f6913879 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -31,7 +31,7 @@ The reproduction runs through nine phases (zero-indexed). Phase 0 (INTERVIEW) an
 | # | Phase | Where it runs | Reference | Primary outputs |
 |---|---|---|---|---|
 | 0 | INTERVIEW | orchestrator session | [`references/interview.md`](references/interview.md) | per-paper `CLAUDE.md` |
-| 1 | ACQUIRE | sub-agent | [`references/acquire.md`](references/acquire.md) | `work/reference/{source/, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml}`; `work/notes/cited_papers.yaml` |
+| 1 | ACQUIRE | sub-agent | [`references/acquire.md`](references/acquire.md) | `work/reference/{source/, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml, index.json}` (index.json's `citations:` block carries each cited paper's `{locations, citation, doi}`) |
 | 2 | ARCHITECT | sub-agent | [`references/architect.md`](references/architect.md) | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative); `work/notes/architect/{paper-index.md, code-index.md}` |
 | 3 | SPECIFY | sub-agent | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
 | 4 | LITERATURE | sub-agent | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
@@ -116,7 +116,6 @@ Workdir signals — file existence implies the phase has been done:
 | `work/reference/code/` | ACQUIRE (code clone) |
 | `work/notes/architect/{paper-index.md,code-index.md}` | ARCHITECT (Explore pass) |
 | `astra.yaml` validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
-| `work/notes/cited_papers.yaml` | ARCHITECT (citation extraction) |
 | `astra.yaml` non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
 | `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; `work/notes/literature/<doi-slug>.yaml` files present | LITERATURE |
 | recipes present in `astra.yaml` | IMPLEMENT |
diff --git a/claude/lightcone/skills/lc-from-paper/references/acquire.md b/claude/lightcone/skills/lc-from-paper/references/acquire.md
index adf0d0cb..4083200e 100644
--- a/claude/lightcone/skills/lc-from-paper/references/acquire.md
+++ b/claude/lightcone/skills/lc-from-paper/references/acquire.md
@@ -1,6 +1,6 @@
 # ACQUIRE — fetch the paper, structure it, clone the code
 
-Acquire the paper's reading materials and (when available) clone the reference code repository. The substrate work — LaTeX-source download, Docling fallback, figures, tables, outline, citations, embedded bibliography, paper-as-ASTRA-artifact — is delegated to **`/paper-extraction`**, which lc-from-paper trusts blindly. ACQUIRE adds **Step 2: code-clone** and **Step 3: cited-papers index** on top.
+Acquire the paper's reading materials and (when available) clone the reference code repository. The substrate work — LaTeX-source download, Docling fallback, figures, tables, outline, citations *with resolved DOIs per cited paper*, embedded bibliography, paper-as-ASTRA-artifact — is delegated to **`/paper-extraction`**, which lc-from-paper trusts blindly. ACQUIRE adds **Step 2: code-clone** on top.
 
 This phase runs as the orchestrator-spawned `acquire` sub-agent. The orchestrator launches it, the user can drop into its chat for any failures (download issues, missing code repo), and it commits each artifact as it lands.
 
@@ -13,7 +13,7 @@ This phase runs as the orchestrator-spawned `acquire` sub-agent. The orchestrato
 
 After Step 1 (`/paper-extraction`):
 
-- `work/reference/index.json` — structural index (figures, tables, outline, citations with line numbers, paths)
+- `work/reference/index.json` — structural index. Includes the enriched `citations:` block mapping each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. SPECIFY consumes this when authoring `prior_insights:` placeholders (`doi:` lookup); LITERATURE consumes it when discovering which DOIs need fetching.
 - `work/reference/astra.yaml` — ASTRA-shape representation of the paper, including the paper's claimed numerical findings as ASTRA `findings:` (when paper-extraction's optional Step 5 is run)
 - `work/reference/paper.pdf` — always
 - `work/reference/paper.tex` + `work/reference/source/` — Path A (arXiv LaTeX)
@@ -27,10 +27,6 @@ After Step 2 (this phase):
 - `work/reference/code/` — cloned reference repo (or absent if not found)
 - `work/reference/code-status.yaml` — record of where the code came from
 
-After Step 3 (this phase):
-
-- `work/notes/cited_papers.yaml` — citation marker → DOI mapping for every paper cited by the target paper. SPECIFY consumes this when authoring `prior_insights:` placeholders; LITERATURE consumes it when fetching cited papers to resolve those placeholders.
-
 ## Step 1 — Stand up the paper's reading materials
 
 Invoke `/paper-extraction <arxiv-id-or-doi>`. The skill is idempotent — it surveys `work/reference/` first and skips work that's already done.
@@ -39,9 +35,9 @@ Invoke `/paper-extraction <arxiv-id-or-doi>`. The skill is idempotent — it sur
 /paper-extraction <arxiv-id-or-doi>
 ```
 
-This produces everything under `work/reference/` *except* the code clone. lc-from-paper ACQUIRE does not re-implement the substrate logic; if something is wrong with the substrate, fix it in `/paper-extraction`, not here.
+This produces everything under `work/reference/` *except* the code clone. lc-from-paper ACQUIRE does not re-implement the substrate logic; if something is wrong with the substrate — including a substrate need that surfaces mid-reproduction — fix it in `/paper-extraction`, not here.
 
-Two starting surfaces: `work/reference/index.json` (structural — figures, tables, outline, citations with line numbers) and `work/reference/astra.yaml` (semantic — the paper as an ASTRA artifact, with `findings:` carrying the paper's central numerical claims as quote-anchored evidence). ARCHITECT reads index.json when its Explore sub-agents fan out across the paper; SPECIFY reads astra.yaml when authoring `prior_insights:` against the paper's claims (the paper's `findings:` map directly to a reproduction's `prior_insights:`).
+Two starting surfaces: `work/reference/index.json` (structural — figures, tables, outline, *citations with locations + cited-paper text + resolved DOIs*) and `work/reference/astra.yaml` (semantic — the paper as an ASTRA artifact, with `findings:` carrying the paper's central numerical claims as quote-anchored evidence). ARCHITECT reads index.json when its Explore sub-agents fan out across the paper; SPECIFY reads index.json's `citations:` block when authoring `prior_insights:` placeholders (citation key → DOI lookup) and reads astra.yaml when authoring `prior_insights:` against the paper's claims.
 
 ## Step 2 — Clone the reference code repository
 
@@ -65,44 +61,20 @@ Spend no more than a few searches before recording failure and moving on. **Do N
 
 Skip Step 2 if `work/reference/code/` already exists.
 
-## Step 3 — Build the cited-papers index
-
-Walk the paper's bibliography and produce `work/notes/cited_papers.yaml`: one entry per paper the target paper cites, keyed by citation marker form (the same form the paper invokes — `Smith+24`, `(Doe & Lee 2023)`, numeric `[12]`), each entry carrying the cited paper's DOI when resolvable.
-
-Sources for the bibliography:
-
-- **Path A (arXiv LaTeX):** `work/reference/bibliography-source.{bib,bbl}` carries the bibliography. Each `\bibitem{}` (or `.bib` entry) yields one record. DOIs come from the `doi` field where present; for entries without one, search for the title via Crossref / arXiv / ADS to resolve.
-- **Path B (PDF + Docling):** the references section is at the end of `work/reference/document.md`. Each citation block yields one record; DOIs are resolved by title search.
-
-The `relevance:` field is **not authored here** — it's filled in downstream. SPECIFY adds per-citation relevance notes when it links a citation to a decision (the `prior_insights:` placeholder authoring step), and LITERATURE deepens those notes as it fetches each cited paper. ACQUIRE just lays the index down.
-
-Output shape:
-
-```yaml
-papers:
-  - marker: "Smith+24"          # or "[12]" or whatever form the target paper uses
-    citation: "Smith et al. (2024), J. Cosmology"
-    doi: "10.xxxx/yyyy"          # null if unresolvable
-    # relevance: filled in by SPECIFY / LITERATURE
-```
-
-Skip Step 3 if `work/notes/cited_papers.yaml` already exists.
-
 ## Survey signals (entry into ACQUIRE)
 
-Run `ls work/reference/ work/notes/` first.
+Run `ls work/reference/` first.
 
-- If `paper.pdf` is present, **and** the path indicator (`source/` for Path A or `document.md` for Path B) is present, **and** `index.json` is present → Step 1 is done.
+- If `paper.pdf` is present, **and** the path indicator (`source/` for Path A or `document.md` for Path B) is present, **and** `index.json` is present (with the enriched `citations:` block — `key -> {locations, citation, doi}`) → Step 1 is done.
 - If `work/reference/code/` is present (or `code-status.yaml` records `found: false`) → Step 2 is done.
-- If `work/notes/cited_papers.yaml` is present → Step 3 is done.
-- When all three are done, ACQUIRE is complete; the orchestrator proceeds to ARCHITECT.
+- When both are done, ACQUIRE is complete; the orchestrator proceeds to ARCHITECT.
 - Otherwise, run whichever step is missing. `/paper-extraction` handles its own idempotency for Step 1.
 
 ## Notes
 
-- **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces that paper-extraction doesn't cover, file it as paper-extraction work — not as ACQUIRE work.
+- **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces that paper-extraction doesn't cover, file it as paper-extraction work — not as ACQUIRE work. Bibliography resolution is paper-extraction's: cited-paper text and DOIs live inside `index.json#citations[key]`, not in a side file.
 - **arXiv DOI form is `10.48550/arXiv.<id>`.** Useful when downstream tools want a DOI rather than an arXiv ID.
 - **Equation numbers and section numbers must match the rendered paper.** When citing "eq. N" or "§N" in any downstream phase, find the equation or heading by content, not by a naïve count of TeX blocks or markdown headings. Path A: source preserves printed numbers in `\label{}`s. Path B: Docling preserves printed numbers in its markdown.
-- **This phase is acquisition + code-clone + bibliography, not understanding.** Do not start indexing or comparing the paper here — that's ARCHITECT. The cited-papers index is mechanical: marker, citation, DOI. Relevance per citation lands later, where it's actually being used.
+- **This phase is acquisition + code-clone, not understanding.** Do not start indexing or comparing the paper here — that's ARCHITECT.
 - **Code-as-canonical** is loaded by every subsequent sub-agent. The per-paper `CLAUDE.md` restates the rule; ACQUIRE just makes sure `work/reference/code/` exists when possible.
-- **Commit each step as it lands.** ACQUIRE runs as a sub-agent; the orchestrator reads `git log` to see how far it got. One commit per artifact (paper materials, code clone, cited-papers index) keeps the trail readable.
+- **Commit each step as it lands.** ACQUIRE runs as a sub-agent; the orchestrator reads `git log` to see how far it got. One commit per artifact (paper materials, code clone) keeps the trail readable.
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index d40f9f12..ed66c855 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -7,8 +7,8 @@ This phase runs as the orchestrator-spawned `architect` sub-agent. Internally it
 ## Inputs
 
 - `work/reference/source/` (Path A — arXiv LaTeX) **or** `work/reference/document.md` + `work/reference/figures/` + `work/reference/tables/` + `work/reference/metadata.json` (Path B — Docling)
+- `work/reference/index.json` — structural index emitted by paper-extraction (consumed by the paper-side Explore; its `citations:` block already carries each cited paper's `{locations, citation, doi}`)
 - `work/reference/code/` — the reference code repo (when present)
-- `work/notes/cited_papers.yaml` — citation marker → DOI mapping from ACQUIRE (consumed by the paper-side Explore for cross-referencing citations against decision clusters)
 - CLAUDE.md — the per-paper artifact at the workdir root; its **Goal** section names the user's intended replication targets
 - `work/notes/notes.md` — user-supplied prior notes, if any (read by every sub-agent if present)
 
@@ -38,8 +38,7 @@ From inside the architect sub-agent's session, spawn two Task-tool Explore sub-a
 > 2. **Sub-analysis boundary candidates.** Where does the paper's pipeline have natural seams — places one stage's output flows as the next stage's input? Look for: a reconstruction stage producing a catalog consumed by a clustering stage; an MCMC producing a chain consumed by a parameter-estimation stage; a fit producing posteriors consumed by a comparison stage. Name each candidate with a noun phrase (`reconstruction`, `clustering`, `bao_fit`) and one-line description.
 > 3. **Decision clusters per sub-analysis.** Group the paper's choices by where they sit in the pipeline. Don't enumerate every choice — name the *clusters* (e.g. "fitting prior choices", "selection criteria for the catalog"). SPECIFY drills back into the paper to author each `decisions:` entry; you're indicating where to look.
 > 4. **Result loci.** Which figures / tables / in-text metrics report the paper's primary and secondary results? Use `path:line` for the `\includegraphics{}` or table source (Path A); use `metadata.json` indexes for Path B. Tag each as primary / secondary based on the paper's own emphasis.
-> 5. **Augment `work/notes/cited_papers.yaml` with relevance notes.** Read the file (already populated by ACQUIRE with marker → citation → DOI for every citation in the bibliography). For each citation that justifies a method, parameter, or value used by the analysis (not general background), add a `relevance:` field with a one-line note on why the citation matters for replication. Skip citations that are pure background. Edit the file in place; preserve every entry (do not delete, even if you don't add relevance to most of them).
-> 6. **Data-flow shape.** A short prose paragraph: "Inputs flow from <source datasets> through <stage 1> producing <intermediate>, into <stage 2> producing <intermediate>, into <stage 3> producing <primary result>." This becomes the seed for the root narrative's data-flow paragraph.
+> 5. **Data-flow shape.** A short prose paragraph: "Inputs flow from <source datasets> through <stage 1> producing <intermediate>, into <stage 2> producing <intermediate>, into <stage 3> producing <primary result>." This becomes the seed for the root narrative's data-flow paragraph.
 >
 > ### Output format — `work/notes/architect/paper-index.md`
 >
@@ -63,13 +62,12 @@ From inside the architect sub-agent's session, spawn two Task-tool Explore sub-a
 > <one-paragraph prose: how inputs flow through the pipeline to the primary result>.
 > ```
 >
-> Augmented relevance notes go directly into `work/notes/cited_papers.yaml`, not the index file.
->
 > ### Rules
 >
 > - **Bounded read.** Do not read the code repo. Your job is paper-side only.
-> - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`. Those are SPECIFY's. Your primary output is markdown (the index); the only YAML you touch is `work/notes/cited_papers.yaml`, and there only the `relevance:` field per existing entry.
+> - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`. Those are SPECIFY's. Your output is markdown (the index); you do not write any YAML.
 > - **Quote sparingly.** Brief paper quotes are OK to disambiguate a result locus or a sub-analysis boundary; verbatim claim quotes are SPECIFY's substrate, not yours.
+> - **Bibliography is already resolved.** `work/reference/index.json#citations` carries each cited paper's text + DOI from paper-extraction. You don't need to re-derive that mapping; SPECIFY and LITERATURE will read from it directly.
 
 ### Code-side Explore — system prompt
 
@@ -260,7 +258,6 @@ After the self-review terminates, the architect sub-agent updates CLAUDE.md's **
 ## Survey signals (entry into ARCHITECT)
 
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) exists ⇒ ready to architect
-- `work/notes/cited_papers.yaml` exists from ACQUIRE ⇒ paper-side Explore can augment it with `relevance:` notes
 - `work/notes/architect/paper-index.md` and `work/notes/architect/code-index.md` (if code present) exist ⇒ Explore pass done
 - `astra.yaml` exists; `astra validate astra.yaml` returns clean; sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks are present-and-empty ⇒ stub written
 - For cheap: `work/notes/architect/review-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ ARCHITECT done
@@ -269,7 +266,7 @@ After the self-review terminates, the architect sub-agent updates CLAUDE.md's **
 ## Notes
 
 - **Run the Explore reads in parallel.** They're fully independent (one reads paper-only, one reads code-only). Synthesis runs once, after both index files exist.
-- **The Explore reads do not write `astra.yaml`.** They write index markdown (and the paper-side adds `relevance:` notes to `work/notes/cited_papers.yaml`). Only the synthesis step writes the stub. This separation keeps each Explore read's context bounded — it doesn't have to think about ASTRA's schema, only the read.
+- **The Explore reads do not write `astra.yaml`.** They write index markdown. Only the synthesis step writes the stub. This separation keeps each Explore read's context bounded — it doesn't have to think about ASTRA's schema, only the read.
 - **The stub's empty blocks are intentional.** `decisions: {}`, `prior_insights: {}`, `findings: {}` make it clear at a glance that ARCHITECT's job is structural, and that SPECIFY is what fills them. Don't try to half-author content — empty is honest.
 - **Code-as-canonical for structure, paper-as-canonical for narrative voice.** The code reveals where the real stage boundaries are; the paper provides the words to describe them. The stub uses both.
 - **Resume is automatic.** If `astra.yaml` already validates and has the structural fields populated, on re-spawn the architect sub-agent skips Step 1 and Step 2 and runs Step 3 (review) only.
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index ebda5c23..4b98b266 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -9,7 +9,7 @@ This phase runs as the orchestrator-spawned `literature` sub-agent. Internally i
 ## Inputs
 
 - `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries with `claim:` + `doi:` + `decision_links:` but no `evidence:` selector. These are the placeholders LITERATURE resolves.
-- `work/notes/cited_papers.yaml` — citation marker → DOI → relevance mapping (built by ACQUIRE, augmented with relevance notes by ARCHITECT's paper-side Explore). Used to discover which DOIs need fetching, complementing the per-placeholder `doi:` lookup.
+- `work/reference/index.json#citations` — paper-extraction's cite-key → `{locations, citation, doi}` mapping for every entry in the target paper's bibliography. Used as the canonical cite-key → DOI lookup when cross-checking placeholder DOIs and when surfacing unresolved-DOI cases.
 - `work/notes/architect/paper-index.md` — has the decision clusters per sub-analysis; per-paper sub-sub-agents get it as context.
 - `work/reference/source/` (Path A — arXiv LaTeX) or `work/reference/document.md` (Path B — Docling) — the target paper (for context on how the cited paper is invoked).
 - CLAUDE.md — **Rigor** for this spawn's chosen rigor level.
@@ -151,7 +151,7 @@ The discipline matches ARCHITECT's and SPECIFY's self-review shape: each round r
 >
 > - `astra.yaml` — focus on every `analyses.<sub-analysis-id>.prior_insights:` entry. Each should have a resolved `evidence:` block.
 > - The cited papers (cached PDFs).
-> - `work/notes/cited_papers.yaml` — DOI lookups.
+> - `work/reference/index.json#citations` — cite-key → `{locations, citation, doi}` mapping from paper-extraction.
 > - `open-questions.md` — to see which placeholders the resolution sub-sub-agents flagged unresolved.
 > - `work/reference/source/` (or `document.md`) — the target paper, for context on how the cited paper is invoked.
 >
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 878a5ab1..14e2a470 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -13,7 +13,7 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 - `astra.yaml` — the stub from ARCHITECT (sub-analyses, inputs, outputs, narrative; empty `decisions:` / `prior_insights:` / `findings:` blocks)
 - `work/notes/architect/paper-index.md` — paper-side decision clusters, result loci, citations
 - `work/notes/architect/code-index.md` (when code present) — module map, natural decomposition, entry-points, gotchas
-- `work/notes/cited_papers.yaml` — citation marker → DOI mapping (from ACQUIRE, with `relevance:` notes added by ARCHITECT's paper-side Explore); SPECIFY uses it to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch
+- `work/reference/index.json` — paper-extraction's structural index; its `citations:` block maps each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. SPECIFY uses this to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch.
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
 - `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (Path B only)
 - `work/reference/code/` (if present) — original code, canonical reference for numerics + method
@@ -22,7 +22,7 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 
 ## Outputs
 
-- `astra.yaml` — **filled form**: each sub-analysis's `decisions:` and `findings:` populated with paper-anchored `evidence:` selectors; `prior_insights:` populated as citation-only **placeholders** (id, claim, decision_links, `doi:` lookup from `cited_papers.yaml` — but no `evidence:` selector yet, LITERATURE fills those next); `narrative:` keys updated to weave `astra-anchor:` references into prose as entries land. `astra validate astra.yaml` returns clean; `astra validate astra.yaml --verify-evidence` runs after LITERATURE has resolved the placeholders.
+- `astra.yaml` — **filled form**: each sub-analysis's `decisions:` and `findings:` populated with paper-anchored `evidence:` selectors; `prior_insights:` populated as citation-only **placeholders** (id, claim, decision_links, `doi:` looked up from `work/reference/index.json#citations[<cite-key>].doi` — but no `evidence:` selector yet, LITERATURE fills those next); `narrative:` keys updated to weave `astra-anchor:` references into prose as entries land. `astra validate astra.yaml` returns clean; `astra validate astra.yaml --verify-evidence` runs after LITERATURE has resolved the placeholders.
 - `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
 - `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
 - `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
@@ -52,20 +52,20 @@ Read the paper's section(s) covering this sub-analysis. Author:
 
    Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. A typical sub-analysis has 2–6 decisions; if a sub-analysis has fewer than 2, revisit `work/notes/architect/paper-index.md` and reconsider.
 
-2. **`prior_insights:`** — for every citation marker the paper invokes that bears on a decision in this sub-analysis (`[12]`, `Smith+24`, `(Doe & Lee 2023)`), record a **placeholder**: an `id:`, a `claim:` describing what the cited paper supports about the decision (the target paper's framing of why it cites that paper here), a `doi:` looked up from `work/notes/cited_papers.yaml`, and `decision_links:` mapping the placeholder to the relevant decision option(s). **Do not author the `evidence:` selector** — that's LITERATURE's job. Leave `evidence:` absent or empty; LITERATURE fetches the cited paper, finds the supporting quote, and authors the resolved selector back into this placeholder. The placeholder shape:
+2. **`prior_insights:`** — for every `\cite{<key>}` (Path A) or rendered citation invocation (Path B) the paper invokes that bears on a decision in this sub-analysis, record a **placeholder**: an `id:`, a `claim:` describing what the cited paper supports about the decision (the target paper's framing of why it cites that paper here), a `doi:` looked up from `work/reference/index.json#citations[<cite-key>].doi`, and `decision_links:` mapping the placeholder to the relevant decision option(s). **Do not author the `evidence:` selector** — that's LITERATURE's job. Leave `evidence:` absent or empty; LITERATURE fetches the cited paper, finds the supporting quote, and authors the resolved selector back into this placeholder. The placeholder shape:
 
    ```yaml
    prior_insights:
      <insight_id>:
        id: <insight_id>
        claim: "<what the cited paper supports about the decision>"
-       doi: "<DOI from cited_papers.yaml>"
+       doi: "<DOI from index.json#citations[<cite-key>].doi>"
        # evidence: omitted — LITERATURE fills this in
        decision_links:
          <decision_id>: [<option_id>, ...]
    ```
 
-   Don't pre-emptively fetch the cited paper or guess its content; LITERATURE does that with fresh context per paper.
+   When the citation's DOI is unresolved (`citations[<key>].doi: null` — flagged in `extraction_warnings`), record the placeholder with `doi: null` and a note in the `claim:`; LITERATURE will surface it as an unresolved entry rather than fabricate evidence. Don't pre-emptively fetch the cited paper or guess its content; LITERATURE does that with fresh context per paper.
 
 3. **`findings:`** — paper-level claims and quantitative results scoped to this sub-analysis, each with source-anchored `evidence:` (verbatim quote against the paper). Pull the verbatim claims for each output's expected value from the paper text + the result loci in `paper-index.md`.
 
@@ -115,7 +115,7 @@ Self-review depth follows the rigor level the orchestrator picked for this spawn
 > - `work/notes/architect/code-index.md` (when code present)
 > - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
 > - `work/reference/code/` (when present) — canonical reference for numerics + method
-> - `work/notes/cited_papers.yaml` — citation marker → DOI mapping (use to confirm each `prior_insights:` placeholder's `doi:` matches what the paper cites)
+> - `work/reference/index.json#citations` — cite-key → `{locations, citation, doi}` mapping from paper-extraction (use to confirm each `prior_insights:` placeholder's `doi:` matches what the paper cites)
 >
 > ### What to check
 >

From 7c5d5463f2776ceefe95369284a4c24b3cf1ab60 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 00:07:58 +0200
Subject: [PATCH 039/124] lc-from-paper: capture fidelity intent as prose in
 INTERVIEW + Goal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

INTERVIEW gains a fourth job — fidelity intent — captured in the user's
own words into CLAUDE.md's Goal section. Pivot questions sit alongside
the existing scope pivots in "when the interview gets stuck"; concrete
examples show what good prose intent looks like. The orchestrator reads
the intent on every spawn decision and COMPARE grades opportunities
against it.

templates/CLAUDE.md's Goal section invites the intent prose. Rigor's
*Current state* is reframed as orchestrator-internal trajectory tracking
(not a user-facing dial), and *Open opportunities* gains a relative-to-
intent grade in its format so future sessions see which gaps matter.

No new vocabulary — sketch/baseline/tightened/canonical and cheap/heavy
stay where they were; intent is the prose anchor they're trajectories
toward.
---
 .../lc-from-paper/references/interview.md     | 34 +++++++++++++++----
 .../skills/lc-from-paper/templates/CLAUDE.md  |  6 ++--
 2 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index 768701d5..f14ee1b1 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -11,18 +11,18 @@ The interview is short. Three to six `AskUserQuestion` rounds, total. The user d
 A single `<paper-slug>/CLAUDE.md`, drafted from the template at [`../templates/CLAUDE.md`](../templates/CLAUDE.md). It carries:
 
 - **Paper identity** — DOI, arXiv ID, title, authors, one-line subject; where the original code lives.
-- **Goal** — what "done" looks like for this reproduction; in-scope and out-of-scope targets.
+- **Goal** — what "done" looks like for this reproduction: in-scope and out-of-scope targets, plus the user's fidelity intent in prose.
 - **Pointers** — any paper-specific conventions or warnings the user surfaced.
 
 The Rigor and Disagreements sections start empty — sub-agents fill them in as they work. The Rules section is standing discipline (universal across reproductions); leave it as the template provides.
 
-There is no separate constitution, no runtime-mode choice, no global termination criterion. The architecture is fixed (orchestrator + named per-phase sub-agents) and rigor is chosen per spawn — see SKILL.md's *Rigor is continuous, chosen per spawn* discipline.
+There is no separate constitution, no runtime-mode choice, no global termination criterion. The architecture is fixed (orchestrator + named per-phase sub-agents) and rigor is a trajectory toward the user's Goal-section intent — see SKILL.md's *Rigor is a trajectory toward the user's intent* discipline.
 
 After the user approves the draft, save it, ensure the workdir is a git repo (`git init` if needed) and commit `CLAUDE.md` as the first commit, then launch the ACQUIRE sub-agent.
 
 ---
 
-## The three jobs
+## The four jobs
 
 ### 1. Identify the paper
 
@@ -30,7 +30,7 @@ Use `AskUserQuestion` for whatever the user did not supply on `/lc-from-paper` i
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
 - **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) When code is available, every implementing sub-agent reads from `work/reference/code/` and treats code as canonical for numerics + method. This is recorded in CLAUDE.md's Rules.
-- **User's prior familiarity.** Has the user reproduced this paper before? Read it recently? Worked with the original authors? Affects how much of ARCHITECT / SPECIFY benefits from heavier rigor settings on first spawn.
+- **User's prior familiarity.** Has the user reproduced this paper before? Read it recently? Worked with the original authors? Affects how much you'd lean toward heavy self-review on first spawns.
 - **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; ARCHITECT will read it.
 
 ### 2. Scope the reproduction
@@ -45,7 +45,27 @@ Ask:
 
 These answers go into CLAUDE.md's **Goal** section as "in scope" / "out of scope". There is no separate target-extraction phase — what the user names here becomes explicit `outputs:` declared in the stub `astra.yaml` during ARCHITECT, then filled with paper-anchored `findings:` / `decisions:` during SPECIFY.
 
-### 3. Paper-specific conventions or warnings
+### 3. Fidelity intent
+
+A reproduction can land anywhere from a quick "does this even run" sanity check to a full match across every primary and secondary target. The user owns where they want this one to land. The job here is to **elicit prose intent** — their own words for what "good enough" looks like, captured into CLAUDE.md's Goal section alongside scope.
+
+Reach for whichever pivot fits the conversation; you usually only need one or two:
+
+- *"What's the moment you'd call this reproduction useful — when any number comes out at all, when a specific figure matches in shape, when the headline number matches within stated uncertainty, or when every primary and secondary target lines up?"*
+- *"Is there a specific result you care about more than the rest, where you'd want full fidelity even if the others stay rough?"*
+- *"If this took several sessions of iteration to reach high fidelity everywhere, is that the right investment, or would you rather get a working version in a couple of sessions and decide later whether to push further?"*
+- *"Are you trying to verify the paper, build on it, or critique it? That shifts where the fidelity bar wants to sit."*
+
+Record the answer verbatim or in close paraphrase under **Fidelity intent** in CLAUDE.md's Goal section. Concrete examples of what good prose intent looks like:
+
+- *"Just checking if the analysis is tractable — quick sanity that some headline number comes out close."*
+- *"I care about Figure 3 being right. The rest can stay rough."*
+- *"Full fidelity on the BAO fit specifically; the rest can stay rough."*
+- *"Every primary and secondary target lining up within stated tolerance, every paper-vs-code conflict adjudicated."*
+
+The orchestrator reads this on every spawn decision and COMPARE grades opportunities against it. If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
+
+### 4. Paper-specific conventions or warnings
 
 Light touch. Ask the user if there's anything they want every sub-agent to know about this paper up front — a known pitfall, a non-obvious convention, a thing the authors did unusually. These go into CLAUDE.md's **Pointers** section as one-line notes. Skip cleanly if nothing comes to mind; sub-agents surface their own as they work.
 
@@ -57,7 +77,7 @@ Open the template at [`../templates/CLAUDE.md`](../templates/CLAUDE.md) and fill
 
 - The header (`<paper-slug>`, paper title, arXiv ID, DOI).
 - **Paper** — authors, one-line subject, code repo URL.
-- **Goal** — what "done" looks like; in-scope and out-of-scope.
+- **Goal** — what "done" looks like; in-scope and out-of-scope; fidelity intent in the user's words.
 - **Pointers** — any paper-specific conventions the user surfaced.
 
 Leave the **Rigor**, **Paper-vs-code disagreements**, and **Rules** sections in their template state. Rigor and Disagreements grow as sub-agents work; Rules are universal.
@@ -83,6 +103,8 @@ Most failure modes resolve into "the user has not yet decided what 'reproduce' m
 
 - *"If we ran this and it produced figure 3 plus the headline number in Table 2, would you be done?"* — pins targeted vs full.
 - *"Is there a specific decision in the paper you want to vary, or are we trying to match the paper exactly?"* — pins whether universes need to span alternatives.
+- *"What's the moment you'd call this useful — any number coming out, a specific figure matching in shape, the headline matching within stated uncertainty, or every target lining up?"* — pins fidelity intent.
+- *"Are you trying to verify the paper, build on it, or critique it?"* — shifts where the fidelity bar naturally sits.
 - *"Is there anything weird about this paper you want every sub-agent to know up front?"* — pins paper-specific conventions.
 
 When these answer cleanly, CLAUDE.md writes itself.
diff --git a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
index 0117c21c..7f92ca51 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
@@ -16,13 +16,15 @@ Reproduction of <paper title> (<arXiv ID>). DOI: <doi>.
 
 **Out of scope:** <explicit exclusions, fenced from drift.>
 
+**Fidelity intent:** <the user's prose answer from INTERVIEW to "when is this good enough" — captured verbatim or in close paraphrase. E.g. "just checking if the analysis is tractable — quick sanity on a headline number", "Figure 3 must be right; the rest can stay rough", "full fidelity on the BAO fit, baseline elsewhere", "every primary and secondary target lining up within stated tolerance". The orchestrator translates this into per-spawn cheap/heavy decisions and COMPARE grades opportunities against it. Static once approved; the user can sharpen it at any REVIEW.>
+
 ## Rigor
 
-*Current state* — populated by sub-agents as they produce artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened / canonical*. Empty until the first phase produces something:
+*Current state* — orchestrator-internal trajectory tracking, updated by sub-agents as they produce artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened / canonical*. The orchestrator reads this alongside the Goal's fidelity intent to decide cheap vs heavy on the next spawn. Empty until the first phase produces something:
 
 - (none yet)
 
-*Open opportunities* — what could benefit from more attention if the user comes back, with a sense of leverage. Format: `<area> — <what could be tightened> — <leverage>`. Empty until a sub-agent surfaces a gap:
+*Open opportunities* — gaps that could be tightened if the user comes back, each carrying a sense of leverage and where it sits relative to the Goal's fidelity intent. Format: `<area> — <what could be tightened> — <leverage> — <above|at|below intent>`. Empty until a sub-agent surfaces one:
 
 - (none yet)
 

From 4847912b4792028f1f89d1c8f5bbf9c63e01f123 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 00:09:01 +0200
Subject: [PATCH 040/124] lc-from-paper: reframe rigor as trajectory toward
 user's intent

The "Rigor is continuous, chosen per spawn" discipline is renamed and
rewritten as "Rigor is a trajectory toward the user's intent." The
cheap/heavy + sketch/baseline/tightened/canonical vocabularies stay
exactly where they were, but their framing shifts: they are the
orchestrator's internal scaffolding for sizing each spawn, derived from
the gap between Current state and the user's Goal-section intent prose.

The Goal description in "Per-paper artifact: CLAUDE.md" is updated to
name fidelity intent as a first-class part of the section. The Interview
bookend summary lists fidelity intent as the third of four interview
jobs. The COMPARE intro flags that opportunities are graded against
intent (the schema change lands in compare.md next commit).

No new vocabulary; the user's surface remains prose intent.
---
 claude/lightcone/skills/lc-from-paper/SKILL.md | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index f6913879..1862395c 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -40,7 +40,7 @@ The reproduction runs through nine phases (zero-indexed). Phase 0 (INTERVIEW) an
 | 7 | COMPARE | sub-agent | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
 | 8 | REVIEW | orchestrator session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
 
-COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are and how much they likely matter. You and the user decide together whether to spend another IMPLEMENT round now (close a high-leverage gap) or land the reproduction at its current rigor level and log the gap as an open opportunity in CLAUDE.md's Rigor section. Either way, control eventually passes to REVIEW.
+COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the user's fidelity intent. You and the user decide together whether to spend another IMPLEMENT round now (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap as an open opportunity in CLAUDE.md's Rigor section. Either way, control eventually passes to REVIEW.
 
 ## Spawning a phase sub-agent
 
@@ -59,8 +59,8 @@ When the sub-agent's turn closes you receive a notification with its full respon
 The reproduction's directory holds a single `CLAUDE.md` that sub-agents and future orchestrator sessions walk up to automatically. It is the durable spec for the reproduction, drafted during INTERVIEW and evolving over time as iterations learn paper-specific gotchas. The starting shape is in [`templates/CLAUDE.md`](templates/CLAUDE.md). Sections:
 
 - **Paper identity** — DOI, arXiv ID, title, authors, one-line subject; where the original code lives (`work/reference/code/`).
-- **Goal** — what the reproduction is aiming for. Desired state, scope (in / out). Stays static once approved at INTERVIEW.
-- **Rigor** — where the reproduction currently stands and what's worth tightening if attention returns. *Current state* per output or per phase (e.g. *sketch / baseline / tightened / canonical*). *Open opportunities* — what could benefit from more attention, with a sense of leverage ("Figure 3's systematics treatment is sketch-level; tightening it would change the headline number by ~10%"). Updated by sub-agents as they work; mined during REVIEW for what's worth coming back for.
+- **Goal** — what the reproduction is aiming for. Desired state, scope (in / out), and the user's **fidelity intent** as prose — their own answer to "when is this good enough." The orchestrator reads the intent on every spawn decision and COMPARE grades opportunities against it. Stays static once approved at INTERVIEW; the user can sharpen the intent at any REVIEW.
+- **Rigor** — the reproduction's trajectory toward that intent. *Current state* per output or per phase (e.g. *sketch / baseline / tightened / canonical*); read alongside the Goal's intent to decide cheap vs heavy on the next spawn. *Open opportunities* — what could benefit from more attention, with a sense of leverage and how it sits relative to intent ("Figure 3's systematics treatment is sketch-level; tightening it would change the headline number by ~10% — below intent"). Updated by sub-agents as they work; mined during REVIEW for what's worth coming back for.
 - **Disagreements** — paper-vs-code material disagreements logged by sub-agents as they find them. Code is canonical for numerics; both options are preserved as decision options in `astra.yaml`. CLAUDE.md just summarizes them so every walk-up sees them at a glance. Surfaced to the user when they're around.
 - **Rules** — the code-as-canonical discipline, the never-block-on-`AskUserQuestion`-mid-sub-agent rule (with `open-questions.md` as the autonomous-mode fallback), arxiv-LaTeX-first acquisition, `astra validate --verify-evidence` as the fidelity gate.
 - **Pointers** — to `open-questions.md`, and any paper-specific conventions or warnings the user surfaced during the interview.
@@ -71,7 +71,7 @@ Keep it short. Pointers, not snapshots.
 
 ### Interview (Phase 0)
 
-The opening interactive phase. Read [`references/interview.md`](references/interview.md) in full before starting. The interview gathers: (1) the paper (DOI / arXiv ID / code repo URL / prior context), (2) scope (full vs targeted, sub-analysis structure), (3) any paper-specific conventions or warnings.
+The opening interactive phase. Read [`references/interview.md`](references/interview.md) in full before starting. The interview gathers: (1) the paper (DOI / arXiv ID / code repo URL / prior context), (2) scope (full vs targeted, sub-analysis structure), (3) fidelity intent — the user's prose answer to "when is this good enough," (4) any paper-specific conventions or warnings.
 
 These get drafted into the per-paper `CLAUDE.md` — paper identity, Goal section, Rules, Conventions. The Rigor section starts empty; sub-agents fill it in as they work. Show the user the draft, take corrections, refine, then save.
 
@@ -89,7 +89,11 @@ REVIEW runs in the orchestrator session because both `/figure-comparison` and `/
 
 **Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every implementing sub-agent reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Disagreements* section so it's visible to every sub-agent and to the user. Surface it to the user the next time they're around. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
 
-**Rigor is continuous, chosen per spawn.** A reproduction isn't one-shot — it reaches a baseline, then accumulates rigor as the user comes back. When you spawn an artifact-producing sub-agent (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT), choose how much fresh-context self-review to ask of it based on where the artifact currently stands (CLAUDE.md's Rigor section) and what the user wants to invest now. *Cheap:* skip self-review or run one fresh-context pass. *Heavy:* iterate fresh-context review + fix until two consecutive rounds find no fixes (capped at 5 rounds). The reviewing sub-agent never sees prior rounds' fixes — fresh context each round, with the prompt "check the artifact is consistent with the paper and the code." Each spawn that produces an artifact updates CLAUDE.md's Rigor section so the picture stays honest across context windows.
+**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates as the user comes back. The anchor for the whole trajectory is the user's **fidelity intent**, captured in CLAUDE.md's Goal section at INTERVIEW as prose — their own words for what "good enough" looks like (e.g. *"just checking the analysis is tractable"*, *"Figure 3 must be right; the rest can stay rough"*, *"every primary and secondary target lining up within stated tolerance"*). Your job as orchestrator is to hold that intent and translate it into per-spawn tactical decisions.
+
+When you spawn an artifact-producing sub-agent (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT), derive how much fresh-context self-review to ask of it from the **gap** between where the artifact currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* skip self-review or run one fresh-context pass. *Heavy:* iterate fresh-context review + fix until two consecutive rounds find no fixes (capped at 5 rounds). The reviewing sub-agent never sees prior rounds' fixes — fresh context each round, with the prompt "check the artifact is consistent with the paper and the code." Each spawn that produces an artifact updates CLAUDE.md's Rigor *Current state* so the trajectory stays honest across context windows.
+
+The *sketch / baseline / tightened / canonical* and *cheap / heavy* vocabularies are the orchestrator's internal scaffolding for sizing each spawn. The user's surface is the intent prose; the scaffolding only shows through when they ask how a spawn was sized.
 
 **arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv only.
 

From eb2ecafd5fdfe437227c5f7d549c09e1a00d007e Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 00:10:30 +0200
Subject: [PATCH 041/124] lc-from-paper: grade COMPARE opportunities against
 user's fidelity intent

The opportunities[] block gains a relative_to_intent field
(above|at|below) placing each gap against the user's prose intent in
CLAUDE.md's Goal section. Below-intent items lead the orchestrator's
surfacing to the user and drive the autonomous-mode keep-iterating
decision; at-intent items close the trajectory; above-intent items get
logged but don't pull attention.

Default rules when intent is silent on a target: at for primaries, above
for secondaries. Empty opportunities[] remains a strong signal.

Intro and "Verdict + opportunity surfacing" both retuned: the
ratification ask is "spend another round on the below gaps?" rather than
"on a high-leverage gap"; the autonomous branch acts against intent
instead of standing rigor settings; the closing line names the
intent-grading as what turns the binary verdict into a navigable
picture.
---
 .../lc-from-paper/references/compare.md       | 25 +++++++++++++------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index dd55e4cd..a90741f3 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -1,8 +1,8 @@
 # COMPARE — judge the match, name the opportunities
 
-Compare reproduced results against the paper's replication targets. COMPARE returns two things: a **verdict** (pass / partial / fail) and an **opportunity assessment** — where the gaps are and how much they likely matter. The verdict drives whether the orchestrator re-spawns IMPLEMENT for another retry attempt; the opportunity assessment tells the orchestrator (and the user) which gaps would be high-leverage to close, even on `pass`. Together they replace the old yes/no framing.
+Compare reproduced results against the paper's replication targets. COMPARE returns two things: a **verdict** (pass / partial / fail) and an **opportunity assessment** — where the gaps are, how much they likely matter, and how they sit relative to the user's fidelity intent in CLAUDE.md's Goal section. The verdict drives whether the orchestrator re-spawns IMPLEMENT for another retry; the opportunity assessment tells the orchestrator (and the user) which gaps fall below intent and would be high-leverage to close, even on `pass`. Together they replace the old yes/no framing.
 
-This phase runs as the orchestrator-spawned `compare` sub-agent. The orchestrator and the user together decide what to do with COMPARE's output — spend another IMPLEMENT round now (close a high-leverage gap), accept the current verdict and proceed to REVIEW, or land at the current rigor level and log the gap as an open opportunity in CLAUDE.md's **Rigor** section. The user can drop into the compare sub-agent's chat for the verdict ratification conversation, or wait until REVIEW close-out.
+This phase runs as the orchestrator-spawned `compare` sub-agent. The orchestrator and the user together decide what to do with COMPARE's output — spend another IMPLEMENT round now (close a below-intent gap), accept the current verdict and proceed to REVIEW, or land at the current trajectory and log the gap as an open opportunity in CLAUDE.md's **Rigor** section. The user can drop into the compare sub-agent's chat for the verdict ratification conversation, or wait until REVIEW close-out.
 
 ## Inputs
 
@@ -62,6 +62,7 @@ opportunities:
     gap: "<what could be tightened — even if the target matched>"
     leverage: "<rough sense of impact: 'changes headline number by ~10%' / 'cosmetic only' / 'unknown'>"
     fix_pointer: "<where the fix would land — script:line, decision id, or implementation-notes section>"
+    relative_to_intent: above|at|below
 ```
 
 ## Verdict rules
@@ -81,18 +82,26 @@ The `opportunities:` block surfaces **gaps that didn't necessarily fail the verd
 - A decision SPECIFY recorded with code-as-canonical that has an unresolved disagreement still in `open-questions.md` and could move the result.
 - A sub-analysis whose evidence quotes are paraphrased rather than verbatim (would fail `--verify-evidence` if pushed harder).
 
-Each opportunity gets a leverage one-liner so the orchestrator and user can decide where to spend attention. Empty `opportunities:` is a strong signal — say "the reproduction is at canonical rigor across the targets" rather than padding.
+Each opportunity gets two grades: a **leverage** one-liner (impact if closed) and a **relative_to_intent** placement against the user's fidelity intent in CLAUDE.md's Goal section:
 
-Also write `comparison-report.md` with a human-readable summary. For figure / table comparisons, describe what you see in both and explain your match judgment. Include the opportunity assessment as its own section.
+- `below` — the user's intent calls for tighter than this; closing the gap moves the reproduction toward what they actually want.
+- `at` — closing the gap reaches the intent; further tightening would be gravy.
+- `above` — already past the intent; log it but it doesn't pull on attention.
+
+Read the Goal's fidelity intent prose to make the call. "Figure 3 must be right" + a sketch-level figure 3 systematics = `below`. "Just checking the analysis is tractable" + a canonical-grade outputs block + a sketchy sub-analysis = `above` everywhere except the headline. When intent is silent on something, default to `at` for primary targets, `above` for secondaries.
+
+Empty `opportunities:` is a strong signal — say "the reproduction is at canonical rigor across the targets" rather than padding.
+
+Also write `comparison-report.md` with a human-readable summary. For figure / table comparisons, describe what you see in both and explain your match judgment. Include the opportunity assessment as its own section — group by `relative_to_intent` so the `below` items lead.
 
 ## Verdict + opportunity surfacing
 
-After writing the report, the compare sub-agent reports back to the orchestrator with the verdict, the failing-output count (if any), and the headline opportunities. The orchestrator either:
+After writing the report, the compare sub-agent reports back to the orchestrator with the verdict, the failing-output count (if any), and the headline opportunities — `below`-intent items first. The orchestrator either:
 
-- **Carries the report to the user** (if the user is reachable in the orchestrator session or the compare sub-agent's chat) for ratification: present verdict, the failing outputs (if `partial` / `fail`), and the top opportunities; ask whether to spend another IMPLEMENT round on a high-leverage gap, accept and proceed to REVIEW, or land at this rigor level and log the gaps as open opportunities in CLAUDE.md.
-- **Acts on standing rigor settings** (if the user is unreachable): if attempt < budget AND verdict is `partial` / `fail`, re-spawn `implement` for a retry; if verdict is `pass` OR attempt >= budget, log opportunities in CLAUDE.md's **Rigor** section as open opportunities and proceed to REVIEW.
+- **Carries the report to the user** (if the user is reachable in the orchestrator session or the compare sub-agent's chat) for ratification: present verdict, the failing outputs (if `partial` / `fail`), and the top `below`-intent opportunities; ask whether to spend another IMPLEMENT round on those gaps, accept and proceed to REVIEW, or land at the current trajectory and log the gaps as open opportunities in CLAUDE.md.
+- **Acts against intent** (if the user is unreachable): if attempt < budget AND (verdict is `partial` / `fail` OR any opportunity is `below` intent), re-spawn `implement` targeting the `below` gaps first; if verdict is `pass` AND no opportunities are `below`, OR attempt >= budget, log remaining opportunities in CLAUDE.md's **Rigor** section and proceed to REVIEW.
 
-The verdict is the compare sub-agent's judgment; the **decision to keep iterating or move on** is the orchestrator's (in dialogue with the user). The opportunity assessment is the bridge — it turns a binary verdict into a graded picture the user can navigate.
+The verdict is the compare sub-agent's judgment; the **decision to keep iterating or move on** is the orchestrator's (in dialogue with the user). The opportunity assessment — graded against the user's fidelity intent — is the bridge that turns a binary verdict into a picture both parties can navigate.
 
 ## Survey signals (entry into COMPARE)
 

From f3d81bafc0b0b146e12d6e3e47584ffb6f7023cc Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 01:18:43 +0200
Subject: [PATCH 042/124] narrative: substrate/mode separation, operational
 rebuild
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Strip the philosophical scaffolding (three-phase ontology, asymmetric
load, honest/performative framing) and match the voice of sibling
skills like lc-new and paper-extraction. The substrate now lives only
in SKILL.md (5 keys, length, anchors, data flow, validation, craft,
anti-patterns); mode-specific drafting moves live in the references.

Modes reframed around the second source paired with the spec:
- Paper reproduction: an authoritative text source
- Retrofit: project artifacts
- Co-drafting: dialogue with the user

The spec is the constant source across all three; what differs is what
gets paired with it. Mode-picking is by source, not entry point — drop
the lc-from-paper-as-framing language, point at /paper-extraction
directly when work/reference/ is missing.

Restore paper-reproduction.md as the deeper guide for the production
mode, minus paper acquisition (paper-extraction owns that surface).
Reproduction moves split into clearer bullets: tell the author's story
by default, paraphrase don't lift, adapt and flag when reproduction
results diverge from the paper. Fidelity audit acknowledges the
divergent-results exception explicitly.

Data flow section reframed around prose-navigability — the prose
itself carries the trail via anchors; reader follows the flow inline.
Drop the "validator does not enforce these" caveat.

Add a Real-subjects-real-verbs craft section borrowed from the writing
skill (agency rules, valid subject categories, the can-you-picture-it
test) so the skill is self-contained on scientific-prose craft.

Two spec drifts fixed: from_ref → from (astra-spec#17), dead
astra-spec#16 link removed (issue was never actually filed).

Total skill: 977 → 477 lines across SKILL.md + three references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/narrative/SKILL.md    | 447 +++++-------------
 .../narrative/references/co-drafting.md       |   4 +-
 .../narrative/references/existing-analysis.md | 218 ++-------
 .../references/paper-reproduction.md          | 237 +++-------
 4 files changed, 235 insertions(+), 671 deletions(-)

diff --git a/claude/lightcone/skills/narrative/SKILL.md b/claude/lightcone/skills/narrative/SKILL.md
index 456d8923..4cb3c5d7 100644
--- a/claude/lightcone/skills/narrative/SKILL.md
+++ b/claude/lightcone/skills/narrative/SKILL.md
@@ -1,312 +1,167 @@
 ---
 name: narrative
 description: >
-  Author or revise the `narrative:` prose inside an ASTRA analysis
-  (`astra.yaml` and its sub-analyses) plus decision `rationale:` fields.
-  Five fixed keys at each scale (`summary`, `findings`, `methods`,
-  `inputs`, `outputs`). Three working modes — paper reproduction
-  (ready), existing-analysis retrofit (under development), and
-  interactive in-flight authoring (under development). Use when the
-  `narrative:` block is empty or stub, when a decision needs a
-  `rationale:`, when a sub-analysis needs its own narrative, or when
-  revising existing prose. Triggers on "narrative", "draft the
-  narrative", "narrate this analysis", "narrate this sub-analysis",
-  "rationale for this decision", "write the summary", or any request
-  for reader-facing prose keyed off an astra.yaml.
+  Authors prose throughout an `astra.yaml` — analysis-level
+  `narrative:` blocks (five fixed keys: `summary`, `findings`,
+  `methods`, `inputs`, `outputs`), decision `rationale:` fields, and
+  shorter `description:` / `notes:` prose on individual entities. The
+  five-key narrative is the most substantive case; the same
+  architectural and syntactical frame applies wherever prose appears
+  in the spec.
+  Always written against an existing `astra.yaml`; what differs
+  between modes is the second source paired with the spec — an
+  authoritative text (paper reproduction), project artifacts
+  (retrofit), or dialogue with the user (co-drafting). Triggers on
+  "narrative", "draft the narrative", "narrate this analysis",
+  "rationale for this decision", "write the summary", "describe this
+  input", or any request for reader-facing prose keyed off an
+  `astra.yaml`.
 ---
 
 # narrative
 
-## What this skill writes
-
-One field: `narrative:` on an analysis or sub-analysis, or `rationale:` on a decision.
-Per-element prose (what each `Input`, `Output`, `Decision`, `Option`, or `Insight` is and why it matters) lives on those elements' own `description` / `rationale` / `notes` fields.
-`narrative` is the analysis-level story that weaves the pieces together.
-
-This skill is also part of the lightcone-cli paper-reproduction bundle: the
-`/lc-from-paper` orchestrator invokes it during the SPECIFY phase to author the
-narrative for the spec it has just crafted. Sibling skills in the bundle —
-`constitution`, `ralph-loops`, `paper-extraction`,
-`check-sentence-by-sentence`, `figure-comparison` — solve adjacent pieces of
-the reproduction story; this skill stands alone and does not need to know
-about them.
-
-## What a narrative is
-
-Science, from a single decision to a review paper, is a practice of
-engaging with previous work and telling the story of what was tried
-and what it means. Any honest account does three things.
-
-**Grounding.** Where the work sits — state of the field, open
-questions, prior work it responds to, upstream decisions that shape
-its choices. Tells the reader why before the work shows its own
-value. May foreshadow findings.
-
-**Movement of learning.** Not the tidied retrospective ("we did X,
-obtained Y") but traces of the process: what was tried, what failed,
-what forced a step back. The best papers convey this; most compress
-it away under length pressure. ASTRA's telescoping makes it cheap —
-a sentence at the top about global-vs-per-object PSF leakage, one
-level down where the nerd gets the two pages on how the team got
-there. Papers don't have this affordance and so compress iteration
-away; ASTRA does, and authors should spend it.
-
-**Implications.** What the results mean and where they point.
-Results are facts; what they do to the field is the argument.
-Forward-look matters even when unformed — that is where science
-passes the baton.
-
-A narrative that does all three at the appropriate scale is honest.
-One that presents only results and methods elides the meaning-making.
-
-The three phases repeat at every scale. A top-level analysis
-narrates them across five keys (`summary`, `methods`, `findings`,
-`inputs`, `outputs`); a sub-analysis does the same; a decision
-narrates in one paragraph of `rationale:`. The telescope gives the
-reader a short view at their current depth and the option to drill
-in — without exploding the parent.
-
-## Length as forcing function
-
-1–3 paragraphs per key, at any level.
-
-Length is the mechanism that keeps analyses modular, not a style
-preference. If the references don't fit in three paragraphs, the
-analysis is too big — split it. The narrative is a compressor; if
-it won't compress, split the thing being compressed.
-
-## What this prose is for
-
-ASTRA preserves the decision structure that papers compress into
-linear argument; the narrative keeps that structure legible. Three
-consequences:
-
-- **Not wiki, not paper.** A wiki page summarizes ("BAO is the
-  baryon acoustic oscillation feature"); a paper compresses ("we
-  chose the Gaussian prior"). An ASTRA narrative **points into
-  reasoning** — it names the load-bearing decision, anchors to the
-  structured node that records it, and lets the reader follow. The
-  prose does not re-explain the field or re-list the spec.
-- **Read and queried.** The narrative is consumed by human readers
-  *and* by agent retrievers. Anchor coverage and clarity are
-  substrate, not style — an uncited decision is invisible to both
-  readings.
-- **Asymmetric load.** The three phases don't map onto ASTRA's
-  structure evenly. Movement-of-learning has strong structural
-  support — `decisions`, `options`, `prior_insights`, the
-  sub-analysis DAG — and `methods` condenses what structure already
-  carries. Grounding has partial support at the decision site;
-  implications have none. On those two phases, the narrative is the
-  reader's only access — carry just enough, and err toward brevity
-  and certainty.
-
-## Pick a mode first
-
-**Paper reproduction is production-ready. Retrofit and interactive
-are under active development — their references are working drafts.**
-
-Three modes. Read the matching reference file in full before drafting.
-
-| Mode | Reference | Status | When |
-|---|---|---|---|
-| **Paper reproduction** | [`references/paper-reproduction.md`](references/paper-reproduction.md) | **Ready.** | A published paper exists and the analysis mirrors it. Primarily in-house Lightcone work (DESI BAO and similar) plus end users bringing a paper to reproduce. Covers paper sourcing (arXiv LaTeX preferred), paper→ASTRA mapping, voice seams, fidelity rules. |
-| **Existing-analysis retrofit** | [`references/existing-analysis.md`](references/existing-analysis.md) | Under development. | Code, results, or an in-flight project being imported into ASTRA with no source paper. Archaeological work: triage, reconstruction of intent, gaps where the record is silent. |
-| **Interactive (in-flight research)** | [`references/interactive.md`](references/interactive.md) | Under development. | New research being done now; the narrative drafted alongside the work. Provisional voice, ask-first discipline. |
+This skill covers prose authoring across an `astra.yaml`. The prose surfaces are:
 
-If unsure which applies, confirm with the user via `AskUserQuestion`.
+- **Analysis `narrative:` blocks** — five keys (`summary`, `inputs`, `methods`, `findings`, `outputs`) on each analysis and sub-analysis.
+- **Decision `rationale:` fields** — one paragraph per decision.
+- **Per-entity prose** — shorter `description:` / `notes:` on individual inputs, outputs, options, insights.
 
-The rest of this file is the **mode-independent substrate** every
-reference relies on.
+ASTRA's structural content surfaces alongside the prose in renderers like lightcone-ui. **Prose does not duplicate the structure** — it cites into it. An anchor is a citation; a sentence pointing to a decision is a small argument; prose is the layer where decisions, sub-analyses, findings, and outputs become a connected story.
 
----
+## Modes
 
-## Narrate what you declare
+Prose cites the spec's structure (decisions, findings, outputs, sub-analyses) by anchor, so the structure must exist when the prose lands: write the spec first, write both concurrently, or revise narrative after spec changes settle.
 
-The five keys are schema-optional, but `astra validate` applies a
-**conditional requirement** — a section must hold non-empty prose
-when the corresponding structured data exists on the Analysis node.
+There are three modes, distinguished by what's available beyond the spec itself. Every mode draws on the under-construction `astra.yaml`; what differs is the **second source** paired with it.
 
-| Key | Required when |
-|---|---|
-| `findings` | `Analysis.findings` has entries |
-| `methods` | `Analysis.decisions` or `Analysis.analyses` has entries |
-| `inputs` | `Analysis.inputs` has entries |
-| `outputs` | `Analysis.outputs` has entries |
-| `summary` | always optional (no structured counterpart) |
-
-Three consequences worth internalizing:
-
-- **A stub analysis with only `summary` is valid.** Use that for
-  stage-zero scoping.
-- **Don't write a `findings` key before findings are declared.** If the
-  spec's `findings:` list is empty, the narrative's `findings` key
-  should not appear — adding prose about findings that don't exist is
-  fiction.
-- **`summary` is the one key without a structural peer.** It's the
-  "question, scope, orientation" key — the only place prose stands
-  alone, not framing something structural.
+| Mode | Second source | Status | Reference |
+|---|---|---|---|
+| **Paper reproduction** | An authoritative text source (paper, thesis, technical report, …) | Ready | [`references/paper-reproduction.md`](references/paper-reproduction.md) |
+| **Retrofit** | Project artifacts — code, notebooks, fibers, commit history | Stub | [`references/existing-analysis.md`](references/existing-analysis.md) |
+| **Co-drafting** | The user, in conversation | Stub | [`references/co-drafting.md`](references/co-drafting.md) |
 
----
+If the second source isn't obvious from context, ask: is there an authoritative text (paper, thesis, technical report) to draw from? If not, are we harvesting from existing artifacts, or working from the user's own framing? Hybrid is allowed — a reproduction with co-drafted extensions, a retrofit with co-drafted gap-filling.
 
-## The spec renders alongside the narrative
+The rest of this file is the mode-independent substrate every reference relies on. Read it through, then open the matching reference.
 
-ASTRA's structural content — decisions, findings, inputs, outputs,
-sub-analyses, options — surfaces alongside the narrative. Structural
-peers will be presented; **prose does not duplicate them.** An
-abstract does not list every methods subsection; a methods section
-does not re-state every appendix equation. Prose assumes its
-structural peers exist and focuses on argument.
+---
 
-Applied to the five keys:
+## The five keys
 
-- `summary` **orients** — question, scope, headline shape.
-- `methods` **walks the pipeline**, citing each decision and
-  sub-analysis by anchor where they appear. Movement-of-learning
-  lives here.
-- `findings` **synthesizes** — each finding cited by anchor as part of
-  the argument, not an enumeration.
-- `inputs` **names provenance**.
-- `outputs` **names what was promoted and why**, citing each by anchor —
-  **and names its downstream consumers** when they exist (see "Data flow" below).
-- Decision `rationale:` **names why the default won**.
+| Key | What it carries | Required when |
+|---|---|---|
+| `summary` | Question, scope, headline shape — the only key without a structural peer. | optional in the schema, but should always exist |
+| `inputs` | Provenance — the data the analysis rests on. | `Analysis.inputs` is non-empty |
+| `methods` | Pipeline walk; cite each decision and sub-analysis by anchor. | `Analysis.decisions` or `Analysis.analyses` is non-empty |
+| `findings` | Synthesis of declared findings; each cited by anchor. | `Analysis.findings` is non-empty |
+| `outputs` | Which artifacts were promoted, and where they go downstream. | `Analysis.outputs` is non-empty |
 
----
+`astra validate` enforces the right column. **Narrate what you declare:** if `findings:` is empty, `narrative.findings` should not appear. A stub analysis with only `summary` is valid.
 
-## Data flow — name where each output goes
+A decision's `rationale:` is its own one-paragraph slot — what was decided, the insight that motivated it (cite by anchor), and what the load-bearing alternative was and why it lost. The alternatives themselves live in the options structure.
 
-Recipe `inputs:` wires the DAG; the narrative makes the wiring legible. The
-schema already encodes who consumes what — readers should not have to grep
-49 `inputs:` lists to learn what an intermediate output is *for*.
+## Length
 
-Two rules — both load-bearing for projects with sub-analyses:
+1–3 paragraphs per key, at any level (root, sub-analysis, decision).
 
-1. **`narrative.outputs` names downstream consumers.** When authoring
-   `outputs` prose on a sub-analysis or the root, name where each output
-   gets consumed using the `<analysis>.<output>` form that recipe `inputs:`
-   already uses. *"`xi_post_recon_lrg1` feeds
-   [`bao_fit_post_iso_ap_lrg1`](#analyses.bao_fit.outputs.bao_fit_post_iso_ap_lrg1)
-   and [`bao_detection_chi2_lrg1`](#findings.bao_detection_chi2_lrg1)."*
-   Anchor where you can; bare `<analysis>.<output>` text is acceptable when
-   no anchor is reachable from the current scope.
+Length is the mechanism that keeps analyses modular, not a style preference. **If references don't fit in three paragraphs, the analysis is too big — split it.** The narrative is a compressor; if it won't compress, split the thing being compressed.
 
-2. **Root narrative includes a top-down data-flow paragraph.** When the
-   project has sub-analyses, the root analysis's `methods` (or `summary`)
-   must include one paragraph that traces the pipeline end-to-end:
-   *"raw catalogs → [reconstruction.post_recon_catalog_*](#analyses.reconstruction)
-   → [clustering.xi_*_recon_*](#analyses.clustering) → root [bao_fit_*](#outputs.bao_fit_post_iso_ap_lrg1)."*
-   This is the one place a reader can land cold and get the shape of the
-   pipeline without reading every recipe declaration.
+## Anchors
 
-Closes [lightcone-cli#108](https://github.com/LightconeResearch/lightcone-cli/issues/108).
-The validator does not (yet) enforce this; treat both rules as authorial
-discipline. The information is already in the spec — surface it.
+Markdown link syntax with `#`-target, **tree-path-first** — same grammar as decision `from:` references.
 
----
+| Target | Anchor |
+|---|---|
+| Input | `#inputs.<id>` |
+| Output | `#outputs.<id>` |
+| Decision | `#decisions.<id>` |
+| Option within a decision | `#decisions.<id>.options.<opt>` |
+| Finding | `#findings.<id>` |
+| Prior insight | `#prior_insights.<id>` |
+| Sub-analysis (whole node) | `#analyses.<sub>` |
+| Element inside sub-analysis | `#<sub>.<category>.<id>` |
+| Parent scope (from a sub-analysis) | `#../decisions.<id>` |
 
-## Anchor coverage
+The sub-analysis form is **sub-analysis first, then category**: `#reconstruction.decisions.algorithm`, not `#decisions.reconstruction.algorithm`. References resolve relative to the hosting analysis; use `../` to escape to parent scope.
 
-`astra validate` checks:
+Rules:
 
-- **Broken references** → error. Anchor doesn't resolve to a real id.
-- **Uncited declared elements** → warning. Every declared finding,
-  decision, output, and sub-analysis must be cited somewhere in the
-  narrative tree.
+- Anchor text is **authored prose**, not the raw id.
+- Inline references do the work of a citation; don't footnote or parenthesize.
+- One reference per idea. Stacking three on a sentence means the sentence carries too much.
+- Prior insights motivate decision options via `decisions.<id>.options.<opt>.insights:`. Findings cannot appear there (validator-enforced); if a finding motivates a decision, cite it from the decision's `rationale:` prose.
 
-If a declared element is genuinely not worth a prose mention, consider
-whether it should be declared at all.
+### Reserved IDs
 
----
+These names cannot be used as entity IDs (they collide with the anchor grammar): `inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`. The validator rejects them.
 
-## User presence
+## Data flow
 
-Multi-turn back-and-forth → user present; use `AskUserQuestion` to
-clarify mode, scale, and reproduction-vs-extension before drafting.
-Single-shot or pipeline invocation → autonomous; make the reasonable
-default inference and note it inline on the narrative. Ambiguous →
-err on present and ask.
+Make the data-flow linkage navigable in the prose itself. Anchors are the trail — a reader follows the flow inline, without leaving the narrative.
 
----
+1. **`narrative.outputs` says where each output goes next.** A sub-analysis's outputs are usually consumed by other sub-analyses or roll up into root findings. When you write the `outputs` prose, name those downstream destinations by anchor. Example, in the `reconstruction` sub-analysis's `outputs` key:
 
-## Phase → key mapping
+   > *"`xi_post_recon_lrg1` feeds [the post-reconstruction BAO fit](#analyses.bao_fit.outputs.bao_fit_post_iso_ap_lrg1) and supports the [headline detection finding](#findings.bao_detection_chi2_lrg1)."*
 
-The three phases (see top) map onto the five keys unevenly:
+   Anchor downstream consumers where you can. When no anchor is reachable from the current scope (typically a sibling sub-analysis), bare `<analysis>.<output>` text is acceptable.
 
-| Key | Dominant phase |
-|---|---|
-| `summary` | all three, telescoped |
-| `findings` | implications |
-| `inputs` | grounding |
-| `methods` | movement of learning |
-| `outputs` | structural; phase-thin |
+2. **The root narrative is the end-to-end view.** When the project has sub-analyses, the root analysis's `methods` (or `summary`) traces the pipeline from raw inputs to final outputs — as much overview as fits in a few paragraphs. The root is the place a reader can land cold and get the shape of the work; details telescope into the sub-analyses. A condensed example:
 
-There is no `discussion` key. Implications distribute into `summary`
-and `findings`.
+   > *"raw catalogs → [reconstruction](#analyses.reconstruction) → [clustering](#analyses.clustering) → root [BAO fit](#outputs.bao_fit_post_iso_ap_lrg1)."*
 
----
+## Validation
 
-## Anchor syntax
+```sh
+astra validate astra.yaml
+```
 
-Markdown link syntax, `#`-target, **tree-path-first**.
+- **Broken references** → error. Anchor doesn't resolve to a real id.
+- **Uncited declared elements** → warning. Every declared finding, decision, output, and sub-analysis must be cited somewhere in the narrative tree. If an element genuinely isn't worth a prose mention, consider whether it should be declared at all.
+- **Conditional coverage** → error. The required-when rule above.
 
-| Target | Anchor |
-|---|---|
-| Input | `#inputs.<id>` |
-| Output | `#outputs.<id>` |
-| Decision | `#decisions.<id>` |
-| Option within a decision | `#decisions.<id>.options.<opt>` |
-| Finding | `#findings.<id>` |
-| Prior insight | `#prior_insights.<id>` |
-| Sub-analysis (whole node) | `#analyses.<sub>` |
-| Element inside sub-analysis | `#<sub>.<category>.<id>` (e.g. `#reconstruction.decisions.algorithm`) |
-| Parent scope (from a sub-analysis) | `#../decisions.<id>` |
+## User presence
 
-Note the sub-analysis form: **sub-analysis first, then category**.
-`#reconstruction.decisions.algorithm`, not `#decisions.reconstruction.algorithm`.
-References are interpreted **relative to the hosting analysis**; use
-`../` to escape to parent scope (matches decision `from_ref` syntax).
+Multi-turn back-and-forth → user present; use `AskUserQuestion` to clarify mode, scale, and any mode-specific framing before drafting. Single-shot or pipeline invocation → autonomous; make the reasonable default inference and note it inline on the narrative. Ambiguous → err on present and ask.
 
-Rules:
+---
 
-- Anchor text is authored prose, **not** the raw id.
-- Inline refs do the work of a citation; don't footnote or parenthesize.
-- One ref per idea. Stacking three on a sentence means the sentence
-  carries too much.
-- Findings cannot currently appear in `decisions.options.insights`
-  (see [astra-spec#16](https://github.com/LightconeResearch/astra-spec/issues/16)).
-  When a finding motivates a decision, cite it from the decision's
-  `rationale:` prose.
+## Craft
 
----
+- **Economy.** Every sentence introduces a new idea or sharpens an existing one. Release real verbs: `conducted cross-correlation` → `cross-correlated`.
+- **Anchor text is prose, not an id.** `[the post-reconstruction catalogs](#analyses.reconstruction)`, not `[reconstruction](#analyses.reconstruction)`.
+- **One reference per idea.** Three anchors on one sentence means the sentence carries too much; split it or drop one.
+- **Specificity.** Names, numbers, references over generic claims.
+- **Arrive through content.** No "in this analysis we will describe…"; the content is the opening.
 
-## Reserved entity names
+### Real subjects, real verbs
 
-These names cannot be used as entity IDs (they collide with the
-anchor grammar): `inputs`, `outputs`, `decisions`, `findings`,
-`prior_insights`, `analyses`, `options`, `content`, `narrative`.
+"We measure the BAO peak with the LRG sample" reads as agency. "The measurements of the BAO peak reveal a 7σ detection" reads as zombie-noun abstraction. The test: can you picture someone or something physically doing the verb? If not, rewrite.
 
-If you find an entity using one (legacy spec), flag it; the authoring
-tooling and validator will reject it.
+Valid subjects:
 
----
+- **We** — for decisions and actions ("we chose the Gaussian damping prior")
+- **The thing itself** — for states and properties ("the covariance is dominated by shot noise")
+- **Passive voice** — when the actor is obvious ("a redshift cut is applied")
+- **Results / data as epistemic subjects** — for what the data shows ("the measurement shows a 7σ peak"; "Figure 2 reveals…")
+- **Physics doing physics** — for physical processes ("lensing distorts shapes"; "higher-order effects produce B-modes")
 
-## Linking relationships — structural vs narrative
+Anthropomorphized abstractions fail the test: "the methodology validates," "this analysis demonstrates," "the catalogue evolution follows." Rewrite to a real subject doing a real verb.
 
-| Relationship | Structural | Narrative |
-|---|---|---|
-| Prior insight → decision option | `decisions.<id>.options.<opt>.insights: [ids]` | inline in `methods` when the decision is discussed |
-| Finding → output | `findings.<id>.evidence` → `outputs.<id>` | inline in `findings` |
-| Finding → decision | *no structural link yet* (#16) | inline in decision's `rationale:` |
-| Decision → decision | `decisions.<id>.from: <ref>` or `from: ../decisions.<id>` | inline in the inheriting decision's `rationale:` |
+## Anti-patterns (mode-independent)
 
-If a relationship is structural, don't duplicate it in prose — cite
-it by anchor.
+- **Wiki-style what-is framing.** "BAO is the baryon acoustic oscillation feature." A wiki summarizes; an ASTRA narrative points into reasoning. Replace with the load-bearing statement and an anchor: "we chose the Gaussian BAO damping prior over flat because flat admitted spurious minima — see [the prior comparison](#decisions.bao_damping_prior)."
+- **Decision-list paragraph.** "We made the following decisions: A, B, C." Cite each decision where it shapes the pipeline, not as recitation. Too many to weave coherently → the spec wants more sub-analyses.
+- **`summary` as primer.** Teaching what the field is. Readers arrive with context.
+- **Drafting `findings` on a sub-analysis with no declared findings.** Skip the key.
+- **Narrative-per-element.** Writing `narrative:` on findings, inputs, outputs, or insights. The five-key analysis narrative is the only home; per-element prose is `description` / `rationale` / `notes`.
+
+Mode-specific anti-patterns live in each mode's reference.
 
 ---
 
 ## Self-contained example
 
-A minimal (not necessarily valid) sketch showing how the blocks fit
-together. The point is the *shape*.
+A minimal (not necessarily valid) sketch showing how the blocks fit together. The point is the *shape*.
 
 ```yaml
 id: example_analysis
@@ -326,21 +181,18 @@ narrative:
     uses [<mocks>](#inputs.validation_mocks).
 
   methods: |
-    The pipeline runs in two stages.
-    [Preparation](#analyses.preparation) ingests the raw catalog and
-    produces [cleaned two-point statistics
-    ](#preparation.outputs.clean_stats).  [Fitting
-    ](#analyses.fitting) consumes those statistics and fits model
-    parameters.  Both stages inherit the parent's
-    [fiducial cosmology](#decisions.fiducial_cosmology) so the
-    distance-redshift relation is used end-to-end.
+    The pipeline runs in two stages.  [Preparation](#analyses.preparation)
+    ingests the raw catalog and produces [cleaned two-point statistics
+    ](#preparation.outputs.clean_stats).  [Fitting](#analyses.fitting)
+    consumes those statistics and fits model parameters.  Both stages
+    inherit the parent's [fiducial cosmology](#decisions.fiducial_cosmology)
+    so the distance-redshift relation is used end-to-end.
 
   findings: |
     Three findings constitute the result: a
     [headline detection](#findings.headline_detection), a
-    [precision comparison with prior work
-    ](#findings.precision_improvement), and
-    [an anomalous feature](#findings.anomaly).  The anomaly is the
+    [precision comparison with prior work](#findings.precision_improvement),
+    and [an anomalous feature](#findings.anomaly).  The anomaly is the
     most-discussed qualitative feature.
 
   outputs: |
@@ -355,8 +207,8 @@ decisions:
     rationale: |
       Planck 2018-ΛCDM is the community reference; distance-redshift
       conversion is downstream of this choice, and fixing it lets
-      results be compared directly to prior measurements.  Inherited
-      by [fitting](#analyses.fitting) so the end-to-end chain uses one
+      results be compared directly to prior measurements.  Inherited by
+      [fitting](#analyses.fitting) so the end-to-end chain uses one
       distance scale.
     default: planck2018
     options:
@@ -367,73 +219,10 @@ decisions:
         excluded_reason: "Superseded; no longer the community reference."
 ```
 
-What to notice:
-
-- Anchor text is prose, not an id.
-- `methods` uses the sub-analysis-first form
-  (`#preparation.outputs.clean_stats`) for cross-scope refs.
-- `findings` synthesizes how three findings relate; each cited by
-  anchor, not recited.
-- `outputs` is thin — two sentences.
-- Decision rationale cites a sub-analysis by anchor when the choice
-  propagates, and says why the default won without enumerating options.
-
-For a canonical reproduction narrative in context, see
-`Reproductions/DESI/desi-dr1-bao/astra.yaml` in
-`LightconeResearch/Reproductions`.
-
----
-
-## Craft
-
-- **Economy.** Every sentence introduces a new idea or sharpens an
-  existing one. Release real verbs: `conducted cross-correlation` →
-  `cross-correlated`.
-- **Epistemic honesty.** Hedges carry information about certainty.
-  "This suggests" reflects real uncertainty; "may perhaps indicate" is
-  decorative.
-- **Show, don't label.** Describe the tension; don't announce it. Cut
-  signposting: "the key insight is," "importantly," "it is worth
-  noting."
-- **Specificity.** Names, numbers, references over generic claims.
-- **Arrive through content.** No "in this analysis we will describe…";
-  the content is the opening.
-
----
-
-## Anti-patterns (mode-independent)
-
-- **Narrative-per-element.** Writing `narrative:` on findings, inputs,
-  outputs, or insights. The five-key analysis narrative is the only
-  home; per-element prose is `description` / `rationale` / `notes`.
-- **Results-only narrative.** Methods without movement-of-learning
-  elides the meaning-making. At minimum, name one pivot or abandoned
-  option per scale.
-- **Decision-list paragraph.** "We made the following decisions: A,
-  B, C." Cite each decision where it shapes the pipeline, not as
-  recitation. Too many to weave coherently → the spec wants more
-  sub-analyses.
-- **Wiki-style what-is framing.** "BAO is the baryon acoustic
-  oscillation feature." A wiki summarizes; an ASTRA narrative points
-  into reasoning. Replace with "we chose the Gaussian BAO damping
-  prior over flat because flat admitted spurious minima" — with the
-  anchor. Applies to every key.
-- **`summary` as primer.** Teaching what the field is. Readers arrive
-  with context.
-
----
-
-## Lint
-
-1. `astra validate <path>` — catches broken anchors, schema
-   violations, uncited declared elements.
-2. Paragraph count per key — flag anything over three.
-3. Only conditionally-required keys present — if `findings:` is
-   empty, `narrative.findings` is absent.
+For a canonical reproduction narrative in context, see `Reproductions/DESI/desi-dr1-bao/astra.yaml` in the [LightconeResearch/Reproductions](https://github.com/LightconeResearch/Reproductions) repo.
 
 ---
 
 ## Now read the mode reference
 
-Before drafting, open the reference file that matches the user's
-situation.
+Open the reference file that matches the user's situation. Each carries the mode's draft order, mode-specific moves, critique pass, and mode-specific anti-patterns.
diff --git a/claude/lightcone/skills/narrative/references/co-drafting.md b/claude/lightcone/skills/narrative/references/co-drafting.md
index 007353fb..1deee5b6 100644
--- a/claude/lightcone/skills/narrative/references/co-drafting.md
+++ b/claude/lightcone/skills/narrative/references/co-drafting.md
@@ -14,13 +14,13 @@ Pure greenfield (no `astra.yaml` at all) isn't a coherent narrative-skill task 
 
 ## What's distinct from paper reproduction
 
-- **Source is conversation, not prose.** The paper-reproduction harvest move (paraphrase from a written source) doesn't apply. Draft moves come from `AskUserQuestion`-batched dialogue, not from extracting prose.
+- **Source is conversation, not prose.** The paper-reproduction harvest move (paraphrase from a written source) doesn't apply. Draft moves come from dialogue with the user — `AskUserQuestion` when several framing questions land together, prose follow-ups when one question opens the next.
 - **Voice depends on stage.** Reproduction is always declarative ("The pipeline runs in…"). Co-drafting voice tracks where the work is: present-tense for live work, past tense for completed steps, provisional markers when content is volatile.
 - **Spec and narrative move together.** In reproduction the spec is fixed (or close to it) and the narrative reconstructs the paper. In co-drafting the spec may shift between drafts; expect to revisit narrative when a decision lands or a sub-analysis splits.
 
 ## The ask-first discipline
 
-Co-drafting is the one mode where authoring without asking produces fiction. The user is available; ask. Use `AskUserQuestion` to batch up the load-bearing reads before drafting:
+Co-drafting is the one mode where authoring without asking produces fiction. The user is available; ask. Surface the load-bearing reads before drafting — `AskUserQuestion` when several land together, single questions or prose follow-ups when the conversation wants its own rhythm:
 
 - **Research question.** What are you trying to learn? One sentence.
 - **Current headline finding** (if any). What's been established so far? One sentence; a gesture is fine.
diff --git a/claude/lightcone/skills/narrative/references/existing-analysis.md b/claude/lightcone/skills/narrative/references/existing-analysis.md
index 765f525c..a28cba29 100644
--- a/claude/lightcone/skills/narrative/references/existing-analysis.md
+++ b/claude/lightcone/skills/narrative/references/existing-analysis.md
@@ -1,172 +1,50 @@
-# Existing-analysis retrofit mode
-
-> **Status: under development.** This mode is scaffolded but not yet
-> production-ready. The workflow below is a working draft — treat it
-> as a starting point, not a locked spec. For the production-ready
-> path, use paper reproduction mode if applicable. Report friction
-> back so this reference can firm up.
-
-A project has been running — with code, results, a working directory,
-possibly a partial spec — and is being imported into ASTRA. There is
-no published paper; the narrative is being built from artifacts, not
-reconstructed from prose.
-
-Read the main SKILL.md first. This file adds what's specific to
-retrofit.
-
-Retrofit is distinct from paper reproduction (there is no source
-narrative to reconstruct) and from interactive authoring (the work is
-already done, or at least substantially done, rather than in flight).
-The core move is **archaeology**: classifying what's live, harvesting
-intent from whatever artifacts carry it, marking gaps where the record
-is silent.
-
-## Workflow
-
-### 1 · Triage
-
-Before writing a single sentence, classify the project's contents.
-
-Go through `astra.yaml` and each sub-analysis and mark:
-
-- **live** — current, active, still used downstream
-- **superseded** — kept in the spec for record, but no longer what's
-  actually run
-- **abandoned** — tried and dropped; may or may not belong in the
-  narrative as movement-of-learning
-- **unclear** — decision or finding with no documentation; the
-  original rationale is not recoverable from the spec alone
-
-Produce this as a short summary and surface via `AskUserQuestion`.
-Confirm with the user:
-
-- What stays, what is explicitly deprecated, what is abandoned.
-- Whether abandoned options should appear as movement-of-learning
-  (sometimes yes: "we initially tried X, which gave Y; switched to Z"
-  is honest). Sometimes no: trivial or confidential choices don't
-  belong.
-- Which `unclear` items the user can reconstruct, vs. which are
-  genuinely lost.
-
-The narrative only speaks for live content unless the user explicitly
-wants a history section.
-
-### 2 · Harvest
-
-The project's substrate substitutes for a paper's narrative. Mine
-these, in roughly decreasing order of value:
-
-- **`README.md`, `CLAUDE.md`, `NOTES.md`, `TODO.md`** at project root.
-  Often contain the clearest statement of intent.
-- **`.felt/`** or a fibers directory. The author's active thinking,
-  decisions with rationale, meeting notes, open questions.
-- **Notebook markdown cells.** Often the narrative the author wrote
-  for themselves.
-- **Code comments** at function-level decision points. "We drop
-  rows where X < 0.1 because …" is a rationale waiting to be lifted.
-- **Commit messages** at milestone commits. `git log --grep` for
-  keywords like "decided," "switched," "abandoned," "fix" can surface
-  turning points.
-- **Meeting notes, old proposals, grant text.** Grant paragraphs are
-  often where motivation lives in its cleanest form.
-- **Open issues and closed PRs.** Rejected options often have a PR
-  describing what was tried.
-
-Make a list of candidate motivation, methodology, and findings text
-before starting to draft. Where possible, anchor each harvested piece
-to its source so rationales can be traced.
-
-### 3 · Fill the gaps
-
-For each `unclear` decision, try in order:
-
-1. **Ask the user.** `AskUserQuestion` with the decision and its
-   options, asking for a one-sentence rationale.
-2. **If the user doesn't know**, write a fair description of what was
-   chosen and mark it as reconstructed. Example:
-   ```yaml
-   rationale: >-
-     _(Reconstructed 2026-04: original rationale not recorded.  Current
-     reading is that option X was chosen because Y, based on the
-     downstream code's assumptions about Z.)_
-     ...
-   ```
-3. **If the rationale is actually lost**, name that. A narrative that
-   admits "the reasoning for this cut was not recorded and cannot be
-   reconstructed" is honest; one that fabricates a plausible-sounding
-   justification is not.
-
-Do the same for findings without evidence, inputs without provenance,
-and outputs without a clear source sub-analysis.
-
-### 4 · Draft order
-
-Same as reproduction: inputs → methods → findings → outputs →
-summary. Retrofit is stable enough for compression-last to
-work. Unlike interactive authoring, you're narrating after the fact.
-
-### 5 · Voice
-
-- **Past tense for what happened**; present tense only for the living
+# Existing-analysis retrofit mode (stub)
+
+> **Status: under development.** Use paper reproduction (the default flow when a paper exists) when applicable. This file names what's distinct about retrofit and the open questions; it isn't yet production guidance.
+
+A project has been running — code, results, partial spec, no published paper — and is being imported into ASTRA. The `astra.yaml` has been built (or is being built); the narrative is reconstructed from the artifacts that produced the work, not from a written source. Retrofit is **harvest from artifacts**; co-drafting is **harvest from conversation**; reproduction is **harvest from a paper**.
+
+## What's distinct from paper reproduction
+
+- **No source narrative.** The five-key shape has to be assembled from
+  what the artifacts carry: README, CLAUDE.md, fibers, notebook cells,
+  code comments, commit messages, meeting notes, old proposals, issues,
+  closed PRs.
+- **Triage comes first.** Sub-analyses and decisions classify as live /
+  superseded / abandoned / unclear. The narrative speaks for live content
+  by default; abandoned and superseded only appear if the user wants
+  history surfaced.
+- **Gaps are explicit.** When a decision's original rationale isn't
+  recoverable from artifacts and the user can't reconstruct it, the
+  honest move is to say so — `_(Reconstructed YYYY-MM: original rationale
+  not recorded.)_` — not to fabricate a plausible justification.
+- **Past tense for what happened.** Present tense only for living
   structure ("the pipeline runs three stages").
-- **Don't impose a narrative of inevitability.** If the project tried
-  Option A for six months, abandoned it, and switched to B, say so.
-  The iteration is the substance of movement-of-learning — retrofit is
-  where that content has to come from the archaeology, not from a
-  researcher narrating live.
-- **Mark reconstructions.** `_(Reconstructed)_` or a brief prose note
-  when the authoring draws on harvested material whose original author
-  is absent.
-
-### 6 · Critique
-
-In addition to SKILL.md's three-phase and craft audits:
-
-**Triage audit.**
-
-- Does the narrative speak only for live content, unless a deliberate
-  history section is included?
-- Are deprecated / abandoned elements explicitly named as such, or do
-  they appear as if current?
-
-**Harvest audit.**
-
-- Does every load-bearing claim in the narrative trace to a project
-  artifact (commit, notebook cell, fiber, code comment, meeting note)
-  — or to the user's confirmation?
-- Are gaps named rather than fabricated?
-
-## Anti-patterns (retrofit-specific)
-
-- **Fabricated rationales.** Writing a plausible-sounding justification
-  for a decision whose actual rationale was "someone chose this and
-  nobody remembers." Mark the reconstruction, or say the reasoning is
-  lost.
-- **Smoothing over abandoned work.** If the project pivoted mid-way,
-  retrofit is exactly the place where that iteration belongs. Don't
-  write a narrative of smooth progress that contradicts the git log.
-- **Narrating around gaps.** A sub-analysis with no findings doesn't
-  need filler prose explaining what it didn't find; the narrative
-  should say the finding work is not yet done (or was never done).
-- **Missing the archaeology step.** Jumping straight to drafting
-  without triage and harvest produces a narrative in the author's
-  voice about work they didn't do. The result sounds invented because
-  it is.
-- **Treating CLAUDE.md like a paper.** Harvest from it; don't import
-  its style. `CLAUDE.md` is agent-facing; the narrative is
-  reader-facing.
-
-## When retrofit becomes reproduction
-
-If, during retrofit, it becomes clear that the project is actually
-reproducing an unacknowledged paper (code based on a published
-analysis, derived from another group's method), switch to paper
-reproduction mode for the parts that map. Hybrid is fine: reproduce
-what's published; retrofit what's novel or local.
-
-## When retrofit becomes interactive
 
-If the retrofit surfaces that core decisions are still open and the
-user wants to revisit them now, the narrative isn't yet stable. Flag
-to the user and switch to interactive mode for those sections —
-provisional voice, revisit after decisions land.
+## Open questions before this is production-ready
+
+- **What's the canonical artifact harvest?** README, fibers, notebooks,
+  commits, PR threads — order, depth, when to stop. Real retrofit cases
+  will vary widely; the skill needs a default ordering and the criteria
+  for going deeper.
+- **How aggressive is `AskUserQuestion`?** A retrofit on a year-old
+  project may have a researcher who remembers some decisions but not
+  others. Where's the line between asking and reconstructing?
+- **History sections.** When abandoned options are load-bearing
+  ("we tried X for six months, switched to Y"), they belong in
+  movement-of-learning. Routing: new sub-analysis with `excluded:` /
+  `lifecycle: abandoned`? Inline marker in `methods`? No firm answer.
+- **Voice for reconstructed content.** `_(Reconstructed)_` works
+  inline. Whether reconstructed-vs-original needs structural distinction
+  in the spec, or stays a prose convention, is open.
+
+## When retrofit shifts modes
+
+- **Becomes reproduction.** If the project is reproducing an unacknowledged paper, switch to the default flow for the parts that map. Hybrid is fine.
+- **Becomes co-drafting.** If retrofit surfaces that core decisions are still open and the user wants to revisit them now, switch to co-drafting mode for those sections (provisional voice, revisit after decisions land).
+
+## Report friction
+
+If you hit retrofit cases this stub doesn't cover, file a fiber or
+GitHub issue against `lightcone-cli` with `narrative` in the title so
+the next pass can firm this up.
diff --git a/claude/lightcone/skills/narrative/references/paper-reproduction.md b/claude/lightcone/skills/narrative/references/paper-reproduction.md
index a9fd42d2..a5fa9d1f 100644
--- a/claude/lightcone/skills/narrative/references/paper-reproduction.md
+++ b/claude/lightcone/skills/narrative/references/paper-reproduction.md
@@ -1,51 +1,21 @@
 # Paper reproduction mode
 
-A published paper exists. Reconstruct its narrative into ASTRA's
-five-key shape — against an `astra.yaml` that's already built, or
-alongside one being built concurrently — preserving the paper's
-confidence level and sequence.
+An authoritative text source exists — most often a published paper, but also a thesis, technical report, posted preprint, or other canonical account of the work. Reconstruct its narrative into ASTRA's five-key shape, drawing on **the text and the under-construction `astra.yaml` as paired sources**: the text carries the claims and the confidence register; the spec carries the structural decomposition (which decisions are nodes, which findings are nodes, where sub-analyses sit). Neither is sufficient alone.
 
-## Where the paper lives
+The spec may be stable, in flux, or both — paper-reproduction often runs concurrently with spec refinement. The narrative tracks both: when a decision is added, write its `rationale:`; when a sub-analysis splits, draft its five keys; when a finding is declared, fold it into the parent's `findings` synthesis.
 
-Prefer arXiv LaTeX source. It's the most natural form to work with:
-sections are delimited, captions are inline, citations resolve to a
-`.bib`, equations are parseable.
+Read the main SKILL.md first. This file adds what's specific to reproduction.
 
-### 1 · arXiv LaTeX source (default)
+## Where the source text lives
 
-If the paper is on arXiv, fetch the source:
+The skill expects `work/reference/` to exist — the standardized output of [`/paper-extraction`](../../paper-extraction/SKILL.md). If it doesn't, run `/paper-extraction` first. The predictable shape:
 
-```sh
-arxiv_id=<id>        # e.g. 2404.03000
-mkdir -p paper
-cd paper
-curl -L "https://arxiv.org/e-print/${arxiv_id}" -o "${arxiv_id}.tar.gz"
-tar -xzf "${arxiv_id}.tar.gz"
-```
+- `work/reference/paper.tex` (Path A — symlink to main `.tex`) **or** `work/reference/document.md` (Path B — Docling output)
+- `work/reference/index.json` — section outline with line numbers, figures, tables, citation locations
+- `work/reference/astra.yaml` — the paper as an ASTRA artifact (claimed findings as ASTRA findings)
+- `work/reference/figures/`, `work/reference/tables/`, `work/reference/source/` (Path A only)
 
-The archive unpacks to the paper's working tree — typically a main
-`.tex` file, section includes, figures, a `.bib`. Identify the main
-file with `grep -l '\\documentclass' *.tex`. Read sections in order;
-resolve citation keys against the bundled `.bib`.
-
-### 2 · Existing parsed paper in the project
-
-Some reproductions ship the paper already parsed. Check for:
-
-- `desi_dr1_paper/` or `paper/` at the project root.
-- Single `.md` file (Docling output or manual conversion),
-  `.pdf`, or the arXiv tarball unpacked.
-
-If a markdown parse exists, use it as the primary source; fall back
-to the PDF or the arXiv source to resolve ambiguities.
-
-### 3 · User-provided
-
-Ask the user where the paper is if nothing lands automatically.
-
-If no paper is accessible, this is not a reproduction task — fall
-back to `references/existing-analysis.md` (currently under
-development).
+If no authoritative text is accessible at all, this isn't reproduction — fall back to `references/existing-analysis.md` or `references/co-drafting.md`.
 
 ## Paper-to-ASTRA mapping
 
@@ -55,167 +25,94 @@ Write this down before drafting a sentence.
 |---|---|
 | Abstract | `summary` |
 | Introduction (motivation, related work) | `summary` + `findings` intro |
-| Methods section N | corresponding sub-analysis's `narrative.methods` |
+| Methods section N | the matching sub-analysis's `narrative.methods` |
 | Results | structural `findings.<id>` claims; narrative intro in `findings` |
 | Discussion | `findings` narrative + `summary` implications |
 | Conclusions | reinforces `summary` |
 | Figures / tables | `outputs.<id>` — referenced in `findings` via anchors |
-| "We chose X because Y" sentences | decision `rationale:` |
+| "We chose X because Y" sentences | the relevant decision's `rationale:` |
 
-Not every paper maps cleanly section-to-sub-analysis. When it
-doesn't, the sub-analysis DAG in `astra.yaml` is authoritative.
-Narrate according to the DAG, harvesting the paper's prose for
-content. If the spec has deliberately reorganized relative to the
-paper, say so briefly in `methods`.
+Not every text maps cleanly section-to-sub-analysis. When it doesn't, the sub-analysis DAG in `astra.yaml` is authoritative: narrate according to the DAG, harvesting the source text's prose for content. If the spec deliberately reorganized relative to the text, say so briefly in `methods`.
 
 ## Workflow
 
 ### 1 · Orient
 
-The spec may be stable, in flux, or both — narrative drafting often
-runs concurrently with spec refinement. Read what's there; expect to
-revisit as the spec moves.
-
-1. `astra.yaml` at the project root. Whole file. Note `inputs`,
-   `outputs`, `decisions`, `findings`, `analyses`, existing
-   `narrative:`. Notice which of the five keys are present vs. empty.
-2. Each sub-analysis `astra.yaml`. Skim decisions (inherited vs.
-   local), findings, outputs, existing narrative. A sub-analysis may
-   use `description:` (legacy) instead of the five-key `narrative:`
-   block — promoting it may be part of the job.
-3. The paper — abstract, intro open/close, methods section headers,
-   discussion, conclusions. Read full sections when drafting the
-   corresponding ASTRA piece.
-4. Any project `CLAUDE.md` or working notes.
-
-Infer authoring state (from-scratch, extending, revising) from what
-is already on disk. If the user is present, confirm via
-`AskUserQuestion`:
-
-- Scale: top-level, a specific sub-analysis, or a decision's
-  `rationale:`?
-- Pure reproduction, or with reproducer extensions (e.g., the
-  reproduction's covariance differs from the posted table)?
-
-If the spec is iterating, draft narrative concurrently — rationale
-when a decision is added, five-key narrative when a sub-analysis
-splits, findings synthesis updated when a finding is added. Narrative
-and spec quality rise together when they share context.
+Read both sources before drafting. The spec carries the structural decomposition; the text carries the claims.
+
+1. **`astra.yaml` at the project root** — whole file. Note `inputs`, `outputs`, `decisions`, `findings`, `analyses`, existing `narrative:`. Notice which of the five keys are present vs. empty.
+2. **Each sub-analysis `astra.yaml`** — skim decisions (inherited vs. local), findings, outputs, existing narrative.
+3. **The source text** — abstract, intro open/close, methods section headers, discussion, conclusions. Read full sections when drafting the corresponding ASTRA piece. Use `work/reference/index.json` to navigate; the parsed `paper.tex` (Path A) or `document.md` (Path B) is the primary source.
+4. **Project `CLAUDE.md` and any working notes** — paper-specific conventions, gotchas, scope decisions.
+
+If the user is present, surface the orienting questions — `AskUserQuestion` is useful when several land together; one question at a time is fine when only one is open:
+
+- **Scale:** top-level, a specific sub-analysis, or a decision's `rationale:`?
+- **Pure reproduction, or with reproducer extensions** (e.g., the reproduction's covariance differs from the posted table)?
+- **Approach:** start with a specific question first — a methods subsection, a particular figure's choices, a discussion claim worth tracing into the decisions — or one-shot the whole narrative? Sets the session shape.
 
 ### 2 · Draft order
 
 Not `summary` first. `summary` compresses the rest; draft it last.
 
-1. **`inputs`** — shortest. Name the data and its provenance. One
-   short paragraph. Let the inputs structure carry the dataset
-   detail.
-2. **`methods`** — walk the pipeline in DAG order. Cite each
-   sub-analysis and decision by anchor as part of the argument, not
-   as an enumeration. If there are too many to weave coherently, the
-   analysis wants more sub-analyses. Inheritance that propagates
-   across sub-analyses gets called out because it's load-bearing
-   end-to-end. Movement-of-learning lives here — a pivot the paper
-   narrates ("we initially tried X, but…") is cheap because of
-   telescoping.
-3. **`findings`** — **only if findings are declared structurally.**
-   If `findings:` is empty, skip this key (per narrate-what-you-
-   declare). If findings exist, synthesize how they fit together —
-   each cited by anchor, not an enumeration.
-4. **`outputs`** — thin. Which artifacts were promoted and why;
-   point to the sub-analysis that produced them.
-5. **`summary`** — last. Two paragraphs. Open with the question and
-   the headline finding; thread motivation, method, and implications.
-   No primer material.
-
-For sub-analyses, same order, same length target (1–3 paragraphs per
-key). For a decision's `rationale:`, one paragraph: what was decided,
-the insight(s) that motivated it (by anchor), what the load-bearing
-alternative was and why it lost. The alternatives themselves are in
-the options structure.
-
-**Conditional keys on sub-analyses.** Only include keys whose
-structural counterpart is non-empty. A reconstruction sub-analysis
-with no findings gets `summary`, `methods`, `inputs`, `outputs` — no
-`findings`.
+1. **`inputs`** — shortest. Name the data and its provenance. One short paragraph. Let the inputs structure carry the dataset detail.
+2. **`methods`** — walk the pipeline in DAG order. Cite each sub-analysis and decision by anchor as part of the argument, not as an enumeration. If too many to weave coherently, the analysis wants more sub-analyses. Inheritance that propagates across sub-analyses gets called out explicitly because it's load-bearing end-to-end. A pivot the paper narrates ("we initially tried X, but…") is cheap to preserve because of telescoping.
+3. **`findings`** — only if findings are declared structurally. Synthesize how they relate; each cited by anchor, not enumerated.
+4. **`outputs`** — thin. Which artifacts were promoted and why; cite the sub-analysis that produced them; name downstream consumers (see Data flow in SKILL.md).
+5. **`summary`** — last. 1–2 paragraphs. Open with the question and the headline finding; thread motivation, method, and implications. No primer material.
+
+For each decision, write a one-paragraph `rationale:`: what was decided, the prior insight that motivated it (cite by anchor), what the load-bearing alternative was and why it lost.
+
+For sub-analyses, same order, same length target.
+
+**Conditional keys.** Only include keys whose structural counterpart is non-empty. A reconstruction sub-analysis with no findings gets `summary`, `methods`, `inputs`, `outputs` — no `findings`.
 
 ### 3 · Reproduction-specific moves
 
-- **Fidelity to source confidence.** Don't sharpen or soften. If the
-  paper says "we detect," don't write "we strongly detect." If it
-  hedges, preserve the hedge.
-- **Harvest, don't invent.** The paper's prose is the first source.
-  Paraphrase — don't lift verbatim — but preserve meaning and
-  confidence register.
-- **Voice seams.** If reproducer-specific content enters ("during
-  reproduction we found the published covariance differs from the
-  posted table"), mark the transition. A sentence mixing paper
-  claims and reproduction claims without a seam confuses both.
-- **Paper sequence is usually load-bearing.** DAG order should match
-  the paper's section order unless the spec deliberately
-  reorganized.
-- **No primer material.** `summary` is not a field-introduction.
-  Don't teach what BAO or weak lensing is. Readers arrive with
-  context.
-- **Rationales come from the paper.** "We chose reconstruction
-  convention X because Y" becomes the backbone of a decision's
-  `rationale`. Keep Y; cite the supporting prior insight by anchor
-  if one exists.
-- **Published = done.** Reproduction narrative is declarative,
-  present-tense matching the paper's voice ("The analysis is
-  organised as…", "The pipeline runs in…"). Not "we are measuring."
-- **Scope-limited reproductions.** Real-world reproductions often
-  cover a subset of the paper (e.g., DESI BAO reproducing only
-  LRG1+LRG2). Name the scope in `summary` so a reader knows what's
-  in and out.
-
-## Critique pass
-
-Run these reproduction-specific checks alongside the three-phase and
-craft audits from SKILL.md.
+- **Tell the author's story by default.** The narrative reproduces what the paper says, restated within the ASTRA structure — anchored to what's referable in the spec (decisions, findings, prior insights). Decision rationales come from the paper's "we chose X because Y" sentences, not invented post-hoc.
+- **Paraphrase, don't lift.** Restate the paper's claims in your own structuring rather than copying sentences verbatim — verbatim quotation calls authorship into question. Preserve meaning and confidence register; don't sharpen or soften (if the paper says "we detect," don't write "we strongly detect"; if it hedges, preserve the hedge).
+- **Two sources, paired.** The authoritative text carries claims, confidence register, and sequence. The under-construction `astra.yaml` carries the structural decomposition. Draft against both; let the spec's structure shape what each key covers, and let the text shape what's said.
+- **When the reproduction's results differ, adapt — and flag.** Where the reproduction landed on different findings (a covariance that diverges from the posted table, a coefficient with different precision, a null where the paper claimed detection), the narrative needs to report what was actually found, not what was claimed. This wants human input on phrasing; surface the divergence to the user rather than papering over it.
+- **Voice seams.** When reproducer-specific content enters the narrative, mark the transition. *"During reproduction we found the published covariance differs from the posted table"* is a seam; the sentence before it can speak in the paper's voice, the sentences after it speak in the reproducer's. A sentence that silently mixes them confuses both.
+- **Walk the paper's sequence in `methods`.** Traverse sub-analyses in DAG order — and the DAG order should match the paper's section order. If the spec deliberately reorganized (split one section into two sub-analyses, or merged two sections into one), name the deviation briefly in `methods`. Don't reorder silently.
+- **Published = done.** Reproduction narrative is declarative, present-tense matching the paper's voice ("The analysis is organised as…", "The pipeline runs in…"). Not "we are measuring."
+- **Scope-limited reproductions.** Real-world reproductions often cover a subset of the paper (e.g., DESI BAO reproducing only LRG1+LRG2). Name the scope in `summary` so a reader knows what's in and out.
+
+### 4 · Critique pass
+
+Run all four audits before declaring the narrative done.
 
 **Fidelity audit.**
 
-- No sharpened or softened claims relative to the paper.
+- Claims match the paper, **except where reproduction results actually differ.** If the reproduction landed on different findings, the narrative reports what was found — and the divergence has been surfaced to the user for phrasing input, not silently softened or sharpened.
 - Voice seams marked where reproducer content enters.
-- Rationales traceable to the paper's justifications or to a prior
-  insight in the spec.
+- Rationales traceable to the paper's justifications or to a prior insight in the spec.
 - No invented citations. Every anchor resolves to a real spec id.
-- Scope (what's reproduced, what isn't) stated in `summary` if
-  narrower than the paper.
+- Scope (what's reproduced, what isn't) stated in `summary` if narrower than the paper.
 
 **Sequence audit.**
 
-- `methods` walks sub-analyses in DAG order; DAG order matches the
-  paper's narrative sequence (or the deviation is named in prose).
+- `methods` walks sub-analyses in DAG order; DAG order matches the paper's narrative sequence (or the deviation is named in prose).
 - `summary` opens with the question, not a field primer.
 
-**Structural-peer-redundancy audit.**
+**Anchor coverage audit.**
 
-- Every declared decision, finding, output, and sub-analysis cited
-  somewhere in the narrative (validator enforces). Citations woven
-  into argument, not recited as a list.
-- `findings` narrative synthesizes relationships between findings;
-  `inputs` narrative names provenance. Neither catalogs fields.
+- `astra validate` warns on any declared finding / decision / output / sub-analysis not cited in the narrative. Review the warnings; either cite the element or consider whether it should be declared.
 
-**Anchor coverage audit.**
+**Structural-peer-redundancy audit.**
 
-- `astra validate` warns on any declared finding / decision / output
-  / sub-analysis not cited in the narrative. Review the warnings;
-  either cite the element or consider whether it should be declared.
+- Citations woven into argument, not recited as a list.
+- `findings` narrative synthesizes relationships between findings; `inputs` narrative names provenance. Neither catalogs fields.
 
 ## Anti-patterns (reproduction-specific)
 
-- **Lifting verbatim.** Copy-pasting abstract sentences into
-  `summary`. Paraphrase — otherwise the narrative reads as a citation
-  of itself.
-- **Adding implications the paper didn't make.** Fidelity cuts both
-  ways.
-- **Eliding the reproducer's voice entirely.** If the reproduction
-  caught something the paper missed, name it with the seam.
-- **Treating paper sections as sub-analyses.** A paper's Section 3.2
-  isn't automatically a sub-analysis; the DAG is the authority.
-- **Listing instead of weaving.** Narrate each decision where it
-  shapes the pipeline. Too many to weave coherently → the spec wants
-  more sub-analyses.
-- **Drafting `findings` on a sub-analysis that has no declared
-  findings.** Skip the key.
+- **Lifting verbatim.** Copy-pasting abstract sentences into `summary`. Paraphrase — otherwise the narrative reads as a citation of itself.
+- **Adding implications the paper didn't make.** Fidelity cuts both ways.
+- **Eliding the reproducer's voice entirely.** If the reproduction caught something the paper missed, name it with the seam.
+- **Treating paper sections as sub-analyses.** A paper's Section 3.2 isn't automatically a sub-analysis; the DAG is the authority.
+- **Listing instead of weaving.** Narrate each decision where it shapes the pipeline. Too many to weave coherently → the spec wants more sub-analyses.
+
+## When reproduction shifts modes
+
+- **Hybrid with co-drafting.** If the reproduction adds a sub-analysis the paper didn't have (a reproducer-specific extension), that sub-analysis's narrative is co-drafted, not reproduced. Use the seams.
+- **Hybrid with retrofit.** If the reproduction inherits code or fibers from a prior iteration, those carry rationale that didn't make it into the paper — harvest from artifacts as in retrofit mode for those sections.

From 0c084f2f3e9cb1f0c669bfc9357d959ea069be77 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 01:42:11 +0200
Subject: [PATCH 043/124] paper-extraction: refresh Path B note for
 bibliography-resolution state

The launch-script note said outline AND citation extraction don't run
on Path B. After the bibliography work, that's misleading: bibliography
resolution does run (references-section parser, synthesized keys, DOI
lookup), it's only citation invocations from prose that don't get
located. Update the note to reflect what actually does and doesn't run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../skills/paper-extraction/references/pdf-fallback.md          | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/paper-extraction/references/pdf-fallback.md b/claude/lightcone/skills/paper-extraction/references/pdf-fallback.md
index c80d776e..b2b8207b 100644
--- a/claude/lightcone/skills/paper-extraction/references/pdf-fallback.md
+++ b/claude/lightcone/skills/paper-extraction/references/pdf-fallback.md
@@ -62,5 +62,5 @@ No `paper.tex` and no `source/` on Path B. Consumers detect the path by reading
 
 ## Notes
 
-- **Outline + citation extraction don't run on Path B in the launch script.** No LaTeX source means no `\section{}` or `\cite{}` markers to walk. `index.json` includes an `extraction_warnings` entry flagging this; a future LLM pass over `document.md` would fill the gap.
+- **Outline extraction and citation-invocation extraction don't run on Path B.** No LaTeX source means no `\section{}` or `\cite{}` markers to walk in the paper body. Bibliography resolution *does* run — the script parses the references section at the tail of `document.md`, synthesizes `<lastname>_<year>` keys (with letter-suffix disambiguation for collisions), and resolves DOIs the same way as Path A. So the `citations:` block is populated with citation text + DOI, but each entry's `locations:` array is empty (the paper-side `\cite`-style invocations weren't extracted from prose). `extraction_warnings` flags both gaps.
 - **Journal DOIs that 403 on Unpaywall** sometimes have an arXiv preprint twin. When that's available, treat the paper as Path A using the arXiv ID — the LaTeX-source surface is far cleaner than any PDF extraction.

From bc34267d6c44019068af7e1cc99bd2c71e369e0f Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 01:42:24 +0200
Subject: [PATCH 044/124] paper-extraction: tighten Step 5 (one example, fold
 the rambly bits)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Step 5 was the heaviest section in the workflow despite being marked
optional — every invocation loaded ~45 lines of findings mechanics into
context, most of them redundant with the example file or just unmeasurable
advice ("typically 3-8 findings").

Compressed in place:
- one example finding instead of two (the second was a near-duplicate)
- folded "what counts as a finding" into the opening sentence
- promoted the narrative.findings cross-link rule into prose under the
  example, where it sits in the flow of the shape
- dropped "How many findings?" (unmeasurable; the example file shows)
- four discipline bullets instead of six numbered points (created_at
  format is visible in the example; cross-link is in the prose above)
- pointer to examples/unions-bmodes-astra.yaml as the canonical
  fully-populated template

Net: ~25 lines instead of ~45. Same contract, less ramble.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../skills/paper-extraction/SKILL.md          | 32 ++++++-------------
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/claude/lightcone/skills/paper-extraction/SKILL.md b/claude/lightcone/skills/paper-extraction/SKILL.md
index 98de0e4f..44cef06b 100644
--- a/claude/lightcone/skills/paper-extraction/SKILL.md
+++ b/claude/lightcone/skills/paper-extraction/SKILL.md
@@ -165,11 +165,9 @@ Also eyeball `astra.yaml`'s `name:` and `narrative.summary:`. The title or abstr
 
 ### Step 5 — *(Optional)* Walk the paper for findings, append to `astra.yaml`
 
-**Skip this step unless a downstream consumer needs it.** Steps 1–4 produce a complete `work/reference/` plus a valid (empty-findings) `astra.yaml` on their own. Step 5 fills in the paper's claimed numerical findings — useful when the next thing you'll do is reproduce the paper (the findings become reproduction targets) or compare against it (the findings become diff anchors). Skip when you just want to read the paper or have the structural index for browsing.
+**Skip unless a downstream consumer needs `findings:` populated.** Steps 1–4 produce a complete `work/reference/` and a valid (empty-findings) `astra.yaml` on their own. Reproductions and diff workflows need findings; reading and browsing don't.
 
-When you do run Step 5: this is the agent's central interpretive step and the one piece the script can't do.
-
-For each **central numerical claim the paper makes about its results**, append a finding to `astra.yaml`'s `findings:` map. The shape (per ASTRA's [Insight + Evidence](https://w3id.org/ASTRA/insight) classes):
+When you do run Step 5: for each **central numerical claim the paper makes about its results** — headline measurements, structural conclusions ("we detect X at Y σ"), validated null-test outcomes — append a finding to `astra.yaml`'s `findings:` map. *Not* methodology choices or dataset descriptions; those live elsewhere. Shape (per ASTRA's [Insight + Evidence](https://w3id.org/ASTRA/insight) classes):
 
 ```yaml
 findings:
@@ -183,30 +181,18 @@ findings:
         version: 1
         quote:
           exact: "we find $S_8 = 0.795 \\pm 0.014$"
-  bmode_pte_fiducial:
-    id: bmode_pte_fiducial
-    claim: "Minimum B-mode PTE = 0.18 across configuration-space, COSEBI, and harmonic-space statistics at fiducial scale cuts"
-    created_at: "2026-04-04T00:00:00Z"
-    evidence:
-      - id: abstract_pte
-        doi: "10.48550/arXiv.2604.03227"
-        version: 1
-        quote:
-          exact: "all three statistics pass the null test (minimum PTE $= \\configPteSixThreeCombined$)"
 ```
 
-**What counts as a finding:** a numerical or specific qualitative result the paper claims, of the kind a reproduction would have to match (or document divergence from). Headline results (S_8, PTEs, χ²), structural conclusions ("we detect X at Y σ"), validated null-test outcomes. *Not* methodology choices, *not* dataset descriptions — those live elsewhere.
+When `findings:` is non-empty, `narrative.findings:` must reference at least one finding — e.g. `narrative: { findings: "The fiducial analysis yields the [S_8 constraint](#findings.s8_constraint)." }`.
 
-**Discipline:**
+See `examples/unions-bmodes-astra.yaml` for a fully populated `astra.yaml` (six findings, narrative, evidence anchored to the published version).
 
-1. **Read the abstract and conclusions first.** The paper's own framing of its results lives there. Most central findings can be quoted from one of those two surfaces.
-2. **Use `quote.exact` literally.** Copy the LaTeX text as it appears in `paper.tex` — don't paraphrase, don't expand macros, don't normalize math. The `exact` is what `astra validate --verify-evidence` will look for in the source PDF; if you paraphrase, evidence verification fails. If the quote is hard to make unique, add `prefix:` and `suffix:` (~20–100 chars before/after) per the W3C TextQuoteSelector spec.
-3. **Anchor to the source.** Every finding's evidence carries a `doi:` (the paper's own DOI, e.g. `10.48550/arXiv.2604.03227`) and `version:` (paper version — `1` for v1, `2` for v2 of an arXiv preprint).
-4. **`created_at`** is the timestamp of the finding's creation in this file (i.e., when the agent wrote it). ISO 8601.
-5. **Add the `narrative.findings:` cross-link.** ASTRA requires that when `findings:` is non-empty, `narrative.findings:` exists and references at least one finding. Shape: `narrative: { findings: "The fiducial analysis yields the [S_8 constraint](#findings.s8_constraint); B-mode null tests pass with [minimum PTE = 0.18](#findings.bmode_pte_fiducial)." }`
-6. **Validate.** Run `astra validate work/reference/astra.yaml`. If it passes, the file is a valid ASTRA artifact. Add `--verify-evidence` to confirm each `quote.exact` is actually findable in the cached PDF.
+**Discipline:**
 
-**How many findings?** Aim for the central results, not exhaustive coverage. A paper with one headline measurement (e.g. an S_8 constraint) plus a few supporting null-test outcomes typically has 3–8 findings. A paper covering multiple separate analyses may have more.
+- **Read the abstract and conclusions first.** Most central findings can be quoted from one of those two surfaces.
+- **`quote.exact` is verbatim.** Copy LaTeX as it appears in `paper.tex` — don't paraphrase, don't expand macros, don't normalize math. `astra validate --verify-evidence` searches for this string in the cached PDF; paraphrasing breaks the gate. If the quote isn't unique, add `prefix:` / `suffix:` (~20–100 chars) per W3C TextQuoteSelector.
+- **Every evidence carries `doi:`** (the paper's own DOI, e.g. `10.48550/arXiv.2604.03227`) and `version:` (the arXiv version: `1` for v1, `2` for v2).
+- **Validate.** `astra validate work/reference/astra.yaml` confirms shape; `--verify-evidence` confirms each `quote.exact` is actually findable in the cached PDF.
 
 
 ## Inputs

From c1f7563300a7bb6711ada09270e0eb0262ff7503 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 02:24:01 +0200
Subject: [PATCH 045/124] lc-from-paper: persistent paper-expert + code-expert
 architecture
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ACQUIRE now runs in the orchestrator session directly, spawning two named,
persistent sub-agents in parallel — paper-expert (/paper-extraction) and
code-expert (/lc-from-code with stop-at-scan prompt-context). Their
transcripts persist for the lifetime of the reproduction; downstream phase
sub-agents receive their agent IDs and consult them via SendMessage rather
than re-ingesting paper / code materials from scratch.

- acquire.md: full rewrite. Orchestrator-direct, two parallel spawns,
  expert ID capture, handoff to ARCHITECT. Stop-at-scan instructions live
  as literal prose in code-expert's spawn prompt; no flag plumbing in
  lc-from-code.
- architect.md: full rewrite (~ -100 lines net). Drops the two embedded
  paper-side / code-side Explore prompts; reads work/reference/index.json
  + work/reference/astra.yaml + work/reference/code-index.md, queries
  paper-expert / code-expert for anything not in the indices.
- literature.md: restructured. Stage 1 is mechanical fetch via paper-
  extraction (batched-parallel by shell, no agent fan-out). Stage 2 is
  quote-finding done by literature itself for <=10 placeholders or fanned
  out to a few Haiku sub-agents for larger sets. astra paper add still
  runs after fetch so --verify-evidence finds cached PDFs.
- specify.md / implement.md / compare.md / review.md: experts added to
  Inputs as SendMessage-reachable resources; stale work/notes/architect/
  paper-index.md and code-index.md refs rerouted to work/reference/.
  implement.md: added explicit "without a code reference" branch — write
  fresh from spec when work/reference/code/ is absent or unusable.
- SKILL.md: phase table reflects ACQUIRE-in-orchestrator (no acquire
  sub-agent); intro names the persistent-experts pattern; spawning
  section instructs orchestrator to hand expert IDs to phase sub-agents;
  anti-pattern adjusted; workdir signals split paper-side / code-side.
- templates/CLAUDE.md: paper materials and code-index in the Paper
  section; persistent-experts rule + no-code clause in Rules; experts
  and new indices in Pointers.
- lc-from-code/SKILL.md: new Invocation contexts section names the
  scan-only invocation pattern used by lc-from-paper's ACQUIRE.

Phases beyond ARCHITECT keep their existing shape; they consult experts
when convenient but still do their own work. paper-extraction Step 6
(decisions extraction) and a fully-developed no-code architect path stay
parked for future passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-code/SKILL.md |   6 +
 .../lightcone/skills/lc-from-paper/SKILL.md   |  20 +-
 .../lc-from-paper/references/acquire.md       | 141 ++++---
 .../lc-from-paper/references/architect.md     | 386 +++++++-----------
 .../lc-from-paper/references/compare.md       |   2 +
 .../lc-from-paper/references/implement.md     |  24 +-
 .../lc-from-paper/references/literature.md    | 361 ++++++++--------
 .../skills/lc-from-paper/references/review.md |   3 +-
 .../lc-from-paper/references/specify.md       |  21 +-
 .../skills/lc-from-paper/templates/CLAUDE.md  |   9 +-
 10 files changed, 477 insertions(+), 496 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-code/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
index 4dae304c..c6882d91 100644
--- a/claude/lightcone/skills/lc-from-code/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -8,6 +8,12 @@ allowed-tools: Read, Write, Edit, Glob, Grep, Bash(astra:*), Bash(lc:*), Bash(py
 
 End-to-end migration: scan existing code, draft or add to `astra.yaml`, parameterize decisions in the code, and run until everything materializes. This works both as a fresh start from code and as an augmenting pass inside an existing ASTRA project. The user's existing logic stays intact — changes should be minimal.
 
+## Invocation contexts
+
+This skill has two invocation contexts. The first is the user-driven default described in the phases below: do the full scan → spec → parameterize → run flow.
+
+The second is **scan-only**, used when `/lc-from-paper`'s ACQUIRE spawns this skill as `code-expert`. The orchestrator's prompt will tell you explicitly to *do only Phase 1's scan*, write the inventory to a path it specifies (typically `work/reference/code-index.md`), and **stop** — do not touch `astra.yaml` at the project root, do not parameterize any code, do not run anything, do not modify the cloned repo. After scanning, stay alive: ARCHITECT and later phases will `SendMessage` you with questions about the code as they write the spec. Trust the spawn prompt's instructions over the defaults below; if the prompt says scan-only, the scan-only contract holds.
+
 ## References
 
 - [ASTRA Reference](../../guides/astra-reference.md) -- spec structure, decision identification, recipes, universes
diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 1862395c..57f19770 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -26,13 +26,15 @@ The reproduction's directory should be a git repo — if not already, `git init`
 
 ## The phases
 
-The reproduction runs through nine phases (zero-indexed). Phase 0 (INTERVIEW) and Phase 8 (REVIEW) are the bookends — they happen in your own session because they're short, interactive, and depend on the through-line context only you hold. Phases 1–7 are sub-agent dispatches: you spawn each as a named sub-agent, point it at the matching reference file in `references/`, and let it work in its own context with the per-paper `CLAUDE.md` auto-loading from the workdir.
+The reproduction runs through nine phases (zero-indexed). Phase 0 (INTERVIEW), Phase 1 (ACQUIRE), and Phase 8 (REVIEW) run in your own session — INTERVIEW and REVIEW because they're interactive bookends, ACQUIRE because its work is two parallel sub-skill invocations (`/paper-extraction` and `/lc-from-code` in scan-only mode) plus capturing the resulting persistent sub-agents as `paper-expert` and `code-expert`. Phases 2–7 are sub-agent dispatches: you spawn each as a named sub-agent, point it at the matching reference file in `references/`, and let it work in its own context with the per-paper `CLAUDE.md` auto-loading from the workdir.
+
+ARCHITECT (Phase 2) is the first sub-agent dispatch. It receives the `paper-expert` and `code-expert` agent IDs in its spawn prompt and consults them via `SendMessage` as it writes the stub `astra.yaml`. Later phases inherit the same pattern — the experts stay alive for the duration of the reproduction and are addressable by any sub-agent that's given their IDs.
 
 | # | Phase | Where it runs | Reference | Primary outputs |
 |---|---|---|---|---|
 | 0 | INTERVIEW | orchestrator session | [`references/interview.md`](references/interview.md) | per-paper `CLAUDE.md` |
-| 1 | ACQUIRE | sub-agent | [`references/acquire.md`](references/acquire.md) | `work/reference/{source/, paper.pdf, figures/, tables/, metadata.json, code/, code-status.yaml, index.json}` (index.json's `citations:` block carries each cited paper's `{locations, citation, doi}`) |
-| 2 | ARCHITECT | sub-agent | [`references/architect.md`](references/architect.md) | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative); `work/notes/architect/{paper-index.md, code-index.md}` |
+| 1 | ACQUIRE | orchestrator session | [`references/acquire.md`](references/acquire.md) | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}`; two persistent sub-agents — `paper-expert` and `code-expert` — reachable by agent ID via `SendMessage` |
+| 2 | ARCHITECT | sub-agent | [`references/architect.md`](references/architect.md) | stub `astra.yaml` at project root (sub-analyses, inputs, outputs, narrative); `work/notes/architect/review-round-<N>.md` |
 | 3 | SPECIFY | sub-agent | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
 | 4 | LITERATURE | sub-agent | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
 | 5 | IMPLEMENT | sub-agent | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
@@ -51,6 +53,7 @@ When you launch a phase, spawn a named sub-agent in the background with the phas
 - **Run in background** so the user can switch into the sub-agent's chat without you blocking on it.
 - **Announce the spawn to the user** before it starts: *"I'm launching the &lt;phase&gt; sub-agent now — switch to its chat now if you want to interact, otherwise it'll work autonomously and report back."*
 - **Note the agent ID** when you spawn it. Names are user-facing — if the user dismisses a sub-agent's surface (escape), the name binding goes away and `SendMessage` by name fails. The agent ID + on-disk transcript persist regardless; `SendMessage` by ID resumes the sub-agent from full context and reopens the surface for the user.
+- **Hand in the expert agent IDs** from ACQUIRE — `paper-expert` and `code-expert` — so the phase sub-agent can `SendMessage` them for paper/code questions instead of re-ingesting materials. The experts have already read their materials in depth; querying them is cheaper and richer than another fresh Explore pass.
 
 When the sub-agent's turn closes you receive a notification with its full response in the `result` field. Read that, then decide: spawn the next phase, ask the user a clarifying question, or revisit a previous phase.
 
@@ -75,7 +78,7 @@ The opening interactive phase. Read [`references/interview.md`](references/inter
 
 These get drafted into the per-paper `CLAUDE.md` — paper identity, Goal section, Rules, Conventions. The Rigor section starts empty; sub-agents fill it in as they work. Show the user the draft, take corrections, refine, then save.
 
-After the user approves, launch the first sub-agent (typically ACQUIRE).
+After the user approves, run ACQUIRE in your own session (it spawns `paper-expert` and `code-expert` as parallel sub-agents and waits for both). When ACQUIRE returns, launch the architect sub-agent with the expert agent IDs handed in.
 
 ### Review (Phase 8, close-out)
 
@@ -116,10 +119,9 @@ Workdir signals — file existence implies the phase has been done:
 
 | Signal | Phase done |
 |---|---|
-| `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) | ACQUIRE |
-| `work/reference/code/` | ACQUIRE (code clone) |
-| `work/notes/architect/{paper-index.md,code-index.md}` | ARCHITECT (Explore pass) |
-| `astra.yaml` validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
+| `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) + `work/reference/index.json` + `work/reference/astra.yaml` | ACQUIRE paper substrate (paper-expert ran) |
+| `work/reference/code/` (or `code-status.yaml` with `found: false`) + `work/reference/code-index.md` | ACQUIRE code work (code-expert ran) |
+| `astra.yaml` at project root validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
 | `astra.yaml` non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
 | `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; `work/notes/literature/<doi-slug>.yaml` files present | LITERATURE |
 | recipes present in `astra.yaml` | IMPLEMENT |
@@ -132,7 +134,7 @@ Workdir signals — file existence implies the phase has been done:
 ## Anti-patterns
 
 - **Reading content the orchestrator doesn't need.** If the answer fits in a sub-agent's return, don't re-read the source yourself. Dispatch Explore for open-ended search.
-- **Doing phase work in the orchestrator session.** The orchestrator spawns and routes; phase work happens in sub-agents. Exception: INTERVIEW and REVIEW (the bookends).
+- **Doing phase work in the orchestrator session.** The orchestrator spawns and routes; phase work happens in sub-agents. Exceptions: INTERVIEW and REVIEW (the interactive bookends), and ACQUIRE (which is two parallel sub-skill invocations + capturing their persistent transcripts — no separate `acquire` sub-agent needed because the work IS the spawns).
 - **Asking a sub-agent to use `AskUserQuestion`.** Sub-agents don't have it. They ask in prose, or surface the question to you so you call `AskUserQuestion` from the orchestrator session.
 - **Re-implementing what `astra` already does.** If `astra validate` returns clean, don't write a separate validator. If `astra paper add` caches the PDF, don't write a separate cache.
 - **Bundling phases into one sub-agent.** Each sub-agent runs one phase. The granularity is what keeps each context window manageable; conflating phases re-creates the failure mode this architecture exists to avoid.
diff --git a/claude/lightcone/skills/lc-from-paper/references/acquire.md b/claude/lightcone/skills/lc-from-paper/references/acquire.md
index 4083200e..1eac80e7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/acquire.md
+++ b/claude/lightcone/skills/lc-from-paper/references/acquire.md
@@ -1,8 +1,10 @@
-# ACQUIRE — fetch the paper, structure it, clone the code
+# ACQUIRE — spawn paper-expert and code-expert
 
-Acquire the paper's reading materials and (when available) clone the reference code repository. The substrate work — LaTeX-source download, Docling fallback, figures, tables, outline, citations *with resolved DOIs per cited paper*, embedded bibliography, paper-as-ASTRA-artifact — is delegated to **`/paper-extraction`**, which lc-from-paper trusts blindly. ACQUIRE adds **Step 2: code-clone** on top.
+The orchestrator dispatches two named, persistent sub-agents in parallel: **paper-expert** (which runs `/paper-extraction` to stand up the paper's reading materials) and **code-expert** (which locates and clones the reference code repo, then runs `/lc-from-code` in scan-only mode against it). Their transcripts persist and become the experts ARCHITECT consults via `SendMessage` as it writes the `astra.yaml` stub.
 
-This phase runs as the orchestrator-spawned `acquire` sub-agent. The orchestrator launches it, the user can drop into its chat for any failures (download issues, missing code repo), and it commits each artifact as it lands.
+## Where this runs
+
+The orchestrator session, directly. There is no `acquire` sub-agent — ACQUIRE's work is two parallel spawns and a wait. The orchestrator captures both agent IDs on return; those IDs are how ARCHITECT reaches the experts.
 
 ## Inputs
 
@@ -11,70 +13,115 @@ This phase runs as the orchestrator-spawned `acquire` sub-agent. The orchestrato
 
 ## Outputs
 
-After Step 1 (`/paper-extraction`):
+Two persistent named sub-agents (paper-expert, code-expert), each reachable via `SendMessage` by ID. On disk:
 
-- `work/reference/index.json` — structural index. Includes the enriched `citations:` block mapping each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. SPECIFY consumes this when authoring `prior_insights:` placeholders (`doi:` lookup); LITERATURE consumes it when discovering which DOIs need fetching.
-- `work/reference/astra.yaml` — ASTRA-shape representation of the paper, including the paper's claimed numerical findings as ASTRA `findings:` (when paper-extraction's optional Step 5 is run)
-- `work/reference/paper.pdf` — always
-- `work/reference/paper.tex` + `work/reference/source/` — Path A (arXiv LaTeX)
-- `work/reference/document.md` — Path B (PDF + Docling)
-- `work/reference/figures/` — figure files
-- `work/reference/tables/` — one .tex file per `\begin{table}` block
-- `work/reference/bibliography-source.{bib,bbl}` — Path A only, copied from source tarball when present
+- `work/reference/index.json` — paper-side structural index (figures, tables, outline, citations with resolved DOIs)
+- `work/reference/astra.yaml` — paper-extraction's ASTRA-shape stub of the paper (id, name, narrative.summary, optionally findings)
+- `work/reference/paper.pdf` and either `work/reference/paper.tex` + `source/` (Path A) or `work/reference/document.md` (Path B)
+- `work/reference/figures/`, `work/reference/tables/`, `work/reference/bibliography-source.{bib,bbl}`
+- `work/reference/code/` — cloned reference repo (absent if not found)
+- `work/reference/code-status.yaml` — record of where the code came from
+- `work/reference/code-index.md` — code-expert's scan output: script inventory, candidate decisions, dependencies, container hints
 
-After Step 2 (this phase):
+## Step 1 — Spawn paper-expert
 
-- `work/reference/code/` — cloned reference repo (or absent if not found)
-- `work/reference/code-status.yaml` — record of where the code came from
+```
+Agent(
+  name="paper-expert",
+  prompt="/paper-extraction <doi-or-arxiv-id>",
+  run_in_background=True,
+)
+```
 
-## Step 1 — Stand up the paper's reading materials
+paper-expert runs the full `/paper-extraction` workflow and stays alive after it finishes — its transcript holds the deep paper context that ARCHITECT and later phases consult. The skill is idempotent; re-invoking on a partially-populated `work/reference/` is safe.
 
-Invoke `/paper-extraction <arxiv-id-or-doi>`. The skill is idempotent — it surveys `work/reference/` first and skips work that's already done.
+Capture the returned agent ID.
+
+## Step 2 — Spawn code-expert (in parallel)
+
+code-expert is a single sub-agent that does *all* the code-side work for ACQUIRE: locate the repo URL, clone it, then run `/lc-from-code` in scan-only mode against the clone. The orchestrator spawns it with explicit instructions to stop at the scan — `/lc-from-code` normally continues into parameterization and execution; here we only want the inventory.
 
 ```
-/paper-extraction <arxiv-id-or-doi>
+Agent(
+  name="code-expert",
+  prompt="""
+    You are the code-expert for an lc-from-paper reproduction.
+
+    Repo URL (from INTERVIEW): <url or 'unknown — find it'>
+    Workdir: this directory.
+
+    Your tasks for ACQUIRE:
+
+    1. Locate the reference code repository.
+       - If a URL was provided above, use it.
+       - Otherwise, grep the paper materials in work/reference/ for repo URLs (abstract,
+         intro, conclusion, footnotes, "Code Availability" / "Data Availability" sections).
+         Path A: grep across work/reference/source/*.tex. Path B: grep work/reference/document.md.
+         If still nothing, web-search: paper title + "github", Papers With Code, or the first
+         author's GitHub profile. A few searches max — record failure and move on.
+
+    2. Clone if found:
+         git clone --depth 1 <url> work/reference/code
+
+    3. Write work/reference/code-status.yaml:
+         found: true        # or false
+         url: "https://..."  # null if not found
+         cloned: true       # false if found but clone failed
+         notes: "..."
+
+    4. If work/reference/code/ exists, run /lc-from-code in SCAN-ONLY mode against it:
+       - Invoke /lc-from-code with the working directory at work/reference/code/.
+       - Do ONLY Phase 1's scan (the Explore-subagent inventory pass).
+       - Write the inventory to work/reference/code-index.md.
+       - DO NOT touch astra.yaml at the project root.
+       - DO NOT parameterize any code.
+       - DO NOT run anything.
+       - DO NOT modify the cloned repo.
+
+    5. Stay alive after returning. ARCHITECT will SendMessage you with questions
+       about the code as it writes the stub astra.yaml.
+
+    Report back: paths produced, anything surprising, any structural caveats
+    (no code found, broken clone, gnarly scan, etc.).
+  """,
+  run_in_background=True,
+)
 ```
 
-This produces everything under `work/reference/` *except* the code clone. lc-from-paper ACQUIRE does not re-implement the substrate logic; if something is wrong with the substrate — including a substrate need that surfaces mid-reproduction — fix it in `/paper-extraction`, not here.
-
-Two starting surfaces: `work/reference/index.json` (structural — figures, tables, outline, *citations with locations + cited-paper text + resolved DOIs*) and `work/reference/astra.yaml` (semantic — the paper as an ASTRA artifact, with `findings:` carrying the paper's central numerical claims as quote-anchored evidence). ARCHITECT reads index.json when its Explore sub-agents fan out across the paper; SPECIFY reads index.json's `citations:` block when authoring `prior_insights:` placeholders (citation key → DOI lookup) and reads astra.yaml when authoring `prior_insights:` against the paper's claims.
+Capture the returned agent ID.
 
-## Step 2 — Clone the reference code repository
+If paper-expert hasn't finished writing paper materials yet when code-expert needs to grep for a URL, code-expert can wait briefly or surface that it needs paper materials first. With a URL from INTERVIEW, code-expert is fully independent of paper-expert and runs truly in parallel.
 
-This step matters more than its size suggests. When `work/reference/code/` exists, every implementing sub-agent treats it as canonical for numerics + method (the canonical-resolution rule, recorded in CLAUDE.md's Rules). Without it, sub-agents have only the paper to anchor to and drift toward "looks right" rather than "matches."
+## Step 3 — Hand off to ARCHITECT
 
-1. Search the paper text for repository URLs — abstract, intro, conclusion, footnotes, "Code Availability" or "Data Availability" sections. (Path A: grep across `work/reference/source/*.tex`. Path B: grep `work/reference/document.md`.)
-2. If none found, web search: paper title + "github", Papers With Code, or the first author's GitHub profile.
-3. Clone if found:
-   ```bash
-   git clone --depth 1 <url> work/reference/code
-   ```
-4. Write `work/reference/code-status.yaml`:
-   ```yaml
-   found: true        # or false
-   url: "https://..."  # null if not found
-   cloned: true       # false if found but clone failed
-   notes: "..."
-   ```
+When both sub-agents have returned, spawn the architect with both indices in its reading list and both expert agent IDs reachable. The architect's reference is [`architect.md`](architect.md); the spawn pattern lives there.
 
-Spend no more than a few searches before recording failure and moving on. **Do NOT modify cloned code** — it's the reference, not the workdir.
+The handoff payload to architect's prompt:
 
-Skip Step 2 if `work/reference/code/` already exists.
+```
+- Paper-expert agent ID: <id>
+- Code-expert agent ID:  <id>
+- Read: work/reference/index.json, work/reference/astra.yaml, work/reference/code-index.md
+- Ask the experts (via SendMessage by ID) anything that isn't in the indices.
+```
 
 ## Survey signals (entry into ACQUIRE)
 
 Run `ls work/reference/` first.
 
-- If `paper.pdf` is present, **and** the path indicator (`source/` for Path A or `document.md` for Path B) is present, **and** `index.json` is present (with the enriched `citations:` block — `key -> {locations, citation, doi}`) → Step 1 is done.
-- If `work/reference/code/` is present (or `code-status.yaml` records `found: false`) → Step 2 is done.
-- When both are done, ACQUIRE is complete; the orchestrator proceeds to ARCHITECT.
-- Otherwise, run whichever step is missing. `/paper-extraction` handles its own idempotency for Step 1.
+- `paper.pdf` + path indicator (`source/` for Path A, `document.md` for Path B) + `index.json` present → paper-expert's work is done (or paper-expert is still resumable; check whether the agent is still addressable, otherwise re-spawn against the existing materials — `/paper-extraction` is idempotent and will skip done work).
+- `work/reference/code/` present, or `code-status.yaml` records `found: false`, **and** `code-index.md` is present → code-expert's work is done.
+- When both indices are present and both expert agent IDs are recorded, ACQUIRE is complete; proceed to ARCHITECT.
+- Otherwise, re-spawn whichever expert is missing. Both skills are survey-first and skip already-done work.
 
 ## Notes
 
-- **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces that paper-extraction doesn't cover, file it as paper-extraction work — not as ACQUIRE work. Bibliography resolution is paper-extraction's: cited-paper text and DOIs live inside `index.json#citations[key]`, not in a side file.
+- **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces — including mid-reproduction — fix it in `/paper-extraction`, not here. Bibliography resolution is paper-extraction's: cited-paper text and DOIs live inside `index.json#citations[key]`, not in a side file.
+- **lc-from-code is the code-inventory authority** for the scan portion. ACQUIRE's code-expert prompt constrains it to scan-only; the parameterization and run portions of `/lc-from-code` are not invoked at this phase.
 - **arXiv DOI form is `10.48550/arXiv.<id>`.** Useful when downstream tools want a DOI rather than an arXiv ID.
-- **Equation numbers and section numbers must match the rendered paper.** When citing "eq. N" or "§N" in any downstream phase, find the equation or heading by content, not by a naïve count of TeX blocks or markdown headings. Path A: source preserves printed numbers in `\label{}`s. Path B: Docling preserves printed numbers in its markdown.
-- **This phase is acquisition + code-clone, not understanding.** Do not start indexing or comparing the paper here — that's ARCHITECT.
-- **Code-as-canonical** is loaded by every subsequent sub-agent. The per-paper `CLAUDE.md` restates the rule; ACQUIRE just makes sure `work/reference/code/` exists when possible.
-- **Commit each step as it lands.** ACQUIRE runs as a sub-agent; the orchestrator reads `git log` to see how far it got. One commit per artifact (paper materials, code clone) keeps the trail readable.
+- **Equation numbers and section numbers must match the rendered paper.** When citing "eq. N" or "§N" downstream, find by content, not by a naïve count of TeX blocks or markdown headings. Path A: source preserves printed numbers in `\label{}`s. Path B: Docling preserves printed numbers.
+- **This phase is acquisition + on-hand expertise, not understanding.** ACQUIRE doesn't write `astra.yaml` at the project root and doesn't compare paper to code. ARCHITECT does that work, with the experts on hand.
+- **Code-as-canonical** is loaded by every subsequent sub-agent. The per-paper `CLAUDE.md` carries the rule; ACQUIRE just stands up the reference so the rule has something to point at.
+- **The cloned code is read-only reference for the agents.** code-expert's scan reads it; ARCHITECT and later phases may have their experts re-read parts of it; nothing modifies `work/reference/code/`. (When the reproduction's implementation needs to happen later, that's an IMPLEMENT-phase decision, not an ACQUIRE one.)
+- **Commit each artifact as it lands.** The orchestrator can commit paper materials when paper-expert returns, and the code clone + scan when code-expert returns — small, descriptive commits that make `git log` legible.
+- **Surface anti-patterns the experts flag.** If code-expert reports the clone failed or the repo is clearly dead, or paper-expert reports the paper substrate is broken, surface to the user immediately rather than handing a half-acquired workdir to ARCHITECT.
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index ed66c855..7d241bde 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -1,274 +1,170 @@
 # ARCHITECT — write the stub `astra.yaml`
 
-ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub in with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps the cognitive load on each sub-agent manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
+ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps each phase's cognitive load manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
 
-This phase runs as the orchestrator-spawned `architect` sub-agent. Internally it does its work in three steps: paper-side index (Explore), code-side index (Explore), synthesis (write the stub). The two Explore reads run in parallel; synthesis runs once both indexes exist. After the stub lands, a self-review pass cross-checks it against paper + code; how heavy that review is, the orchestrator picks per spawn from CLAUDE.md's Rigor section.
+This phase runs as the orchestrator-spawned `architect` sub-agent. The heavy work of *understanding* the paper and code already happened in ACQUIRE: paper-expert and code-expert are alive with deep context. ARCHITECT reads their indices, queries them via `SendMessage` for anything the indices don't cover, writes the stub, and self-reviews. No re-ingestion.
 
 ## Inputs
 
-- `work/reference/source/` (Path A — arXiv LaTeX) **or** `work/reference/document.md` + `work/reference/figures/` + `work/reference/tables/` + `work/reference/metadata.json` (Path B — Docling)
-- `work/reference/index.json` — structural index emitted by paper-extraction (consumed by the paper-side Explore; its `citations:` block already carries each cited paper's `{locations, citation, doi}`)
-- `work/reference/code/` — the reference code repo (when present)
-- CLAUDE.md — the per-paper artifact at the workdir root; its **Goal** section names the user's intended replication targets
-- `work/notes/notes.md` — user-supplied prior notes, if any (read by every sub-agent if present)
+- `work/reference/index.json` — paper-side structural index from `/paper-extraction` (figures, tables, section outline with line numbers, citations with resolved DOIs)
+- `work/reference/astra.yaml` — paper-extraction's ASTRA-shape stub of the paper itself: id, name, `narrative.summary` (from abstract), optionally `findings:` (paper's claimed numerical results)
+- `work/reference/code-index.md` — code-side inventory from code-expert's scan: script inventory, candidate decisions with file:line refs, module map, entry-points, external data dependencies, container hints
+- **paper-expert** (agent ID handed in by the orchestrator) — reachable via `SendMessage`. Ask anything the indices don't cover: "what does the paper say about the apodization choice", "which figures are primary vs secondary", "where does the paper define the fiducial cosmology", etc.
+- **code-expert** (agent ID handed in by the orchestrator) — reachable via `SendMessage`. Ask: "which module produces the BAO fit posteriors", "where is the magnitude cut applied", "is there a config file we should treat as the canonical baseline", etc.
+- CLAUDE.md — the per-paper artifact at the workdir root; its **Goal** section names the user's intended replication targets and fidelity intent.
+- `work/notes/notes.md` — user-supplied prior notes, if any.
 
 ## Outputs
 
 - `astra.yaml` — **stub form**: sub-analyses named, architecture wired (inputs / outputs declared at the sub-analysis level), high-level `narrative:` prose blocks per analysis. **No `decisions:`, `prior_insights:`, `findings:`, or `astra-anchor:` references yet** — those entries don't exist for the narrative to reference.
-- `work/notes/architect/paper-index.md` — paper-side Explore output: section list, sub-analysis boundary candidates, decision clusters, result loci (figures / tables / quoted numerics)
-- `work/notes/architect/code-index.md` — code-side Explore output: top-level module map, natural decomposition, entry-points, where the analysis stages live
-- `work/notes/architect/review-round-<N>.md` — each self-review round's findings (one file per round; how many rounds depends on the rigor setting the orchestrator chose for this spawn)
-
-## Step 1: Two parallel Explore reads
-
-From inside the architect sub-agent's session, spawn two Task-tool Explore sub-agents in parallel. Each is bounded — neither tries to compare paper to code, and neither writes `astra.yaml`. Their job is to give the synthesis step (next) enough indexed context to draft the stub.
-
-### Paper-side Explore — system prompt
-
-> You are a paper-indexing agent. Read the paper and produce an index that the architecture-synthesis agent will use to decide the `astra.yaml` sub-analysis decomposition. **Do NOT read code; do NOT write `astra.yaml`.**
->
-> ### Inputs
->
-> - Paper text: `work/reference/source/*.tex` (Path A) or `work/reference/document.md` (Path B). Read the methods, results, and analysis-bearing intro / discussion sections in full. Skip front-matter (abstract, acknowledgments, author list) and back-matter (references, supplementary).
-> - User-supplied notes: `work/notes/notes.md` if present.
->
-> ### What to extract
->
-> 1. **Section list** with anchors (`\label{}` for Path A; markdown heading for Path B).
-> 2. **Sub-analysis boundary candidates.** Where does the paper's pipeline have natural seams — places one stage's output flows as the next stage's input? Look for: a reconstruction stage producing a catalog consumed by a clustering stage; an MCMC producing a chain consumed by a parameter-estimation stage; a fit producing posteriors consumed by a comparison stage. Name each candidate with a noun phrase (`reconstruction`, `clustering`, `bao_fit`) and one-line description.
-> 3. **Decision clusters per sub-analysis.** Group the paper's choices by where they sit in the pipeline. Don't enumerate every choice — name the *clusters* (e.g. "fitting prior choices", "selection criteria for the catalog"). SPECIFY drills back into the paper to author each `decisions:` entry; you're indicating where to look.
-> 4. **Result loci.** Which figures / tables / in-text metrics report the paper's primary and secondary results? Use `path:line` for the `\includegraphics{}` or table source (Path A); use `metadata.json` indexes for Path B. Tag each as primary / secondary based on the paper's own emphasis.
-> 5. **Data-flow shape.** A short prose paragraph: "Inputs flow from <source datasets> through <stage 1> producing <intermediate>, into <stage 2> producing <intermediate>, into <stage 3> producing <primary result>." This becomes the seed for the root narrative's data-flow paragraph.
->
-> ### Output format — `work/notes/architect/paper-index.md`
->
-> ```markdown
-> # Paper index
->
-> ## Sections
-> - <NN. Section title> — anchor `<label>` in `<path>`. Phase: methods | results | discussion | other.
->
-> ## Sub-analysis candidates
-> - **<noun phrase id>** — <one-line role>; spans sections <list>; produces <output(s)>; consumes <input(s)>.
->
-> ## Decision clusters (per candidate sub-analysis)
-> ### <sub-analysis id>
-> - **<cluster name>** — <where in the paper>; <one-line shape of the choices>.
->
-> ## Result loci (primary + secondary)
-> - **<figure / table / metric>** — `<source-path:line>` or `metadata.json#<id>`; reported in §<X>; primary | secondary.
->
-> ## Data-flow shape
-> <one-paragraph prose: how inputs flow through the pipeline to the primary result>.
-> ```
->
-> ### Rules
->
-> - **Bounded read.** Do not read the code repo. Your job is paper-side only.
-> - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`. Those are SPECIFY's. Your output is markdown (the index); you do not write any YAML.
-> - **Quote sparingly.** Brief paper quotes are OK to disambiguate a result locus or a sub-analysis boundary; verbatim claim quotes are SPECIFY's substrate, not yours.
-> - **Bibliography is already resolved.** `work/reference/index.json#citations` carries each cited paper's text + DOI from paper-extraction. You don't need to re-derive that mapping; SPECIFY and LITERATURE will read from it directly.
-
-### Code-side Explore — system prompt
-
-> You are a code-indexing agent. Read the code repo and produce an index that the architecture-synthesis agent will use to decide the `astra.yaml` sub-analysis decomposition. **Do NOT read the paper; do NOT write `astra.yaml`.**
->
-> ### Inputs
->
-> - Code repo at `work/reference/code/`. Read the README, the entry-points, and follow imports to map the analysis pipeline. **Do NOT modify any code.**
-> - User-supplied notes: `work/notes/notes.md` if present.
->
-> ### What to extract
->
-> 1. **Top-level module map.** What lives where: each top-level directory or module file with a one-line role.
-> 2. **Natural decomposition.** Where does the code's pipeline split into independent stages? Most analysis pipelines have stage seams visible from imports — a `reconstruction/` module fed by `data/`, a `bao_fit/` module fed by `reconstruction/`. Name each stage with the same noun-phrase shape the paper-side index uses (the synthesis agent will reconcile names).
-> 3. **Entry-points.** Top-level scripts the user runs to produce primary results: `scripts/run_reconstruction.py`, `nbs/figure_4.ipynb`, etc. For each: which stage / output it produces, with a `path:line` to the main function.
-> 4. **External data dependencies.** What datasets the code expects to find at runtime — environment variables, config files, paths to catalogs. SPECIFY uses these for `inputs:`; this is the place to surface them.
-> 5. **Code-specific gotchas surfaced from the README or top-level docs.** Things the paper doesn't say but the code's own docs flag (a calibration version, a runtime requirement, a data preprocessing step). One bullet each, with `path:line`.
->
-> ### Output format — `work/notes/architect/code-index.md`
->
-> ```markdown
-> # Code index
->
-> ## Module map
-> - `<path>` — <one-line role>.
->
-> ## Natural decomposition
-> - **<noun phrase id>** — <one-line role>; entry-point `<path:line>`; consumes <input modules / data>; produces <output artifact paths or in-memory shapes>.
->
-> ## Entry-points (top-level runnable scripts)
-> - **<script path>** — produces <output id>; main: `<path:line>`.
->
-> ## External data dependencies
-> - **<dataset / env var / config path>** — read at `<path:line>`; <one-line on what's expected>.
->
-> ## Code-specific gotchas
-> - **<gotcha>** — surfaced at `<path:line>`; <one-line on why it matters>.
-> ```
->
-> ### Rules
->
-> - **Bounded read.** Do not read the paper. Your job is code-side only.
-> - **Index, do not author.** No `decisions:`, no `prior_insights:`, no `findings:`, no recipes. Your output is markdown, not YAML.
-> - **Trust the imports.** Module dependencies tell the natural decomposition story more reliably than the README's prose summary.
-
-## Step 2: Synthesis — write the stub `astra.yaml`
-
-Once both index files exist, the architect sub-agent does the synthesis directly (no further fan-out). This is where the structural decisions actually get made: reconcile paper-side vs code-side sub-analysis decompositions, pick the unified set of sub-analysis IDs, wire inputs and outputs at the sub-analysis level, and author the high-level `narrative:` prose blocks. If the architect sub-agent prefers to delegate the synthesis to a fresh Task-tool sub-agent for a clean context window, that is fine — the prompt below covers either case.
-
-> You are an ASTRA architecture-synthesis agent. You read paper-side and code-side indexes and produce the stub `astra.yaml` that SPECIFY will fill in.
->
-> ### Inputs
->
-> - `work/notes/architect/paper-index.md` — paper-side Explore output
-> - `work/notes/architect/code-index.md` — code-side Explore output (when present)
-> - `work/notes/notes.md` — user-supplied notes (if present)
-> - CLAUDE.md at the workdir root — its **Goal** section names the user's intended replication targets
->
-> ### What to do
->
-> 1. **Reconcile sub-analysis decompositions.** Read both index files' sub-analysis candidates. Where paper and code agree on a stage, use that name (noun-phrase, e.g. `reconstruction`). Where they disagree, the code's structure is canonical for stage boundaries — the paper compresses; the code reveals the actual decomposition. Where the code is absent, follow the paper alone.
-> 2. **Choose: one analysis or sub-analyses?** If the paper has only one stage end-to-end (no clean intermediate handoffs), write a single analysis. If the paper has genuinely independent stages (each one's output flows as the next one's input), write sub-analyses. Sub-analysis IDs must be noun phrases (not verb phrases): `reconstruction`, `clustering`, `bao_fit`. Avoid reserved names (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
-> 3. **Wire inputs and outputs at the sub-analysis level.** For each sub-analysis:
->    - Declare `inputs:` from the data-dependency list in the code-side index plus any paper-named external datasets. The depth (acquisition path, selection criteria) is SPECIFY's; ARCHITECT names the input and gives it a stable id.
->    - Declare `outputs:` matching the result loci from the paper-side index plus any intermediate artifacts a downstream sub-analysis consumes. Tag each output's `priority:` from the paper's emphasis (primary / secondary). The reproduction's targeted scope from CLAUDE.md's **Goal** takes precedence — if the user only wants Figure 3 and Table 2, only those land as `outputs:` (the rest are out-of-scope and noted as such).
-> 4. **Author the root and per-analysis narrative.** Use `/narrative` for prose authoring (it carries the discipline on reserved names, voice, the data-flow paragraph requirement). High-level prose only — *no `astra-anchor:` references yet, because the entries those would point at don't exist*. SPECIFY will weave in anchors as it authors `decisions:` / `prior_insights:` / `findings:` per sub-analysis. The root `narrative:` MUST include a top-down end-to-end data-flow paragraph (per the narrative skill's data-flow rules — closes lightcone-cli#108) when sub-analyses exist.
-> 5. **Validate** with `astra validate astra.yaml`. The stub MUST validate as written — even with empty `decisions:` / `prior_insights:` / `findings:` blocks, the structural fields and the narrative prose must pass schema checks.
->
-> ### Stub shape — what `astra.yaml` looks like after ARCHITECT
->
-> ```yaml
-> # Stub: structure + narrative; SPECIFY fills decisions, findings, prior_insights, evidence, anchors.
-> id: <paper-slug>
-> title: "<paper title>"
-> doi: <doi>
->
-> narrative:
->   summary: |
->     <high-level paragraph for the root analysis>
->   methods: |
->     <data-flow paragraph; required when sub-analyses exist>
->
-> analyses:
->   <sub-analysis-id-1>:
->     narrative:
->       summary: |
->         <prose for this sub-analysis>
->     inputs:
->       <input-id>:
->         <stable name; depth lives in SPECIFY>
->     outputs:
->       <output-id>:
->         type: figure | table | metric | data-product
->         priority: primary | secondary
->         description: |
->           <one-line on what this output is>
->     decisions: {}      # SPECIFY fills
->     prior_insights: {} # SPECIFY records placeholders (citation only), LITERATURE resolves evidence
->     findings: {}       # SPECIFY fills
->
->   <sub-analysis-id-2>:
->     ...
-> ```
->
-> ### Rules
->
-> - **Stub, not snapshot.** Don't try to author content for `decisions:`, `prior_insights:`, `findings:`. Those go in SPECIFY. Your job is the structural skeleton.
-> - **Reserved names.** Sub-analysis IDs are noun phrases; avoid the reserved set listed above. Each ID must be unique across the spec.
-> - **Code-as-canonical for structure.** Where paper and code disagree on the decomposition, the code's structure is canonical (the paper compresses for narrative; the code reveals real seams).
-> - **Targeted scope wins.** CLAUDE.md's **Goal** scopes the reproduction. If the user only wants Figures 3 and 4 plus Table 2, only those land as `outputs:` in the stub.
-> - **Narrative prose, no anchors.** Author `narrative:` prose at the root and per-sub-analysis level. Do NOT add `astra-anchor:` references — the entries those would point at don't exist yet.
-> - **Validate before exit.** `astra validate astra.yaml` must return clean.
-
-## Step 3: Self-review (rigor chosen per spawn)
+- `work/notes/architect/review-round-<N>.md` — each self-review round's findings (one file per round; how many rounds depends on the rigor setting the orchestrator chose for this spawn).
+
+The architect sub-agent's transcript persists alongside paper-expert and code-expert — later phases can `SendMessage` it with "you wrote this stub; why this decomposition?" if a downstream question needs the writing-time reasoning.
+
+## Step 1 — Write the stub `astra.yaml`
+
+Read the three indices first. Then query the experts as you write — paper-expert for paper-specific facts, code-expert for code-specific facts. Don't try to absorb the paper or code yourself; the experts already have that context built up.
+
+### What to do
+
+1. **Reconcile sub-analysis decompositions.** Read `code-index.md`'s natural-decomposition section and `index.json`'s section outline. Where paper and code agree on a stage, use that name (noun-phrase, e.g. `reconstruction`). Where they disagree, **code's structure is canonical for stage boundaries** — the paper compresses; the code reveals the actual decomposition. Where code is absent or thin, follow the paper alone. Ask code-expert to clarify any module-boundary ambiguity; ask paper-expert how the paper itself frames stage boundaries.
+2. **Choose: one analysis or sub-analyses?** If the paper has only one stage end-to-end (no clean intermediate handoffs), write a single analysis. If it has genuinely independent stages (each stage's output flows as the next's input), write sub-analyses. Sub-analysis IDs must be noun phrases: `reconstruction`, `clustering`, `bao_fit`. Avoid reserved names: `inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`.
+3. **Wire inputs and outputs at the sub-analysis level.** For each sub-analysis:
+   - Declare `inputs:` from `code-index.md`'s External-data-dependencies plus any paper-named external datasets. The depth (acquisition path, selection criteria) is SPECIFY's; ARCHITECT names the input and gives it a stable id.
+   - Declare `outputs:` matching the result loci from `index.json` (figures + tables) plus any intermediate artifacts a downstream sub-analysis consumes. Tag each output's `priority:` from the paper's emphasis (primary / secondary). **The reproduction's targeted scope from CLAUDE.md's Goal takes precedence** — if the user only wants Figure 3 and Table 2, only those land as `outputs:`; the rest are out-of-scope and noted as such.
+   - Ask paper-expert which results the paper itself emphasizes if priority is unclear.
+4. **Author the root and per-analysis narrative.** Invoke `/narrative` for prose authoring (it carries the discipline on reserved names, voice, the data-flow paragraph requirement). High-level prose only — **no `astra-anchor:` references yet**, because the entries those would point at don't exist. SPECIFY will weave in anchors as it authors `decisions:` / `prior_insights:` / `findings:` per sub-analysis. The root `narrative:` MUST include a top-down end-to-end data-flow paragraph (per the narrative skill's data-flow rules) when sub-analyses exist.
+5. **Validate.** `astra validate astra.yaml` must return clean — even with empty `decisions:` / `prior_insights:` / `findings:` blocks, the structural fields and narrative prose must pass schema checks.
+
+### Stub shape — what `astra.yaml` looks like after ARCHITECT
+
+```yaml
+# Stub: structure + narrative; SPECIFY fills decisions, findings, prior_insights, evidence, anchors.
+id: <paper-slug>
+title: "<paper title>"
+doi: <doi>
+
+narrative:
+  summary: |
+    <high-level paragraph for the root analysis>
+  methods: |
+    <data-flow paragraph; required when sub-analyses exist>
+
+analyses:
+  <sub-analysis-id-1>:
+    narrative:
+      summary: |
+        <prose for this sub-analysis>
+    inputs:
+      <input-id>:
+        <stable name; depth lives in SPECIFY>
+    outputs:
+      <output-id>:
+        type: figure | table | metric | data-product
+        priority: primary | secondary
+        description: |
+          <one-line on what this output is>
+    decisions: {}      # SPECIFY fills
+    prior_insights: {} # SPECIFY records placeholders (citation only), LITERATURE resolves evidence
+    findings: {}       # SPECIFY fills
+
+  <sub-analysis-id-2>:
+    ...
+```
+
+### Rules for Step 1
+
+- **Stub, not snapshot.** Don't try to author content for `decisions:`, `prior_insights:`, `findings:`. Those go in SPECIFY. Your job is the structural skeleton.
+- **Reserved names.** Sub-analysis IDs are noun phrases; avoid the reserved set. Each ID must be unique across the spec.
+- **Code-as-canonical for structure.** Where paper and code disagree on the decomposition, the code's structure is canonical (the paper compresses for narrative; the code reveals real seams).
+- **Targeted scope wins.** CLAUDE.md's **Goal** scopes the reproduction. If the user only wants Figures 3–4 plus Table 2, only those land as `outputs:`.
+- **Narrative prose, no anchors.** Author `narrative:` prose at root and per-sub-analysis levels. Do NOT add `astra-anchor:` references — the entries those would point at don't exist yet.
+- **Validate before exit.** `astra validate astra.yaml` must return clean.
+- **Don't re-ingest.** The experts have already read the paper and code in depth. Query them; don't try to absorb the materials yourself. Your context window is for synthesis, not absorption.
+
+## Step 2 — Self-review (rigor chosen per spawn)
 
 After the stub lands, a fresh-context sub-agent cross-checks it against paper + code: are the sub-analyses the right decomposition? Are the inputs and outputs declared at the sub-analysis level wired correctly? Does the narrative prose accurately describe what each sub-analysis does?
 
 The depth of self-review is set by the rigor level the orchestrator picked when it spawned this `architect` sub-agent — read CLAUDE.md's **Rigor** section for the current state and what the orchestrator flagged as the chosen rigor for this spawn:
 
-- **Cheap:** skip review entirely, or run a single fresh-context Task-tool sub-agent pass and incorporate its fixes once.
-- **Heavy:** N rounds — each round spawns a fresh Task-tool reviewer against `astra.yaml` + paper + code; the architect sub-agent incorporates fixes (regenerate the stub or edit it directly for trivial cases); the next round spawns another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap.
-
-The discipline: each round spawns a brand-new Task-tool sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; the architect sub-agent edits the stub between rounds (or for trivial mechanical fixes, the orchestrator can do the edit directly).
-
-After the self-review terminates, the architect sub-agent updates CLAUDE.md's **Rigor** section with the post-spawn state of `astra.yaml` (e.g. *stub: baseline* after a cheap pass, *stub: tightened* after heavy review). That keeps the picture honest across sub-agents.
-
-### Per-round fresh sub-agent — system prompt
-
-> You are an ARCHITECT-stub reviewer. Read `astra.yaml` (the stub), the paper, and the code (when present), and report any structural inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
->
-> ### Inputs
->
-> - `astra.yaml` — the stub under review (sub-analyses, inputs, outputs, narrative; `decisions:` / `prior_insights:` / `findings:` are intentionally empty at this stage, do NOT flag those as missing)
-> - `work/notes/architect/paper-index.md` — paper-side Explore output
-> - `work/notes/architect/code-index.md` — code-side Explore output (when present)
-> - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
-> - `work/reference/code/` (when present) — canonical reference for stage boundaries + entry-points
-> - CLAUDE.md — for the **Goal** section's scope fence
->
-> ### What to check
->
-> 1. **Sub-analysis decomposition.** Are the sub-analyses the right cuts? Where the code structure shows a clean stage boundary, is the stub's split consistent with it? Where the paper compresses across stages, is the stub's decomposition still defensible against the code? Where there is no code, does the stub's decomposition match the paper's natural seams?
-> 2. **Sub-analysis IDs.** Noun phrases, not verb phrases. No reserved-name collisions (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
-> 3. **Inputs at sub-analysis level.** Each declared input has a stable id; the data dependency is real (cross-check against `work/notes/architect/code-index.md`'s External-data-dependencies list and the paper's data section). No phantom inputs invented to round out the structure.
-> 4. **Outputs at sub-analysis level.** Each declared output corresponds to a result locus from the paper-side index OR an intermediate artifact a downstream sub-analysis consumes. The targeted scope from CLAUDE.md's **Goal** is honored — no out-of-scope outputs sneaking in, no in-scope targets missed.
-> 5. **Narrative coverage.** The root narrative includes a data-flow paragraph (when sub-analyses exist). Each sub-analysis's `narrative:` accurately describes its role. No `astra-anchor:` references at this stage (those land in SPECIFY); flag any that snuck in.
-> 6. **Validates.** `astra validate astra.yaml` returns clean.
->
-> ### What NOT to do
->
-> - **Do not flag empty `decisions:` / `prior_insights:` / `findings:`.** That's SPECIFY's territory. Your job is structural correctness of the stub.
-> - **Do not edit any file.** Your output is a findings file; an ARCHITECT-fix pass responds to the findings.
-> - **Do not re-read the entire paper.** Use Grep + the index files.
-> - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
->
-> ### Output format — `work/notes/architect/review-round-<N>.md`
->
-> ```markdown
-> # Architect-review round <N>
->
-> Reviewer ran fresh against astra.yaml (stub), paper, and code.
->
-> ## Findings
->
-> ### <category — e.g. "Sub-analysis decomposition" / "Outputs" / "Narrative">
->
-> - **<one-line finding>**
->   - **What's wrong**: <quote or location of the structural problem>
->   - **Where to fix**: <`astra.yaml#path/to/key` or `work/notes/architect/paper-index.md` row>
->   - **Suggested fix**: <one-line concrete change>
->   - **Source**: <paper §X.Y "quote" + index row, or code `path:line`>
->
-> ## Verdict
->
-> - **fixes_needed**: <count>
-> - **clean** | **needs-fixes**
-> ```
+- **Cheap:** skip review entirely, or run a single fresh-context reviewer pass and incorporate its fixes once.
+- **Heavy:** N rounds — each round spawns a fresh reviewer against `astra.yaml` + the ACQUIRE indices + the experts; the architect sub-agent incorporates fixes; the next round spawns another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap.
+
+Each round spawns a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; the architect sub-agent edits the stub between rounds (or for trivial mechanical fixes, the orchestrator can do the edit directly).
+
+After self-review terminates, the architect sub-agent updates CLAUDE.md's **Rigor** section with the post-spawn state of `astra.yaml` (e.g. *stub: baseline* after a cheap pass, *stub: tightened* after heavy review).
+
+### Per-round fresh sub-agent — prompt shape
+
+```
+You are an ARCHITECT-stub reviewer. Read astra.yaml (the stub) and report
+structural inconsistencies. You are one of several independent reviewers;
+do not assume anything has already been fixed.
+
+Inputs:
+  - astra.yaml — the stub under review. decisions: / prior_insights: /
+    findings: are intentionally empty; do NOT flag those as missing.
+  - work/reference/index.json — paper structural index
+  - work/reference/astra.yaml — paper-extraction's paper-as-ASTRA stub
+  - work/reference/code-index.md — code inventory
+  - paper-expert agent ID: <id> — SendMessage for paper-side questions
+  - code-expert agent ID:  <id> — SendMessage for code-side questions
+  - CLAUDE.md — for the Goal section's scope fence
+
+What to check:
+  1. Sub-analysis decomposition. Right cuts? Consistent with code-index?
+     Defensible against the paper where the paper compresses?
+  2. Sub-analysis IDs. Noun phrases. No reserved-name collisions
+     (inputs, outputs, decisions, findings, prior_insights, analyses,
+      options, content, narrative).
+  3. Inputs at sub-analysis level. Each input has a stable id; the data
+     dependency is real (cross-check against code-index.md's
+     External-data-dependencies and the paper's data section).
+  4. Outputs at sub-analysis level. Each output corresponds to a result
+     locus from index.json OR an intermediate artifact a downstream
+     sub-analysis consumes. Targeted scope from CLAUDE.md's Goal is
+     honored — no out-of-scope outputs sneaking in, no in-scope targets
+     missed.
+  5. Narrative coverage. Root narrative includes a data-flow paragraph
+     (when sub-analyses exist). Each sub-analysis's narrative accurately
+     describes its role. No astra-anchor: references at this stage; flag
+     any that snuck in.
+  6. Validates. astra validate astra.yaml returns clean.
+
+What NOT to do:
+  - Do not flag empty decisions: / prior_insights: / findings:. That's
+    SPECIFY's territory.
+  - Do not edit any file. Output findings only.
+  - Do not re-read the entire paper or code. Use the indices and ask the
+    experts.
+  - Do not assume a prior reviewer has been here. You are fresh.
+
+Output: work/notes/architect/review-round-<N>.md (findings + verdict).
+```
 
 ### Termination
 
 - **Cheap:** one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
 - **Heavy:**
-  - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
-  - If round N is the first round (N=1), spawn round 2 unconditionally so we can compare.
-  - If round N produced fixes, spawn round (N+1) as a fresh sub-agent that does not see round N's findings or the fixes.
-  - If N hits the 5-round system cap without two consecutive clean rounds, the architect sub-agent stops and reports back to the orchestrator. If the user is reachable in the architect sub-agent's chat or the orchestrator session, ask in prose: "ARCHITECT review reached round cap with N fixes still landing; continue, accept the current stub, or revise scope?" If the user is unreachable, accept the current stub, log the unfinished tail in `open-questions.md` at the workdir root, and let the orchestrator decide whether to proceed to SPECIFY or re-spawn ARCHITECT later.
+  - Round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
+  - First round (N=1): spawn round 2 unconditionally so we can compare.
+  - Round N produced fixes: spawn round (N+1) as a fresh sub-agent that does not see round N's findings or fixes.
+  - 5-round cap without two consecutive clean rounds: stop, report back to orchestrator. If user is reachable, ask in prose: "ARCHITECT review reached round cap with N fixes still landing; continue, accept the current stub, or revise scope?" If unreachable, accept the current stub, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed to SPECIFY or re-spawn ARCHITECT later.
 
 ## Survey signals (entry into ARCHITECT)
 
-- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) exists ⇒ ready to architect
-- `work/notes/architect/paper-index.md` and `work/notes/architect/code-index.md` (if code present) exist ⇒ Explore pass done
-- `astra.yaml` exists; `astra validate astra.yaml` returns clean; sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks are present-and-empty ⇒ stub written
+- `work/reference/index.json` + `work/reference/astra.yaml` + `work/reference/code-index.md` (when code present) exist ⇒ ACQUIRE indices are ready
+- paper-expert and code-expert agent IDs received from the orchestrator ⇒ experts are reachable
+- `astra.yaml` exists at project root; `astra validate astra.yaml` returns clean; sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty ⇒ stub written
 - For cheap: `work/notes/architect/review-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ ARCHITECT done
-- For heavy: two consecutive `work/notes/architect/review-round-<N>.md` files both have verdict `clean` ⇒ ARCHITECT done; orchestrator proceeds to SPECIFY
+- For heavy: two consecutive `work/notes/architect/review-round-<N>.md` files both with verdict `clean` ⇒ ARCHITECT done; orchestrator proceeds to SPECIFY
 
 ## Notes
 
-- **Run the Explore reads in parallel.** They're fully independent (one reads paper-only, one reads code-only). Synthesis runs once, after both index files exist.
-- **The Explore reads do not write `astra.yaml`.** They write index markdown. Only the synthesis step writes the stub. This separation keeps each Explore read's context bounded — it doesn't have to think about ASTRA's schema, only the read.
-- **The stub's empty blocks are intentional.** `decisions: {}`, `prior_insights: {}`, `findings: {}` make it clear at a glance that ARCHITECT's job is structural, and that SPECIFY is what fills them. Don't try to half-author content — empty is honest.
+- **Experts replace re-ingestion.** ACQUIRE's paper-expert and code-expert are alive with deep context. ARCHITECT does not spawn its own Explore sub-agents; it queries the experts. This keeps the architect sub-agent's context lean.
+- **The stub's empty blocks are intentional.** `decisions: {}`, `prior_insights: {}`, `findings: {}` make it clear at a glance that ARCHITECT's job is structural and SPECIFY fills them. Don't try to half-author content — empty is honest.
 - **Code-as-canonical for structure, paper-as-canonical for narrative voice.** The code reveals where the real stage boundaries are; the paper provides the words to describe them. The stub uses both.
-- **Resume is automatic.** If `astra.yaml` already validates and has the structural fields populated, on re-spawn the architect sub-agent skips Step 1 and Step 2 and runs Step 3 (review) only.
+- **Resume is automatic.** If `astra.yaml` already validates and has the structural fields populated, on re-spawn the architect sub-agent skips Step 1 and runs Step 2 (review) only.
 - **The narrative skill is the prose author, not the structure author.** Invoke `/narrative` for the prose blocks; ARCHITECT's job is the structural skeleton plus invoking `/narrative` to fill the `narrative:` keys cleanly.
-- **Commit each artifact as it lands.** The orchestrator reads `git log` to see how far the architect sub-agent got. Indexes commit before the stub; the stub commits before any review-round files; review-round files commit one per round. Small, descriptive commits keep the trail readable.
+- **Commit each artifact as it lands.** The orchestrator reads `git log` to see how far the architect sub-agent got. Stub commits before any review-round files; review-round files commit one per round. Small, descriptive commits keep the trail readable.
diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index a90741f3..dca3cc8d 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -10,6 +10,8 @@ This phase runs as the orchestrator-spawned `compare` sub-agent. The orchestrato
 - `astra.yaml` — output definitions (each target maps to an output)
 - `targets/` — reference figures / tables for comparison
 - `results/<universe>/<output_id>/` — reproduced results
+- **paper-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Useful for "what does the paper actually claim for this number" or "how does the paper describe what Figure 3 should show" when grading the comparison.
+- **code-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Useful for diagnosing divergence: "what does the reference code compute here that ours might miss".
 
 ## Outputs
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index 5bab21d7..15d5f40a 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -8,9 +8,11 @@ This phase runs as the orchestrator-spawned `implement` sub-agent. Most implemen
 
 - `astra.yaml` — the filled spec (sub-analyses, decisions, prior_insights, findings, narrative — all populated by SPECIFY)
 - `implementation-notes.md` — tricky algorithms, numerical gotchas, data-format quirks
-- `work/notes/architect/paper-index.md` — for context when the spec compresses (sub-analysis decomposition, result loci, decision clusters)
-- `work/notes/architect/code-index.md` (when code present) — natural decomposition + entry-points + data dependencies + gotchas (the canonical map of where each sub-analysis's logic lives in `work/reference/code/`)
+- `work/reference/index.json` — paper-side structural index (figures, tables, outline, citations); useful when the spec compresses or you need to find where in the paper a behavior is described.
+- `work/reference/code-index.md` (when code present) — code inventory: module map, candidate decisions with file:line, entry-points, data dependencies, gotchas (the canonical map of where each sub-analysis's logic lives in `work/reference/code/`).
 - `work/reference/code/` (if present) — **canonical reference. Read it when implementing each output.** Where paper and code disagree, code wins for numerics, plotting, and method.
+- **code-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. The first stop for "where does X live in the code", "what's the canonical entry-point for Y", "what's the default parameter the code uses for Z". Cheaper than re-reading the code yourself.
+- **paper-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Useful when implementing an output and the spec doesn't fully capture what the paper says it should produce (e.g. "what's the expected axis range for Figure 4").
 - CLAUDE.md — **Rigor** for this spawn's chosen rigor level; **Paper-vs-code disagreements** for prior conflicts already logged.
 
 ## Outputs
@@ -25,7 +27,9 @@ This phase runs as the orchestrator-spawned `implement` sub-agent. Most implemen
 
 Read `astra.yaml` and `implementation-notes.md`. For each output, write a script in `scripts/` that produces it, and add a `recipe:` block to the output's entry in `astra.yaml` with `command:` and `inputs:`.
 
-If `work/reference/code/` exists, **read the relevant code when implementing each output** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that SPECIFY's code pass missed:
+### With a code reference (`work/reference/code/` exists)
+
+**Read the relevant code when implementing each output** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that SPECIFY's code pass missed:
 
 - **User reachable** (in the implement sub-agent's chat): ask in prose — paper method + code method + plausible impact + which one to take.
 - **User unreachable**: continue with the code's behavior, append the disagreement to CLAUDE.md's **Paper-vs-code disagreements** AND `open-questions.md`, and note it in `implementation-notes.md` so REVIEW close-out can ratify or override.
@@ -34,13 +38,19 @@ Without this discipline, the implementation drifts to "looks right" rather than
 
 When the reference code is substantial enough that implementation is really a migration of an existing codebase, follow `/lc-from-code`'s migration workflow in **augment existing ASTRA** mode. Use its code scan, minimal parameter-plumbing, dependency/container, and baseline-preservation strategies, but apply them to this reproduction's existing `astra.yaml`. Do not create a second ASTRA project or duplicate the spec; add recipes, code-backed options, implementation notes, and missing structure to the current reproduction artifact.
 
+### Without a code reference (`work/reference/code/` is absent)
+
+When `code-status.yaml` records `found: false` or the cloned repo turned out to be unusable, there is no canonical code substrate to anchor against. **Write the implementation fresh from the spec** — `astra.yaml`'s decisions, findings, and prior_insights are now the only source of method-level truth, and the paper's prose (consulted via paper-expert) is the source of numerics-level truth. Don't pretend a code reference exists; don't try to find a similar paper's code as a stand-in. Implement what the spec describes, ask paper-expert when the spec compresses something you need clarified, and rely on COMPARE to surface anywhere the implementation has drifted from the paper's claims.
+
+The code-as-canonical rule does not apply here — there is no code to be canonical. The paper is the only anchor. This is the harder path; reproductions on it converge slower and have more open questions for REVIEW close-out. Surface that honestly to the user as you go; don't dress up paper-only implementations as if they had a code anchor.
+
 ### Parallelize where feasible
 
 When outputs are produced by independent scripts (no shared expensive computation), the implement sub-agent spawns one Task-tool sub-sub-agent per output. Each sub-sub-agent gets:
 
 - The output's spec entry from `astra.yaml` (including its sub-analysis's `decisions:` / `findings:` for context)
 - The relevant section of `implementation-notes.md`
-- The matching entry in `work/notes/architect/code-index.md`'s natural-decomposition / entry-points block — that's the pointer back to the canonical code location for the sub-analysis the output lives in
+- The matching entry in `work/reference/code-index.md`'s natural-decomposition / entry-points block — that's the pointer back to the canonical code location for the sub-analysis the output lives in
 - The relevant code path(s) under `work/reference/code/`
 
 The implement sub-agent merges scripts and recipes after the per-output sub-sub-agents finish. Tightly-coupled outputs (e.g. an MCMC producing both a chain and a summary statistic) stay in one sub-sub-agent and one script.
@@ -72,8 +82,8 @@ The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: each ro
 > - `scripts/` — first-pass implementation
 > - `astra.yaml` — the spec (recipes are part of the implementation; structural + content fields are ARCHITECT's and SPECIFY's)
 > - `implementation-notes.md`
-> - `work/notes/architect/paper-index.md` — Grep into; do not re-read whole
-> - `work/notes/architect/code-index.md` (when present) — natural decomposition + entry-points + gotchas
+> - `work/reference/index.json` — Grep into; do not re-read whole
+> - `work/reference/code-index.md` (when present) — natural decomposition + entry-points + gotchas
 > - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep)
 > - `work/reference/code/` (when present) — canonical reference for numerics + method
 >
@@ -89,7 +99,7 @@ The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: each ro
 > ### What NOT to do
 >
 > - **Do not edit any file.** Your output is a findings file; an IMPLEMENT-fix pass responds to the findings.
-> - **Do not re-read the entire paper.** Grep into `work/notes/architect/` and `work/reference/source/` (or `document.md`) for the specific claims you want to verify; the filled `astra.yaml` is your primary source for what each sub-analysis is supposed to do.
+> - **Do not re-read the entire paper.** Ask paper-expert / code-expert via `SendMessage` for claim verification, or Grep into `work/reference/index.json`, `work/reference/code-index.md`, and `work/reference/source/` (or `document.md`) for specific items. The filled `astra.yaml` is your primary source for what each sub-analysis is supposed to do.
 > - **Do not invent problems.** If the implementation matches paper + code, say so briefly.
 > - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
 >
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index 4b98b266..1622ed51 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -1,132 +1,164 @@
 # LITERATURE — resolve `prior_insights:` placeholders against the cited papers
 
-After SPECIFY's paper pass records each citation marker as a `prior_insights:` *placeholder* (id, claim, doi, decision_links — no `evidence:` selector), LITERATURE fetches each cited paper, finds the verbatim quote that justifies the placeholder's claim, and authors the resolved `evidence:` selector back into `astra.yaml`'s `prior_insights[<id>].evidence[]`. After LITERATURE, every `prior_insights:` entry is a verified citation; `astra validate astra.yaml --verify-evidence` should pass.
+After SPECIFY records each citation marker as a `prior_insights:` *placeholder* (`id`, `claim`, `doi`, `decision_links` — no `evidence:` selector), LITERATURE stands up each cited paper's reading materials, finds the verbatim quote in the cited paper that justifies the placeholder's claim, and authors the resolved `evidence:` selector back into `astra.yaml`. After LITERATURE, every `prior_insights:` entry is a verified citation; `astra validate astra.yaml --verify-evidence` returns clean.
+
+The quote-finding direction is: **target paper's claim → quote inside the cited paper**. The target paper says "we follow Smith+20's magnitude cut of i<24"; LITERATURE goes to Smith+20 and finds the verbatim quote there that justifies that statement ("we adopt a magnitude cut of i<24 as our fiducial selection"). The point is to verify the target paper's claims about its predecessors are real, not paraphrased or misremembered.
 
 LITERATURE runs **after SPECIFY**, not before — relevant `prior_insights:` are defined by the decisions and findings they justify. Fetching cited papers speculatively before SPECIFY would do work for citations that may never end up needed.
 
-This phase runs as the orchestrator-spawned `literature` sub-agent. Internally it fans out: one Task-tool sub-sub-agent per cited paper for parallel resolution — they edit disjoint subsets of `astra.yaml`'s `prior_insights:` entries (only the placeholders whose `doi:` matches the sub-sub-agent's paper). A merge step (the literature sub-agent itself) writes the per-paper resolutions back into `astra.yaml` after all sub-sub-agents complete; a final fresh-context Task-tool sub-agent runs the self-review at the rigor level the orchestrator picked for this spawn.
+This phase runs as the orchestrator-spawned `literature` sub-agent. Its internal architecture is **two simple stages**: mechanical fetch (paper-extraction's deterministic script, batched-parallel via shell — no agent fan-out), then quote-finding (literature does it itself for small placeholder counts; spawns a small number of Haiku sub-agents for large counts). The agentic work is the quote-matching; the fetch is plumbing.
 
 ## Inputs
 
 - `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries with `claim:` + `doi:` + `decision_links:` but no `evidence:` selector. These are the placeholders LITERATURE resolves.
-- `work/reference/index.json#citations` — paper-extraction's cite-key → `{locations, citation, doi}` mapping for every entry in the target paper's bibliography. Used as the canonical cite-key → DOI lookup when cross-checking placeholder DOIs and when surfacing unresolved-DOI cases.
-- `work/notes/architect/paper-index.md` — has the decision clusters per sub-analysis; per-paper sub-sub-agents get it as context.
-- `work/reference/source/` (Path A — arXiv LaTeX) or `work/reference/document.md` (Path B — Docling) — the target paper (for context on how the cited paper is invoked).
+- `work/reference/index.json#citations` — paper-extraction's cite-key → `{locations, citation, doi}` mapping for every entry in the target paper's bibliography. Used as the canonical cite-key → DOI lookup when cross-checking placeholder DOIs and surfacing unresolved-DOI cases.
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — the target paper; useful for context on how the cited paper is invoked.
+- **paper-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Useful when a placeholder's claim is ambiguous and you need to know what the target paper actually says around the citation site.
 - CLAUDE.md — **Rigor** for this spawn's chosen rigor level.
 
 ## Outputs
 
 - `astra.yaml` — `prior_insights:` placeholders **resolved**: each placeholder now has at least one `evidence:` entry with `TextQuoteSelector` (`exact:`, `prefix:`, `suffix:`) plus `FragmentSelector` (`page:`) pointing at the cited paper. `astra validate astra.yaml --verify-evidence` returns clean.
-- `work/notes/literature/<doi-slug>.yaml` — one file per cited paper carrying that paper's per-placeholder evidence resolutions (intermediate artifact; resume-by-existence — re-running LITERATURE skips a paper whose YAML already exists).
-- Cached PDFs registered with `astra paper add` so `astra validate --verify-evidence` and downstream auditors can find them.
+- `work/cited/<doi-slug>/` — one directory per cited paper, holding that paper's substrate from paper-extraction (`paper.pdf`, `source/` or `document.md`, `index.json`, `astra.yaml` stub, figures, tables). Resume-by-existence: re-running LITERATURE skips fetching any DOI whose `work/cited/<doi-slug>/` is already populated.
+- `work/notes/literature/resolutions.yaml` — consolidated per-placeholder evidence resolutions before merge (when Haiku fan-out is used, sub-Haiku outputs land in `work/notes/literature/haiku-<N>.yaml` and are merged into this single file). Intermediate; survives for audit.
 
 ## How it runs
 
-1. **Discovery.** Read `astra.yaml` and collect every `prior_insights:` entry whose `evidence:` is missing or empty. Group by `doi:`. Each group becomes a per-paper sub-sub-agent invocation.
-2. **Per-paper resolution (parallel).** Spawn one Task-tool sub-sub-agent per DOI group. Each one: caches the PDF via `astra paper add`, reads the cited paper, finds verbatim quote(s) supporting each placeholder claim in its group, and writes the per-placeholder `evidence:` resolutions to `work/notes/literature/<doi-slug>.yaml`. Sub-sub-agents do not edit `astra.yaml` directly — they write their per-paper YAML and exit.
-3. **Merge.** The literature sub-agent itself reads each `work/notes/literature/<doi-slug>.yaml` and writes the resolved `evidence:` entries back into `astra.yaml`'s `prior_insights[<insight_id>].evidence[]`. Single writer, no merge conflicts.
-4. **Self-review (rigor chosen per spawn).** A fresh-context Task-tool sub-agent reads each `prior_insights:` entry against its cited paper and asks "does this evidence actually justify the claim it's attached to?" Iterate per the rigor level the orchestrator chose — cheap: one pass; heavy: N rounds until two consecutive rounds find no fixes (or a 5-round system cap).
-
-## Per-paper resolution sub-sub-agent — system prompt
-
-> You are an ASTRA evidence-resolution agent. Your task is to find the verbatim quotes in a single cited paper that justify a set of `prior_insights:` placeholders authored by SPECIFY.
->
-> ### Inputs
->
-> You are given:
->
-> - The path to the cited paper's PDF (cached via `astra paper add`).
-> - A list of placeholder claims to resolve, each carrying:
->   - `id:` — the placeholder's unique id within `astra.yaml`.
->   - `claim:` — what the cited paper supports about a decision in the target paper (the target paper's framing, written by SPECIFY).
->   - `decision_links:` — which decision option(s) in `astra.yaml` this placeholder backs (for context — helps you find the right passage).
-> - The path to the target paper (`work/reference/source/` or `work/reference/document.md`) for context on how the cited paper is invoked.
-> - `work/notes/architect/paper-index.md` — the decision clusters from ARCHITECT.
->
-> ### Instructions
->
-> 1. Read the cited PDF using the Read tool.
-> 2. For each placeholder claim, locate verbatim passage(s) in the cited paper that support it. Focus on:
->    - Empirical comparisons between approaches the placeholder's `decision_links` reference.
->    - Performance benchmarks or validation results relevant to the choices.
->    - Recommendations or caveats about specific methods / parameters.
-> 3. For each supporting passage, build a `TextQuoteSelector` (`exact:` + `prefix:` + `suffix:`) and `FragmentSelector` (`page:`).
-> 4. If a placeholder's claim has no supporting evidence in the paper (the citation was loose or the claim was paraphrased beyond what the paper actually says), record it under `unresolved:` with a brief note rather than fabricating evidence. The self-review pass surfaces these to `open-questions.md` for the user to resolve at REVIEW close-out.
-> 5. Write the per-placeholder resolutions to the specified output file.
->
-> ### Caching the source PDF
->
-> Before resolution, register the paper with the validator's PDF cache:
->
-> ```bash
-> astra paper add "<DOI>"
-> ```
->
-> For arXiv DOIs (`10.48550/arXiv.<id>`) this fetches directly. Journal DOIs that 403 on Unpaywall can be aliased to a locally-downloaded arXiv preprint:
->
-> ```bash
-> astra paper add "<JOURNAL_DOI>" --pdf <path-to-arxiv-pdf>
-> ```
->
-> ### Quote fidelity rules
->
-> Quotes are verified at the spec level (`astra validate astra.yaml --verify-evidence`). Your job here is to extract quotes that pass that verification cleanly. The checks are:
->
-> - Each `exact` quote must be present on the cited page, fuzzy-matched at RapidFuzz `partial_ratio` ≥ 70. Copy verbatim from the PDF; do not paraphrase, normalize whitespace, or strip mathematical typesetting.
-> - The validator concatenates `prefix + quote + suffix` and matches that against the page text at a context score ≥ 80. Choose `prefix` / `suffix` as REAL surrounding page text (W3C TextQuoteSelector convention), not editorial commentary. Wording like "(Section 3.1 of Foo+19)" or "(see Figure 4)" silently lowers the context score below threshold even when the quote itself is in the PDF.
-> - Avoid YAML `|` block-literal style for `exact`, `prefix`, and `suffix` values: embedded newlines from block-literal folding can mishandle the context-score concatenation. Single-line strings or `>` folded-block style are safer.
-> - Math-formula quotes (with superscripts, subscripts, inline footnote markers) are likely to fail because the PDF text extractor collapses these. Quote the surrounding English narrative instead, or skip that piece of evidence if a sibling quote already establishes the claim.
->
-> The verification cache is keyed by `(doi, version, sha256(quote_text))` plus `pdf_sha256`, so any edit to a quote in the eventual YAML automatically invalidates that entry — there is no need to delete the cache between runs.
->
-> ### Quote granularity rules
->
-> - **Quotes carry the claim on their own.** A four-word fragment satisfies fuzzy-match but fails the reader: lift the quote out of context and the claim it supports must still stand. Default to full sentences with TeX-anchored prefix/suffix; split a long passage into two evidence rows rather than truncate a quote into a fragment that depends on context. Fragments creep in at exactly the spots where inline math forces shrinking, which is also where claims hide.
-> - **Cross-section methodology gets separate evidence rows.** When a paper's relevant methodology is split across multiple sections — a methods chapter defining a tool, a results chapter setting a threshold, an application chapter running it — file one evidence row per piece, each citing the section where that piece is *defined*. Do not collapse all the borrowed pieces into the application section's number.
->
-> ### Output format
->
-> Write ONLY this YAML structure to the output file. No other text.
->
-> ```yaml
-> resolutions:
->   <insight_id>:
->     id: <insight_id>
->     evidence:
->       - id: ev1
->         doi: "<DOI>"
->         quote:
->           type: TextQuoteSelector
->           exact: "<exact quote from paper, verbatim>"
->           prefix: "<~20-100 chars of REAL surrounding text BEFORE the quote>"
->           suffix: "<~20-100 chars of REAL surrounding text AFTER the quote>"
->         location:
->           type: FragmentSelector
->           page: <page number>
->
-> unresolved:
->   <insight_id>:
->     reason: "<one-line: why no supporting evidence was found>"
-> ```
->
-> ### Rules
->
-> - The keys under `resolutions:` and `unresolved:` are the placeholder `id:` values from `astra.yaml`'s `prior_insights:` — preserve them exactly. The merge step uses these as the join key.
-> - One placeholder lands in either `resolutions:` or `unresolved:`, never both. If two passages support the same claim, list both as siblings under one placeholder's `evidence:`.
-> - Quotes must be EXACT — copy verbatim from the PDF, no paraphrasing or whitespace normalization.
-> - Prefix and suffix must be real surrounding page text, not editorial parentheticals.
-> - `prefix:` and `suffix:` are REQUIRED for every `TextQuoteSelector`.
-> - Do NOT edit `astra.yaml`. The merge step does that.
-
-## Merge step
-
-After all per-paper sub-sub-agents complete, the literature sub-agent reads each `work/notes/literature/<doi-slug>.yaml` and writes the resolutions back into `astra.yaml`:
-
-- For each entry in `resolutions:`, locate `prior_insights[<insight_id>]` in `astra.yaml` (sub-analysis ownership is implicit in the id; the placeholder already lives there) and set its `evidence:` field to the resolved selectors.
-- For each entry in `unresolved:`, append a line to `open-questions.md` describing the unresolved placeholder and the reason — the user resolves at REVIEW close-out by either supplying a different citation, weakening the placeholder's `claim:`, or removing the placeholder entirely.
-- Re-run `astra validate astra.yaml` after each per-paper merge to catch any structural breakage early.
-
-A single writer (the merge step) avoids YAML round-trip conflicts that parallel writes would produce.
+### Stage 1 — Mechanical fetch (batched, no agent fan-out)
+
+Collect every `prior_insights:` entry whose `evidence:` is missing or empty. Group by DOI. Each unique DOI becomes one fetch.
+
+Run paper-extraction's substrate script for each unique DOI **in batches of 5** via shell parallelism. paper-extraction's `extract-paper-substrate.py` is deterministic — no agent involvement needed. Each invocation writes to `work/cited/<doi-slug>/work/reference/`:
+
+```bash
+# Pseudocode for the batched fetch loop the literature sub-agent runs.
+# For each unique DOI in the placeholder set:
+mkdir -p work/cited/<doi-slug>
+cd work/cited/<doi-slug>
+python3 /path/to/paper-extraction/scripts/extract-paper-substrate.py \
+    --arxiv-id <id-or-doi>
+# Run up to 5 in parallel with `&` and `wait`; throttle to bound disk + network.
+```
+
+Skip Step 5 (findings) — LITERATURE only needs substrate, not the cited paper's claimed findings. Skip the agent's Step 4 (fix structural gaps) too — cited papers don't need warning-resolution to be quote-grep-able. Cited-paper bibliographies don't need DOI resolution either (we don't care about their citations' DOIs); if paper-extraction supports suppressing that, use it; if not, the cache amortizes across cited papers and it's tolerable.
+
+Wall time: tens of seconds for 20 cited papers; bottlenecked by the slowest single fetch in each batch.
+
+After each fetch lands, **register the PDF with the validator's cache** so `astra validate --verify-evidence` can find it later:
+
+```bash
+astra paper add "<DOI>" --pdf work/cited/<doi-slug>/work/reference/paper.pdf
+```
+
+For arXiv DOIs (`10.48550/arXiv.<id>`) the `--pdf` argument is optional (astra paper add can fetch directly), but pointing at the already-fetched PDF avoids a redundant network hit. For journal DOIs that 403 on Unpaywall, `--pdf` is required.
+
+Resume: if `work/cited/<doi-slug>/work/reference/index.json` already exists, skip that DOI's fetch. If `astra paper get <DOI>` returns a cached entry, skip the registration too.
+
+### Stage 2 — Quote-finding (literature does it, or Haiku fan-out)
+
+Once all substrate is in place, count placeholders:
+
+- **≤10 placeholders:** the literature sub-agent does the quote-finding itself. It walks the placeholders one at a time, greps into the relevant cited paper's substrate for terms from the claim, identifies the verbatim quote, and writes `{exact, prefix, suffix, page}` to `work/notes/literature/resolutions.yaml`. Single agent, low context overhead per placeholder (grep + targeted read, not whole-paper-absorption).
+
+- **>10 placeholders:** the literature sub-agent partitions placeholders across **a small number of Haiku sub-agents** (rough rule: aim for 5–8 placeholders per Haiku, so 11–15 placeholders → 2 Haikus, 30 placeholders → 4 Haikus). Each Haiku gets its subset of placeholders + the substrate paths for the cited papers those placeholders reference. Haikus are cheap and fast and the work is well-bounded (grep + format YAML), so this is the right model. Each Haiku writes to `work/notes/literature/haiku-<N>.yaml`; literature reads them all, merges into `resolutions.yaml`, then writes back to `astra.yaml`.
+
+The exact Haiku threshold and partition size are heuristic — they trade off context-budget per Haiku vs. orchestration overhead. The literature sub-agent has discretion; the rule of thumb is "few enough to track easily, each one small enough to finish in a single fast turn."
+
+### Stage 3 — Merge into astra.yaml
+
+The literature sub-agent reads `work/notes/literature/resolutions.yaml` and writes the resolutions back into `astra.yaml`:
+
+- For each resolved placeholder, locate `prior_insights[<id>]` in `astra.yaml` (the placeholder already lives in its sub-analysis; the merge just sets its `evidence:` field).
+- For each unresolved placeholder, append a line to `open-questions.md` describing it — the user resolves at REVIEW close-out by either supplying a different citation, weakening the claim, or removing the placeholder entirely.
+- Run `astra validate astra.yaml --verify-evidence` after the merge to catch structural breakage early.
+
+Single writer (the literature sub-agent), no merge conflicts even when Haikus produced the inputs in parallel.
+
+## Quote-finding contract (used by both the literature sub-agent and Haiku sub-agents)
+
+The agent doing the quote-finding (literature itself, or each Haiku) follows the same contract. The Haiku prompt is just this contract with concrete placeholders + paths spliced in.
+
+```
+You are an ASTRA evidence-resolution agent. Your task is to find the
+verbatim quotes in cited papers that justify a set of prior_insights:
+placeholders authored by SPECIFY.
+
+Inputs:
+  - A list of placeholders. Each carries:
+      id:             the placeholder's unique id within astra.yaml
+      claim:          what the cited paper supports about a decision
+                      in the target paper (target paper's framing)
+      doi:            DOI of the cited paper
+      decision_links: which decision option(s) this placeholder backs
+  - Substrate path per cited paper at work/cited/<doi-slug>/work/reference/:
+      paper.pdf, source/*.tex (Path A) or document.md (Path B),
+      index.json (structural index for that cited paper).
+  - Target paper at work/reference/source/ or work/reference/document.md
+    (for context on how the cited paper is invoked, if you need it).
+
+For each placeholder:
+
+  1. Grep into the cited paper's substrate for terms from the claim.
+     Path A: grep across work/cited/<doi-slug>/work/reference/source/*.tex.
+     Path B: grep work/cited/<doi-slug>/work/reference/document.md.
+
+  2. Read targeted spans (offset/limit) around the matches. Find a
+     verbatim passage that supports the claim. Focus on:
+       - Empirical comparisons between approaches the claim's
+         decision_links reference.
+       - Performance benchmarks or validation results relevant to the
+         choices.
+       - Recommendations or caveats about specific methods/parameters.
+
+  3. Build a TextQuoteSelector (exact + prefix + suffix) and
+     FragmentSelector (page).
+       - exact: copied VERBATIM from the source. Don't paraphrase or
+         normalize whitespace. Don't quote math-heavy passages (the PDF
+         text extractor collapses them); quote the surrounding English
+         narrative instead.
+       - prefix / suffix: 20–100 chars of REAL surrounding text, NOT
+         editorial parentheticals. The validator concatenates them with
+         the quote and matches against the PDF page at score ≥ 80.
+       - page: page number from the rendered PDF where the quote
+         appears.
+
+  4. If no quote in the cited paper supports the claim, record the
+     placeholder under unresolved: with a brief reason. The citation
+     was loose, or the paper was paraphrased beyond what the source
+     says, or the wrong paper was cited. Don't fabricate evidence.
+
+Output (YAML, written to the path you were assigned):
+
+resolutions:
+  <insight_id>:
+    id: <insight_id>
+    evidence:
+      - id: ev1
+        doi: "<DOI>"
+        quote:
+          type: TextQuoteSelector
+          exact: "<verbatim quote>"
+          prefix: "<~20-100 chars REAL surrounding text BEFORE>"
+          suffix: "<~20-100 chars REAL surrounding text AFTER>"
+        location:
+          type: FragmentSelector
+          page: <int>
+
+unresolved:
+  <insight_id>:
+    reason: "<one-line>"
+
+Rules:
+  - Keys under resolutions: / unresolved: are placeholder ids from
+    astra.yaml; preserve them exactly. Merge uses these as the join key.
+  - One placeholder lands in either resolutions: or unresolved:, never both.
+  - Quotes are EXACT — verbatim, no paraphrasing, no whitespace normalization.
+  - prefix: and suffix: are REQUIRED.
+  - Avoid YAML | block-literal style for these strings; single-line or > folded.
+  - Do NOT edit astra.yaml. The merge step does that.
+```
+
+When the literature sub-agent fans out to Haikus, each Haiku is spawned with `model="haiku"` and gets this contract plus its assigned subset of placeholders and substrate paths.
 
 ## Self-review (rigor chosen per spawn)
 
@@ -138,79 +170,56 @@ After the merge lands, a fresh-context Task-tool sub-agent cross-checks each res
 
 The depth of self-review follows the rigor level the orchestrator picked for this spawn (read CLAUDE.md's **Rigor** section):
 
-- **Cheap:** skip review entirely, or run a single fresh-context Task-tool sub-agent pass and incorporate its fixes once.
-- **Heavy:** N rounds — each round spawns a fresh Task-tool reviewer against the resolved `prior_insights:` + the cited papers + the target paper; the literature sub-agent incorporates fixes (re-spawn the per-paper sub-sub-agent for entries that need a different quote, or adjust unresolved entries); the next round spawns another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap.
-
-The discipline matches ARCHITECT's and SPECIFY's self-review shape: each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; the literature sub-agent edits `astra.yaml` between rounds for trivial mechanical fixes, or re-spawns the relevant per-paper sub-sub-agent for substantive changes.
-
-### Per-round fresh sub-agent — system prompt
-
-> You are a LITERATURE reviewer. Read `astra.yaml`'s `prior_insights:` entries, the cited papers (cached via `astra paper add`), and the target paper, and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
->
-> ### Inputs
->
-> - `astra.yaml` — focus on every `analyses.<sub-analysis-id>.prior_insights:` entry. Each should have a resolved `evidence:` block.
-> - The cited papers (cached PDFs).
-> - `work/reference/index.json#citations` — cite-key → `{locations, citation, doi}` mapping from paper-extraction.
-> - `open-questions.md` — to see which placeholders the resolution sub-sub-agents flagged unresolved.
-> - `work/reference/source/` (or `document.md`) — the target paper, for context on how the cited paper is invoked.
->
-> ### What to check
->
-> 1. **Evidence integrity.** `astra validate astra.yaml --verify-evidence` returns clean. (Do not run it yourself — your job is the semantic check beyond what `--verify-evidence` does.)
-> 2. **Evidence justifies claim.** For each `prior_insights:` entry, does the quote actually support the `claim:`? Or is it tangential / weaker than the claim asserts?
-> 3. **Claim supports the decision.** For each placeholder's `decision_links:`, does the placeholder's claim actually justify the linked decision option(s)? Or is the link a leap?
-> 4. **Cited paper is the right paper.** Does the target paper actually invoke this DOI for this claim? (Sometimes a citation marker is misread; the wrong paper gets cached.)
-> 5. **Unresolved entries are honest.** For entries in `open-questions.md` flagged unresolved, does a closer read of the cited paper actually find supporting evidence? (If yes, the resolution sub-sub-agent missed it; flag for re-resolution.)
->
-> ### Output
->
-> Write your findings to `work/notes/literature-review/round-<N>.md`:
->
-> ```markdown
-> # LITERATURE review — round <N>
->
-> ## verdict: clean | <count> fixes
->
-> ## findings (one per fix needed)
->
-> ### F-1 — <one-line summary>
->
-> - placeholder: `prior_insights.<id>` (sub-analysis: `<sub-analysis-id>`)
-> - issue: <evidence integrity | evidence-claim mismatch | claim-decision mismatch | wrong paper | unresolved-but-resolvable>
-> - paper: `<DOI>` (page <N>)
-> - what's wrong: <2–3 sentences>
-> - suggested fix: <re-resolve with a different quote | adjust the claim | re-link decision | flag for human review>
-> ```
->
-> ### Rules
->
-> - **Output findings only — do not edit `astra.yaml`.** A separate fix pass responds to your findings. Editing here defeats the multi-round-fresh-context discipline.
-> - **Verdict is `clean` or a count.** "clean" means no fixes; otherwise enumerate.
-> - **One fix per `F-N`.** Do not bundle.
-> - **Cite specifically.** Always reference the placeholder by id, the cited paper by DOI + page, and the target paper's invocation site by section / page.
-
-### LITERATURE-fix pass between rounds
-
-After each round's findings file lands, the literature sub-agent responds to the findings — re-resolving placeholders with different quotes, adjusting claims, re-linking decisions, or surfacing unresolvable entries to `open-questions.md`. After any change to `astra.yaml`, re-run `astra validate astra.yaml --verify-evidence` to confirm the structural and quote-fidelity checks still pass.
-
-If N hits the 5-round system cap without two consecutive clean rounds, the literature sub-agent stops and reports back to the orchestrator. If the user is reachable, ask in prose: "LITERATURE review reached round cap with N fixes still landing; continue, accept the current resolutions, or revise scope?" If the user is unreachable, accept current state, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed or re-spawn.
+- **Cheap:** skip review entirely, or run a single fresh-context reviewer pass and incorporate its fixes once.
+- **Heavy:** N rounds — each round spawns a fresh reviewer; literature incorporates fixes between rounds; the next round spawns another fresh reviewer that does not see the prior round's fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap.
+
+Each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; the literature sub-agent edits `astra.yaml` between rounds (or re-spawns Haiku quote-finding for entries that need a different quote).
+
+### Per-round fresh reviewer — prompt shape
+
+```
+You are a LITERATURE reviewer. Read astra.yaml's prior_insights:
+entries, the cited papers (substrate at work/cited/<doi-slug>/), and
+the target paper. Report inconsistencies. You are one of several
+independent reviewers; assume nothing has been fixed.
+
+Check:
+  1. Evidence integrity. (astra validate --verify-evidence handles the
+     deterministic check; you do the semantic check.)
+  2. Evidence justifies claim. Does the quote actually support the
+     claim, or is it tangential?
+  3. Claim supports the decision. Does the placeholder's claim justify
+     the linked decision option?
+  4. Cited paper is the right paper. Does the target paper actually
+     invoke this DOI for this claim?
+  5. Unresolved entries are honest. For entries in open-questions.md
+     flagged unresolved, does a closer read of the cited paper find
+     supporting evidence the resolver missed?
+
+Output findings to work/notes/literature-review/round-<N>.md, one fix
+per F-N entry. Verdict is `clean` or a count. Do NOT edit astra.yaml.
+```
+
+If N hits the 5-round system cap without two consecutive clean rounds, the literature sub-agent stops and reports back to the orchestrator. If the user is reachable, ask in prose: "LITERATURE review reached round cap with N fixes still landing; continue, accept the current resolutions, or revise scope?" If unreachable, accept current state, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed or re-spawn.
 
 ## Survey signals (entry into LITERATURE)
 
 - `astra.yaml` has `prior_insights:` placeholders — entries with `claim:` + `doi:` but no `evidence:` ⇒ ready to resolve
-- `work/notes/literature/<doi-slug>.yaml` files exist (one per cited DOI) ⇒ per-paper resolution done
+- `work/cited/<doi-slug>/work/reference/index.json` exists for each unique cited DOI ⇒ fetches done
+- `work/notes/literature/resolutions.yaml` exists with non-empty resolutions / unresolved sections ⇒ quote-finding done
 - `astra.yaml`'s `prior_insights:` entries each have a resolved `evidence:` selector ⇒ merge done
 - `astra validate astra.yaml --verify-evidence` returns clean ⇒ structural validation done
-- For cheap: at least a `work/notes/literature-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ LITERATURE review done
+- For cheap: at least one `work/notes/literature-review/round-<N>.md` with verdict `clean` (or no fixes were incorporated) ⇒ LITERATURE review done
 - For heavy: two consecutive `round-<N>.md` files with verdict `clean` ⇒ LITERATURE review done
 
 When all of the above hold ⇒ LITERATURE complete; orchestrator proceeds to IMPLEMENT.
 
 ## Notes
 
-- **Run per-paper resolutions in parallel.** One Task-tool sub-sub-agent per cited DOI; they edit disjoint subsets of `prior_insights:` so write conflicts don't arise — but the merge step still serializes the writes back to `astra.yaml` to keep YAML round-trip safe.
-- **Resume is automatic.** If `work/notes/literature/<doi-slug>.yaml` already exists, skip the per-paper resolution for that DOI. The merge re-runs whenever new per-paper files appear.
-- **Unresolved is not failure.** A placeholder that no quote in the cited paper supports is a real signal — the target paper cited loosely, or paraphrased beyond what the source actually says. Surface to `open-questions.md`; don't fabricate evidence to make it green.
-- **`astra validate --verify-evidence` runs after the merge, not after each per-paper sub-sub-agent.** Sub-sub-agents write to per-paper YAMLs; the deterministic check happens once `astra.yaml` is updated.
-- **Commit each per-paper resolution as it lands.** Plus the merge as one commit, plus each review-round file as it lands. The orchestrator reads `git log` to see how far the literature sub-agent got.
+- **Mechanical fetch is the substrate; quote-finding is the agentic work.** Don't conflate them. paper-extraction's deterministic script handles the fetch — batched-parallel via shell, no agent fan-out. Quote-finding is the semantic match between target-paper-claim and cited-paper-quote; that's the agent's job.
+- **paper-extraction is the canonical fetch mechanism.** Using `astra paper add` would give only the cached PDF; paper-extraction gives substrate (LaTeX source where available, structural index, figures, citations) which is much better material for verbatim quote-finding. The cost is small and parallelizable.
+- **Haiku is the right model for fan-out quote-finding.** Cheap, fast, well-suited to bounded grep-and-format work. Use Sonnet/Opus only when the placeholder count is small enough that the literature sub-agent does it itself anyway.
+- **Resume is automatic.** If `work/cited/<doi-slug>/work/reference/index.json` exists, skip that DOI's fetch. If `work/notes/literature/resolutions.yaml` has an entry for a placeholder, skip that placeholder's quote-finding.
+- **Unresolved is not failure.** A placeholder that no quote in the cited paper supports is a real signal — the target paper cited loosely or paraphrased beyond what the source actually says. Surface to `open-questions.md`; don't fabricate evidence.
+- **`astra validate --verify-evidence` runs after the merge**, not after each Haiku's per-placeholder output. Haikus write to disjoint files; the deterministic check happens once `astra.yaml` is updated.
+- **Commit per stage.** Fetches commit together once Stage 1 completes (one commit for all cited-paper substrates). Quote-finding commits together once Stage 2 completes (`resolutions.yaml` + Haiku files). The merge into `astra.yaml` is its own commit. Each review round file commits as it lands. The orchestrator reads `git log` to see progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index 34302997..c1c92fb1 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -13,7 +13,8 @@ The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding
 - `targets/targets.md` — what was being matched against; reference figures / tables in `targets/`
 - `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
 - `open-questions.md` at the workdir root — running report from sub-agent phases (paper-vs-code conflicts, ambiguities, anything sub-agents flagged for user resolution)
-- `work/notes/architect/paper-index.md` and `work/notes/architect/code-index.md` — for context
+- `work/reference/index.json` and `work/reference/code-index.md` — for context
+- **paper-expert** and **code-expert** — still reachable via `SendMessage` if the user asks a follow-up question during REVIEW that the report and CLAUDE.md don't answer. The experts persist for the lifetime of the reproduction; they're useful here for "remind me what the paper says about X" or "did the original code do Y" without leaving the orchestrator session.
 - `CLAUDE.md` at the workdir root — paper identity, Goal, Rigor, Paper-vs-code disagreements (the at-a-glance summary that's accumulated across all sub-agent spawns)
 
 ## Outputs
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 14e2a470..ba7ddf1d 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -8,12 +8,15 @@ The new structure runs **two passes per sub-analysis** (paper, then code, when c
 
 Per-sub-analysis work is parallelizable when sub-analyses are independent. Each sub-analysis's two passes (paper, then code) run sequentially within that sub-analysis; across sub-analyses the work fans out via Task-tool sub-sub-agents from inside the specify session.
 
+When the specify sub-agent (or its per-sub-analysis sub-sub-agents) needs paper- or code-side context, prefer **querying paper-expert / code-expert via `SendMessage`** over re-reading materials directly. The experts already have deep context built up from ACQUIRE; SendMessage queries are cheaper and richer than fresh Explore passes. Falling back to direct reads (Grep on `work/reference/source/` / `document.md` / `code/`) is still fine for specific verbatim quote-hunting, but the experts should be the first stop for understanding.
+
 ## Inputs
 
 - `astra.yaml` — the stub from ARCHITECT (sub-analyses, inputs, outputs, narrative; empty `decisions:` / `prior_insights:` / `findings:` blocks)
-- `work/notes/architect/paper-index.md` — paper-side decision clusters, result loci, citations
-- `work/notes/architect/code-index.md` (when code present) — module map, natural decomposition, entry-points, gotchas
-- `work/reference/index.json` — paper-extraction's structural index; its `citations:` block maps each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. SPECIFY uses this to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch.
+- `work/reference/index.json` — paper-extraction's structural index: figures, tables, section outline, citations. The `citations:` block maps each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. SPECIFY uses this to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch.
+- `work/reference/code-index.md` (when code present) — code inventory: module map, candidate decisions with file:line, entry-points, data dependencies, gotchas.
+- **paper-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Ask the deeper paper-side questions the structural index doesn't answer: "what decisions does the paper describe for the apodization choice", "where does the paper define the fiducial cosmology", "what does §4.2 conclude about the null tests". paper-expert has the paper's full context built up.
+- **code-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Ask: "which module implements the BAO fit", "what's the default magnitude cut hardcoded in this script", "how does the code split data into bins". code-expert has the code's full context built up.
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
 - `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (Path B only)
 - `work/reference/code/` (if present) — original code, canonical reference for numerics + method
@@ -50,7 +53,7 @@ Read the paper's section(s) covering this sub-analysis. Author:
    - Sibling alternatives mentioned in the paper, each as a separate option.
    - `evidence:` for the chosen option using `TextQuoteSelector` against the paper text — verbatim quote + `prefix` / `suffix` from real surrounding text + page or section anchor.
 
-   Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. A typical sub-analysis has 2–6 decisions; if a sub-analysis has fewer than 2, revisit `work/notes/architect/paper-index.md` and reconsider.
+   Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. A typical sub-analysis has 2–6 decisions; if a sub-analysis has fewer than 2, revisit `work/reference/index.json` and reconsider.
 
 2. **`prior_insights:`** — for every `\cite{<key>}` (Path A) or rendered citation invocation (Path B) the paper invokes that bears on a decision in this sub-analysis, record a **placeholder**: an `id:`, a `claim:` describing what the cited paper supports about the decision (the target paper's framing of why it cites that paper here), a `doi:` looked up from `work/reference/index.json#citations[<cite-key>].doi`, and `decision_links:` mapping the placeholder to the relevant decision option(s). **Do not author the `evidence:` selector** — that's LITERATURE's job. Leave `evidence:` absent or empty; LITERATURE fetches the cited paper, finds the supporting quote, and authors the resolved selector back into this placeholder. The placeholder shape:
 
@@ -75,7 +78,7 @@ Read the paper's section(s) covering this sub-analysis. Author:
 
 ### Pass B — code pass (when `work/reference/code/` exists)
 
-Read the code that implements this sub-analysis (`work/notes/architect/code-index.md`'s natural-decomposition rows point at the relevant modules / scripts). Augment / amend:
+Read the code that implements this sub-analysis (`work/reference/code-index.md`'s natural-decomposition rows point at the relevant modules / scripts). Augment / amend:
 
 1. **Code-as-canonical material disagreements.** For each decision authored in the paper pass, locate its implementation in the code. Where paper and code disagree:
    - **Material** = a different choice would plausibly change a numeric result the paper reports.
@@ -111,8 +114,8 @@ Self-review depth follows the rigor level the orchestrator picked for this spawn
 > - `astra.yaml` — focus on `analyses.<sub-analysis-id>` (`decisions:`, `prior_insights:`, `findings:`, `narrative:`, `inputs:`, `outputs:`)
 > - `universes/baseline.yaml`
 > - `implementation-notes.md`
-> - `work/notes/architect/paper-index.md` — the decision clusters and result loci that scoped the work
-> - `work/notes/architect/code-index.md` (when code present)
+> - `work/reference/index.json` — the decision clusters and result loci that scoped the work
+> - `work/reference/code-index.md` (when code present)
 > - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
 > - `work/reference/code/` (when present) — canonical reference for numerics + method
 > - `work/reference/index.json#citations` — cite-key → `{locations, citation, doi}` mapping from paper-extraction (use to confirm each `prior_insights:` placeholder's `doi:` matches what the paper cites)
@@ -132,7 +135,7 @@ Self-review depth follows the rigor level the orchestrator picked for this spawn
 >
 > - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; a SPECIFY-fix pass responds to the findings. Editing here defeats the multi-round-fresh-context discipline.
 > - **Do not flag missing `recipes:`.** Recipes are IMPLEMENT's, not SPECIFY's.
-> - **Do not re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/notes/architect/paper-index.md`.
+> - **Do not re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/reference/index.json`.
 > - **Do not invent problems.** If the sub-analysis is consistent with paper + code, say so briefly.
 > - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
 >
@@ -200,7 +203,7 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - **Do NOT add executable implementation code or invented run commands.** Do add concise provenance / recipe descriptions where ASTRA fields support them, especially for paper-derived calculations, figure generation, imported constants, and values that IMPLEMENT will need to regenerate.
 - **Equation and section numbers must match the rendered paper / PDF**, not a naïve count of TeX blocks or markdown headings. When citing "eq. N" or "§N", find the equation or heading by content in the rendered paper and use the printed number.
 - **Validate** with `astra validate astra.yaml` after each pass.
-- **Work primarily from `work/notes/architect/`** — the index files distilled the relevant scope per sub-analysis. Use `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) only to look up specific details (Grep for terms, or read targeted sections with offset/limit). Do not re-read the whole paper.
+- **Work primarily through paper-expert and code-expert** via `SendMessage` — they have the deep context built up. Use `work/reference/index.json` and `work/reference/code-index.md` for structural lookups, and `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) only to verify specific verbatim quotes (Grep for terms, or read targeted sections with offset/limit). Do not re-read the whole paper.
 - **The narrative skill is the prose author, not the structure author.** SPECIFY weaves anchors into the prose ARCHITECT wrote — the structural surface is fixed, the anchored references are SPECIFY's contribution.
 
 ## Survey signals (entry into SPECIFY)
diff --git a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
index 7f92ca51..5f2a86e6 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
@@ -6,7 +6,8 @@ Reproduction of <paper title> (<arXiv ID>). DOI: <doi>.
 
 - Authors: <list>
 - One-line subject: <e.g. "BAO scale measurement from DESI DR1">
-- Code repo: <url> (cloned to `work/reference/code/` during ACQUIRE)
+- Code repo: <url> (cloned to `work/reference/code/` during ACQUIRE; scan inventory at `work/reference/code-index.md`)
+- Paper materials: `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` (from `/paper-extraction` during ACQUIRE)
 
 ## Goal
 
@@ -36,7 +37,8 @@ Material disagreements between paper and code, logged here as sub-agents find th
 
 ## Rules
 
-- **Code-as-canonical when `work/reference/code/` exists.** Every implementing sub-agent reads relevant code on entry. Where paper and code disagree, code is canonical for numerics, plotting, and method.
+- **Code-as-canonical when `work/reference/code/` exists.** Every implementing sub-agent reads relevant code on entry. Where paper and code disagree, code is canonical for numerics, plotting, and method. When `work/reference/code/` is absent, paper is the only anchor — implement fresh from the spec, expect slower convergence, and surface gaps honestly to the user rather than dressing them up.
+- **Persistent experts during a session.** ACQUIRE spawns `paper-expert` (knows the paper) and `code-expert` (knows the cloned code, when present) as named sub-agents that stay alive for the reproduction. Downstream sub-agents receive their agent IDs at spawn and consult them via `SendMessage` instead of re-ingesting paper / code materials from scratch. The expert IDs are session-scoped — they don't persist across orchestrator sessions, so if the orchestrator session restarts, ACQUIRE re-spawns them against the existing on-disk substrate.
 - **Never block on `AskUserQuestion` mid-sub-agent.** Sub-agents don't have `AskUserQuestion`. Ask in prose if the user is reachable; otherwise append the question to `open-questions.md` and continue with the best-judgment default. The user resolves accumulated questions in REVIEW.
 - **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
 - **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
@@ -45,4 +47,7 @@ Material disagreements between paper and code, logged here as sub-agents find th
 ## Pointers
 
 - `open-questions.md` — accumulated questions from autonomous-mode runs, resolved in REVIEW.
+- `work/reference/index.json` — paper structural index (figures, tables, outline, citations with DOIs); the starting surface for any "where in the paper does X happen" lookup. Or just ask `paper-expert` via `SendMessage`.
+- `work/reference/code-index.md` — code inventory (when code present): module map, candidate decisions with file:line, entry-points, gotchas. Or just ask `code-expert` via `SendMessage`.
+- `work/cited/<doi-slug>/` — per-cited-paper substrate produced by LITERATURE for `prior_insights:` resolution.
 - <any paper-specific conventions or warnings the user surfaced during the interview>

From 5499bf9758f1b154231d021afeb076f7df7935f0 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 03:04:48 +0200
Subject: [PATCH 046/124] docs: catch skills docs up to the current bundle
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add dedicated pages for lc-from-paper plus the four sibling skills in
the paper-reproduction bundle (paper-extraction, narrative,
figure-comparison, check-sentence-by-sentence), wire them into the
zensical nav, and expand the skills index to show the bundle structure.

Also clean up stale bits:

- claude-workflow.md → agent-workflow.md (file was renamed)
- lc-feedback.md "Notes for the maintainer" pointed at a Dagster
  reference that has since been removed from the SKILL.md
- authoring.md said `lc eval` wasn't wired into the top-level CLI; it
  is, gated on the optional eval extra

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/skills/authoring.md                  |   6 +-
 docs/skills/check-sentence-by-sentence.md | 116 +++++++++++++++++++
 docs/skills/figure-comparison.md          |  87 ++++++++++++++
 docs/skills/index.md                      |  34 +++++-
 docs/skills/lc-feedback.md                |   7 --
 docs/skills/lc-from-paper.md              | 118 +++++++++++++++++++
 docs/skills/narrative.md                  | 108 ++++++++++++++++++
 docs/skills/paper-extraction.md           | 132 ++++++++++++++++++++++
 docs/user/getting-started.md              |   4 +-
 docs/user/index.md                        |   2 +-
 docs/user/install.md                      |   2 +-
 zensical.toml                             |   5 +
 12 files changed, 601 insertions(+), 20 deletions(-)
 create mode 100644 docs/skills/check-sentence-by-sentence.md
 create mode 100644 docs/skills/figure-comparison.md
 create mode 100644 docs/skills/lc-from-paper.md
 create mode 100644 docs/skills/narrative.md
 create mode 100644 docs/skills/paper-extraction.md

diff --git a/docs/skills/authoring.md b/docs/skills/authoring.md
index 00c6e6a2..d8b929fe 100644
--- a/docs/skills/authoring.md
+++ b/docs/skills/authoring.md
@@ -79,9 +79,9 @@ Spawn agents in parallel by issuing them in a single tool-use block.
 
 The `evals/` tree has fixtures (currently `evals/tasks/snae/`) and the
 runner lives at `lightcone.eval.harness`. Eval CLI commands are defined
-in `lightcone.eval.cli` (`lc eval run|report|compare`), but **note that
-this group is currently not wired into the top-level `lc` CLI** — see
-the [maintainer summary](../index.md) for status. To run evals
+in `lightcone.eval.cli` and registered as `lc eval run|report|compare`
+when the optional `eval` extra is installed (the registration is
+gated on `ImportError` in `lightcone.cli.commands`). To run evals
 programmatically:
 
 ```python
diff --git a/docs/skills/check-sentence-by-sentence.md b/docs/skills/check-sentence-by-sentence.md
new file mode 100644
index 00000000..9117faaf
--- /dev/null
+++ b/docs/skills/check-sentence-by-sentence.md
@@ -0,0 +1,116 @@
+# /check-sentence-by-sentence
+
+Sentence-by-sentence audit of a paper against an ASTRA project's code.
+For every claim about implementation or results in the methodology,
+results, discussion, and appendices, locate the corresponding code
+(`file:line`) or mark `NOT FOUND`. The agent does **not** run any
+code — this is a static reading audit.
+
+Source: [`claude/lightcone/skills/check-sentence-by-sentence/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md).
+
+Argument hint: `[path to paper source, e.g. work/reference/source/main.tex or work/reference/document.md]`.
+
+## Allowed tools
+
+```
+Read, Glob, Grep,
+Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*),
+AskUserQuestion, Agent
+```
+
+Read-only over both the paper source and the project code. No
+execution.
+
+## Setup
+
+1. **Confirm project root** — `astra.yaml` in cwd, or ask the user to
+   `cd` to the ASTRA project.
+2. **Confirm paper source.** Resolve in order:
+   - A `.tex` argument → `tex` mode.
+   - A directory argument → look for `<dir>/source/` (TeX), then
+     `<dir>/document.md` (markdown).
+   - No argument → prefer the lc-from-paper layout:
+     `work/reference/source/<main>.tex` (Path A) or
+     `work/reference/document.md` (Path B, Docling/Pandoc fallback).
+   - Legacy `.tex` locations in cwd as a last resort.
+
+Don't audit PDFs directly — if only `work/reference/paper.pdf` exists,
+ask the user to run paper extraction first.
+
+## Section enumeration
+
+The main agent walks the source carefully to enumerate sections.
+
+- **`tex` mode** — build an ordered audit source list by following
+  local `\input{...}` / `\include{...}` from the main TeX file (one
+  level deep). For each file, `grep -n` for `^\\section`,
+  `^\\subsection`, and `^\\appendix`. Many arXiv papers keep prose
+  outside the main wrapper, so the included files carry most audit
+  units.
+- **`markdown` mode** — `grep -n` for `^#`, `^##`, etc. in
+  `document.md`. Heading depth maps to TeX section/subsection.
+
+Audit-relevant sections: methodology, results, discussion,
+appendices. Skip abstract, introduction, acknowledgements,
+references, author lists.
+
+Each leaf (sub)section becomes one parallel sub-agent dispatch — a
+section with subsections spawns one sub-agent per subsection plus
+optionally one for any pre-subsection prose span. Spawn them in a
+single message so they run in parallel.
+
+## Per-sub-agent output
+
+Each sub-agent reads its assigned line range, splits into sentences,
+keeps the claim-bearing ones, and returns:
+
+```
+[
+  {"quote": "...", "location": "scripts/foo.py:142", "note": "..."},
+  {"quote": "...", "location": "NOT FOUND", "note": "..."},
+  ...
+]
+```
+
+`note` is optional, under 10 words, used for nuance like "approximate
+match", "different constant", "value computed at runtime".
+
+## Aggregation: two filtering passes
+
+Sub-agents are deliberately generous about what they keep. The main
+agent then:
+
+1. **Drops non-computational sentences** — framing / motivation
+   ("the first step is..."), pure prose that doesn't correspond to
+   anything you'd expect in code.
+2. **Merges duplicates** — when the same claim is asserted in multiple
+   places, collapse to a single entry pointing at the canonical
+   location.
+
+The final report is paper-order: methodology → results → discussion →
+appendices, with each entry's `quote`, `location`, and `note`.
+
+## Hard rules
+
+- **No execution.** Numerical results can be located at the line that
+  computes them, but agreement isn't verifiable here. Use a note like
+  "value computed at runtime".
+- **Quote verbatim**, trimmed to one sentence. Long sentences may keep
+  just the claim-bearing clause.
+- **`file:line` is specific** — the function call, parameter
+  assignment, or computed value, not just a file.
+- **Read only the assigned line range** in each sub-agent.
+
+## When to invoke
+
+- From `/lc-from-paper`'s REVIEW close-out (opt-in).
+- Standalone, any time, to spot-check fidelity claim by claim.
+
+## Related
+
+- [`/lc-from-paper`](lc-from-paper.md) — invokes
+  `/check-sentence-by-sentence` during REVIEW (opt-in).
+- [`/figure-comparison`](figure-comparison.md) — the other REVIEW
+  close-out, artifact-vs-artifact rather than paper-vs-code.
+- [`/paper-extraction`](paper-extraction.md) — produces the paper
+  substrate this skill reads.
diff --git a/docs/skills/figure-comparison.md b/docs/skills/figure-comparison.md
new file mode 100644
index 00000000..a54254af
--- /dev/null
+++ b/docs/skills/figure-comparison.md
@@ -0,0 +1,87 @@
+# /figure-comparison
+
+Build a self-contained HTML report (`.lightcone/comparison.html`) that
+places paper reference artifacts on the left and reproduced artifacts
+on the right, with red flags wherever a counterpart is missing. Images
+are base64-embedded so the HTML is portable. Run from a project folder
+containing `astra.yaml`.
+
+Source: [`claude/lightcone/skills/figure-comparison/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/figure-comparison/SKILL.md).
+
+Argument hint: `[path to paper reference dir, e.g. work/reference/]`.
+
+## Allowed tools
+
+```
+Read, Write, Glob, Grep,
+Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), Bash(file:*),
+Bash(python3:*), Bash(python:*), Bash(base64:*),
+AskUserQuestion, Agent
+```
+
+Read-only over the build artifacts. The skill never invokes the
+pipeline itself — if `results/<universe>/` is empty, it tells the user
+to run `lc run` first and stops.
+
+## Setup
+
+1. **Confirm project root.** Reads `astra.yaml` in the cwd. If missing,
+   asks the user to `cd` to the ASTRA project.
+2. **Confirm results exist.** Default universe is `baseline`, unless
+   `comparison-report.yaml` names another universe or the user
+   supplied one. Checks `ls results/<universe>/`.
+3. **Locate the paper reference substrate.** In order: a path passed as
+   an argument, then `work/reference/` from lc-from-paper's layout
+   (`source/` for arXiv TeX, `document.md` for the Docling fallback,
+   plus extracted `figures/` and `tables/`). Legacy locations are
+   tried only after lc-from-paper paths fail.
+
+## Scope resolution
+
+The skill picks its target set in priority order:
+
+1. **`comparison-report.yaml`** — the highest-priority scope when
+   lc-from-paper has run COMPARE. Records exactly what to compare,
+   including `type`, `priority`, paper/reproduced values, file paths,
+   and match status.
+2. **`targets/targets.md`** — the SPECIFY-phase scope ledger, used
+   when COMPARE hasn't run yet.
+3. **Default paper-driven flow** — when neither scope file exists,
+   builds a best-effort report from `astra.yaml`'s narrative and
+   findings plus `work/reference/`.
+
+## Output
+
+A single `.lightcone/comparison.html` with paper artifacts on the left
+and reproduced artifacts on the right. Helper scripts and intermediate
+manifests also live under `.lightcone/` so they don't pollute the
+baseline results.
+
+The HTML embeds figure images as base64 — portable to email, shared
+drives, or Slack without breaking links.
+
+## When to invoke
+
+- From `/lc-from-paper`'s REVIEW close-out (mandatory).
+- Standalone, any time after `lc run` succeeds, to see how the
+  reproduction stacks up against the paper.
+
+## Hard rules
+
+- **Read-only over build artifacts.** Never run the pipeline; if
+  outputs are missing, stop and ask the user to build first.
+- **Don't compare directly against a whole PDF.** When only
+  `work/reference/paper.pdf` exists, ask the user to run paper
+  extraction first.
+- **Preserve scope ordering.** `comparison-report.yaml` wins over
+  `targets/targets.md` wins over the default flow.
+
+## Related
+
+- [`/lc-from-paper`](lc-from-paper.md) — invokes `/figure-comparison`
+  during REVIEW (mandatory).
+- [`/paper-extraction`](paper-extraction.md) — produces the
+  `work/reference/` substrate this skill reads.
+- [`/check-sentence-by-sentence`](check-sentence-by-sentence.md) —
+  the other REVIEW close-out, paper-vs-code rather than
+  artifact-vs-artifact.
diff --git a/docs/skills/index.md b/docs/skills/index.md
index f900428a..e3b25da3 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -11,15 +11,34 @@ guide is the friendly version. This page is for maintainers.
 ## Available skills
 
 The `/lc-from-*` family is parallel by what you start from: a question,
-code, or a paper.
+code, or a paper. `/lc-from-paper` is the entry point of a five-skill
+paper-reproduction bundle; the four bundle siblings stand alone and are
+user-invokable directly.
+
+### Project lifecycle
 
 | Skill | Command | Purpose |
 |-------|---------|---------|
 | [lc-new](lc-new.md) | `/lc-new` | Scope a research question into an `astra.yaml`, with optional literature extraction. |
 | [lc-from-code](lc-from-code.md) | `/lc-from-code` | Wrap an existing codebase in ASTRA: scan, generate spec, parameterize, run. |
-| lc-from-paper | `/lc-from-paper` | Reproduce a published paper in ASTRA — interview-first orchestrator, multi-session loop. (See the paper-reproduction bundle in [`claude/lightcone/skills/README.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md) for the full bundle map.) |
+| [lc-from-paper](lc-from-paper.md) | `/lc-from-paper` | Reproduce a published paper in ASTRA — interview-first orchestrator that spawns named per-phase sub-agents. |
 | [lc-feedback](lc-feedback.md) | `/lc-feedback` | File a GitHub issue against the right Lightcone repo with auto-collected context. |
 
+### Paper-reproduction bundle (sibling skills)
+
+Co-located with `lc-from-paper` so a single `lc init` brings the full
+toolkit. Each stands alone and is user-invokable; `lc-from-paper`
+dispatches them by role during the reproduction.
+
+| Skill | Command | Purpose |
+|-------|---------|---------|
+| [paper-extraction](paper-extraction.md) | `/paper-extraction` | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: substrate, figures, tables, citations (with resolved DOIs), and a stub `astra.yaml`. |
+| [narrative](narrative.md) | `/narrative` | Author the `narrative:` prose and decision `rationale:` against an existing `astra.yaml`, in paper-reproduction, retrofit, or co-drafting mode. |
+| [figure-comparison](figure-comparison.md) | `/figure-comparison` | Build a self-contained HTML side-by-side: paper figures, tables, and numerics vs reproduced artifacts. |
+| [check-sentence-by-sentence](check-sentence-by-sentence.md) | `/check-sentence-by-sentence` | Static audit of paper claims against code locations (`file:line` or `NOT FOUND`). |
+
+See the [bundle README](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md) for the rationale behind co-location vs plugin install.
+
 ## How a skill is wired
 
 Each skill is a `claude/lightcone/skills/<name>/SKILL.md` file with
@@ -46,12 +65,15 @@ files, anti-patterns. The skill bundles its own helper scripts under
 ```
 claude/lightcone/
 ├── skills/
-│   ├── lc-new/SKILL.md
+│   ├── lc-new/{SKILL.md, references/*.md}
 │   ├── lc-from-code/SKILL.md
-│   ├── lc-from-paper/{SKILL.md, references/*.md}
+│   ├── lc-from-paper/{SKILL.md, references/*.md, templates/CLAUDE.md}
 │   ├── lc-feedback/SKILL.md
-│   └── …                              # paper-reproduction bundle siblings
-├── agents/lc-extractor.md             # subagent definition
+│   ├── paper-extraction/{SKILL.md, scripts/*.py}
+│   ├── narrative/{SKILL.md, references/*.md}
+│   ├── figure-comparison/{SKILL.md, scripts/*.py}
+│   └── check-sentence-by-sentence/SKILL.md
+├── agents/lc-extractor.md             # literature subagent for /lc-new
 ├── guides/                            # reference docs loaded by skills
 ├── templates/CLAUDE.md                # the project CLAUDE.md template
 └── scripts/*.sh                       # session lifecycle hooks
diff --git a/docs/skills/lc-feedback.md b/docs/skills/lc-feedback.md
index 3db3e3de..b0b55de5 100644
--- a/docs/skills/lc-feedback.md
+++ b/docs/skills/lc-feedback.md
@@ -74,10 +74,3 @@ Sections that don't apply are dropped.
 - Trim aggressively — only the relevant portion of errors.
 - No sensitive data — strip absolute paths, credentials, tokens.
 - Don't editorialize — report what happened.
-
-## Notes for the maintainer who's looking
-
-The triage hint in the prompt currently says "lightcone-cli — `lc` CLI,
-**Dagster execution**, recipes, container builds, scaffolding, skills."
-That's stale — the Dagster mention should be replaced with
-"Snakemake/Dask execution." See the `SKILL.md` source.
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
new file mode 100644
index 00000000..f9031361
--- /dev/null
+++ b/docs/skills/lc-from-paper.md
@@ -0,0 +1,118 @@
+# /lc-from-paper
+
+Reproduce a published scientific paper as a complete ASTRA project. The
+skill is an **orchestrator**: it opens with an interactive interview,
+drafts a per-paper `CLAUDE.md`, then runs as a persistent session that
+spawns named per-phase sub-agents the user can drop into directly.
+
+`/lc-from-paper` is the entry point of the paper-reproduction bundle.
+The four sibling skills ([`paper-extraction`](paper-extraction.md),
+[`narrative`](narrative.md), [`figure-comparison`](figure-comparison.md),
+[`check-sentence-by-sentence`](check-sentence-by-sentence.md)) are
+co-located in the same plugin and invoked by role across the phases.
+
+Source: [`claude/lightcone/skills/lc-from-paper/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-from-paper/SKILL.md).
+
+## Architecture
+
+The orchestrator never absorbs paper or code content directly — it
+spawns sub-agents and reads what they return. Each sub-agent gets its
+own context window, runs one phase, commits its work to git, and exits.
+The orchestrator holds the through-line: user intent, what's been done,
+what's next, how rigorously to spawn the next phase.
+
+Two persistent sub-agents — `paper-expert` and `code-expert` — are
+spawned during ACQUIRE and stay alive for the rest of the reproduction.
+Later phases query them via `SendMessage` instead of re-ingesting
+materials.
+
+**The user can interact with any sub-agent directly.** When the
+orchestrator spawns one, it appears as a chat surface (typically at the
+bottom of the screen). The user switches in for turn-by-turn dialogue,
+switches back out, and the sub-agent stays addressable.
+
+## Phases
+
+Nine phases, zero-indexed. Phases 0, 1, and 8 run in the orchestrator
+session; phases 2–7 are sub-agent dispatches.
+
+| # | Phase | Where | Primary outputs |
+|---|-------|-------|------------------|
+| 0 | INTERVIEW | orchestrator | per-paper `CLAUDE.md` |
+| 1 | ACQUIRE | orchestrator | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-index.md}`; `paper-expert` and `code-expert` sub-agents |
+| 2 | ARCHITECT | sub-agent | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative) |
+| 3 | SPECIFY | sub-agent | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `universes/baseline.yaml` |
+| 4 | LITERATURE | sub-agent | `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
+| 5 | IMPLEMENT | sub-agent | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
+| 6 | RUN | sub-agent | `results/<universe>/<output>/` |
+| 7 | COMPARE | sub-agent | `comparison-report.{yaml,md}` plus an opportunity assessment graded against the user's fidelity intent |
+| 8 | REVIEW | orchestrator | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
+
+ACQUIRE runs in the orchestrator session because its work is two
+parallel sub-skill invocations (`/paper-extraction` and `/lc-from-code`
+in scan-only mode) plus capturing the resulting persistent sub-agents.
+INTERVIEW and REVIEW run there because both are interactive bookends.
+
+## Per-paper `CLAUDE.md`
+
+Drafted during INTERVIEW. The reproduction workdir holds a single
+`CLAUDE.md` that sub-agents and future orchestrator sessions walk up to
+automatically. Sections:
+
+- **Paper identity** — DOI, arXiv ID, title, authors, one-line subject;
+  where the original code lives.
+- **Goal** — the user's **fidelity intent** as prose: their own answer
+  to "when is this good enough." Read on every spawn decision.
+- **Rigor** — *Current state* per output or phase (*sketch / baseline /
+  tightened / canonical*) plus *open opportunities*. Updated by
+  sub-agents as they work.
+- **Disagreements** — paper-vs-code disagreements logged as found.
+  Code is canonical for numerics; both options are preserved as
+  decision options in `astra.yaml`.
+- **Rules** — code-as-canonical, never-block-on-`AskUserQuestion`-
+  mid-sub-agent, arxiv-LaTeX-first acquisition, `astra validate
+  --verify-evidence` as the fidelity gate.
+
+Pointers, not snapshots.
+
+## Disciplines
+
+- **Workdir is the state.** File existence + `git log` + `astra
+  validate` answer "what phase am I on" deterministically. No separate
+  state machine.
+- **Code-as-canonical, with disagreements recorded.** Where paper and
+  code disagree on something material, code wins for numerics but the
+  disagreement is preserved as a decision option and noted in
+  CLAUDE.md.
+- **Rigor is a trajectory toward the user's intent.** Sub-agent
+  fresh-context self-review is sized per spawn from the gap between
+  *Current state* and the Goal's fidelity intent — cheap (skip or one
+  pass) vs heavy (iterate until two consecutive clean rounds, cap 5).
+- **arxiv-LaTeX-first acquisition.** PDF + Docling is the non-arxiv
+  fallback only.
+- **No synthetic data.** Unless the paper itself uses synthetic data,
+  every input must be real.
+
+## Anti-patterns
+
+- Reading content the orchestrator doesn't need. If the answer fits in
+  a sub-agent's return, don't re-read the source.
+- Doing phase work in the orchestrator session. Exceptions are
+  INTERVIEW, ACQUIRE, and REVIEW.
+- Asking a sub-agent to use `AskUserQuestion` — they don't have it.
+- Re-implementing what `astra` already does (`astra validate`, `astra
+  paper add`).
+- Forgetting to announce the spawn — the user needs to know a sub-agent
+  has launched and that they can switch into its chat.
+
+## Related
+
+- [Bundle README](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md)
+  — why the bundle is co-located rather than a separate plugin install.
+- [`/paper-extraction`](paper-extraction.md) — ACQUIRE's primary
+  acquisition path.
+- [`/narrative`](narrative.md) — SPECIFY's prose authoring.
+- [`/figure-comparison`](figure-comparison.md) — REVIEW (mandatory) and
+  also user-invokable.
+- [`/check-sentence-by-sentence`](check-sentence-by-sentence.md) —
+  REVIEW (opt-in) and also user-invokable.
diff --git a/docs/skills/narrative.md b/docs/skills/narrative.md
new file mode 100644
index 00000000..4d659ab6
--- /dev/null
+++ b/docs/skills/narrative.md
@@ -0,0 +1,108 @@
+# /narrative
+
+Author the reader-facing prose in an `astra.yaml`: analysis-level
+`narrative:` blocks (`summary`, `inputs`, `methods`, `findings`,
+`outputs`), decision `rationale:` fields, and shorter `description:` /
+`notes:` on individual entities. Always written against an existing
+spec — the structure must exist when the prose lands.
+
+Source: [`claude/lightcone/skills/narrative/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/narrative/SKILL.md).
+
+## Modes
+
+The skill draws on the spec plus a **second source**. Three modes,
+distinguished by what that second source is:
+
+| Mode | Second source | Status |
+|---|---|---|
+| **Paper reproduction** | An authoritative text (paper, thesis, technical report) | Ready |
+| **Retrofit** | Project artifacts — code, notebooks, fibers, commit history | Stub |
+| **Co-drafting** | The user, in conversation | Stub |
+
+If the second source isn't obvious, the skill asks. Hybrid is allowed
+(reproduction with co-drafted extensions; retrofit with co-drafted
+gap-filling).
+
+`/narrative` is invoked by `/lc-from-paper` during SPECIFY (paper-
+reproduction mode), and is also user-invokable directly in any mode.
+
+## Allowed surfaces
+
+The five-key analysis narrative:
+
+| Key | What it carries | Required when |
+|---|---|---|
+| `summary` | Question, scope, headline shape | optional, but should always exist |
+| `inputs` | Provenance — the data the analysis rests on | `Analysis.inputs` non-empty |
+| `methods` | Pipeline walk; cite each decision and sub-analysis by anchor | `Analysis.decisions` or `Analysis.analyses` non-empty |
+| `findings` | Synthesis of declared findings, each cited by anchor | `Analysis.findings` non-empty |
+| `outputs` | Which artifacts were promoted, and where they go downstream | `Analysis.outputs` non-empty |
+
+A decision's `rationale:` is its own one-paragraph slot: what was
+decided, the insight that motivated it (cite by anchor), and the
+load-bearing alternative and why it lost. Per-entity prose
+(`description`, `notes`) is shorter and lives on individual entries.
+
+## Anchors
+
+Markdown link syntax with `#`-target, **tree-path-first** — same
+grammar as decision `from:` references.
+
+| Target | Anchor |
+|---|---|
+| Input | `#inputs.<id>` |
+| Output | `#outputs.<id>` |
+| Decision | `#decisions.<id>` |
+| Option | `#decisions.<id>.options.<opt>` |
+| Finding | `#findings.<id>` |
+| Prior insight | `#prior_insights.<id>` |
+| Sub-analysis | `#analyses.<sub>` |
+| Element inside a sub-analysis | `#<sub>.<category>.<id>` |
+| Parent scope from a sub-analysis | `#../decisions.<id>` |
+
+Anchor text is **authored prose**, never the raw id. One reference per
+idea — stacking three on a sentence means the sentence carries too
+much.
+
+## Length and modularity
+
+1–3 paragraphs per key, at any level. Length is the mechanism that
+keeps analyses modular: **if references don't fit in three paragraphs,
+the analysis is too big — split it.** The narrative is a compressor;
+if it won't compress, split the thing being compressed.
+
+## Validation
+
+```sh
+astra validate astra.yaml
+```
+
+- **Broken references** → error.
+- **Uncited declared elements** → warning. Every declared finding,
+  decision, output, and sub-analysis must be cited somewhere in the
+  narrative tree.
+- **Conditional coverage** (required-when rules above) → error.
+
+## Anti-patterns
+
+- **Wiki-style what-is framing.** A wiki summarizes; an ASTRA narrative
+  points into reasoning.
+- **Decision-list paragraph.** "We made the following decisions: A, B,
+  C." Cite each where it shapes the pipeline.
+- **`summary` as primer.** Teaching what the field is. Readers arrive
+  with context.
+- **Drafting `findings` on a sub-analysis with no declared findings.**
+  Skip the key.
+- **Narrative-per-element.** The five-key analysis narrative is the
+  only home; per-element prose is `description` / `rationale` /
+  `notes`.
+
+Mode-specific anti-patterns live in each mode's reference under
+`claude/lightcone/skills/narrative/references/`.
+
+## Related
+
+- [`/lc-from-paper`](lc-from-paper.md) — invokes `/narrative` during
+  SPECIFY in paper-reproduction mode.
+- [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md)
+  — full schema reference.
diff --git a/docs/skills/paper-extraction.md b/docs/skills/paper-extraction.md
new file mode 100644
index 00000000..c1b68542
--- /dev/null
+++ b/docs/skills/paper-extraction.md
@@ -0,0 +1,132 @@
+# /paper-extraction
+
+Turn an arXiv ID or DOI into a standardized, indexed `work/reference/`
+directory: substrate (arXiv LaTeX source preferred, PDF + Docling
+fallback), copied figures, per-table `.tex` files, a section outline
+with line numbers, deduplicated citation keys with resolved DOIs, the
+abstract, and a stub `astra.yaml` treating the paper as an ASTRA
+artifact.
+
+Source: [`claude/lightcone/skills/paper-extraction/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/paper-extraction/SKILL.md).
+
+Argument hint: `<arxiv-id-or-doi>` — invoked as `/paper-extraction
+2503.19441` or `/paper-extraction 10.48550/arXiv.2503.19441`.
+
+## Allowed tools
+
+```
+Read, Write, Edit, Bash, Grep, Glob, WebFetch, WebSearch
+```
+
+The deterministic structural work is done by
+`scripts/extract-paper-substrate.py`; the agent runs it, then walks
+warnings and (optionally) fills `findings:`.
+
+## Outputs
+
+Under `work/reference/` (idempotent — re-runs skip what's already done):
+
+```
+work/reference/
+├── index.json                # structural index — figures, tables, outline, citations (with DOIs), paths
+├── astra.yaml                # semantic — the paper as an ASTRA artifact (findings populated in Step 5)
+├── paper.pdf                 # always
+├── paper.tex                 # Path A — symlink to the main .tex file
+│   (or)
+├── document.md               # Path B — Docling-extracted markdown
+├── source/                   # Path A — extracted arXiv tarball
+├── figures/                  # copied figure files
+├── tables/                   # one .tex file per `\begin{table}` block (Path A)
+├── bibliography-source.bib   # Path A — copied from source
+├── bibliography-source.bbl   # Path A — copied from source
+└── .doi-cache.json           # Crossref/ADS lookup cache for idempotency
+```
+
+The skill produces only the paper's own reading materials. Code
+repositories and supplementary datasets are out of scope; the caller
+handles those.
+
+## Two surfaces
+
+**`index.json` is structural and machine-friendly.** Everything the
+script mechanically extracts: figures, tables, section outline with
+line numbers, citation keys (with every location *plus* the cited
+paper's full citation text and resolved DOI), abstract, paths. Read
+this when you want "what's in this paper, where do I find it." DOI
+resolution covers ~96% of typical-paper bibliographies.
+
+**`astra.yaml` is semantic and ASTRA-validating.** Treats the paper as
+an ASTRA artifact: `id`, `name`, `narrative.summary`, and `findings:`
+carrying the paper's claimed numerical results in the Insight +
+Evidence shape. The verbosity of the shape *is* the back-pressure
+against hallucinated claims — the agent has to find and quote actual
+text.
+
+## Workflow
+
+1. **Survey** — `ls work/reference/`; read `index.json` if present.
+   Skip work already done.
+2. **Acquire substrate** — Path A (arXiv → LaTeX source) or Path B
+   (journal-only DOI → PDF + Docling).
+3. **Run the extraction script** — `extract-paper-substrate.py` does
+   the deterministic structural pass: figure copying, per-table
+   `.tex` extraction, outline, citation resolution, `astra.yaml`
+   stub.
+4. **Review warnings and fix structural gaps** — unresolved figures,
+   missing captions, unresolved citation DOIs, Path B caveats.
+5. **(Optional) Walk the paper for findings** — append the paper's
+   central numerical claims to `astra.yaml`'s `findings:` map with
+   verbatim `quote.exact` evidence. Skip unless a downstream consumer
+   needs it.
+
+Path A is preferred whenever the paper is on arXiv — equations,
+ligatures, captions, tables come through clean. Path B is for
+non-arxiv only.
+
+## Citation DOI resolution
+
+The resolver tries, in order: the entry's `doi:` field → an
+`eprint:`-derived arXiv DOI → Crossref bibliographic query (free, no
+API key) → ADS title search (only if `ADS_API_TOKEN` env var or
+`~/.ads/dev_key` is present — graceful skip otherwise). Title hits
+from Crossref are gated by a similarity check against the queried
+title.
+
+## Findings as Insight + Evidence
+
+When Step 5 runs, each finding carries `claim:` plus verbatim `quote.
+exact` anchored to the paper's DOI:
+
+```yaml
+findings:
+  s8_constraint:
+    claim: "S_8 = sigma_8 (Omega_m / 0.3)^0.5 = 0.795 ± 0.014 ..."
+    created_at: "2026-04-04T00:00:00Z"
+    evidence:
+      - doi: "10.48550/arXiv.2604.03227"
+        version: 1
+        quote:
+          exact: "we find $S_8 = 0.795 \\pm 0.014$"
+```
+
+`astra validate --verify-evidence` searches for `quote.exact` in the
+cached PDF — paraphrasing breaks the gate.
+
+## Discipline
+
+- **Quote verbatim.** Copy LaTeX as it appears in `paper.tex`. Don't
+  paraphrase, expand macros, or normalize math.
+- **Every evidence carries `doi:` and `version:`** (the arXiv version,
+  e.g. `1`, `2`).
+- **Read abstract and conclusions first.** Most central findings sit
+  in one of those two surfaces.
+- **Re-runs are safe.** The script preserves agent edits to
+  `astra.yaml` once the stub exists.
+
+## Related
+
+- [`/lc-from-paper`](lc-from-paper.md) — invokes `/paper-extraction`
+  during ACQUIRE; the resulting `paper-expert` sub-agent reads
+  `index.json` and the substrate.
+- [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md)
+  — Insight + Evidence shape, `quote.exact` rules.
diff --git a/docs/user/getting-started.md b/docs/user/getting-started.md
index b4c5cfa1..63032827 100644
--- a/docs/user/getting-started.md
+++ b/docs/user/getting-started.md
@@ -70,7 +70,7 @@ bug reports without leaving the session.
 | `/lc-from-paper` | You have a published paper (DOI / arXiv ID) you want to reproduce. |
 | `/lc-feedback` | Something broke and you want to file a GitHub issue without leaving the session. |
 
-The next page, [The Claude Code Workflow](claude-workflow.md),
+The next page, [The Agent Workflow](agent-workflow.md),
 explains each of these in more detail.
 
 ## 5. The four CLI commands you'll actually type
@@ -91,7 +91,7 @@ exact flags.
 
 ## 6. Read on
 
-- [The Claude Code Workflow](claude-workflow.md) — how each slash
+- [The Agent Workflow](agent-workflow.md) — how each slash
   command actually flows.
 - [Tutorial: Your First Analysis](tutorial.md) — end-to-end, with the
   agent doing most of the typing.
diff --git a/docs/user/index.md b/docs/user/index.md
index c1c73705..4988c051 100644
--- a/docs/user/index.md
+++ b/docs/user/index.md
@@ -15,7 +15,7 @@ implementation; you stay in charge of the scientific choices.
   machine.
 - [Getting Started](getting-started.md) — your first `lc init` and
   what every directory means.
-- [The Claude Code Workflow](claude-workflow.md) — `/lc-new`,
+- [The Agent Workflow](agent-workflow.md) — `/lc-new`,
   `/lc-from-code`, `/lc-from-paper`, and `/lc-feedback` — what each
   one does and when to reach for it.
 - [Tutorial: Your First Analysis](tutorial.md) — an end-to-end worked
diff --git a/docs/user/install.md b/docs/user/install.md
index 7b06f33d..93dedb9a 100644
--- a/docs/user/install.md
+++ b/docs/user/install.md
@@ -80,7 +80,7 @@ claude
 
 Inside Claude Code you'll type slash commands like `/lc-new`,
 `/lc-from-code`, and `/lc-from-paper` — see
-[The Claude Code Workflow](claude-workflow.md).
+[The Agent Workflow](agent-workflow.md).
 
 ## 5. (Optional) Docker or Podman
 
diff --git a/zensical.toml b/zensical.toml
index c99589c1..f09d0639 100644
--- a/zensical.toml
+++ b/zensical.toml
@@ -48,7 +48,12 @@ nav = [
       {"Overview" = "skills/index.md"},
       {"lc-new" = "skills/lc-new.md"},
       {"lc-from-code" = "skills/lc-from-code.md"},
+      {"lc-from-paper" = "skills/lc-from-paper.md"},
       {"lc-feedback" = "skills/lc-feedback.md"},
+      {"paper-extraction" = "skills/paper-extraction.md"},
+      {"narrative" = "skills/narrative.md"},
+      {"figure-comparison" = "skills/figure-comparison.md"},
+      {"check-sentence-by-sentence" = "skills/check-sentence-by-sentence.md"},
       {"Authoring Skills" = "skills/authoring.md"},
     ]},
     {"Contributing" = [

From e1e6fec99c20c1949517df85884a93cd96694a0e Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 03:09:01 +0200
Subject: [PATCH 047/124] README: fix two stale claims; mirror the same fix in
 architecture.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- The `extraction_model:` config option doesn't exist. The lc-extractor
  agent hard-codes `model: sonnet` in its frontmatter; the global
  config only carries `container.runtime`. Drop the misleading
  paragraph from the README and the parenthetical from architecture.md
  (which had already softened it to "historically").

- The /lc-from-paper phase sub-agents listed `acquire`, but ACQUIRE
  runs in the orchestrator session itself — its work is two parallel
  sub-skill invocations that produce the persistent `paper-expert`
  and `code-expert` experts. The actual dispatchable phase sub-agents
  are architect, specify, literature, implement, run, compare. Update
  the description to match the SKILL.md and surface the two persistent
  experts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 README.md            | 4 +---
 docs/architecture.md | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 6c05aafd..489a1e2a 100644
--- a/README.md
+++ b/README.md
@@ -42,7 +42,7 @@ Scans an existing codebase, drafts an `astra.yaml` that captures its inputs, out
 
 ### `/lc-from-paper` — Reproduce a published paper
 
-Interview-first orchestrator for reproducing a published paper in ASTRA. Drafts a per-paper `CLAUDE.md`, then runs as a persistent orchestrator session that spawns named per-phase sub-agents (acquire, architect, specify, literature, implement, run, compare) the user can drop into directly. The two bookends — INTERVIEW and REVIEW — run in the orchestrator session itself. Composes a bundle of sibling skills (paper-extraction, narrative, figure-comparison, check-sentence-by-sentence). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
+Interview-first orchestrator for reproducing a published paper in ASTRA. Drafts a per-paper `CLAUDE.md`, then runs as a persistent orchestrator session that spawns named per-phase sub-agents (architect, specify, literature, implement, run, compare) the user can drop into directly. The bookends — INTERVIEW, ACQUIRE, and REVIEW — run in the orchestrator session itself; ACQUIRE spawns two persistent expert sub-agents (`paper-expert`, `code-expert`) that downstream phases consult via `SendMessage` instead of re-ingesting materials. Composes a bundle of sibling skills (paper-extraction, narrative, figure-comparison, check-sentence-by-sentence). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
 
 ### `/lc-feedback` — Report a bug
 
@@ -58,8 +58,6 @@ Once `astra.yaml` exists, the agent reads `.claude/guides/lightcone-cli-referenc
 
 The first `lc` invocation auto-creates `~/.lightcone/config.yaml` with `container.runtime: auto`. To pin a runtime or change other settings, edit the file directly.
 
-**Extraction model:** Literature extraction subagents default to Sonnet. To change this, set `extraction_model:` in `~/.lightcone/config.yaml` (options: `sonnet`, `haiku`, or omit for inherit).
-
 ### Project scaffolding
 
 ```bash
diff --git a/docs/architecture.md b/docs/architecture.md
index 6122f3d1..d1858130 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -272,7 +272,7 @@ warnings.
 | `.lightcone/Snakefile` | Project (generated) | Auto-generated by `lc run`. Don't edit. |
 | `.lightcone/snakefile-config.json` | Project (generated) | Per-`(rule, universe)` config. |
 | `.lightcone/lightcone.yaml` | Project | Tiny scratchpad — currently writes only `target: local`. Not consumed by today's code. |
-| `~/.lightcone/config.yaml` | User | `container.runtime` (and historically `extraction_model`). |
+| `~/.lightcone/config.yaml` | User | `container.runtime`. |
 | `.claude/settings.json` | Project | Claude Code permissions. |
 
 The `dagster.yaml` and `~/.lightcone/targets/*.yaml` files referenced in

From 78bd863d6f35dffb50d1895006981dc4ff0077b5 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:07:54 +0200
Subject: [PATCH 048/124] ralph-loops + constitution: re-add the iteration
 substrate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round 7 of the lc-from-paper rewrite reverses Round 6's "named per-phase
sub-agent" architecture back to a ralph loop, since sub-agents can't
spawn sub-agents — which broke the per-phase fresh-context review pattern
that's load-bearing for fidelity. Re-adding the two skills that the
reversal needs:

- ralph-loops/scripts/ralph: spec-file-driven loop runner. tmux-detached;
  each iteration starts a fresh claude/codex session with the
  constitution as system prompt; termination by an iteration flipping
  YAML frontmatter `status:` to `closed`.

- ralph-loops/SKILL.md: lean iteration discipline (Survey → Work → Update
  → Exit), absorbing improvements from the cailmdaley/felt original —
  "earn the vantage point" (exit before half-full context), Monitor for
  long-running jobs (no sleep loops), "if you made changes you may not
  close this iteration" (closing is a separate decision).

- ralph-loops/assets/spec.md: generic constitution template (the
  constitution skill points at this; lc-from-paper has its own
  paper-shaped one).

- constitution/SKILL.md + references: the authoring discipline
  (pointers-not-snapshots, desired-state framing, reshape-don't-accrete,
  two-diamonds + six-stances). Trimmed felt-only mechanics; both skills
  now stand alone inside a project's .claude/skills/ with no external
  dependency.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/constitution/SKILL.md | 119 ++++++++++++
 .../constitution/references/constitution.md   | 139 ++++++++++++++
 .../constitution/references/crafting.md       | 181 ++++++++++++++++++
 claude/lightcone/skills/ralph-loops/SKILL.md  | 106 ++++++++++
 .../skills/ralph-loops/assets/spec.md         |  29 +++
 .../skills/ralph-loops/scripts/ralph          | 145 ++++++++++++++
 6 files changed, 719 insertions(+)
 create mode 100644 claude/lightcone/skills/constitution/SKILL.md
 create mode 100644 claude/lightcone/skills/constitution/references/constitution.md
 create mode 100644 claude/lightcone/skills/constitution/references/crafting.md
 create mode 100644 claude/lightcone/skills/ralph-loops/SKILL.md
 create mode 100644 claude/lightcone/skills/ralph-loops/assets/spec.md
 create mode 100755 claude/lightcone/skills/ralph-loops/scripts/ralph

diff --git a/claude/lightcone/skills/constitution/SKILL.md b/claude/lightcone/skills/constitution/SKILL.md
new file mode 100644
index 00000000..e76eb422
--- /dev/null
+++ b/claude/lightcone/skills/constitution/SKILL.md
@@ -0,0 +1,119 @@
+---
+name: constitution
+description: >
+  Draft a constitution — a markdown document describing a desired state
+  for autonomous iteration. Study the problem space, shape the
+  constitution interactively (two-diamonds rhythm; six stances on
+  demand), then hand it to a runner — `/ralph-loops` for a tmux loop,
+  or any other iteration-runner. Use for any work where adaptation
+  matters more than a fixed plan: science, refactoring, exploration,
+  creative work, research narratives.
+  Triggers: "constitution", "constitute", "draft a constitution",
+  "ralph spec", "set up a ralph", "write a spec for autonomous iteration".
+---
+
+# Constitution
+
+A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It's designed to outlast any single agent or iteration and remain valid as the world changes around it. A good constitution never says "50 files remain" because that's a snapshot that goes stale; it says "check `grep -r 'old_pattern'`" because that's a principle that stays true until the work is done.
+
+Constitutions don't prescribe steps. They describe what the system looks like when it's right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what's highest value. Each iteration of the work does this with fresh context.
+
+This matters most in science and exploratory work, where each decision is informed by the result just before it. A plan assumes you know the path; a constitution trusts the agent to find it — with taste, judgment, and fresh eyes each time.
+
+**Separation of context: if you craft, you never do the work yourself.**
+
+## Workflow
+
+1. **Study** — Read relevant files, understand existing patterns. This informs the *constitution*, not implementation. The goal is pointers that iterations will follow.
+
+2. **Draft** — Create a markdown file for the constitution. The bundled template lives in the sibling `ralph-loops` skill:
+   ```bash
+   cp .claude/skills/ralph-loops/assets/spec.md my-constitution.md
+   ```
+   Some workflows author the constitution at a specific path so a runner picks it up (e.g. `/lc-from-paper` writes `constitution.md` at the reproduction workdir root). Fill in what you can; don't wait until it's perfect.
+
+3. **Refine** — Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. The two-diamonds rhythm and six stances in [`references/crafting.md`](references/crafting.md) help most when the user is deciding something non-trivial. Apply the qualitative ambiguity self-check before launching.
+
+4. **Launch** — When approved, hand the constitution to whichever runner is appropriate. Common options:
+
+   - **`/ralph-loops`** — bundled tmux loop runner. Re-spawns iterations against the constitution until an iteration flips `status:` to `closed` after a cold survey.
+     ```bash
+     .claude/skills/ralph-loops/scripts/ralph my-constitution.md [--backend claude|codex] [-- extra-flags...]
+     ```
+     Add `-- --chrome` for visual / frontend work. Attach: `tmux attach -t ralph-<dir>-<basename>`.
+   - **Other dispatchers** — anything that reads a markdown spec or fiber and spawns iterations. Their configuration is owned outside this skill.
+
+   The constitution stays editable while iteration runs; successive iterations re-read it each cycle, so refinements between iterations are normal.
+
+## What goes in a constitution
+
+A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what doesn't, add what's missing:
+
+```markdown
+## Desired State
+What the system looks like when it's done. Invariants, quality bar,
+done-conditions. Fence the scope — what to aim for AND what to leave alone.
+
+## Context
+File paths, existing patterns, architectural constraints. Things iterations
+need to *find* but not *achieve*.
+
+## Skills
+Which skills to activate before working.
+
+## Evidence
+How to check progress — commands, test suites, grep patterns. Pointers to
+the ground truth that iterations measure themselves against.
+
+## Open Questions
+Uncertainties the user should weigh in on. Iterations add to this; the user
+resolves between loops.
+```
+
+For deeper reference on each section's voice and the discipline that keeps a constitution from drifting into a plan, see [`references/constitution.md`](references/constitution.md).
+
+## Principles
+
+**Constitution, not plan.** Say what the system looks like when it's right. Never describe the current state — anything that becomes false or irrelevant as work progresses doesn't belong. If a section would be outdated after one iteration, it's a snapshot — replace it with a pointer.
+
+**Pointers, not snapshots.** "Check `grep -r 'old_pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
+
+**Reshape, don't accrete.** When the desired state evolves — testing surfaces a gap, a meeting changes the priority, a sibling decision lands — rewrite the affected sections so the body still reads as today's desired state. Don't tack on a "Round 2" section; don't add an "Amendments" appendix; don't keep the old framing alongside the new one as a sediment. A green-field constitution will change a lot as it matures. The chronology lives in the runner's history surface (commits, sibling notes); the body lives in *now*.
+
+**Prefer existing systems.** Before designing anything new: can what's there handle this?
+
+**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
+
+**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it.
+
+## Constitutions that shape artifacts
+
+Some constitutions don't build code — they shape artifacts like documentation, dashboards, or research narratives. These have different rhythms:
+
+- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it's the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow.
+- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops.
+
+## Anti-patterns
+
+**Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment.
+
+**Vague done.** "Make it better" — when does iteration stop?
+
+**Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste.
+
+**Snapshot language.** "Currently 50 files" — will be wrong after one iteration.
+
+**Decision logs in the body.** "Resolved choices" / "Process notes" sections turn the constitution into a process journal. When a question gets answered, fold the answer into the narrative where it's contextually relevant — into Invariants, Desired State, Context — and let the runner's history surface (commits, sibling notes) carry the chronology.
+
+**Amendment scaffolding.** "Round 2", "v2 deltas", "Updates 2026-05-04 →". The same failure as a decision log, played out across edits: the body becomes a sediment of layered framings instead of the current desired state. When the desired state shifts, *reshape* the affected sections — rewrite headings, update prose, drop what no longer applies — so the document still reads as one coherent description of now.
+
+---
+
+## References
+
+- [`references/constitution.md`](references/constitution.md) — depth on drafting voice, sections, and the crafting workflow.
+- [`references/crafting.md`](references/crafting.md) — two-diamonds
+  rhythm, six stances, the funnel ledger, and the qualitative ambiguity
+  self-check. Use this when the conversation has careful-thinking
+  character — not every constitution drafting needs it, but the ones that
+  do are the ones that benefit most.
diff --git a/claude/lightcone/skills/constitution/references/constitution.md b/claude/lightcone/skills/constitution/references/constitution.md
new file mode 100644
index 00000000..89f7542d
--- /dev/null
+++ b/claude/lightcone/skills/constitution/references/constitution.md
@@ -0,0 +1,139 @@
+# Constitution — depth reference
+
+Drafting a constitution. The SKILL body covers the procedural backbone (Study → Draft → Refine → Launch). This reference goes deeper on voice, sections, and the discipline that keeps a constitution from sliding into a plan.
+
+The constitution itself is just a markdown file with YAML frontmatter that a runner reads on each iteration. The bundled runner is `ralph-loops` (`scripts/ralph`); other dispatchers can read the same markdown shape. The runner is interchangeable; the constitution is what matters.
+
+---
+
+## What a constitution is
+
+A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It is designed to outlast any single iteration and remain valid as the world changes around it.
+
+**A good constitution never says "50 files remain"** — that is a snapshot that goes stale. It says `check "grep -r 'old_pattern'"` — that is a principle that stays true until the work is done.
+
+Constitutions do not prescribe steps. They describe what the system looks like when it is right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what is highest value. Each iteration of the work does this with fresh context.
+
+**Constitution, not plan.** Plans assume you know the path; constitutions trust the agent to find it — with taste, judgment, and fresh eyes each time. This matters most in science and exploratory work, where each decision is informed by the result just before it.
+
+**Separation of context: if you craft, you never do the work yourself.** The constitution is designed by one role; iterations are run by another.
+
+---
+
+## When to write a constitution
+
+- Work where adaptation matters more than a fixed plan: scientific investigation, exploratory refactoring, creative writing.
+- The desired state is clear (or can be made clear) but the path is not.
+- Iterations need to re-read with fresh context and make judgment calls.
+- A checklist would either be wrong after one step or race through without judgment.
+
+Don't write a constitution for: clearly-scoped atomic tasks, anything where a checklist or a plan is genuinely the right shape.
+
+---
+
+## Workflow (deeper)
+
+### 1. Study
+
+Read relevant files, understand existing patterns. This informs the **constitution**, not implementation — the goal is pointers that iterations will follow, not a head start on the work.
+
+### 2. Draft
+
+Create the constitution file from the bundled template:
+
+```bash
+cp .claude/skills/ralph-loops/assets/spec.md my-constitution.md
+```
+
+Or write the constitution at a path the runner expects (e.g. `/lc-from-paper` writes `constitution.md` at the reproduction workdir root).
+
+Use the crafting process from [`crafting.md`](crafting.md):
+
+- **Wonder → Ontology:** what IS the desired state? Name it precisely.
+- **Design → Delivery:** what sections does this constitution need? Which are pointers vs snapshots?
+
+Stances that help most during constitution drafting:
+
+- **Ontologist** for naming the desired state ("what IS 'done' here?")
+- **Simplifier** for fencing scope ("what are we explicitly leaving alone?")
+- **Contrarian** for pressure-testing whether the whole framing is right
+- **Architect** when the constitution is about refactoring structure
+
+### 3. Refine
+
+Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. Apply the qualitative ambiguity self-check from `crafting.md` — goal, constraints, success — before launching.
+
+Repeat until it feels solid. It does not have to be complete; open questions belong in the Open Questions section.
+
+### 4. Launch
+
+When approved, hand to a runner. Bundled option: `.claude/skills/ralph-loops/scripts/ralph my-constitution.md`. The runner re-reads the constitution each iteration, so refinements between iterations are normal.
+
+---
+
+## Constitutional sections
+
+A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what does not, add what is missing:
+
+```markdown
+## Desired State
+What the system looks like when it is done. Invariants, quality bar,
+done-conditions. Fence the scope — what to aim for AND what to leave alone.
+
+## Context
+File paths, existing patterns, architectural constraints. Things iterations
+need to *find* but not *achieve*.
+
+## Skills
+Which skills to activate before working.
+
+## Evidence
+How to check progress — commands, test suites, grep patterns. Pointers to
+ground truth that iterations measure themselves against.
+
+## Open Questions
+Uncertainties the user should weigh in on. Iterations add to this; the user
+resolves between loops.
+```
+
+---
+
+## Principles (deeper)
+
+**Pointers, not snapshots.** `check "grep -r 'old_pattern'"` not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
+
+**Reshape, don't accrete.** When the desired state evolves — testing surfaces a gap, a meeting changes the priority, a sibling decision lands — rewrite the affected sections so the body still reads as today's desired state. Don't tack on a "Round 2" section; don't add an "Amendments" appendix; don't keep the old framing alongside the new one as a sediment. A green-field constitution will change a lot as it matures, and a mature one will keep changing as reality does. The chronology lives in the runner's history surface (commits, sibling notes); the body lives in *now*.
+
+**Prefer existing systems.** Before designing anything new: can what is there handle this?
+
+**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
+
+**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it.
+
+---
+
+## Constitutions that shape artifacts
+
+Some constitutions do not build code — they shape artifacts like documentation or research narratives. These have different rhythms:
+
+- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it is the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow.
+- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops.
+
+---
+
+## Anti-patterns
+
+- **Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment.
+- **Vague done.** "Make it better" — when does iteration stop? What would a reader see?
+- **Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste.
+- **Snapshot language.** "Currently 50 files" — will be wrong after one iteration.
+- **Immutable seed.** Not our shape. The constitution is meant to be edited between iterations; do not treat it as frozen.
+- **Numerical convergence.** "Iteration stops when similarity ≥ 0.95" — wrong shape for science. Stop when the Evidence section says the desired state has been reached.
+- **Decision logs in the body.** "Resolved choices" / "Decisions made" / "Process notes" sections turn the constitution into a process journal. When a question gets answered (in conversation, via `AskUserQuestion`, in a review), fold the answer into the narrative where it is contextually relevant — into Invariants, Desired State, Context — and let the runner's chronological surface (commits, sibling notes) carry the chronology. The constitution describes *what is*, not *how we got here*; an "Open Questions" section that has been fully resolved should be deleted, not left as a victory log.
+- **Amendment scaffolding.** "Round 2", "v2 deltas", "Updates 2026-05-04 →", "Second round amendments". The same failure as a decision log, played out across edits: the body becomes a sediment of layered framings instead of the current desired state. When the desired state shifts, *reshape* the affected sections — rewrite headings, update prose, drop what no longer applies — so the document still reads as one coherent description of now. The story of how it got here is what commits and the outcome blurb are for.
+
+---
+
+## When crafting lands here
+
+The crafting rhythm in [`crafting.md`](crafting.md) applies to all careful interactive thinking; this reference kicks in when the target artifact is specifically a constitution. The diamonds do most of the work — the funnel mechanic used for open-ended exploration is not the primary move here, because there is already one specific artifact being produced. See the Workflow section above for which stances help most at each drafting phase.
diff --git a/claude/lightcone/skills/constitution/references/crafting.md b/claude/lightcone/skills/constitution/references/crafting.md
new file mode 100644
index 00000000..9bc44cc0
--- /dev/null
+++ b/claude/lightcone/skills/constitution/references/crafting.md
@@ -0,0 +1,181 @@
+# Crafting
+
+How to help the user think through something that hasn't crystallized, and turn the result into structured commitments — fields on an `astra.yaml` if you're inside an analysis (decisions with excluded options, evidence pointers, scoped findings), or inline structure in the constitution itself.
+
+Use it when the user is deciding something non-trivial, scoping a sub-analysis, drafting a living spec, or talking through an open question — any time careful interactive thinking is happening and the output can land in structured form.
+
+The rhythm is two diamonds: first understand what the thing IS, then decide what to DO about it. Each diamond diverges to explore and converges to commit. The ontological question — *what IS this, really?* — is the convergence point of the first diamond, and it is the most practical question you can ask.
+
+```
+    ◇ Wonder              ◇ Design
+   ╱  (diverge)          ╱  (diverge)
+  ╱    surface          ╱    alternatives
+ ╱     questions       ╱     trade-offs
+●─────────────────────●─────────────────────●
+ ╲                     ╲
+  ╲    crystallize      ╲    commit
+   ╲   the name          ╲   with reasons
+    ◇  (converge)         ◇  (converge)
+    Ontology              Delivery
+```
+
+Diamond 1 diverges into questions and converges on a name (*"this IS a decision about covariance estimation"*). Diamond 2 diverges into alternatives and converges on a commit (a default with `excluded_reason` for each rejection). The second diamond inherits the ontological commit from the first.
+
+---
+
+## The two diamonds
+
+### Diamond 1: Wonder → Ontology
+
+**Wonder (diverge).** What are we actually trying to figure out? Surface questions, assumptions, ambiguities. Do not propose answers yet. If the user is already pitching solutions, back them up to the question.
+
+**Ontology (converge).** What IS this, really? Crystallize into a claim, decision, or question specific enough to act on. The convergence is complete when you can **name** the thing precisely — "this is a decision about covariance estimation" or "this is a question about whether leakage matters below ℓ=100." A good name is often the entire output of Diamond 1.
+
+**Output of Diamond 1:** a stub with a real name and at least one structural placeholder — a decision label, an insight claim, or input/output IDs. Not a full block — just the hook that identifies what kind of thing this is.
+
+### Diamond 2: Design → Delivery
+
+**Design (diverge).** What are the real alternatives? For each, what would make it right or wrong? Trade-offs, excluded options, edge cases. This is where the Contrarian and Simplifier stances are most useful.
+
+**Delivery (converge).** Commit to a default, write the `excluded_reason` for each rejected option, identify inputs and outputs, stage the evidence. The structure is now formalizable.
+
+**Output of Diamond 2:** structured fields populated — `decisions` with options and default, `inputs`/`outputs` with IDs and types, `insights` with claim and evidence (in `astra.yaml` or in the constitution itself).
+
+The two diamonds are sequential but the boundary is soft. If you find yourself naming alternatives before the thing is clear, back up to the ontology convergence point. If you converge too early on "this is a decision" when it is actually a question, the Design phase will feel forced — that is the cue to re-enter Wonder.
+
+---
+
+## Stances
+
+Six lightweight lenses for when the conversation needs pressure. **Default is no stance** — straight conversation. Invoke a stance when pressure would help, announce it in one sentence, drop it when it has done its work. Do not stack or pipeline them.
+
+### Socratic — *"What are you assuming?"*
+
+Question-only. Never proposes answers. Surfaces the assumptions under the user's framing.
+
+- What are you assuming is true that might not be?
+- What would make option A right vs option B? What is the actual fork?
+- If you had to write the `excluded_reason` for the option you are about to reject, what would it say?
+
+**Use in Wonder and early Design.** When the user is about to commit to a path and you want the reasons made explicit.
+
+### Ontologist — *"What IS this, really?"*
+
+Pushes on definition before mechanism. Four questions:
+
+1. **Essence** — what is the true nature, stripping away accidental properties?
+2. **Root cause or symptom** — is this the fundamental issue or a surface effect?
+3. **Prerequisites** — what must exist first for this even to make sense?
+4. **Hidden assumptions** — what implicit beliefs is the framing resting on?
+
+**Use at the Ontology convergence point.** When a word is doing heavy lifting and may mean different things in different sentences.
+
+### Contrarian — *"What if the opposite were true?"*
+
+Challenges premises, not details.
+
+- What if the choice does not actually matter for your signal?
+- What if the constraint you are designing around is not real?
+- What if the simplest version is already good enough?
+
+**Use in Design.** When the conversation is burning effort on a distinction that may not matter, or a third option (do nothing, use the default) is being ignored.
+
+### Simplifier — *"Is this complexity earning its keep?"*
+
+YAGNI, concrete first, data over code.
+
+- What can we remove without losing the core value?
+- What is the simplest version that would work?
+- Can a data structure replace this logic?
+
+**Use in Design and early Delivery.** When the design is drifting toward over-engineering or a feature list is growing without anchoring reasons.
+
+### Researcher — *"What do we actually know?"*
+
+Evidence before interpretation. Especially useful for scientific work where a claim needs to be defensible.
+
+- What does the actual source say, not what we remember?
+- What would count as evidence here? What would falsify the claim?
+- What is the most specific claim we can make with the data in hand?
+
+**Use in Delivery.** When an insight needs a defensible claim, or when the user is about to write an outcome that is stronger than the evidence supports.
+
+### Architect — *"If we started over, would we build it this way?"*
+
+Structural root cause. The question behind the question when friction keeps recurring.
+
+- Is the same problem showing up in different forms?
+- Which abstraction does not match reality?
+- What assumption was wrong from the start?
+
+**Use when a debate keeps returning.** The user is circling a decision they have already made three times and cannot stick to — the real question is probably structural, not tactical.
+
+---
+
+## The funnel
+
+When the conversation is exploratory — no single topic, things are accumulating — keep a private running ledger of what is falling out, classified by destination:
+
+| Item kind | What it looks like | Destination |
+|-----------|--------------------|-------------|
+| **Decision** | A choice between real alternatives | `decisions` block in `astra.yaml` / spec |
+| **Finding** | A claim with at least the start of evidence | `findings` block in `astra.yaml` / spec |
+| **Sub-analysis** | "Compute X from Y" with identifiable inputs/outputs | New `astra.yaml` sub-analysis |
+| **Question** | An open thread worth tracking, not yet answered | "Open Questions" section of the constitution |
+| **CLAUDE.md change** | A pattern or gotcha that belongs in project memory | Edit CLAUDE.md |
+
+The ledger is your own working memory. **Do not surface it mid-conversation** unless the user asks or a flush cue fires.
+
+**Flush cues:**
+
+- User says "OK we should write this down" or similar
+- Three or more items have accumulated and the topic is about to shift
+- A natural pause after a decision or finding lands
+
+On flush, present the ledger grouped by destination, then file with the user's assent. If the user declines an item, discard it without argument.
+
+---
+
+## Qualitative ambiguity self-check
+
+Before committing to a path — filing a decision, launching an iteration loop, sealing an outcome — check three things qualitatively. **No scoring, no thresholds.** If any feels fuzzy, resolve it with AskUserQuestion.
+
+1. **Goal.** Is what the user wants specific enough that two competent people would build the same thing from it? If not, what would pin it down?
+2. **Constraints.** Are the limits named? What cannot change, what must be preserved, what would break everything? Missing constraints tend to show up as "oh wait, we also need…" after the commit.
+3. **Success.** How will we know it is done or right? What is the evidence condition? Qualitative is fine ("a reviewer can follow the narrative cold"), but it has to be checkable.
+
+When one is fuzzy, use AskUserQuestion with concrete options rather than open prose questions. Iterate until the answer is "yeah, that's it." **Stop when the fuzziness resolves, not when a score crosses a threshold.** Scores on qualitative priors add false precision; the honest signal is whether the user knows what they want.
+
+This is a mirror, not a gate. If the user wants to file anyway with one dimension still fuzzy, file it — the fuzziness itself can live in an Open Questions section, and future iterations can refine it.
+
+---
+
+## Mapping outputs to structure
+
+What comes out of the diamonds maps onto wherever you keep structured commitments:
+
+| Diamond output | Destination |
+|----------------|-------------|
+| Wonder questions left open | "Open Questions" section in the constitution |
+| Ontology convergence — "this IS a decision about X" | A `decisions.<key>.label` entry — in `astra.yaml` or in the constitution body |
+| Design alternatives with trade-offs | `decisions.<key>.options`; rejected options get `excluded_reason` |
+| Delivery — the commit | `decisions.<key>.default` |
+| Finding at end of Delivery | `findings.<key>` with `claim` + `evidence` (or in `astra.yaml`) |
+| Sub-analysis scope | New sub-analysis in `astra.yaml` |
+| Process-level lesson that generalizes | Edit to project CLAUDE.md |
+
+The same shapes apply directly inline in `astra.yaml` or the constitution itself; no separate substrate is required.
+
+---
+
+## Anti-patterns
+
+- **Ambiguity gates.** Do not withhold help until the user clarifies N dimensions. The self-check is a mirror, not a door.
+- **Numerical scoring.** Do not introduce 0–1 clarity scores with thresholds. The underlying signal is qualitative and the number adds false precision.
+- **Stance pipelines.** Do not run Socratic → Ontologist → Contrarian in sequence. Pick one when it helps; drop it when it has.
+- **Mandatory interview.** No prepared question list. Stances are responsive to the actual conversation.
+- **Surfacing the ledger too early.** A single item is not a flush. Wait for accumulation or a pause.
+- **Immutable outputs.** Nothing filed here is locked. Everything is editable; reversals are normal.
+- **Nine-minds overload.** Six stances is already generous. Add more only when a specific gap shows up, never preemptively.
+- **Interrogation without a ceiling.** Three questions is usually enough. If the user is getting irritated, stop asking and file what you have.
+- **Converging before the name is clear.** If Diamond 2 feels forced, Diamond 1 has not finished. Back up.
diff --git a/claude/lightcone/skills/ralph-loops/SKILL.md b/claude/lightcone/skills/ralph-loops/SKILL.md
new file mode 100644
index 00000000..f59f7e1a
--- /dev/null
+++ b/claude/lightcone/skills/ralph-loops/SKILL.md
@@ -0,0 +1,106 @@
+---
+name: ralph-loops
+description: >
+  Autonomous loop iteration toward a desired state. You are inside a ralph
+  loop — your constitution is in the system prompt. Survey, contribute,
+  update state discoverably, exit. Activated automatically inside ralph
+  loops, or when launching one against an existing constitution via
+  scripts/ralph; for drafting the constitution itself, use /constitution.
+  Triggers: "ralph-loops", "launch ralph", "run ralph", "ralph loop on <constitution>".
+---
+
+# Ralph Loops
+
+The autonomous iteration loop a constitution dispatches against. The skill has two entry points, and only one applies at a time:
+
+- **Launching a loop** — outside any active loop, invoke the bundled launcher script to start an iteration sequence on a constitution file. See **Launching** below.
+- **Inside a loop** — the constitution is in the system prompt above; follow the **Loop** protocol. Ignore the Launching section; a loop is already running.
+
+## Launching
+
+The launcher is a shell script bundled with this skill. Its runtime path inside a project (after `lc init` copies the bundle) is:
+
+```
+.claude/skills/ralph-loops/scripts/ralph
+```
+
+Usage:
+
+```
+.claude/skills/ralph-loops/scripts/ralph <constitution.md> [--backend claude|codex] [-- extra-flags...]
+```
+
+- `<constitution.md>` is the constitution file. Its YAML frontmatter must carry `status: open` or `status: active`; the launcher refuses to start otherwise. The loop terminates automatically when an iteration flips `status:` to `closed` after a cold survey.
+- The launcher starts a detached tmux session and returns immediately. Attach with `tmux attach -t <session>`; the printed session name is `ralph-<dirname>-<basename>`.
+- A second launch with the same constitution detects the existing tmux session and prints the attach command instead of double-starting.
+
+### Backends
+
+- `claude` (default) — each iteration runs `claude --dangerously-skip-permissions --append-system-prompt <constitution>` with the constitution injected as the system prompt.
+- `codex` — runs `codex --dangerously-bypass-approvals-and-sandbox --config developer_instructions=<constitution>`.
+
+Set via `--backend codex` or `RALPH_BACKEND=codex`.
+
+### Extra flags
+
+Anything after a literal `--` separator forwards to the backend unchanged. Common flags for the Claude backend:
+
+- `--chrome` — enable the Claude-in-Chrome integration for iterations that need live browser access.
+- `--model <id>` — override the backend model.
+
+### Examples
+
+```bash
+# Launch on a per-paper reproduction constitution
+.claude/skills/ralph-loops/scripts/ralph constitution.md
+
+# Codex backend
+.claude/skills/ralph-loops/scripts/ralph constitution.md --backend codex
+
+# Claude backend with Chrome integration and a model override
+.claude/skills/ralph-loops/scripts/ralph constitution.md -- --chrome --model claude-opus-4-6
+```
+
+## Loop
+
+1. **Survey** — Fresh eyes. Read the constitution and the workdir's `CLAUDE.md`. Check `git log`, glance at sub-fibers or notes the prior iteration left, look at what's actually in the workdir. You decide what to check.
+2. **Work** — Stay and work from the vantage point the survey built. Make 1–3 substantial contributions; don't try to clear the whole queue in one iteration.
+3. **Update** — Before exiting: commit your work, update `CLAUDE.md`'s accumulators (Rigor *Current state*, Paper-vs-code disagreements, open opportunities) if anything sharpened, sharpen the constitution body if a fact stable enough to belong in *Context* or *Desired State* landed.
+4. **Exit** — `kill $PPID`.
+
+### Earn the vantage point
+
+The survey is a fixed cost; exploit the warm world-model rather than rebuilding it next iteration. Exit when the next valuable move needs a different mental workspace — not when one task ends. If changes so far have been small and runway is plentiful, expand the workspace rather than exit.
+
+**Exit before context is half-full.** Don't wait for "filling" to feel pressing — the right moment is the next sub-task boundary after you cross half. Write the handoff (commits, `CLAUDE.md` accumulators, constitution sharpening) from full attention and exit; don't try to cram one more thing in. The marginal step you'd squeeze in costs the next iteration more than it saves you, because it pays for the degraded handoff.
+
+## Rules
+
+**State, not checklist.** The constitution describes what "done" looks like. Survey reality, decide what's highest value, work on that.
+
+**Discoverable updates.** Commits, files in the workdir, `CLAUDE.md` accumulators — not progress notes scattered in the body. The next iteration finds what changed by inspecting the system.
+
+**Pointers, not snapshots.** If you learn something stable, update the constitution's *Context* or *Desired State*. Don't leave drive-by notes in the body.
+
+**You have authority.** Trust the constitution. Don't ask permission. Make substantial contributions. Don't avoid ambitious solutions just because they span multiple iterations — the loop continues, tweaks on the next iter are cheap.
+
+**File uncertain decisions** somewhere the user will see them. The convention varies by project: an `open-questions.md` file the constitution points at, an `Open Questions` section in the constitution itself, a `-t question` felt fiber when felt is in use. Don't sediment them in invisible places.
+
+### Long-running jobs
+
+If an iteration kicks off computation (snakemake, cluster jobs, container builds, dev servers), use the `Monitor` tool to stream events from the background process — each stdout line surfaces as a notification, so you'll get pinged when something happens without polling-with-sleep. For one-shot "wait until done," use Bash with `run_in_background` and you'll be notified on completion. Either way, shepherd computation to completion before exiting. Don't fire-and-forget.
+
+## Exit
+
+Closing the constitution (`status: closed` in frontmatter) stops the loop — no further iterations will run. So the closing decision is reserved for a cold survey that finds nothing left to do.
+
+**If you made any changes this iteration, you may not close the constitution.** Commit, update the workdir, `kill $PPID` — let the next iteration survey with fresh eyes and decide whether to close. This is the only hard rule on exit.
+
+Making changes does NOT mean you should exit early. Keep working while the context is warm — make as many changes as belong in this iteration. The rule only constrains *closing the constitution*, not the length of the iteration. See **Earn the vantage point** above for when to actually exit.
+
+- **Made changes this iteration** → `kill $PPID` when the warm context is spent. Do not close the constitution.
+- **Survey found zero remaining work AND you made zero changes** → flip the constitution's frontmatter `status:` to `closed`, append a closing line to the body or to a sibling summary file recording what landed, then `kill $PPID`. The launcher's next check fails and the loop terminates.
+
+---
+
+Pattern adapted from [Ralph Wiggum](https://ghuntley.com/ralph/).
diff --git a/claude/lightcone/skills/ralph-loops/assets/spec.md b/claude/lightcone/skills/ralph-loops/assets/spec.md
new file mode 100644
index 00000000..e9294db3
--- /dev/null
+++ b/claude/lightcone/skills/ralph-loops/assets/spec.md
@@ -0,0 +1,29 @@
+---
+status: open
+---
+
+This is your constitution for an autonomous iteration loop — a meditative iteration toward a desired state.
+
+## Desired State
+
+[Describe what you're building and why. Someone unfamiliar with the project should understand the goal from this section alone.
+
+Be detailed about "done": the architecture, behavior, constraints, quality bar. You'll check reality against this and work to close the gap.
+
+Use pointers, not snapshots. Say "check `grep -r 'pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid.]
+
+## Context
+
+[Point to relevant files and existing patterns. When you see real implementations, you build coherently on them rather than introducing alien patterns.]
+
+## Skills
+
+[Skills to activate before working. Use `/skill-name`.]
+
+## Evidence
+
+[How to check progress — commands, test suites, grep patterns. Pointers to the ground truth that iterations measure themselves against.]
+
+## Open Questions
+
+[Uncertainties the user should weigh in on. Iterations add to this; the user resolves between loops.]
diff --git a/claude/lightcone/skills/ralph-loops/scripts/ralph b/claude/lightcone/skills/ralph-loops/scripts/ralph
new file mode 100755
index 00000000..993fb586
--- /dev/null
+++ b/claude/lightcone/skills/ralph-loops/scripts/ralph
@@ -0,0 +1,145 @@
+#!/bin/bash
+# Run a ralph loop on a constitution file.
+#
+# Loops while the constitution's YAML frontmatter `status:` is `open` or
+# `active`. Each iteration starts a fresh Claude (or Codex) session with
+# the constitution injected as the system prompt; the worker surveys,
+# works, commits, and exits via `kill $PPID`. Termination is by an
+# iteration flipping `status:` to `closed` on a cold survey.
+#
+# Usage:
+#   ralph <constitution.md> [--backend claude|codex] [-- extra-flags...]
+#
+# Default backend: claude. Override with --backend codex or RALPH_BACKEND=codex.
+
+set -e
+
+SPEC_FILE="${1:?Usage: ralph <constitution.md> [--backend claude|codex] [-- extra-flags...]}"
+shift
+
+BACKEND="${RALPH_BACKEND:-claude}"
+if [[ "$1" == "--backend" ]]; then
+    BACKEND="$2"
+    shift 2
+fi
+
+EXTRA_FLAGS=""
+if [[ "$1" == "--" ]]; then
+    shift
+    EXTRA_FLAGS="$*"
+fi
+
+# Resolve to absolute path
+SPEC_FILE="$(cd "$(dirname "$SPEC_FILE")" && pwd)/$(basename "$SPEC_FILE")"
+
+if [[ ! -f "$SPEC_FILE" ]]; then
+    echo "Constitution file not found: $SPEC_FILE"
+    exit 1
+fi
+
+SESSION="ralph-$(basename "$(dirname "$SPEC_FILE")")-$(basename "$SPEC_FILE" .md)"
+WORK_DIR="$(dirname "$SPEC_FILE")"
+
+# Anchor status check to the YAML frontmatter so body prose describing
+# this very check ("status: open|active") can't self-match.
+check_status() {
+    head -50 "$SPEC_FILE" | sed -n '/^---$/,/^---$/p' | grep -qiE '^status:[[:space:]]*(open|active)'
+}
+
+if ! check_status; then
+    echo "Constitution $SPEC_FILE must have YAML frontmatter status: open or active."
+    echo "  Fix: add"
+    echo "         ---"
+    echo "         status: active"
+    echo "         ---"
+    echo "       at the top of the file."
+    exit 1
+fi
+
+# Refuse to double-launch
+if tmux has-session -t "$SESSION" 2>/dev/null; then
+    echo "Ralph already running: $SESSION"
+    echo "  Attach: tmux attach -t $SESSION"
+    exit 0
+fi
+
+# Write loop script to temp file (avoids heredoc quoting hell)
+LOOP_SCRIPT=$(mktemp "${TMPDIR:-/tmp}/ralph-loop.XXXXXX")
+cat > "$LOOP_SCRIPT" << 'LOOP'
+#!/bin/bash
+SPEC_FILE="$1"
+WORK_DIR="$2"
+BACKEND="$3"
+EXTRA_FLAGS="$4"
+
+iteration=0
+
+check_status() {
+    head -50 "$SPEC_FILE" | sed -n '/^---$/,/^---$/p' | grep -qiE '^status:[[:space:]]*(open|active)'
+}
+
+while check_status; do
+    cd "$WORK_DIR"
+    iteration=$((iteration + 1))
+    echo ""
+    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+    echo "Ralph iteration $iteration — $(date '+%H:%M:%S')"
+    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+
+    SPEC_CONTENT=$(cat "$SPEC_FILE")
+
+    SYSPROMPT_FILE=$(mktemp "${TMPDIR:-/tmp}/ralph-sys.XXXXXX")
+    PROMPT_FILE=$(mktemp "${TMPDIR:-/tmp}/ralph-prompt.XXXXXX")
+
+    cat > "$SYSPROMPT_FILE" << SYSEOF
+Ralph iteration $iteration. Constitution: $SPEC_FILE
+
+$SPEC_CONTENT
+SYSEOF
+
+    cat > "$PROMPT_FILE" << 'PROMPTEOF'
+You are inside a ralph loop — meditative iteration toward a desired state. Activate the ralph-loops skill and follow its iteration protocol against the constitution above. The workdir's CLAUDE.md auto-loads; read it on entry.
+PROMPTEOF
+
+    PROMPT=$(cat "$PROMPT_FILE")
+
+    if [[ "$BACKEND" == "codex" ]]; then
+        codex --dangerously-bypass-approvals-and-sandbox \
+            --config "developer_instructions=$(cat "$SYSPROMPT_FILE")" \
+            $EXTRA_FLAGS \
+            "$PROMPT"
+    else
+        claude --dangerously-skip-permissions \
+            $EXTRA_FLAGS \
+            --append-system-prompt "$(cat "$SYSPROMPT_FILE")" \
+            <<< "$PROMPT"
+    fi
+
+    rm -f "$SYSPROMPT_FILE" "$PROMPT_FILE"
+
+    echo "--- Iteration complete ---"
+    sleep 2
+done
+
+echo ""
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Ralph complete — $iteration iterations"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo ""
+echo "Session kept open for inspection. Type exit to close."
+exec bash -l
+LOOP
+
+chmod +x "$LOOP_SCRIPT"
+
+echo "Starting ralph on $SPEC_FILE"
+echo "  Backend:  $BACKEND"
+echo "  Work dir: $WORK_DIR"
+[[ -n "$EXTRA_FLAGS" ]] && echo "  Flags:    $EXTRA_FLAGS"
+
+# Launch tmux with a login shell running the loop script
+tmux new-session -d -s "$SESSION" -c "$WORK_DIR" \
+    bash -l "$LOOP_SCRIPT" "$SPEC_FILE" "$WORK_DIR" "$BACKEND" "$EXTRA_FLAGS"
+
+echo "  Session:  $SESSION"
+echo "  Attach:   tmux attach -t $SESSION"

From 102ca0cb61c77cf4633edd6a56b22ee77f1edef8 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:11:51 +0200
Subject: [PATCH 049/124] ralph: collapse constitution + ralph-loops into one
 skill
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cail's call mid-session: authoring a constitution and iterating against
it are two modes of the same activity, not two skills. felt's bundle
does it this way (one `ralph`, no separate `/constitution`); the same
shape fits here. The previous split forced a cross-skill dependency
(constitution → ralph-loops/assets/spec.md) and bifurcated the surface
for what's really one practice.

Result: one `ralph/` skill with three modes — Authoring (Study → Draft →
Refine → Launch), Launching (the script), and Loop (Survey → Work →
Update → Exit). Reference depth (`constitution.md`, `crafting.md`) lives
under `ralph/references/`. The standalone constitution skill is gone;
the assets/spec.md template is gone (the SKILL body's "What goes in a
constitution" section is the starting shape, no separate file to copy).

`/constitution`, `/constitute`, "draft a constitution" all trigger
`/ralph` via the description's keywords so existing muscle memory still
reaches it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/constitution/SKILL.md | 119 -----------
 claude/lightcone/skills/ralph-loops/SKILL.md  | 106 ----------
 .../skills/ralph-loops/assets/spec.md         |  29 ---
 claude/lightcone/skills/ralph/SKILL.md        | 195 ++++++++++++++++++
 .../references/constitution.md                |  14 +-
 .../references/crafting.md                    |   0
 .../{ralph-loops => ralph}/scripts/ralph      |   0
 7 files changed, 199 insertions(+), 264 deletions(-)
 delete mode 100644 claude/lightcone/skills/constitution/SKILL.md
 delete mode 100644 claude/lightcone/skills/ralph-loops/SKILL.md
 delete mode 100644 claude/lightcone/skills/ralph-loops/assets/spec.md
 create mode 100644 claude/lightcone/skills/ralph/SKILL.md
 rename claude/lightcone/skills/{constitution => ralph}/references/constitution.md (89%)
 rename claude/lightcone/skills/{constitution => ralph}/references/crafting.md (100%)
 rename claude/lightcone/skills/{ralph-loops => ralph}/scripts/ralph (100%)

diff --git a/claude/lightcone/skills/constitution/SKILL.md b/claude/lightcone/skills/constitution/SKILL.md
deleted file mode 100644
index e76eb422..00000000
--- a/claude/lightcone/skills/constitution/SKILL.md
+++ /dev/null
@@ -1,119 +0,0 @@
----
-name: constitution
-description: >
-  Draft a constitution — a markdown document describing a desired state
-  for autonomous iteration. Study the problem space, shape the
-  constitution interactively (two-diamonds rhythm; six stances on
-  demand), then hand it to a runner — `/ralph-loops` for a tmux loop,
-  or any other iteration-runner. Use for any work where adaptation
-  matters more than a fixed plan: science, refactoring, exploration,
-  creative work, research narratives.
-  Triggers: "constitution", "constitute", "draft a constitution",
-  "ralph spec", "set up a ralph", "write a spec for autonomous iteration".
----
-
-# Constitution
-
-A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It's designed to outlast any single agent or iteration and remain valid as the world changes around it. A good constitution never says "50 files remain" because that's a snapshot that goes stale; it says "check `grep -r 'old_pattern'`" because that's a principle that stays true until the work is done.
-
-Constitutions don't prescribe steps. They describe what the system looks like when it's right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what's highest value. Each iteration of the work does this with fresh context.
-
-This matters most in science and exploratory work, where each decision is informed by the result just before it. A plan assumes you know the path; a constitution trusts the agent to find it — with taste, judgment, and fresh eyes each time.
-
-**Separation of context: if you craft, you never do the work yourself.**
-
-## Workflow
-
-1. **Study** — Read relevant files, understand existing patterns. This informs the *constitution*, not implementation. The goal is pointers that iterations will follow.
-
-2. **Draft** — Create a markdown file for the constitution. The bundled template lives in the sibling `ralph-loops` skill:
-   ```bash
-   cp .claude/skills/ralph-loops/assets/spec.md my-constitution.md
-   ```
-   Some workflows author the constitution at a specific path so a runner picks it up (e.g. `/lc-from-paper` writes `constitution.md` at the reproduction workdir root). Fill in what you can; don't wait until it's perfect.
-
-3. **Refine** — Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. The two-diamonds rhythm and six stances in [`references/crafting.md`](references/crafting.md) help most when the user is deciding something non-trivial. Apply the qualitative ambiguity self-check before launching.
-
-4. **Launch** — When approved, hand the constitution to whichever runner is appropriate. Common options:
-
-   - **`/ralph-loops`** — bundled tmux loop runner. Re-spawns iterations against the constitution until an iteration flips `status:` to `closed` after a cold survey.
-     ```bash
-     .claude/skills/ralph-loops/scripts/ralph my-constitution.md [--backend claude|codex] [-- extra-flags...]
-     ```
-     Add `-- --chrome` for visual / frontend work. Attach: `tmux attach -t ralph-<dir>-<basename>`.
-   - **Other dispatchers** — anything that reads a markdown spec or fiber and spawns iterations. Their configuration is owned outside this skill.
-
-   The constitution stays editable while iteration runs; successive iterations re-read it each cycle, so refinements between iterations are normal.
-
-## What goes in a constitution
-
-A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what doesn't, add what's missing:
-
-```markdown
-## Desired State
-What the system looks like when it's done. Invariants, quality bar,
-done-conditions. Fence the scope — what to aim for AND what to leave alone.
-
-## Context
-File paths, existing patterns, architectural constraints. Things iterations
-need to *find* but not *achieve*.
-
-## Skills
-Which skills to activate before working.
-
-## Evidence
-How to check progress — commands, test suites, grep patterns. Pointers to
-the ground truth that iterations measure themselves against.
-
-## Open Questions
-Uncertainties the user should weigh in on. Iterations add to this; the user
-resolves between loops.
-```
-
-For deeper reference on each section's voice and the discipline that keeps a constitution from drifting into a plan, see [`references/constitution.md`](references/constitution.md).
-
-## Principles
-
-**Constitution, not plan.** Say what the system looks like when it's right. Never describe the current state — anything that becomes false or irrelevant as work progresses doesn't belong. If a section would be outdated after one iteration, it's a snapshot — replace it with a pointer.
-
-**Pointers, not snapshots.** "Check `grep -r 'old_pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done.
-
-**Reshape, don't accrete.** When the desired state evolves — testing surfaces a gap, a meeting changes the priority, a sibling decision lands — rewrite the affected sections so the body still reads as today's desired state. Don't tack on a "Round 2" section; don't add an "Amendments" appendix; don't keep the old framing alongside the new one as a sediment. A green-field constitution will change a lot as it matures. The chronology lives in the runner's history surface (commits, sibling notes); the body lives in *now*.
-
-**Prefer existing systems.** Before designing anything new: can what's there handle this?
-
-**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
-
-**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it.
-
-## Constitutions that shape artifacts
-
-Some constitutions don't build code — they shape artifacts like documentation, dashboards, or research narratives. These have different rhythms:
-
-- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it's the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow.
-- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops.
-
-## Anti-patterns
-
-**Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment.
-
-**Vague done.** "Make it better" — when does iteration stop?
-
-**Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste.
-
-**Snapshot language.** "Currently 50 files" — will be wrong after one iteration.
-
-**Decision logs in the body.** "Resolved choices" / "Process notes" sections turn the constitution into a process journal. When a question gets answered, fold the answer into the narrative where it's contextually relevant — into Invariants, Desired State, Context — and let the runner's history surface (commits, sibling notes) carry the chronology.
-
-**Amendment scaffolding.** "Round 2", "v2 deltas", "Updates 2026-05-04 →". The same failure as a decision log, played out across edits: the body becomes a sediment of layered framings instead of the current desired state. When the desired state shifts, *reshape* the affected sections — rewrite headings, update prose, drop what no longer applies — so the document still reads as one coherent description of now.
-
----
-
-## References
-
-- [`references/constitution.md`](references/constitution.md) — depth on drafting voice, sections, and the crafting workflow.
-- [`references/crafting.md`](references/crafting.md) — two-diamonds
-  rhythm, six stances, the funnel ledger, and the qualitative ambiguity
-  self-check. Use this when the conversation has careful-thinking
-  character — not every constitution drafting needs it, but the ones that
-  do are the ones that benefit most.
diff --git a/claude/lightcone/skills/ralph-loops/SKILL.md b/claude/lightcone/skills/ralph-loops/SKILL.md
deleted file mode 100644
index f59f7e1a..00000000
--- a/claude/lightcone/skills/ralph-loops/SKILL.md
+++ /dev/null
@@ -1,106 +0,0 @@
----
-name: ralph-loops
-description: >
-  Autonomous loop iteration toward a desired state. You are inside a ralph
-  loop — your constitution is in the system prompt. Survey, contribute,
-  update state discoverably, exit. Activated automatically inside ralph
-  loops, or when launching one against an existing constitution via
-  scripts/ralph; for drafting the constitution itself, use /constitution.
-  Triggers: "ralph-loops", "launch ralph", "run ralph", "ralph loop on <constitution>".
----
-
-# Ralph Loops
-
-The autonomous iteration loop a constitution dispatches against. The skill has two entry points, and only one applies at a time:
-
-- **Launching a loop** — outside any active loop, invoke the bundled launcher script to start an iteration sequence on a constitution file. See **Launching** below.
-- **Inside a loop** — the constitution is in the system prompt above; follow the **Loop** protocol. Ignore the Launching section; a loop is already running.
-
-## Launching
-
-The launcher is a shell script bundled with this skill. Its runtime path inside a project (after `lc init` copies the bundle) is:
-
-```
-.claude/skills/ralph-loops/scripts/ralph
-```
-
-Usage:
-
-```
-.claude/skills/ralph-loops/scripts/ralph <constitution.md> [--backend claude|codex] [-- extra-flags...]
-```
-
-- `<constitution.md>` is the constitution file. Its YAML frontmatter must carry `status: open` or `status: active`; the launcher refuses to start otherwise. The loop terminates automatically when an iteration flips `status:` to `closed` after a cold survey.
-- The launcher starts a detached tmux session and returns immediately. Attach with `tmux attach -t <session>`; the printed session name is `ralph-<dirname>-<basename>`.
-- A second launch with the same constitution detects the existing tmux session and prints the attach command instead of double-starting.
-
-### Backends
-
-- `claude` (default) — each iteration runs `claude --dangerously-skip-permissions --append-system-prompt <constitution>` with the constitution injected as the system prompt.
-- `codex` — runs `codex --dangerously-bypass-approvals-and-sandbox --config developer_instructions=<constitution>`.
-
-Set via `--backend codex` or `RALPH_BACKEND=codex`.
-
-### Extra flags
-
-Anything after a literal `--` separator forwards to the backend unchanged. Common flags for the Claude backend:
-
-- `--chrome` — enable the Claude-in-Chrome integration for iterations that need live browser access.
-- `--model <id>` — override the backend model.
-
-### Examples
-
-```bash
-# Launch on a per-paper reproduction constitution
-.claude/skills/ralph-loops/scripts/ralph constitution.md
-
-# Codex backend
-.claude/skills/ralph-loops/scripts/ralph constitution.md --backend codex
-
-# Claude backend with Chrome integration and a model override
-.claude/skills/ralph-loops/scripts/ralph constitution.md -- --chrome --model claude-opus-4-6
-```
-
-## Loop
-
-1. **Survey** — Fresh eyes. Read the constitution and the workdir's `CLAUDE.md`. Check `git log`, glance at sub-fibers or notes the prior iteration left, look at what's actually in the workdir. You decide what to check.
-2. **Work** — Stay and work from the vantage point the survey built. Make 1–3 substantial contributions; don't try to clear the whole queue in one iteration.
-3. **Update** — Before exiting: commit your work, update `CLAUDE.md`'s accumulators (Rigor *Current state*, Paper-vs-code disagreements, open opportunities) if anything sharpened, sharpen the constitution body if a fact stable enough to belong in *Context* or *Desired State* landed.
-4. **Exit** — `kill $PPID`.
-
-### Earn the vantage point
-
-The survey is a fixed cost; exploit the warm world-model rather than rebuilding it next iteration. Exit when the next valuable move needs a different mental workspace — not when one task ends. If changes so far have been small and runway is plentiful, expand the workspace rather than exit.
-
-**Exit before context is half-full.** Don't wait for "filling" to feel pressing — the right moment is the next sub-task boundary after you cross half. Write the handoff (commits, `CLAUDE.md` accumulators, constitution sharpening) from full attention and exit; don't try to cram one more thing in. The marginal step you'd squeeze in costs the next iteration more than it saves you, because it pays for the degraded handoff.
-
-## Rules
-
-**State, not checklist.** The constitution describes what "done" looks like. Survey reality, decide what's highest value, work on that.
-
-**Discoverable updates.** Commits, files in the workdir, `CLAUDE.md` accumulators — not progress notes scattered in the body. The next iteration finds what changed by inspecting the system.
-
-**Pointers, not snapshots.** If you learn something stable, update the constitution's *Context* or *Desired State*. Don't leave drive-by notes in the body.
-
-**You have authority.** Trust the constitution. Don't ask permission. Make substantial contributions. Don't avoid ambitious solutions just because they span multiple iterations — the loop continues, tweaks on the next iter are cheap.
-
-**File uncertain decisions** somewhere the user will see them. The convention varies by project: an `open-questions.md` file the constitution points at, an `Open Questions` section in the constitution itself, a `-t question` felt fiber when felt is in use. Don't sediment them in invisible places.
-
-### Long-running jobs
-
-If an iteration kicks off computation (snakemake, cluster jobs, container builds, dev servers), use the `Monitor` tool to stream events from the background process — each stdout line surfaces as a notification, so you'll get pinged when something happens without polling-with-sleep. For one-shot "wait until done," use Bash with `run_in_background` and you'll be notified on completion. Either way, shepherd computation to completion before exiting. Don't fire-and-forget.
-
-## Exit
-
-Closing the constitution (`status: closed` in frontmatter) stops the loop — no further iterations will run. So the closing decision is reserved for a cold survey that finds nothing left to do.
-
-**If you made any changes this iteration, you may not close the constitution.** Commit, update the workdir, `kill $PPID` — let the next iteration survey with fresh eyes and decide whether to close. This is the only hard rule on exit.
-
-Making changes does NOT mean you should exit early. Keep working while the context is warm — make as many changes as belong in this iteration. The rule only constrains *closing the constitution*, not the length of the iteration. See **Earn the vantage point** above for when to actually exit.
-
-- **Made changes this iteration** → `kill $PPID` when the warm context is spent. Do not close the constitution.
-- **Survey found zero remaining work AND you made zero changes** → flip the constitution's frontmatter `status:` to `closed`, append a closing line to the body or to a sibling summary file recording what landed, then `kill $PPID`. The launcher's next check fails and the loop terminates.
-
----
-
-Pattern adapted from [Ralph Wiggum](https://ghuntley.com/ralph/).
diff --git a/claude/lightcone/skills/ralph-loops/assets/spec.md b/claude/lightcone/skills/ralph-loops/assets/spec.md
deleted file mode 100644
index e9294db3..00000000
--- a/claude/lightcone/skills/ralph-loops/assets/spec.md
+++ /dev/null
@@ -1,29 +0,0 @@
----
-status: open
----
-
-This is your constitution for an autonomous iteration loop — a meditative iteration toward a desired state.
-
-## Desired State
-
-[Describe what you're building and why. Someone unfamiliar with the project should understand the goal from this section alone.
-
-Be detailed about "done": the architecture, behavior, constraints, quality bar. You'll check reality against this and work to close the gap.
-
-Use pointers, not snapshots. Say "check `grep -r 'pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid.]
-
-## Context
-
-[Point to relevant files and existing patterns. When you see real implementations, you build coherently on them rather than introducing alien patterns.]
-
-## Skills
-
-[Skills to activate before working. Use `/skill-name`.]
-
-## Evidence
-
-[How to check progress — commands, test suites, grep patterns. Pointers to the ground truth that iterations measure themselves against.]
-
-## Open Questions
-
-[Uncertainties the user should weigh in on. Iterations add to this; the user resolves between loops.]
diff --git a/claude/lightcone/skills/ralph/SKILL.md b/claude/lightcone/skills/ralph/SKILL.md
new file mode 100644
index 00000000..5155c2f0
--- /dev/null
+++ b/claude/lightcone/skills/ralph/SKILL.md
@@ -0,0 +1,195 @@
+---
+name: ralph
+description: >
+  Author a constitution — a markdown document describing a desired state for
+  autonomous iteration — and run a ralph loop against it. The skill covers
+  three modes: drafting a constitution (Study → Draft → Refine → Launch),
+  launching a loop via the bundled tmux runner, and executing a single
+  iteration from inside an active loop (survey → work → update → exit).
+  Use for any work where adaptation matters more than a fixed plan: science,
+  refactoring, exploration, long-running reproductions.
+  Triggers: "ralph", "ralph loop", "constitution", "constitute", "draft a
+  constitution", "launch ralph", "run ralph on <constitution>", "set up a
+  ralph loop".
+---
+
+# Ralph
+
+Long-running iteration toward a desired state. The substrate is a **constitution** — a markdown file describing what "done" looks like. The runner is a **ralph loop** — a tmux session that spawns a fresh worker per iteration with the constitution as system prompt.
+
+Three modes; one applies at a time:
+
+- **Authoring** — drafting a constitution from scratch. See **Authoring** below.
+- **Launching** — outside any active loop, invoking the bundled script to start one on an existing constitution. See **Launching**.
+- **Inside a loop** — the constitution is in the system prompt above; follow the **Loop** protocol. Ignore the other sections; a loop is already running.
+
+**Separation of context: if you author, you do not iterate. If you iterate, you do not author.** Authoring designs the desired state from outside; iterations close the gap from inside. The constitution stays editable across iterations, but the role is set per session.
+
+---
+
+## What a constitution is
+
+A design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It is designed to outlast any single iteration and remain valid as the world changes around it. **A good constitution never says "50 files remain"** — that's a snapshot that goes stale. It says `check "grep -r 'old_pattern'"` — that's a principle that stays true until the work is done.
+
+Constitutions don't prescribe steps. They describe what the system looks like when it's right — the desired state, in both senses. Whoever works from it surveys reality, reasons about the gap, and decides what's highest value. Each iteration does this with fresh context.
+
+For deeper voice / section guidance and the discipline that keeps a constitution from sliding into a plan, see [`references/constitution.md`](references/constitution.md). For the careful-thinking rhythm that authoring usually wants (two diamonds, six stances, the funnel, the qualitative ambiguity self-check), see [`references/crafting.md`](references/crafting.md).
+
+---
+
+## Authoring
+
+1. **Study** — Read relevant files, understand existing patterns. This informs the *constitution*, not the implementation. The goal is pointers iterations will follow.
+
+2. **Draft** — Create the constitution as a markdown file. Some workflows expect it at a specific path so a runner picks it up (e.g. `/lc-from-paper` writes `constitution.md` at the reproduction workdir root); otherwise put it wherever the work lives. Frontmatter the file with:
+
+   ```yaml
+   ---
+   status: active
+   ---
+   ```
+
+   That's what the launcher checks; it refuses to start otherwise.
+
+3. **Refine** — Show the draft, get feedback, revise. Use `AskUserQuestion` for structured choices. Apply the qualitative ambiguity self-check from [`references/crafting.md`](references/crafting.md) — goal, constraints, success — before launching. Reach for the crafting rhythm and stances when the conversation has careful-thinking character; skip when it doesn't.
+
+4. **Launch** — Hand the constitution to the runner (see **Launching** below). The constitution stays editable while iterations run; each cycle re-reads it, so refinements between iterations are normal.
+
+### What goes in a constitution
+
+A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what doesn't, add what's missing:
+
+```markdown
+## Desired State
+What the system looks like when it's done. Invariants, quality bar,
+done-conditions. Fence the scope — what to aim for AND what to leave alone.
+
+## Context
+File paths, existing patterns, architectural constraints. Things iterations
+need to *find* but not *achieve*.
+
+## Skills
+Which skills to activate before working.
+
+## Evidence
+How to check progress — commands, test suites, grep patterns. Pointers to
+the ground truth that iterations measure themselves against.
+
+## Open Questions
+Uncertainties the user should weigh in on. Iterations add to this; the user
+resolves between loops.
+```
+
+### Authoring principles
+
+- **Constitution, not plan.** Say what the system looks like when it's right. Never describe the current state — anything that becomes false or irrelevant as work progresses doesn't belong. If a section would be outdated after one iteration, it's a snapshot — replace it with a pointer.
+- **Pointers, not snapshots.** "Check `grep -r 'old_pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid across iterations.
+- **Reshape, don't accrete.** When the desired state evolves, rewrite the affected sections so the body still reads as today's desired state. Don't tack on "Round 2" or an "Amendments" appendix. The chronology lives in commits and sibling notes; the body lives in *now*.
+- **Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies.
+- **Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift.
+
+### Authoring anti-patterns
+
+- **Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment.
+- **Vague done.** "Make it better" — when does iteration stop?
+- **Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste.
+- **Decision logs / amendment scaffolding.** "Resolved choices", "Round 2", "v2 deltas". Turns the constitution into a process journal. Fold answers into the narrative; let commits carry the chronology.
+
+---
+
+## Launching
+
+The launcher is a shell script bundled with this skill. Inside a project (after `lc init` copies the bundle), its path is:
+
+```
+.claude/skills/ralph/scripts/ralph
+```
+
+Usage:
+
+```
+.claude/skills/ralph/scripts/ralph <constitution.md> [--backend claude|codex] [-- extra-flags...]
+```
+
+- `<constitution.md>` is the constitution file. YAML frontmatter must carry `status: open` or `status: active`; the launcher refuses to start otherwise. Termination is automatic when an iteration flips `status:` to `closed`.
+- The launcher detaches into a tmux session named `ralph-<dirname>-<basename>` and returns immediately. Attach with `tmux attach -t <session>`. A second launch with the same constitution detects the existing session and prints the attach command instead of double-starting.
+
+### Backends
+
+- `claude` (default) — each iteration runs `claude --dangerously-skip-permissions --append-system-prompt <constitution>` with the constitution injected as the system prompt.
+- `codex` — runs `codex --dangerously-bypass-approvals-and-sandbox --config developer_instructions=<constitution>`.
+
+Set with `--backend codex` or `RALPH_BACKEND=codex`.
+
+### Extra flags
+
+Anything after a literal `--` separator forwards to the backend unchanged. Common Claude-backend flags:
+
+- `--chrome` — Claude-in-Chrome integration for iterations that need live browser access.
+- `--model <id>` — override the backend model.
+
+### Examples
+
+```bash
+# Launch on a per-paper reproduction constitution
+.claude/skills/ralph/scripts/ralph constitution.md
+
+# Codex backend
+.claude/skills/ralph/scripts/ralph constitution.md --backend codex
+
+# Claude backend with Chrome integration and a model override
+.claude/skills/ralph/scripts/ralph constitution.md -- --chrome --model claude-opus-4-6
+```
+
+---
+
+## Loop
+
+1. **Survey** — Fresh eyes. Read the constitution and the workdir's `CLAUDE.md`. Check `git log`, glance at sub-fibers or notes the prior iteration left, look at what's actually in the workdir.
+2. **Work** — Stay and work from the vantage point the survey built. Make 1–3 substantial contributions; don't try to clear the queue in one iteration.
+3. **Update** — Before exiting: commit your work; update `CLAUDE.md`'s accumulators (Rigor *Current state*, Paper-vs-code disagreements, open opportunities — whichever the project carries) if anything sharpened; sharpen the constitution body itself if a fact stable enough to belong in *Context* or *Desired State* landed.
+4. **Exit** — `kill $PPID`.
+
+### Earn the vantage point
+
+The survey is a fixed cost; exploit the warm world-model rather than rebuilding it next iteration. Exit when the next valuable move needs a different mental workspace — not when one task ends. If changes so far have been small and runway is plentiful, expand the workspace rather than exit.
+
+**Exit before context is half-full.** Don't wait for "filling" to feel pressing — the right moment is the next sub-task boundary after you cross half. Write the handoff (commits, accumulator updates, constitution sharpening) from full attention and exit; don't try to cram one more thing in. The marginal step you'd squeeze in costs the next iteration more than it saves you, because it pays for the degraded handoff.
+
+### Iteration rules
+
+**State, not checklist.** The constitution describes what "done" looks like. Survey reality, decide what's highest value, work on that.
+
+**Discoverable updates.** Commits, files in the workdir, `CLAUDE.md` accumulators — not progress notes scattered in the body. The next iteration finds what changed by inspecting the system.
+
+**Pointers, not snapshots.** If you learn something stable, update the constitution's *Context* or *Desired State*. Don't leave drive-by notes in the body.
+
+**You have authority.** Trust the constitution. Don't ask permission. Make substantial contributions. Don't avoid ambitious solutions just because they span multiple iterations — the loop continues; tweaks on the next iter are cheap.
+
+**File uncertain decisions** somewhere the user will see them. The convention varies by project: an `open-questions.md` file the constitution points at, an `Open Questions` section in the constitution itself, a `-t question` felt fiber when felt is in use. Don't sediment them in invisible places.
+
+### Long-running jobs
+
+If an iteration kicks off computation (snakemake, cluster jobs, container builds, dev servers), use the `Monitor` tool to stream events from the background process — each stdout line surfaces as a notification, so you'll get pinged when something happens without polling-with-sleep. For one-shot "wait until done," use Bash with `run_in_background` and you'll be notified on completion. Either way, shepherd computation to completion before exiting. Don't fire-and-forget.
+
+### Exit
+
+Closing the constitution (`status: closed` in frontmatter) stops the loop — no further iterations will run. So the closing decision is reserved for a cold survey that finds nothing left to do.
+
+**If you made any changes this iteration, you may not close the constitution.** Commit, update the workdir, `kill $PPID` — let the next iteration survey with fresh eyes and decide whether to close. This is the only hard rule on exit.
+
+Making changes does NOT mean you should exit early. Keep working while the context is warm — make as many changes as belong in this iteration. The rule only constrains *closing the constitution*, not the length of the iteration. See **Earn the vantage point** above.
+
+- **Made changes this iteration** → `kill $PPID` when the warm context is spent. Do not close the constitution.
+- **Survey found zero remaining work AND you made zero changes** → flip the constitution's frontmatter `status:` to `closed`, append a closing summary to the body or a sibling notes file recording what landed, then `kill $PPID`. The launcher's next check fails and the loop terminates.
+
+---
+
+## References
+
+- [`references/constitution.md`](references/constitution.md) — depth on drafting voice, sections, and the discipline that keeps a constitution from drifting into a plan.
+- [`references/crafting.md`](references/crafting.md) — two-diamonds rhythm, six stances, the funnel ledger, and the qualitative ambiguity self-check. Use this when the conversation has careful-thinking character — not every authoring session needs it, but the ones that do are the ones that benefit most.
+
+---
+
+Loop pattern adapted from [Ralph Wiggum](https://ghuntley.com/ralph/).
diff --git a/claude/lightcone/skills/constitution/references/constitution.md b/claude/lightcone/skills/ralph/references/constitution.md
similarity index 89%
rename from claude/lightcone/skills/constitution/references/constitution.md
rename to claude/lightcone/skills/ralph/references/constitution.md
index 89f7542d..39eb28b5 100644
--- a/claude/lightcone/skills/constitution/references/constitution.md
+++ b/claude/lightcone/skills/ralph/references/constitution.md
@@ -1,8 +1,8 @@
 # Constitution — depth reference
 
-Drafting a constitution. The SKILL body covers the procedural backbone (Study → Draft → Refine → Launch). This reference goes deeper on voice, sections, and the discipline that keeps a constitution from sliding into a plan.
+Drafting a constitution. The SKILL body's **Authoring** section covers the procedural backbone (Study → Draft → Refine → Launch). This reference goes deeper on voice, sections, and the discipline that keeps a constitution from sliding into a plan.
 
-The constitution itself is just a markdown file with YAML frontmatter that a runner reads on each iteration. The bundled runner is `ralph-loops` (`scripts/ralph`); other dispatchers can read the same markdown shape. The runner is interchangeable; the constitution is what matters.
+The constitution itself is just a markdown file with YAML frontmatter that a runner reads on each iteration. The bundled runner is `scripts/ralph` (next to this skill); other dispatchers can read the same markdown shape. The runner is interchangeable; the constitution is what matters.
 
 ---
 
@@ -39,13 +39,7 @@ Read relevant files, understand existing patterns. This informs the **constituti
 
 ### 2. Draft
 
-Create the constitution file from the bundled template:
-
-```bash
-cp .claude/skills/ralph-loops/assets/spec.md my-constitution.md
-```
-
-Or write the constitution at a path the runner expects (e.g. `/lc-from-paper` writes `constitution.md` at the reproduction workdir root).
+Create the constitution as a markdown file with `status: active` in YAML frontmatter (that's what the launcher checks). Some workflows expect a specific path so a runner picks it up — e.g. `/lc-from-paper` writes `constitution.md` at the reproduction workdir root. Otherwise put it wherever the work lives. The section block in the SKILL's "What goes in a constitution" is your starting shape; fill what fits, drop what doesn't.
 
 Use the crafting process from [`crafting.md`](crafting.md):
 
@@ -67,7 +61,7 @@ Repeat until it feels solid. It does not have to be complete; open questions bel
 
 ### 4. Launch
 
-When approved, hand to a runner. Bundled option: `.claude/skills/ralph-loops/scripts/ralph my-constitution.md`. The runner re-reads the constitution each iteration, so refinements between iterations are normal.
+When approved, hand to a runner. Bundled option: `.claude/skills/ralph/scripts/ralph my-constitution.md`. The runner re-reads the constitution each iteration, so refinements between iterations are normal.
 
 ---
 
diff --git a/claude/lightcone/skills/constitution/references/crafting.md b/claude/lightcone/skills/ralph/references/crafting.md
similarity index 100%
rename from claude/lightcone/skills/constitution/references/crafting.md
rename to claude/lightcone/skills/ralph/references/crafting.md
diff --git a/claude/lightcone/skills/ralph-loops/scripts/ralph b/claude/lightcone/skills/ralph/scripts/ralph
similarity index 100%
rename from claude/lightcone/skills/ralph-loops/scripts/ralph
rename to claude/lightcone/skills/ralph/scripts/ralph

From d743bd17d910a954ce2264e9ead9f3b46d1b2e42 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:15:07 +0200
Subject: [PATCH 050/124] lc-from-paper: split per-paper template into
 constitution.md + CLAUDE.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The single CLAUDE.md template conflated two surfaces with different
update cadences: a goal substrate (Goal, Fidelity intent, Scope, Quality
bar, Evidence — sharpens slowly, only on real changes) and an
auto-loading walk-up (Rules, Rigor accumulator, Disagreements log,
Pointers — accretes every iteration). Mixing them made it harder to see
what's stable vs what's churning, and obscured which file is the ralph
loop's driving document.

Split into two templates that ship side-by-side. INTERVIEW will author
both:

- templates/constitution.md — YAML frontmatter `status: active` (the
  ralph launcher's gate), then Goal / Fidelity intent / Scope / Quality
  bar / Evidence / Open dimensions. Every ralph iteration reads this on
  entry. Mirrors the felt constitution-as-fiber-body practice.

- templates/CLAUDE.md — leaner walk-up: paper identity at the top,
  Rules, Rigor accumulator, Disagreements log, Pointers. Points at
  constitution.md as the driving document. Updated by iterations as they
  work.

Rules cleaned up alongside the split: dropped the persistent-experts
rule (gone in the new architecture; ACQUIRE produces the substrate
on-disk, not a long-lived sub-agent), no-synthetic-data lifted in from
the implement reference where it belonged everywhere, "discoverable
updates not progress notes" pulled forward as a rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../skills/lc-from-paper/templates/CLAUDE.md  | 51 ++++++++-----------
 .../lc-from-paper/templates/constitution.md   | 45 ++++++++++++++++
 2 files changed, 65 insertions(+), 31 deletions(-)
 create mode 100644 claude/lightcone/skills/lc-from-paper/templates/constitution.md

diff --git a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
index 5f2a86e6..4e8fdf1d 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
@@ -1,53 +1,42 @@
 # <paper-slug>
 
-Reproduction of <paper title> (<arXiv ID>). DOI: <doi>.
+Reproduction of **<paper title>** (<arXiv ID>). DOI: <doi>. One-line subject: <e.g. "BAO scale measurement from DESI DR1">.
 
-## Paper
+The driving document for this reproduction is [`constitution.md`](constitution.md) — Goal, Fidelity intent, Scope, Quality bar, Evidence. Every ralph iteration reads it on entry. This file (`CLAUDE.md`) is the auto-loading walk-up: rules + running accumulators.
 
-- Authors: <list>
-- One-line subject: <e.g. "BAO scale measurement from DESI DR1">
-- Code repo: <url> (cloned to `work/reference/code/` during ACQUIRE; scan inventory at `work/reference/code-index.md`)
-- Paper materials: `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` (from `/paper-extraction` during ACQUIRE)
-
-## Goal
-
-<What "done" looks like for this reproduction. Concrete: which targets, what verdict against them, what validation passes. E.g.: "A complete `astra.yaml` with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.">
-
-**In scope:** <targeted figures / tables / numbers, methodological span being reproduced.>
-
-**Out of scope:** <explicit exclusions, fenced from drift.>
+## Rules
 
-**Fidelity intent:** <the user's prose answer from INTERVIEW to "when is this good enough" — captured verbatim or in close paraphrase. E.g. "just checking if the analysis is tractable — quick sanity on a headline number", "Figure 3 must be right; the rest can stay rough", "full fidelity on the BAO fit, baseline elsewhere", "every primary and secondary target lining up within stated tolerance". The orchestrator translates this into per-spawn cheap/heavy decisions and COMPARE grades opportunities against it. Static once approved; the user can sharpen it at any REVIEW.>
+- **Code-as-canonical when `work/reference/code/` exists.** Every iteration that touches a sub-analysis reads the relevant code first. Where paper and code disagree, code is canonical for numerics, plotting, and method. When `work/reference/code/` is absent, paper is the only anchor — implement fresh from the spec, expect slower convergence, surface gaps honestly to the user rather than dressing them up.
+- **Never block on `AskUserQuestion` mid-iteration.** Each ralph iteration runs in a fresh detached session; the user isn't reachable interactively. Append questions to `open-questions.md` and continue with the best-judgment default. The user resolves accumulated questions at REVIEW close-out (which runs in the user's main session).
+- **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
+- **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
+- **No synthetic data.** Unless the paper itself uses synthetic data as input, every input dataset must be downloaded or queried from its real source.
+- **Commit as you go.** Small, descriptive commits per significant change. The git log is the chronological trail of the reproduction; the next iteration reads it to know what landed.
+- **Updates go in code, files, and the accumulators below — not progress notes scattered in the body.** Discoverable updates; the next iteration finds what changed by inspecting the system.
 
-## Rigor
+## Rigor — current state
 
-*Current state* — orchestrator-internal trajectory tracking, updated by sub-agents as they produce artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened / canonical*. The orchestrator reads this alongside the Goal's fidelity intent to decide cheap vs heavy on the next spawn. Empty until the first phase produces something:
+Per-output trajectory tracking, updated by iterations as they produce artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened / canonical*. Read alongside [`constitution.md`](constitution.md)'s Fidelity intent to decide how much to push on the next iteration. Empty until the first iteration produces something:
 
 - (none yet)
 
-*Open opportunities* — gaps that could be tightened if the user comes back, each carrying a sense of leverage and where it sits relative to the Goal's fidelity intent. Format: `<area> — <what could be tightened> — <leverage> — <above|at|below intent>`. Empty until a sub-agent surfaces one:
+### Open opportunities
+
+Gaps that could be tightened if the reproduction comes back. Each carries a sense of leverage and where it sits relative to the constitution's Fidelity intent. Format: `<area> — <what could be tightened> — <leverage> — <above|at|below intent>`. Empty until a COMPARE iteration surfaces one:
 
 - (none yet)
 
 ## Paper-vs-code disagreements
 
-Material disagreements between paper and code, logged here as sub-agents find them. Code is canonical for numerics, plotting, and method (per the discipline below); both options are preserved in `astra.yaml` as decision alternatives. Each entry summarizes the disagreement and points to the corresponding decision so any sub-agent or future orchestrator session can see them at a glance. Surfaced to the user the next time they're around.
+Material disagreements between paper and code, logged here as iterations find them. Code is canonical for numerics, plotting, and method (per the rule above); both options are preserved in `astra.yaml` as decision alternatives. Each entry summarizes the disagreement and points to the corresponding decision so any iteration can see them at a glance. Surfaced to the user at REVIEW close-out (or earlier if they're around).
 
 - (none yet)
 
-## Rules
-
-- **Code-as-canonical when `work/reference/code/` exists.** Every implementing sub-agent reads relevant code on entry. Where paper and code disagree, code is canonical for numerics, plotting, and method. When `work/reference/code/` is absent, paper is the only anchor — implement fresh from the spec, expect slower convergence, and surface gaps honestly to the user rather than dressing them up.
-- **Persistent experts during a session.** ACQUIRE spawns `paper-expert` (knows the paper) and `code-expert` (knows the cloned code, when present) as named sub-agents that stay alive for the reproduction. Downstream sub-agents receive their agent IDs at spawn and consult them via `SendMessage` instead of re-ingesting paper / code materials from scratch. The expert IDs are session-scoped — they don't persist across orchestrator sessions, so if the orchestrator session restarts, ACQUIRE re-spawns them against the existing on-disk substrate.
-- **Never block on `AskUserQuestion` mid-sub-agent.** Sub-agents don't have `AskUserQuestion`. Ask in prose if the user is reachable; otherwise append the question to `open-questions.md` and continue with the best-judgment default. The user resolves accumulated questions in REVIEW.
-- **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
-- **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
-- **Commit as you go.** Small, descriptive commits per significant change. The git log is the chronological trail of the reproduction.
-
 ## Pointers
 
-- `open-questions.md` — accumulated questions from autonomous-mode runs, resolved in REVIEW.
-- `work/reference/index.json` — paper structural index (figures, tables, outline, citations with DOIs); the starting surface for any "where in the paper does X happen" lookup. Or just ask `paper-expert` via `SendMessage`.
-- `work/reference/code-index.md` — code inventory (when code present): module map, candidate decisions with file:line, entry-points, gotchas. Or just ask `code-expert` via `SendMessage`.
+- [`constitution.md`](constitution.md) — Goal, Fidelity intent, Scope, Quality bar, Evidence, Open dimensions. The ralph loop's driving document.
+- `open-questions.md` — accumulated questions from iterations, resolved in REVIEW.
+- `work/reference/index.json` — paper structural index (figures, tables, outline, citations with DOIs); the starting surface for any "where in the paper does X happen" lookup.
+- `work/reference/code-index.md` — code inventory (when code present): module map, candidate decisions with file:line, entry-points, gotchas.
 - `work/cited/<doi-slug>/` — per-cited-paper substrate produced by LITERATURE for `prior_insights:` resolution.
 - <any paper-specific conventions or warnings the user surfaced during the interview>
diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
new file mode 100644
index 00000000..b8e249ba
--- /dev/null
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -0,0 +1,45 @@
+---
+status: active
+---
+
+# <paper-slug> — reproduction constitution
+
+The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, DOI <doi>). Every iteration reads this on entry to know what "done" looks like and how to size its next move. **Sharpened slowly** — only when something fundamental shifts (target moves, scope opens or fences, a material disagreement makes us re-think a sub-analysis). Running accumulators (per-output rigor state, the disagreements log, opportunities) live in `CLAUDE.md`, not here.
+
+## Goal
+
+<What "done" looks like for this reproduction. Concrete: which targets, what verdict against them, what validation passes. E.g.: "A complete `astra.yaml` with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.">
+
+**Fidelity intent.** <The user's prose answer from INTERVIEW to "when is this good enough" — captured verbatim or in close paraphrase. E.g. "just checking if the analysis is tractable — quick sanity on a headline number", "Figure 3 must be right; the rest can stay rough", "full fidelity on the BAO fit, baseline elsewhere", "every primary and secondary target lining up within stated tolerance". Each iteration reads this when deciding cheap vs heavy next moves; COMPARE grades opportunities against it. Static once approved at INTERVIEW; the user can sharpen at any REVIEW.>
+
+## Scope
+
+**In scope:** <targeted figures / tables / numbers, methodological span being reproduced.>
+
+**Out of scope:** <explicit exclusions, fenced from drift.>
+
+## Quality bar
+
+What "canonical" rigor looks like for *this* paper. The bar that primary-target outputs aim for when the fidelity intent calls for it:
+
+- <e.g. "BAO fit posteriors match the paper's Figure 4 within 1σ across the full damping prior range">
+- <e.g. "magnitude cuts and selection match the code's defaults exactly; any deviation is recorded as a paper-vs-code disagreement with both options preserved">
+- <e.g. "every prior insight cites a real verbatim quote from the cited paper">
+
+This is the ceiling; the fidelity intent determines which outputs need to actually reach it. CLAUDE.md's *Rigor — current state* table tracks where each output currently sits relative to this bar.
+
+## Evidence
+
+The substrate this reproduction is built against — the canonical sources iterations consult:
+
+- **Paper:** `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` (from `/paper-extraction` during ACQUIRE). The `index.json#citations` block carries each cited paper's resolved DOI for LITERATURE.
+- **Code:** `work/reference/code/` (cloned during ACQUIRE; scan inventory at `work/reference/code-index.md`). Code is canonical for numerics, plotting, and method where it disagrees with the paper.
+- **Paper DOI:** <doi>
+- **arXiv ID:** <id> (if applicable)
+- **Code repo URL:** <url>
+
+## Open dimensions
+
+Decisions worth surfacing to the user — places the reproduction could go differently and the call benefits from human ratification. Iterations append here when something material comes up that isn't itself a paper-vs-code disagreement (those go to `CLAUDE.md`'s disagreements log instead). The user resolves these at REVIEW close-out, or earlier if they're around.
+
+- (none yet)

From a91b767f261b316ada03bf71384b3e02f4aaa4a0 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:17:43 +0200
Subject: [PATCH 051/124] lc-from-paper: SKILL.md for the ralph dispatch shape
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round 7's architectural reversal — orchestrator + named per-phase
sub-agents → ralph loop whose iterations carry the long middle. The
phase decomposition (INTERVIEW → ACQUIRE → ARCHITECT → SPECIFY →
LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW) stays as-is; what
changes is dispatch.

Architecture is now two-piece:
- INTERVIEW + ACQUIRE + REVIEW in the user's main session
- ARCHITECT through COMPARE inside a ralph loop launched against the
  per-paper constitution.md

The fresh-context-no-bias property that drove the old per-phase
fresh-reviewer-sub-agent pattern (broken by the sub-agents-can't-spawn-
sub-agents constraint) collapses into iteration boundaries: iteration
N writes, iteration N+1 reads fresh and reviews. Parallel fan-out
(LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis, IMPLEMENT
per-output) still happens, but at the iteration level (one level deep
from the user's main session through the ralph harness), never nested.

The per-iteration discipline section names what a single iteration does
on entry, including the closing rule: "an iteration that contributed
cannot close the constitution." Closing is reserved for a cold survey
that found nothing left to improve. Mirrored in the constitution
template's body so every per-paper constitution carries the rule
explicitly. Adds at least one fresh-eyes review pass on every closing
decision.

INTERVIEW now drafts two files (constitution.md + CLAUDE.md per the
prior commit). Launching the loop is an explicit step: invoke
.claude/skills/ralph/scripts/ralph constitution.md after ACQUIRE, tell
the user the tmux session name, come back for REVIEW close-out when the
loop terminates.

Workdir-as-state table updated to reflect the constitution.md +
CLAUDE.md split as INTERVIEW's signal.

Persistent paper-expert / code-expert sub-agents and SendMessage gone
throughout; the structural index from /paper-extraction + the
code-index.md from scan-only /lc-from-code carry the orientation
iterations need, and each iteration's targeted re-reads on entry replace
the long-lived experts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../lightcone/skills/lc-from-paper/SKILL.md   | 188 ++++++++++--------
 .../lc-from-paper/templates/constitution.md   |   2 +
 2 files changed, 111 insertions(+), 79 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 57f19770..697f9b3d 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -5,98 +5,143 @@ description: >
   scientific paper in ASTRA — has a DOI, arXiv ID, or PDF — or asks to
   "reproduce <paper>", "set up reproduction", or "import a paper". Also
   use when continuing or resuming an existing reproduction workdir. The
-  skill instructs Claude to act as an orchestrator that drives the
-  reproduction across phases by spawning named sub-agents per phase, with
-  the user able to drop into any sub-agent's chat directly to steer.
+  skill instructs Claude to run INTERVIEW + ACQUIRE in the user's main
+  session, then hand the reproduction off to a ralph loop whose
+  iterations carry the remaining phases (ARCHITECT → SPECIFY → LITERATURE
+  → IMPLEMENT → RUN → COMPARE) until the constitution closes, at which
+  point REVIEW close-out runs back in the user's main session.
 ---
 
 # lc-from-paper
 
-You are helping the user reproduce a published scientific paper as a complete ASTRA project. This is a long, complex task that won't fit in a single context window — it spans discrete phases: acquire the paper and its code, architect the spec, specify decisions and findings, resolve cited literature, implement, run, compare, review. The complexity is exactly why your role matters. As **orchestrator**, you hold the whole shape for the user — guiding them through the workflow, explaining what's happening, tracking what's been done and what's next, deciding how to delegate. Each sub-agent only ever sees its own slice; you keep the through-line.
+You are helping the user reproduce a published scientific paper as a complete ASTRA project. This is a long, complex task that won't fit in a single context window — it spans discrete phases: acquire the paper and its code, architect the spec, specify decisions and findings, resolve cited literature, implement, run, compare, review.
 
-The heavy lifting of any phase is done by a sub-agent: you spawn it pointed at the workdir (where its `CLAUDE.md` auto-loads), let it work in its own context window, and read what it returns when it's done. Your own context stays light — you carry user intent forward, watch the workdir, and choose what to spawn next.
+The architecture is two-piece:
 
-**The user can interact with any sub-agent directly.** When you spawn one, it appears as a chat surface the user can switch into (typically at the bottom of the screen). Tell them explicitly: *"I'm launching the X sub-agent now — if you want to interact with it, switch to its chat before its first turn finishes."* While the user stays in that chat, the sub-agent stays active — natural turn-by-turn dialogue, prose questions, the user steering directly. When they switch back to you and the sub-agent goes idle, the surface goes away from their view; the sub-agent stays addressable from your side, and addressing it via SendMessage reopens the surface for the user too. **Sub-agents can be resumed at any time, with full context preserved** — if the user wants to drop into any earlier phase, you pull that phase's sub-agent back and it shows up in their chat exactly where it left off.
+1. **Interactive bookends in the user's main session.** INTERVIEW and REVIEW are conversations with the user. ACQUIRE is two parallel sub-skill invocations (`/paper-extraction` and `/lc-from-code` in scan-only mode) that produce the on-disk substrate everything downstream consults.
 
-**As orchestrator, keep your context lean.** Your job is to coordinate, not to absorb sub-agent outputs or the codebase in detail. The paper itself is the exception worth making — it's among the highest-value text in the workflow, the canonical source the spec is being built against, and worth reading carefully at the start. Your other regular reads are short and load-bearing: the paper-extraction index, `CLAUDE.md`, and what sub-agents return. For everything else, delegate: a quick `grep` or single-file lookup is fine to do directly, but anything more open-ended — cross-cutting search, repeated reads of large content — goes to an Explore sub-agent that reads on your behalf and returns a summary. The failure mode to avoid is the orchestrator quietly turning into "just another iteration" by reading everything itself.
+2. **A ralph loop for the long middle.** Once the per-paper `constitution.md` is drafted (INTERVIEW) and the substrate is on disk (ACQUIRE), you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution as system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, exits. The fresh-context property is automatic — iteration N+1 reads N's work without bias, which makes per-phase review collapse into "the next iteration is the review."
+
+The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The constitution describes the goal (what "done" looks like, the user's fidelity intent, scope, quality bar); CLAUDE.md carries the running accumulators (rigor state per output, paper-vs-code disagreements log, rules). Every iteration walks up to both.
 
 ## Setup: git-tracked workdir
 
-The reproduction's directory should be a git repo — if not already, `git init` it locally before spawning the first sub-agent. Every sub-agent commits its work as it goes — small, descriptive commits per significant change. The git log is the chronological trail of the reproduction; `git diff` makes each sub-agent's work auditable from your side without you having to read source files directly. Don't push to a remote unless the user has set one up; local-only is the default.
+The reproduction's directory should be a git repo — if not already, `git init` it before launching the ralph loop. Every iteration commits its work as it goes — small, descriptive commits per significant change. The git log is the chronological trail of the reproduction; `git diff` is how the next iteration reads what landed.
 
 ## The phases
 
-The reproduction runs through nine phases (zero-indexed). Phase 0 (INTERVIEW), Phase 1 (ACQUIRE), and Phase 8 (REVIEW) run in your own session — INTERVIEW and REVIEW because they're interactive bookends, ACQUIRE because its work is two parallel sub-skill invocations (`/paper-extraction` and `/lc-from-code` in scan-only mode) plus capturing the resulting persistent sub-agents as `paper-expert` and `code-expert`. Phases 2–7 are sub-agent dispatches: you spawn each as a named sub-agent, point it at the matching reference file in `references/`, and let it work in its own context with the per-paper `CLAUDE.md` auto-loading from the workdir.
-
-ARCHITECT (Phase 2) is the first sub-agent dispatch. It receives the `paper-expert` and `code-expert` agent IDs in its spawn prompt and consults them via `SendMessage` as it writes the stub `astra.yaml`. Later phases inherit the same pattern — the experts stay alive for the duration of the reproduction and are addressable by any sub-agent that's given their IDs.
+Nine phases (zero-indexed). INTERVIEW and ACQUIRE run before the loop, in the user's main session; the loop's iterations carry phases 2–7; REVIEW runs after the loop closes, back in the user's main session.
 
 | # | Phase | Where it runs | Reference | Primary outputs |
 |---|---|---|---|---|
-| 0 | INTERVIEW | orchestrator session | [`references/interview.md`](references/interview.md) | per-paper `CLAUDE.md` |
-| 1 | ACQUIRE | orchestrator session | [`references/acquire.md`](references/acquire.md) | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}`; two persistent sub-agents — `paper-expert` and `code-expert` — reachable by agent ID via `SendMessage` |
-| 2 | ARCHITECT | sub-agent | [`references/architect.md`](references/architect.md) | stub `astra.yaml` at project root (sub-analyses, inputs, outputs, narrative); `work/notes/architect/review-round-<N>.md` |
-| 3 | SPECIFY | sub-agent | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
-| 4 | LITERATURE | sub-agent | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
-| 5 | IMPLEMENT | sub-agent | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
-| 6 | RUN | sub-agent | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
-| 7 | COMPARE | sub-agent | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
-| 8 | REVIEW | orchestrator session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
+| 0 | INTERVIEW | user's main session | [`references/interview.md`](references/interview.md) | per-paper `constitution.md` + `CLAUDE.md` |
+| 1 | ACQUIRE | user's main session | [`references/acquire.md`](references/acquire.md) | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}` |
+| 2 | ARCHITECT | ralph iteration | [`references/architect.md`](references/architect.md) | stub `astra.yaml` at project root (sub-analyses, inputs, outputs, narrative) |
+| 3 | SPECIFY | ralph iteration | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
+| 4 | LITERATURE | ralph iteration | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
+| 5 | IMPLEMENT | ralph iteration | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
+| 6 | RUN | ralph iteration | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
+| 7 | COMPARE | ralph iteration | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
+| 8 | REVIEW | user's main session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
+
+COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the constitution's fidelity intent. A subsequent iteration decides whether to spend another IMPLEMENT round (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap as an open opportunity in CLAUDE.md's Rigor section. When the constitution's `status:` flips to `closed` (typically by an iteration after COMPARE returns `pass` or after the iteration logs accepted opportunities), the loop terminates and REVIEW runs in the user's main session.
+
+## The two pre-loop bookends
+
+### INTERVIEW (Phase 0)
+
+The opening interactive phase. Run it from the user's main session. Read [`references/interview.md`](references/interview.md) in full before starting.
+
+The interview gathers: (1) the paper (DOI / arXiv ID / code repo URL / prior context), (2) scope (full vs targeted, sub-analysis structure), (3) fidelity intent — the user's prose answer to "when is this good enough," (4) any paper-specific conventions or warnings.
+
+These get drafted into **two files** in the reproduction workdir:
+
+- **`constitution.md`** — the ralph loop's driving document. Goal, Fidelity intent, Scope, Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Open dimensions. Starts with YAML frontmatter `status: active` so the ralph launcher accepts it. Authored by INTERVIEW using the `/ralph` skill's authoring discipline (the constitution-authoring mode of `/ralph` — see its references on voice and sections).
+- **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the top, Rules (universal across reproductions; leave the template's defaults), Rigor accumulator (starts empty), Disagreements log (starts empty), Pointers (to `constitution.md`, `work/reference/`, etc.).
+
+Templates ship in [`templates/constitution.md`](templates/constitution.md) and [`templates/CLAUDE.md`](templates/CLAUDE.md). Show the user both drafts, take corrections, refine, save.
+
+After approval, `git init` the workdir if it isn't one already and commit both files. Then run ACQUIRE in the same session.
+
+### ACQUIRE (Phase 1)
+
+Two parallel sub-skill invocations:
+
+- **`/paper-extraction <doi-or-arxiv-id>`** — produces the paper substrate at `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml, figures/, tables/, bibliography-source.{bib,bbl}}`.
+- **`/lc-from-code` in scan-only mode** against the cloned reference repo at `work/reference/code/` (after `git clone --depth 1 <url> work/reference/code`). Produces `work/reference/code-status.yaml` + `work/reference/code-index.md`.
+
+See [`references/acquire.md`](references/acquire.md) for the full step-by-step. Both happen in your main session — no orchestration overhead, just two skill invocations that produce on-disk artifacts.
+
+When ACQUIRE returns, commit the new substrate and launch the ralph loop (see **Launching the loop** below).
 
-COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the user's fidelity intent. You and the user decide together whether to spend another IMPLEMENT round now (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap as an open opportunity in CLAUDE.md's Rigor section. Either way, control eventually passes to REVIEW.
+## Launching the loop
 
-## Spawning a phase sub-agent
+After INTERVIEW + ACQUIRE land, hand the rest of the reproduction off to a ralph loop. From the reproduction workdir:
 
-When you launch a phase, spawn a named sub-agent in the background with the phase reference as its working spec:
+```bash
+.claude/skills/ralph/scripts/ralph constitution.md
+```
 
-- **Name** the sub-agent after the phase: `architect`, `specify`, `implement`, etc. The name is what the user sees in their chat list. If you re-spawn under the same name, the previous instance becomes addressable only by ID.
-- **Prompt** the sub-agent to read its phase reference file (`references/<phase>.md`). The reproduction's `CLAUDE.md` auto-loads from the workdir, so it doesn't need to be passed explicitly. Trust the sub-agent to read what else it needs.
-- **Run in background** so the user can switch into the sub-agent's chat without you blocking on it.
-- **Announce the spawn to the user** before it starts: *"I'm launching the &lt;phase&gt; sub-agent now — switch to its chat now if you want to interact, otherwise it'll work autonomously and report back."*
-- **Note the agent ID** when you spawn it. Names are user-facing — if the user dismisses a sub-agent's surface (escape), the name binding goes away and `SendMessage` by name fails. The agent ID + on-disk transcript persist regardless; `SendMessage` by ID resumes the sub-agent from full context and reopens the surface for the user.
-- **Hand in the expert agent IDs** from ACQUIRE — `paper-expert` and `code-expert` — so the phase sub-agent can `SendMessage` them for paper/code questions instead of re-ingesting materials. The experts have already read their materials in depth; querying them is cheaper and richer than another fresh Explore pass.
+(Or `--backend codex`, or pass `-- --model <id>` for a specific model. See `/ralph`'s **Launching** section for the full surface.)
 
-When the sub-agent's turn closes you receive a notification with its full response in the `result` field. Read that, then decide: spawn the next phase, ask the user a clarifying question, or revisit a previous phase.
+The launcher detaches a tmux session named `ralph-<workdir>-constitution`. The user attaches with `tmux attach -t <session>`. Iterations start firing immediately; each runs in a fresh Claude (or Codex) session with `constitution.md` injected as the system prompt and the workdir's `CLAUDE.md` auto-loading.
 
-## Per-paper artifact: CLAUDE.md
+The loop runs until an iteration flips `constitution.md`'s frontmatter `status:` to `closed` — typically after COMPARE returns `pass` (or user-accepted `partial`) and the iteration that runs after that survey finds nothing left to do.
 
-The reproduction's directory holds a single `CLAUDE.md` that sub-agents and future orchestrator sessions walk up to automatically. It is the durable spec for the reproduction, drafted during INTERVIEW and evolving over time as iterations learn paper-specific gotchas. The starting shape is in [`templates/CLAUDE.md`](templates/CLAUDE.md). Sections:
+Tell the user explicitly: "Launching the ralph loop in tmux session `<name>`. Attach with `tmux attach -t <name>`. Detach with the usual tmux prefix + `d`. The loop will run until the constitution closes (typically after COMPARE returns `pass`); at that point come back here and I'll run REVIEW close-out."
 
-- **Paper identity** — DOI, arXiv ID, title, authors, one-line subject; where the original code lives (`work/reference/code/`).
-- **Goal** — what the reproduction is aiming for. Desired state, scope (in / out), and the user's **fidelity intent** as prose — their own answer to "when is this good enough." The orchestrator reads the intent on every spawn decision and COMPARE grades opportunities against it. Stays static once approved at INTERVIEW; the user can sharpen the intent at any REVIEW.
-- **Rigor** — the reproduction's trajectory toward that intent. *Current state* per output or per phase (e.g. *sketch / baseline / tightened / canonical*); read alongside the Goal's intent to decide cheap vs heavy on the next spawn. *Open opportunities* — what could benefit from more attention, with a sense of leverage and how it sits relative to intent ("Figure 3's systematics treatment is sketch-level; tightening it would change the headline number by ~10% — below intent"). Updated by sub-agents as they work; mined during REVIEW for what's worth coming back for.
-- **Disagreements** — paper-vs-code material disagreements logged by sub-agents as they find them. Code is canonical for numerics; both options are preserved as decision options in `astra.yaml`. CLAUDE.md just summarizes them so every walk-up sees them at a glance. Surfaced to the user when they're around.
-- **Rules** — the code-as-canonical discipline, the never-block-on-`AskUserQuestion`-mid-sub-agent rule (with `open-questions.md` as the autonomous-mode fallback), arxiv-LaTeX-first acquisition, `astra validate --verify-evidence` as the fidelity gate.
-- **Pointers** — to `open-questions.md`, and any paper-specific conventions or warnings the user surfaced during the interview.
+## Per-iteration discipline
 
-Keep it short. Pointers, not snapshots.
+Iterations follow the `/ralph` skill's Loop protocol — Survey → Work → Update → Exit. The per-paper specifics layered on top:
 
-## The two bookends
+- **Survey starts with the constitution + CLAUDE.md, then the workdir.** Read the constitution to remember the goal and the fidelity intent. Read CLAUDE.md's Rigor accumulator to know where each output currently sits relative to the quality bar. Then survey the workdir against the **Workdir-as-state** table below to identify the next phase that needs work.
+- **One phase per iteration is the typical shape.** Don't try to do ARCHITECT *and* SPECIFY in one iteration; the fresh-context property of the next iteration is what makes review work, and conflating phases collapses the seam. (Exceptions: small targeted fixes after COMPARE may touch multiple phases in one iteration if they're tightly coupled.)
+- **Phase reference is your working spec for the iteration.** Whichever phase is next, read its `references/<phase>.md` on entry. That file carries the discipline for that phase's work (what to produce, code-as-canonical, rigor adjustment, etc.).
+- **Self-review is the next iteration.** Where ARCHITECT/SPECIFY/LITERATURE/IMPLEMENT used to spawn fresh-context reviewer sub-agents per round (broken — sub-agents can't spawn sub-agents), the discipline now collapses into iteration boundaries: iteration N writes the artifact, iteration N+1 reads it fresh and reviews, iteration N+2 applies fixes if needed, until two consecutive review iterations find no fixes or a 5-iteration cap. Each iteration is fresh by construction; the no-bias property is free.
+- **Parallel fan-out lives inside an iteration.** LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis work, IMPLEMENT per-output work — these fan out as one-level-deep `Agent(...)` spawns inside the iteration's main session. Sub-agents can't spawn sub-agents, but an iteration *is* the main session, so it can spawn freely.
+- **`AskUserQuestion` is not available inside an iteration.** Each iteration runs in a detached tmux session; the user isn't reachable interactively. Iterations append questions to `open-questions.md` with their best-judgment default applied, and the user resolves them at REVIEW close-out (back in their main session).
+- **Update the accumulators in CLAUDE.md** before exit: Rigor *Current state* per output that the iteration changed; *Paper-vs-code disagreements* for any material conflict the iteration surfaced; *Open opportunities* for COMPARE-surfaced gaps.
+- **Sharpen the constitution body itself** if something fundamental shifted — the user's fidelity intent reframed, a sub-analysis decomposition rethought, a quality-bar item that's now more concrete. Don't accrete amendment sections; rewrite the affected prose.
+- **An iteration that contributed cannot close the constitution.** Closing the loop (flipping `constitution.md`'s frontmatter `status:` to `closed`) is reserved for an iteration whose cold survey found *nothing left to improve or contribute* — verdict `pass` (or user-accepted `partial`) is on disk, accumulators are caught up, no open opportunity sits below the fidelity intent. If you wrote anything this iteration — even small fixes, even just an accumulator update — commit, exit, let the next fresh-eyes iteration decide. This adds a round of review on every closing decision: at least one cold pass has to confirm there's genuinely nothing left.
 
-### Interview (Phase 0)
+## Workdir-as-state
 
-The opening interactive phase. Read [`references/interview.md`](references/interview.md) in full before starting. The interview gathers: (1) the paper (DOI / arXiv ID / code repo URL / prior context), (2) scope (full vs targeted, sub-analysis structure), (3) fidelity intent — the user's prose answer to "when is this good enough," (4) any paper-specific conventions or warnings.
+Each iteration's survey reads the workdir to determine what phase is next. File existence implies the phase has been done:
 
-These get drafted into the per-paper `CLAUDE.md` — paper identity, Goal section, Rules, Conventions. The Rigor section starts empty; sub-agents fill it in as they work. Show the user the draft, take corrections, refine, then save.
+| Signal | Phase done |
+|---|---|
+| `constitution.md` + `CLAUDE.md` at workdir root, both committed | INTERVIEW |
+| `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) + `work/reference/index.json` + `work/reference/astra.yaml` | ACQUIRE paper substrate |
+| `work/reference/code/` (or `code-status.yaml` with `found: false`) + `work/reference/code-index.md` | ACQUIRE code substrate |
+| `astra.yaml` at project root validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
+| `astra.yaml` non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
+| `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; `work/cited/<doi-slug>/` populated per cited paper | LITERATURE |
+| recipes present in `astra.yaml` + `scripts/` + `requirements.txt` | IMPLEMENT |
+| `results/<universe>/<output>/` for every output | RUN |
+| `comparison-report.yaml` | COMPARE |
+| `REPRODUCTION-SUMMARY.md` + `.lightcone/comparison.html` + resolved `open-questions.md` | REVIEW |
 
-After the user approves, run ACQUIRE in your own session (it spawns `paper-expert` and `code-expert` as parallel sub-agents and waits for both). When ACQUIRE returns, launch the architect sub-agent with the expert agent IDs handed in.
+`git log --oneline` complements this — phase commits are the chronological view of what landed when, and iteration boundaries are visible in the log.
 
-### Review (Phase 8, close-out)
+## REVIEW close-out (after the loop)
 
-The closing interactive phase. Drafts `REPRODUCTION-SUMMARY.md`, invokes [`/figure-comparison`](../figure-comparison/SKILL.md) (mandatory) and optionally [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md), walks `open-questions.md` with the user, and finalizes the reproduction outcome.
+When the loop closes (the user reports back that the tmux session has exited, or `constitution.md`'s `status:` is `closed`), run REVIEW from the user's main session. See [`references/review.md`](references/review.md) for the full close-out: invoke `/figure-comparison` (mandatory) and optionally `/check-sentence-by-sentence`, walk `open-questions.md` with the user, draft `REPRODUCTION-SUMMARY.md`, propagate un-acted opportunities into CLAUDE.md, commit.
 
-REVIEW runs in the orchestrator session because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which isn't available to sub-agents.
+REVIEW runs in your main session because `/figure-comparison` and `/check-sentence-by-sentence` both use `AskUserQuestion`, which isn't available inside ralph iterations.
 
 ## Disciplines
 
-**Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each phase sub-agent's first move is to survey the workdir on entry; you (orchestrator) survey at startup and after each completion notification.
+**Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each iteration's first move is to survey the workdir on entry against the table above.
 
-**Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every implementing sub-agent reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Disagreements* section so it's visible to every sub-agent and to the user. Surface it to the user the next time they're around. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
+**Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every iteration that touches a sub-analysis reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Paper-vs-code disagreements* section so it's visible to every iteration and to the user at REVIEW. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
 
-**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates as the user comes back. The anchor for the whole trajectory is the user's **fidelity intent**, captured in CLAUDE.md's Goal section at INTERVIEW as prose — their own words for what "good enough" looks like (e.g. *"just checking the analysis is tractable"*, *"Figure 3 must be right; the rest can stay rough"*, *"every primary and secondary target lining up within stated tolerance"*). Your job as orchestrator is to hold that intent and translate it into per-spawn tactical decisions.
+**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose — their own words for what "good enough" looks like (e.g. *"just checking the analysis is tractable"*, *"Figure 3 must be right; the rest can stay rough"*, *"every primary and secondary target lining up within stated tolerance"*).
 
-When you spawn an artifact-producing sub-agent (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT), derive how much fresh-context self-review to ask of it from the **gap** between where the artifact currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* skip self-review or run one fresh-context pass. *Heavy:* iterate fresh-context review + fix until two consecutive rounds find no fixes (capped at 5 rounds). The reviewing sub-agent never sees prior rounds' fixes — fresh context each round, with the prompt "check the artifact is consistent with the paper and the code." Each spawn that produces an artifact updates CLAUDE.md's Rigor *Current state* so the trajectory stays honest across context windows.
+Each iteration translates the fidelity intent into a per-spawn tactical decision when working on an artifact-producing phase (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT). Derive how much in-iteration self-review-via-fan-out to run from the gap between where the artifact currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* write the artifact and exit; let the next iteration's fresh-context survey serve as the review. *Heavy:* fan out parallel reviewers as one-level-deep sub-agents inside the iteration, merge findings, apply fixes, exit. Either way, update CLAUDE.md's Rigor *Current state* so the trajectory stays honest across iterations.
 
-The *sketch / baseline / tightened / canonical* and *cheap / heavy* vocabularies are the orchestrator's internal scaffolding for sizing each spawn. The user's surface is the intent prose; the scaffolding only shows through when they ask how a spawn was sized.
+The default is **sequential review via iteration boundaries** — cheaper, no fan-out, and the fresh-context property is automatic. Reach for in-iteration fan-out when the parallelism actually pays (LITERATURE with many cited papers, SPECIFY with many independent sub-analyses, IMPLEMENT with many outputs).
+
+The *sketch / baseline / tightened / canonical* and *cheap / heavy* vocabularies are the iteration's internal scaffolding for sizing its work. The user's surface is the intent prose; the scaffolding only shows through when they ask how an iteration sized itself.
 
 **arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv only.
 
@@ -104,38 +149,23 @@ The *sketch / baseline / tightened / canonical* and *cheap / heavy* vocabularies
 
 **No synthetic data.** Unless the paper itself uses synthetic data as input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement reference repeats this; treat it as load-bearing.
 
-**Open-questions for autonomous mode only.** When the user is reachable (in the sub-agent's chat or in your orchestrator session), questions are asked directly in prose. The `<paper-slug>/open-questions.md` accumulator is for autonomous mode — when the user has explicitly stepped away. The user resolves accumulated questions in REVIEW before the reproduction closes.
+**Open-questions accumulator.** Iterations run detached and can't reach the user interactively, so questions go to `<workdir>/open-questions.md` with the iteration's best-judgment default applied. The user resolves the accumulated questions at REVIEW close-out before the reproduction closes.
 
 ## Resuming an in-flight reproduction
 
-When you walk into a workdir that already has artifacts:
-
-1. **Skip INTERVIEW** unless the user explicitly wants to revise scope.
-2. CLAUDE.md auto-loads from the workdir — that's the spec.
-3. Survey the workdir to determine the current phase (table below).
-4. Spawn the appropriate next sub-agent.
-
-Workdir signals — file existence implies the phase has been done:
-
-| Signal | Phase done |
-|---|---|
-| `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) + `work/reference/index.json` + `work/reference/astra.yaml` | ACQUIRE paper substrate (paper-expert ran) |
-| `work/reference/code/` (or `code-status.yaml` with `found: false`) + `work/reference/code-index.md` | ACQUIRE code work (code-expert ran) |
-| `astra.yaml` at project root validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
-| `astra.yaml` non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
-| `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; `work/notes/literature/<doi-slug>.yaml` files present | LITERATURE |
-| recipes present in `astra.yaml` | IMPLEMENT |
-| `results/<universe>/<output>/` | RUN |
-| `comparison-report.yaml` | COMPARE |
-| `REPRODUCTION-SUMMARY.md` + `.lightcone/comparison.html` + resolved `open-questions.md` | REVIEW |
+When the user walks back into a workdir that already has artifacts:
 
-`git log --oneline` complements this — phase commits are the chronological view.
+1. **Skip INTERVIEW** unless the user explicitly wants to revise scope (in which case edit `constitution.md` together, no re-draft from scratch).
+2. **If `constitution.md`'s `status:` is `active` and the tmux session isn't running**, re-launch the ralph loop: `.claude/skills/ralph/scripts/ralph constitution.md`. The next iteration surveys the workdir and picks up wherever the prior loop left off.
+3. **If `constitution.md`'s `status:` is `closed`**, the reproduction is at REVIEW. Run REVIEW close-out in your main session.
+4. **If ACQUIRE substrate is incomplete**, finish ACQUIRE in your main session before launching the loop — re-spawn `/paper-extraction` and/or `/lc-from-code` against the existing partial state (both are survey-first and skip done work).
 
 ## Anti-patterns
 
-- **Reading content the orchestrator doesn't need.** If the answer fits in a sub-agent's return, don't re-read the source yourself. Dispatch Explore for open-ended search.
-- **Doing phase work in the orchestrator session.** The orchestrator spawns and routes; phase work happens in sub-agents. Exceptions: INTERVIEW and REVIEW (the interactive bookends), and ACQUIRE (which is two parallel sub-skill invocations + capturing their persistent transcripts — no separate `acquire` sub-agent needed because the work IS the spawns).
-- **Asking a sub-agent to use `AskUserQuestion`.** Sub-agents don't have it. They ask in prose, or surface the question to you so you call `AskUserQuestion` from the orchestrator session.
+- **Spawning a "loop manager" sub-agent inside your main session.** The whole point of the ralph loop is fresh per-iteration context; you launch the loop, the loop runs detached, you come back when it's done. No nested orchestrator.
+- **Doing the long middle in your main session instead of launching the loop.** INTERVIEW and ACQUIRE belong in your session; ARCHITECT through COMPARE belong in the loop. Doing phase work in your main session burns context that doesn't get reset; the loop exists precisely to give each phase fresh context.
+- **Asking an iteration to use `AskUserQuestion`.** Iterations run detached. Surface questions to `open-questions.md` with a default applied; the user resolves at REVIEW.
 - **Re-implementing what `astra` already does.** If `astra validate` returns clean, don't write a separate validator. If `astra paper add` caches the PDF, don't write a separate cache.
-- **Bundling phases into one sub-agent.** Each sub-agent runs one phase. The granularity is what keeps each context window manageable; conflating phases re-creates the failure mode this architecture exists to avoid.
-- **Forgetting to announce the spawn to the user.** They need to know a sub-agent has launched and that they can switch into its chat before it finishes its first turn. Without the announcement, the surface comes and goes invisibly.
+- **Bundling phases into one iteration.** Each iteration does one phase's worth of work. Conflating phases re-creates the failure mode the loop exists to avoid: no fresh-context review between phases.
+- **Spawning a sub-agent from inside another sub-agent.** The Agent tool is one level deep. An iteration's main session can spawn sub-agents (Haiku fan-out, per-sub-analysis fan-out, per-output fan-out); sub-agents cannot spawn sub-agents. If a piece of work needs sub-agents, it has to happen at the iteration level, not nested.
+- **Accreting amendment sections in `constitution.md`.** When something fundamental shifts, *reshape* the affected prose. The chronology lives in commits; the body lives in *now*.
diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
index b8e249ba..f2f47218 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/constitution.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -6,6 +6,8 @@ status: active
 
 The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, DOI <doi>). Every iteration reads this on entry to know what "done" looks like and how to size its next move. **Sharpened slowly** — only when something fundamental shifts (target moves, scope opens or fences, a material disagreement makes us re-think a sub-analysis). Running accumulators (per-output rigor state, the disagreements log, opportunities) live in `CLAUDE.md`, not here.
 
+**Closing rule.** An iteration that contributed anything this run cannot flip this constitution's `status:` to `closed`. Closing is reserved for an iteration whose cold survey found nothing left to improve — verdict on disk, accumulators caught up, no open opportunity below the fidelity intent. This adds at least one fresh-eyes review pass on every closing decision.
+
 ## Goal
 
 <What "done" looks like for this reproduction. Concrete: which targets, what verdict against them, what validation passes. E.g.: "A complete `astra.yaml` with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.">

From 24f1e0d0bc17faaffccabdc5971512d3b42f9aef Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:20:11 +0200
Subject: [PATCH 052/124] lc-from-paper: interview retune + drop misplaced
 edits

interview.md: now drafts both `constitution.md` and `CLAUDE.md` (not
just one), no expert-sub-agent spawning, no "launch the architect
sub-agent" framing. After both files are committed and ACQUIRE has run,
INTERVIEW hands off to the ralph loop (per SKILL.md's *Launching the
loop* section). The /ralph skill's Authoring mode supplies the
authoring discipline; this reference says which two files INTERVIEW
produces and what goes in each.

Also walking back two edits from the prior commits:

- Dropped the "iteration that contributed cannot close the constitution"
  rule from both the lc-from-paper SKILL.md per-iteration discipline
  and the per-paper constitution template. The generic /ralph Loop >
  Exit section already covers it; the explicit duplication wasn't
  earning its keep where it landed (Cail's emphasis on that rule was
  for the shuttle constitution dispatching this work, not for per-paper
  reproductions).

- Dropped the "spawning a sub-agent from inside another sub-agent"
  anti-pattern. The constraint is real but it's an implementation
  detail of the harness, not user-facing discipline worth preaching in
  the SKILL.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../lightcone/skills/lc-from-paper/SKILL.md   |  2 -
 .../lc-from-paper/references/interview.md     | 57 ++++++++-----------
 .../lc-from-paper/templates/constitution.md   |  2 -
 3 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 697f9b3d..e90b617d 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -102,7 +102,6 @@ Iterations follow the `/ralph` skill's Loop protocol — Survey → Work → Upd
 - **`AskUserQuestion` is not available inside an iteration.** Each iteration runs in a detached tmux session; the user isn't reachable interactively. Iterations append questions to `open-questions.md` with their best-judgment default applied, and the user resolves them at REVIEW close-out (back in their main session).
 - **Update the accumulators in CLAUDE.md** before exit: Rigor *Current state* per output that the iteration changed; *Paper-vs-code disagreements* for any material conflict the iteration surfaced; *Open opportunities* for COMPARE-surfaced gaps.
 - **Sharpen the constitution body itself** if something fundamental shifted — the user's fidelity intent reframed, a sub-analysis decomposition rethought, a quality-bar item that's now more concrete. Don't accrete amendment sections; rewrite the affected prose.
-- **An iteration that contributed cannot close the constitution.** Closing the loop (flipping `constitution.md`'s frontmatter `status:` to `closed`) is reserved for an iteration whose cold survey found *nothing left to improve or contribute* — verdict `pass` (or user-accepted `partial`) is on disk, accumulators are caught up, no open opportunity sits below the fidelity intent. If you wrote anything this iteration — even small fixes, even just an accumulator update — commit, exit, let the next fresh-eyes iteration decide. This adds a round of review on every closing decision: at least one cold pass has to confirm there's genuinely nothing left.
 
 ## Workdir-as-state
 
@@ -167,5 +166,4 @@ When the user walks back into a workdir that already has artifacts:
 - **Asking an iteration to use `AskUserQuestion`.** Iterations run detached. Surface questions to `open-questions.md` with a default applied; the user resolves at REVIEW.
 - **Re-implementing what `astra` already does.** If `astra validate` returns clean, don't write a separate validator. If `astra paper add` caches the PDF, don't write a separate cache.
 - **Bundling phases into one iteration.** Each iteration does one phase's worth of work. Conflating phases re-creates the failure mode the loop exists to avoid: no fresh-context review between phases.
-- **Spawning a sub-agent from inside another sub-agent.** The Agent tool is one level deep. An iteration's main session can spawn sub-agents (Haiku fan-out, per-sub-analysis fan-out, per-output fan-out); sub-agents cannot spawn sub-agents. If a piece of work needs sub-agents, it has to happen at the iteration level, not nested.
 - **Accreting amendment sections in `constitution.md`.** When something fundamental shifts, *reshape* the affected prose. The chronology lives in commits; the body lives in *now*.
diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index f14ee1b1..86b4d4ee 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -1,6 +1,6 @@
-# Interview — Phase 0
+# INTERVIEW — Phase 0
 
-The opening interactive phase. Run from the orchestrator session, before any sub-agent is spawned. Its job is to crystallize what the user actually wants — which paper, what scope, any paper-specific gotchas — and bake that into the per-paper `CLAUDE.md` every sub-agent walks up to.
+The opening interactive phase. Runs from the user's main session, before the ralph loop launches. Its job is to crystallize what the user actually wants — which paper, what scope, any paper-specific gotchas — and bake that into the per-paper `constitution.md` (the ralph loop's driving document) and `CLAUDE.md` (the auto-loading walk-up with rules and accumulators) the loop's iterations will walk up to.
 
 The interview is short. Three to six `AskUserQuestion` rounds, total. The user does not need to teach you the paper; they need to tell you what they want reproduced.
 
@@ -8,17 +8,14 @@ The interview is short. Three to six `AskUserQuestion` rounds, total. The user d
 
 ## What the interview produces
 
-A single `<paper-slug>/CLAUDE.md`, drafted from the template at [`../templates/CLAUDE.md`](../templates/CLAUDE.md). It carries:
+Two files at the reproduction workdir root:
 
-- **Paper identity** — DOI, arXiv ID, title, authors, one-line subject; where the original code lives.
-- **Goal** — what "done" looks like for this reproduction: in-scope and out-of-scope targets, plus the user's fidelity intent in prose.
-- **Pointers** — any paper-specific conventions or warnings the user surfaced.
+- **`constitution.md`** — drafted from [`../templates/constitution.md`](../templates/constitution.md). YAML frontmatter `status: active`, then Goal, Fidelity intent, Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL, where the substrate lives), Open dimensions. The ralph loop's driving document; each iteration reads it on entry. Sharpens slowly; the user can revise it at any point (including mid-loop — successive iterations re-read it).
+- **`CLAUDE.md`** — drafted from [`../templates/CLAUDE.md`](../templates/CLAUDE.md). Paper identity at the top (DOI, title, one-line subject), Rules (universal across reproductions; leave the template's defaults), Rigor accumulator (starts empty; iterations append), Disagreements log (starts empty; iterations append), Pointers (to `constitution.md`, `work/reference/`, etc.). The auto-loading walk-up; every Claude Code session in the workdir picks it up.
 
-The Rigor and Disagreements sections start empty — sub-agents fill them in as they work. The Rules section is standing discipline (universal across reproductions); leave it as the template provides.
+There is no separate "constitution skill" invocation — `/ralph`'s Authoring mode (Study → Draft → Refine → Launch) is what you're following here; the constitution authoring discipline + reference materials live there. Pull the discipline mentally; the deliverable is these two markdown files.
 
-There is no separate constitution, no runtime-mode choice, no global termination criterion. The architecture is fixed (orchestrator + named per-phase sub-agents) and rigor is a trajectory toward the user's Goal-section intent — see SKILL.md's *Rigor is a trajectory toward the user's intent* discipline.
-
-After the user approves the draft, save it, ensure the workdir is a git repo (`git init` if needed) and commit `CLAUDE.md` as the first commit, then launch the ACQUIRE sub-agent.
+After the user approves both drafts, save them, `git init` the workdir if it isn't one already, commit both files as the first commit, then proceed to ACQUIRE in the same session.
 
 ---
 
@@ -29,9 +26,9 @@ After the user approves the draft, save it, ensure the workdir is a git repo (`g
 Use `AskUserQuestion` for whatever the user did not supply on `/lc-from-paper` invocation:
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
-- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) When code is available, every implementing sub-agent reads from `work/reference/code/` and treats code as canonical for numerics + method. This is recorded in CLAUDE.md's Rules.
-- **User's prior familiarity.** Has the user reproduced this paper before? Read it recently? Worked with the original authors? Affects how much you'd lean toward heavy self-review on first spawns.
-- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; ARCHITECT will read it.
+- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) When code is available, every iteration that touches a sub-analysis reads from `work/reference/code/` and treats code as canonical for numerics + method. This is recorded in CLAUDE.md's Rules.
+- **User's prior familiarity.** Has the user reproduced this paper before? Read it recently? Worked with the original authors? Affects how much you'd lean toward heavy in-iteration review on first iterations.
+- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; iterations will read it during ARCHITECT.
 
 ### 2. Scope the reproduction
 
@@ -43,11 +40,11 @@ Ask:
 - **Specific decisions of interest.** A paper makes many choices. The user may care most about a few — e.g. "I want the BAO fit to use a different damping prior than the paper." These become first-class decisions in the spec, with the alternative preserved as a sibling option.
 - **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; ARCHITECT will mirror that structure as the stub's decomposition. If the paper is monolithic, one analysis suffices.
 
-These answers go into CLAUDE.md's **Goal** section as "in scope" / "out of scope". There is no separate target-extraction phase — what the user names here becomes explicit `outputs:` declared in the stub `astra.yaml` during ARCHITECT, then filled with paper-anchored `findings:` / `decisions:` during SPECIFY.
+These answers go into `constitution.md`'s **Scope** section (in / out) and inform ARCHITECT's structural decomposition.
 
 ### 3. Fidelity intent
 
-A reproduction can land anywhere from a quick "does this even run" sanity check to a full match across every primary and secondary target. The user owns where they want this one to land. The job here is to **elicit prose intent** — their own words for what "good enough" looks like, captured into CLAUDE.md's Goal section alongside scope.
+A reproduction can land anywhere from a quick "does this even run" sanity check to a full match across every primary and secondary target. The user owns where they want this one to land. The job here is to **elicit prose intent** — their own words for what "good enough" looks like, captured into `constitution.md`'s Goal section.
 
 Reach for whichever pivot fits the conversation; you usually only need one or two:
 
@@ -56,44 +53,40 @@ Reach for whichever pivot fits the conversation; you usually only need one or tw
 - *"If this took several sessions of iteration to reach high fidelity everywhere, is that the right investment, or would you rather get a working version in a couple of sessions and decide later whether to push further?"*
 - *"Are you trying to verify the paper, build on it, or critique it? That shifts where the fidelity bar wants to sit."*
 
-Record the answer verbatim or in close paraphrase under **Fidelity intent** in CLAUDE.md's Goal section. Concrete examples of what good prose intent looks like:
+Record the answer verbatim or in close paraphrase under **Fidelity intent** in `constitution.md`'s Goal section. Concrete examples of what good prose intent looks like:
 
 - *"Just checking if the analysis is tractable — quick sanity that some headline number comes out close."*
 - *"I care about Figure 3 being right. The rest can stay rough."*
 - *"Full fidelity on the BAO fit specifically; the rest can stay rough."*
 - *"Every primary and secondary target lining up within stated tolerance, every paper-vs-code conflict adjudicated."*
 
-The orchestrator reads this on every spawn decision and COMPARE grades opportunities against it. If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
+Each iteration reads this when deciding cheap vs heavy on the next move; COMPARE grades opportunities against it. If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
 
 ### 4. Paper-specific conventions or warnings
 
-Light touch. Ask the user if there's anything they want every sub-agent to know about this paper up front — a known pitfall, a non-obvious convention, a thing the authors did unusually. These go into CLAUDE.md's **Pointers** section as one-line notes. Skip cleanly if nothing comes to mind; sub-agents surface their own as they work.
+Light touch. Ask the user if there's anything they want every iteration to know about this paper up front — a known pitfall, a non-obvious convention, a thing the authors did unusually. These go into `CLAUDE.md`'s **Pointers** section as one-line notes. Skip cleanly if nothing comes to mind; iterations surface their own as they work.
 
 ---
 
-## Drafting CLAUDE.md
-
-Open the template at [`../templates/CLAUDE.md`](../templates/CLAUDE.md) and fill in:
+## Drafting the two files
 
-- The header (`<paper-slug>`, paper title, arXiv ID, DOI).
-- **Paper** — authors, one-line subject, code repo URL.
-- **Goal** — what "done" looks like; in-scope and out-of-scope; fidelity intent in the user's words.
-- **Pointers** — any paper-specific conventions the user surfaced.
+Open both templates side-by-side:
 
-Leave the **Rigor**, **Paper-vs-code disagreements**, and **Rules** sections in their template state. Rigor and Disagreements grow as sub-agents work; Rules are universal.
+- [`../templates/constitution.md`](../templates/constitution.md) — fill in the header, Goal (with fidelity intent), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL — these are pointers to substrate, not to the workdir paths, which CLAUDE.md handles), Open dimensions. Leave the YAML frontmatter `status: active` intact.
+- [`../templates/CLAUDE.md`](../templates/CLAUDE.md) — fill in the header (paper title + arXiv ID + DOI + one-line subject), any paper-specific Pointers. Leave Rules in the template state (universal across reproductions). Leave Rigor and Disagreements sections empty — iterations populate them.
 
-Show the draft to the user, take corrections, refine, save to `<paper-slug>/CLAUDE.md`. Then `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline) and commit `CLAUDE.md` as the first commit.
+Show both drafts to the user, take corrections, refine, save. Then `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline) and commit both as the first commit.
 
-After the user approves and the workdir is initialized, launch the ACQUIRE sub-agent. Follow SKILL.md's *Spawning a phase sub-agent* for the announcement pattern — the user needs to know the sub-agent has launched and that they can switch into its chat before its first turn finishes.
+After the user approves and the workdir is initialized, run ACQUIRE in your same main session (see [`acquire.md`](acquire.md)). When ACQUIRE completes, commit the substrate and launch the ralph loop (per SKILL.md's *Launching the loop* section). Tell the user the tmux session name and the attach command, and that you'll be ready for REVIEW close-out when the loop terminates.
 
 ---
 
 ## Discipline
 
 - **The interview is short.** Three to six `AskUserQuestion` rounds, total. If the user is grinding through detail, gently steer back to scope.
-- **CLAUDE.md is the only artifact.** No separate scope document, no interview notes, no constitution. Everything goes in CLAUDE.md.
+- **Two files, both drafted at INTERVIEW.** No deferring — both `constitution.md` and `CLAUDE.md` are committed before ACQUIRE runs and before the loop launches.
 - **Defaults are the path.** When the user says "you choose," take the defaults — full reproduction, the paper's natural sub-analysis structure if any. The defaults reflect what the architecture has learned about which seams matter.
-- **One paper at a time.** A single CLAUDE.md covers one paper. If the user wants two, run the interview twice — two reproduction directories, two CLAUDE.mds.
+- **One paper at a time.** A single `constitution.md` + `CLAUDE.md` pair covers one paper. If the user wants two, run the interview twice — two reproduction directories, two pairs.
 
 ---
 
@@ -105,6 +98,6 @@ Most failure modes resolve into "the user has not yet decided what 'reproduce' m
 - *"Is there a specific decision in the paper you want to vary, or are we trying to match the paper exactly?"* — pins whether universes need to span alternatives.
 - *"What's the moment you'd call this useful — any number coming out, a specific figure matching in shape, the headline matching within stated uncertainty, or every target lining up?"* — pins fidelity intent.
 - *"Are you trying to verify the paper, build on it, or critique it?"* — shifts where the fidelity bar naturally sits.
-- *"Is there anything weird about this paper you want every sub-agent to know up front?"* — pins paper-specific conventions.
+- *"Is there anything weird about this paper you want every iteration to know up front?"* — pins paper-specific conventions.
 
-When these answer cleanly, CLAUDE.md writes itself.
+When these answer cleanly, both files draft themselves.
diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
index f2f47218..b8e249ba 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/constitution.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -6,8 +6,6 @@ status: active
 
 The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, DOI <doi>). Every iteration reads this on entry to know what "done" looks like and how to size its next move. **Sharpened slowly** — only when something fundamental shifts (target moves, scope opens or fences, a material disagreement makes us re-think a sub-analysis). Running accumulators (per-output rigor state, the disagreements log, opportunities) live in `CLAUDE.md`, not here.
 
-**Closing rule.** An iteration that contributed anything this run cannot flip this constitution's `status:` to `closed`. Closing is reserved for an iteration whose cold survey found nothing left to improve — verdict on disk, accumulators caught up, no open opportunity below the fidelity intent. This adds at least one fresh-eyes review pass on every closing decision.
-
 ## Goal
 
 <What "done" looks like for this reproduction. Concrete: which targets, what verdict against them, what validation passes. E.g.: "A complete `astra.yaml` with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.">

From 5a6d02c84f03b543992863276ee5126ffdd95756 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:22:09 +0200
Subject: [PATCH 053/124] lc-from-paper: acquire + architect references for the
 ralph dispatch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

acquire.md: drops the persistent paper-expert / code-expert sub-agent
machinery entirely. ACQUIRE is now what it always actually was — two
parallel sub-skill invocations (`/paper-extraction`, `/lc-from-code`
scan-only) that produce the on-disk substrate. No agent IDs, no
SendMessage, no long-lived expert transcripts. Iterations read the
substrate (`index.json`, `code-index.md`, paper-extraction's astra.yaml
stub) on entry; that orientation is what the experts used to provide.
After both substrate sides land, commit and launch the ralph loop.

architect.md: structural-skeleton work unchanged. Self-review-via-fresh-
context-sub-agent (broken in the new architecture) collapses into review
by iteration boundary: iteration N writes the stub, iteration N+1 reads
fresh and writes review-N.md, iteration N+2 applies fixes, terminating
on two consecutive clean rounds or a 5-iteration cap. The check-list and
findings-file shape from the prior reviewer prompt is preserved; only
the dispatch mechanism changes. Cheap fidelity intent lets the iteration
short-circuit at one clean review pass.

Same pattern (review-by-iteration-boundary) will land for specify,
literature, implement next.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../lc-from-paper/references/acquire.md       | 145 +++++++-----------
 .../lc-from-paper/references/architect.md     | 140 +++++++----------
 2 files changed, 112 insertions(+), 173 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/acquire.md b/claude/lightcone/skills/lc-from-paper/references/acquire.md
index 1eac80e7..38ca55e3 100644
--- a/claude/lightcone/skills/lc-from-paper/references/acquire.md
+++ b/claude/lightcone/skills/lc-from-paper/references/acquire.md
@@ -1,127 +1,98 @@
-# ACQUIRE — spawn paper-expert and code-expert
+# ACQUIRE — stand up the on-disk substrate
 
-The orchestrator dispatches two named, persistent sub-agents in parallel: **paper-expert** (which runs `/paper-extraction` to stand up the paper's reading materials) and **code-expert** (which locates and clones the reference code repo, then runs `/lc-from-code` in scan-only mode against it). Their transcripts persist and become the experts ARCHITECT consults via `SendMessage` as it writes the `astra.yaml` stub.
+The pre-loop substrate phase. Runs in the user's main session, right after INTERVIEW has committed `constitution.md` and `CLAUDE.md`. Two parallel sub-skill invocations produce the on-disk material every subsequent ralph iteration consults: `/paper-extraction` for the paper side, `/lc-from-code` in scan-only mode for the code side. Both write to `work/reference/`; both are survey-first and skip already-done work, so re-invoking on a partial state is safe.
+
+There is no `acquire` sub-agent. ACQUIRE's work *is* the two sub-skill invocations. Once they return, commit the substrate and launch the ralph loop (per SKILL.md's *Launching the loop* section).
 
 ## Where this runs
 
-The orchestrator session, directly. There is no `acquire` sub-agent — ACQUIRE's work is two parallel spawns and a wait. The orchestrator captures both agent IDs on return; those IDs are how ARCHITECT reaches the experts.
+User's main session, directly. Sub-skills are invoked as `/paper-extraction <id>` and `/lc-from-code` against the cloned reference repo.
 
 ## Inputs
 
-- The paper's DOI or arXiv ID (from CLAUDE.md's Paper section)
-- An optional code repo URL (from the interview, if the user knew it; recorded in CLAUDE.md)
+- The paper's DOI or arXiv ID (from `constitution.md`'s Evidence section)
+- An optional code repo URL (from the interview, if the user knew it; recorded in `constitution.md`'s Evidence section)
 
 ## Outputs
 
-Two persistent named sub-agents (paper-expert, code-expert), each reachable via `SendMessage` by ID. On disk:
+All on-disk; no persistent agents:
 
-- `work/reference/index.json` — paper-side structural index (figures, tables, outline, citations with resolved DOIs)
+- `work/reference/paper.pdf`
+- `work/reference/source/` (Path A — arxiv LaTeX) **or** `work/reference/document.md` (Path B — Docling fallback)
+- `work/reference/index.json` — paper-side structural index (figures, tables, outline with line numbers, citations with resolved DOIs)
 - `work/reference/astra.yaml` — paper-extraction's ASTRA-shape stub of the paper (id, name, narrative.summary, optionally findings)
-- `work/reference/paper.pdf` and either `work/reference/paper.tex` + `source/` (Path A) or `work/reference/document.md` (Path B)
 - `work/reference/figures/`, `work/reference/tables/`, `work/reference/bibliography-source.{bib,bbl}`
 - `work/reference/code/` — cloned reference repo (absent if not found)
 - `work/reference/code-status.yaml` — record of where the code came from
-- `work/reference/code-index.md` — code-expert's scan output: script inventory, candidate decisions, dependencies, container hints
+- `work/reference/code-index.md` — script inventory, candidate decisions, dependencies, container hints
 
-## Step 1 — Spawn paper-expert
+## Step 1 — Invoke `/paper-extraction`
 
 ```
-Agent(
-  name="paper-expert",
-  prompt="/paper-extraction <doi-or-arxiv-id>",
-  run_in_background=True,
-)
+/paper-extraction <doi-or-arxiv-id>
 ```
 
-paper-expert runs the full `/paper-extraction` workflow and stays alive after it finishes — its transcript holds the deep paper context that ARCHITECT and later phases consult. The skill is idempotent; re-invoking on a partially-populated `work/reference/` is safe.
+This runs the full paper-extraction workflow against the workdir. It writes everything under `work/reference/` listed above. The skill is idempotent; re-invoking on a partially-populated `work/reference/` is safe.
 
-Capture the returned agent ID.
+## Step 2 — Locate, clone, and scan the reference code (parallel with Step 1)
 
-## Step 2 — Spawn code-expert (in parallel)
+In a separate flow inside the same session:
 
-code-expert is a single sub-agent that does *all* the code-side work for ACQUIRE: locate the repo URL, clone it, then run `/lc-from-code` in scan-only mode against the clone. The orchestrator spawns it with explicit instructions to stop at the scan — `/lc-from-code` normally continues into parameterization and execution; here we only want the inventory.
+1. **Locate the reference code repository.**
+   - If a URL was provided at INTERVIEW (in `constitution.md`'s Evidence section), use it.
+   - Otherwise, grep the paper materials in `work/reference/` for repo URLs (abstract, intro, conclusion, footnotes, "Code Availability" / "Data Availability" sections). Path A: grep across `work/reference/source/*.tex`. Path B: grep `work/reference/document.md`. If `/paper-extraction` hasn't finished yet when you need to grep, wait briefly or skip ahead and come back.
+   - If still nothing, web-search: paper title + "github", Papers With Code, or the first author's GitHub profile. A few searches max — record failure and move on.
 
-```
-Agent(
-  name="code-expert",
-  prompt="""
-    You are the code-expert for an lc-from-paper reproduction.
-
-    Repo URL (from INTERVIEW): <url or 'unknown — find it'>
-    Workdir: this directory.
-
-    Your tasks for ACQUIRE:
-
-    1. Locate the reference code repository.
-       - If a URL was provided above, use it.
-       - Otherwise, grep the paper materials in work/reference/ for repo URLs (abstract,
-         intro, conclusion, footnotes, "Code Availability" / "Data Availability" sections).
-         Path A: grep across work/reference/source/*.tex. Path B: grep work/reference/document.md.
-         If still nothing, web-search: paper title + "github", Papers With Code, or the first
-         author's GitHub profile. A few searches max — record failure and move on.
-
-    2. Clone if found:
-         git clone --depth 1 <url> work/reference/code
-
-    3. Write work/reference/code-status.yaml:
-         found: true        # or false
-         url: "https://..."  # null if not found
-         cloned: true       # false if found but clone failed
-         notes: "..."
-
-    4. If work/reference/code/ exists, run /lc-from-code in SCAN-ONLY mode against it:
-       - Invoke /lc-from-code with the working directory at work/reference/code/.
-       - Do ONLY Phase 1's scan (the Explore-subagent inventory pass).
-       - Write the inventory to work/reference/code-index.md.
-       - DO NOT touch astra.yaml at the project root.
-       - DO NOT parameterize any code.
-       - DO NOT run anything.
-       - DO NOT modify the cloned repo.
-
-    5. Stay alive after returning. ARCHITECT will SendMessage you with questions
-       about the code as it writes the stub astra.yaml.
-
-    Report back: paths produced, anything surprising, any structural caveats
-    (no code found, broken clone, gnarly scan, etc.).
-  """,
-  run_in_background=True,
-)
-```
+2. **Clone if found:**
+   ```bash
+   git clone --depth 1 <url> work/reference/code
+   ```
 
-Capture the returned agent ID.
+3. **Write `work/reference/code-status.yaml`:**
+   ```yaml
+   found: true        # or false
+   url: "https://..."  # null if not found
+   cloned: true       # false if found but clone failed
+   notes: "..."
+   ```
 
-If paper-expert hasn't finished writing paper materials yet when code-expert needs to grep for a URL, code-expert can wait briefly or surface that it needs paper materials first. With a URL from INTERVIEW, code-expert is fully independent of paper-expert and runs truly in parallel.
+4. **If `work/reference/code/` exists, run `/lc-from-code` in scan-only mode against it:**
+   - Invoke `/lc-from-code` pointing at the cloned repo.
+   - The scan-only branch of `/lc-from-code` does the inventory pass inline (no Explore sub-agent spawn); it writes to `work/reference/code-index.md`.
+   - Do not touch `astra.yaml` at the project root, do not parameterize any code, do not run anything, do not modify the cloned repo.
 
-## Step 3 — Hand off to ARCHITECT
+`/lc-from-code`'s scan-only branch is the canonical code-inventory mechanism. Its prompt-context surface is what carries the "stop at scan" contract.
 
-When both sub-agents have returned, spawn the architect with both indices in its reading list and both expert agent IDs reachable. The architect's reference is [`architect.md`](architect.md); the spawn pattern lives there.
+## Step 3 — Commit and launch the ralph loop
 
-The handoff payload to architect's prompt:
+When both Step 1 and Step 2 have landed:
 
-```
-- Paper-expert agent ID: <id>
-- Code-expert agent ID:  <id>
-- Read: work/reference/index.json, work/reference/astra.yaml, work/reference/code-index.md
-- Ask the experts (via SendMessage by ID) anything that isn't in the indices.
-```
+1. **Commit the substrate.** Stage `work/reference/` and commit — small, descriptive ("acquire: paper-extraction substrate"). For the code side: commit `code-status.yaml` + `code-index.md`. The `work/reference/code/` clone itself can be `.gitignore`d or committed depending on the project's preference; the inventory file (`code-index.md`) is what downstream iterations actually consult.
+
+2. **Tell the user** the ralph loop is about to launch. Surface anything notable from Step 2 — if `code-status.yaml` records `found: false` or the cloned repo is gnarly, mention it now so the user can adjust scope before iterations start working against the substrate.
+
+3. **Launch the loop** (per SKILL.md's *Launching the loop* section):
+   ```bash
+   .claude/skills/ralph/scripts/ralph constitution.md
+   ```
+   Tell the user the tmux session name and the attach command. Iterations start firing immediately.
 
 ## Survey signals (entry into ACQUIRE)
 
 Run `ls work/reference/` first.
 
-- `paper.pdf` + path indicator (`source/` for Path A, `document.md` for Path B) + `index.json` present → paper-expert's work is done (or paper-expert is still resumable; check whether the agent is still addressable, otherwise re-spawn against the existing materials — `/paper-extraction` is idempotent and will skip done work).
-- `work/reference/code/` present, or `code-status.yaml` records `found: false`, **and** `code-index.md` is present → code-expert's work is done.
-- When both indices are present and both expert agent IDs are recorded, ACQUIRE is complete; proceed to ARCHITECT.
-- Otherwise, re-spawn whichever expert is missing. Both skills are survey-first and skip already-done work.
+- `paper.pdf` + path indicator (`source/` for Path A, `document.md` for Path B) + `index.json` + paper-side `astra.yaml` present → `/paper-extraction` has done its work (or is mid-run; re-invoking is idempotent and will skip done work).
+- `work/reference/code/` present, **or** `code-status.yaml` records `found: false`, **and** `code-index.md` is present → code-side work is done.
+- When both sides are present and committed → ACQUIRE is complete; commit any unstaged changes and launch the loop.
+- Otherwise, re-invoke whichever side is missing. Both skills are survey-first and skip already-done work.
 
 ## Notes
 
-- **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces — including mid-reproduction — fix it in `/paper-extraction`, not here. Bibliography resolution is paper-extraction's: cited-paper text and DOIs live inside `index.json#citations[key]`, not in a side file.
-- **lc-from-code is the code-inventory authority** for the scan portion. ACQUIRE's code-expert prompt constrains it to scan-only; the parameterization and run portions of `/lc-from-code` are not invoked at this phase.
+- **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces — including mid-reproduction, raised by an iteration — fix it in `/paper-extraction`, not here. Bibliography resolution is paper-extraction's: cited-paper text and DOIs live inside `index.json#citations[key]`, not in a side file.
+- **lc-from-code is the code-inventory authority** for the scan portion. ACQUIRE's invocation constrains it to scan-only via the prompt; the parameterization and run portions of `/lc-from-code` are not invoked at this phase.
 - **arXiv DOI form is `10.48550/arXiv.<id>`.** Useful when downstream tools want a DOI rather than an arXiv ID.
 - **Equation numbers and section numbers must match the rendered paper.** When citing "eq. N" or "§N" downstream, find by content, not by a naïve count of TeX blocks or markdown headings. Path A: source preserves printed numbers in `\label{}`s. Path B: Docling preserves printed numbers.
-- **This phase is acquisition + on-hand expertise, not understanding.** ACQUIRE doesn't write `astra.yaml` at the project root and doesn't compare paper to code. ARCHITECT does that work, with the experts on hand.
-- **Code-as-canonical** is loaded by every subsequent sub-agent. The per-paper `CLAUDE.md` carries the rule; ACQUIRE just stands up the reference so the rule has something to point at.
-- **The cloned code is read-only reference for the agents.** code-expert's scan reads it; ARCHITECT and later phases may have their experts re-read parts of it; nothing modifies `work/reference/code/`. (When the reproduction's implementation needs to happen later, that's an IMPLEMENT-phase decision, not an ACQUIRE one.)
-- **Commit each artifact as it lands.** The orchestrator can commit paper materials when paper-expert returns, and the code clone + scan when code-expert returns — small, descriptive commits that make `git log` legible.
-- **Surface anti-patterns the experts flag.** If code-expert reports the clone failed or the repo is clearly dead, or paper-expert reports the paper substrate is broken, surface to the user immediately rather than handing a half-acquired workdir to ARCHITECT.
+- **This phase is acquisition, not understanding.** ACQUIRE doesn't write `astra.yaml` at the project root and doesn't compare paper to code. ARCHITECT does that, in the first ralph iteration after the loop launches.
+- **Code-as-canonical** is loaded by every iteration via `CLAUDE.md`'s Rules. ACQUIRE just stands up the reference so the rule has something to point at.
+- **The cloned code is read-only reference.** Iterations may re-read it; nothing modifies `work/reference/code/`. (When the reproduction's implementation needs to happen, that's an IMPLEMENT-phase decision, not an ACQUIRE one.)
+- **Surface anti-patterns from the scan.** If `code-status.yaml` reports the clone failed or the repo is clearly dead, or if `/paper-extraction` reports the paper substrate is broken, surface to the user immediately rather than launching a loop against half-acquired substrate.
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 7d241bde..797b23dc 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -1,38 +1,36 @@
 # ARCHITECT — write the stub `astra.yaml`
 
-ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps each phase's cognitive load manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
+ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps each iteration's cognitive load manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
 
-This phase runs as the orchestrator-spawned `architect` sub-agent. The heavy work of *understanding* the paper and code already happened in ACQUIRE: paper-expert and code-expert are alive with deep context. ARCHITECT reads their indices, queries them via `SendMessage` for anything the indices don't cover, writes the stub, and self-reviews. No re-ingestion.
+ARCHITECT is what a ralph iteration does when the workdir signals "ACQUIRE substrate present + project-root `astra.yaml` absent (or empty stub)." The heavy work of *understanding* the paper and code happened in `/paper-extraction` and `/lc-from-code`'s scan-only branch; their on-disk substrate (the structural `index.json`, the paper-extraction `astra.yaml`, the `code-index.md`) is what you read on entry. No persistent expert sub-agents; targeted reads against the substrate carry the orientation.
 
 ## Inputs
 
-- `work/reference/index.json` — paper-side structural index from `/paper-extraction` (figures, tables, section outline with line numbers, citations with resolved DOIs)
-- `work/reference/astra.yaml` — paper-extraction's ASTRA-shape stub of the paper itself: id, name, `narrative.summary` (from abstract), optionally `findings:` (paper's claimed numerical results)
-- `work/reference/code-index.md` — code-side inventory from code-expert's scan: script inventory, candidate decisions with file:line refs, module map, entry-points, external data dependencies, container hints
-- **paper-expert** (agent ID handed in by the orchestrator) — reachable via `SendMessage`. Ask anything the indices don't cover: "what does the paper say about the apodization choice", "which figures are primary vs secondary", "where does the paper define the fiducial cosmology", etc.
-- **code-expert** (agent ID handed in by the orchestrator) — reachable via `SendMessage`. Ask: "which module produces the BAO fit posteriors", "where is the magnitude cut applied", "is there a config file we should treat as the canonical baseline", etc.
-- CLAUDE.md — the per-paper artifact at the workdir root; its **Goal** section names the user's intended replication targets and fidelity intent.
+- `constitution.md` — Goal, Fidelity intent, Scope, Quality bar. Read first; the Goal's intended replication targets fence what `outputs:` belong in the stub.
+- `CLAUDE.md` — auto-loaded; Rules + accumulators (still empty at this point).
+- `work/reference/index.json` — paper-side structural index from `/paper-extraction` (figures, tables, section outline with line numbers, citations with resolved DOIs).
+- `work/reference/astra.yaml` — paper-extraction's ASTRA-shape stub of the paper itself: id, name, `narrative.summary` (from abstract), optionally `findings:` (paper's claimed numerical results).
+- `work/reference/code-index.md` — code-side inventory from `/lc-from-code`'s scan: script inventory, candidate decisions with `file:line` refs, module map, entry-points, external data dependencies, container hints.
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text. Grep into for specific facts; do not re-read whole.
+- `work/reference/code/` (when present) — the cloned reference code. Read targeted modules when `code-index.md` doesn't answer a structural question.
 - `work/notes/notes.md` — user-supplied prior notes, if any.
 
 ## Outputs
 
-- `astra.yaml` — **stub form**: sub-analyses named, architecture wired (inputs / outputs declared at the sub-analysis level), high-level `narrative:` prose blocks per analysis. **No `decisions:`, `prior_insights:`, `findings:`, or `astra-anchor:` references yet** — those entries don't exist for the narrative to reference.
-- `work/notes/architect/review-round-<N>.md` — each self-review round's findings (one file per round; how many rounds depends on the rigor setting the orchestrator chose for this spawn).
+- `astra.yaml` at the project root — **stub form**: sub-analyses named, architecture wired (inputs / outputs declared at the sub-analysis level), high-level `narrative:` prose blocks per analysis. **No `decisions:`, `prior_insights:`, `findings:`, or `astra-anchor:` references yet** — those entries don't exist for the narrative to reference.
+- `CLAUDE.md` updates: Rigor *Current state* appended with the stub's state (e.g. *stub: baseline* after a single-iteration write, *stub: tightened* if this iteration was a review pass that incorporated fixes).
 
-The architect sub-agent's transcript persists alongside paper-expert and code-expert — later phases can `SendMessage` it with "you wrote this stub; why this decomposition?" if a downstream question needs the writing-time reasoning.
+## Step 1 — Read the substrate, then write the stub
 
-## Step 1 — Write the stub `astra.yaml`
-
-Read the three indices first. Then query the experts as you write — paper-expert for paper-specific facts, code-expert for code-specific facts. Don't try to absorb the paper or code yourself; the experts already have that context built up.
+Read `constitution.md`, `CLAUDE.md`, `work/reference/index.json`, `work/reference/code-index.md`, and the paper-extraction `astra.yaml` first. Then for anything the indices don't answer, Grep into `work/reference/source/` (Path A) or `document.md` (Path B), or read targeted modules in `work/reference/code/`. Don't try to absorb the paper or code whole; the indices give you the orientation, and targeted reads fill in specifics.
 
 ### What to do
 
-1. **Reconcile sub-analysis decompositions.** Read `code-index.md`'s natural-decomposition section and `index.json`'s section outline. Where paper and code agree on a stage, use that name (noun-phrase, e.g. `reconstruction`). Where they disagree, **code's structure is canonical for stage boundaries** — the paper compresses; the code reveals the actual decomposition. Where code is absent or thin, follow the paper alone. Ask code-expert to clarify any module-boundary ambiguity; ask paper-expert how the paper itself frames stage boundaries.
+1. **Reconcile sub-analysis decompositions.** Read `code-index.md`'s natural-decomposition section and `index.json`'s section outline. Where paper and code agree on a stage, use that name (noun-phrase, e.g. `reconstruction`). Where they disagree, **code's structure is canonical for stage boundaries** — the paper compresses; the code reveals the actual decomposition. Where code is absent or thin, follow the paper alone. Where module boundaries are genuinely ambiguous, read the relevant modules under `work/reference/code/` to settle it.
 2. **Choose: one analysis or sub-analyses?** If the paper has only one stage end-to-end (no clean intermediate handoffs), write a single analysis. If it has genuinely independent stages (each stage's output flows as the next's input), write sub-analyses. Sub-analysis IDs must be noun phrases: `reconstruction`, `clustering`, `bao_fit`. Avoid reserved names: `inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`.
 3. **Wire inputs and outputs at the sub-analysis level.** For each sub-analysis:
    - Declare `inputs:` from `code-index.md`'s External-data-dependencies plus any paper-named external datasets. The depth (acquisition path, selection criteria) is SPECIFY's; ARCHITECT names the input and gives it a stable id.
-   - Declare `outputs:` matching the result loci from `index.json` (figures + tables) plus any intermediate artifacts a downstream sub-analysis consumes. Tag each output's `priority:` from the paper's emphasis (primary / secondary). **The reproduction's targeted scope from CLAUDE.md's Goal takes precedence** — if the user only wants Figure 3 and Table 2, only those land as `outputs:`; the rest are out-of-scope and noted as such.
-   - Ask paper-expert which results the paper itself emphasizes if priority is unclear.
+   - Declare `outputs:` matching the result loci from `index.json` (figures + tables) plus any intermediate artifacts a downstream sub-analysis consumes. Tag each output's `priority:` from the paper's emphasis (primary / secondary). **The reproduction's targeted scope from `constitution.md`'s Scope takes precedence** — if the user only wants Figure 3 and Table 2, only those land as `outputs:`; the rest are out-of-scope and noted as such.
 4. **Author the root and per-analysis narrative.** Invoke `/narrative` for prose authoring (it carries the discipline on reserved names, voice, the data-flow paragraph requirement). High-level prose only — **no `astra-anchor:` references yet**, because the entries those would point at don't exist. SPECIFY will weave in anchors as it authors `decisions:` / `prior_insights:` / `findings:` per sub-analysis. The root `narrative:` MUST include a top-down end-to-end data-flow paragraph (per the narrative skill's data-flow rules) when sub-analyses exist.
 5. **Validate.** `astra validate astra.yaml` must return clean — even with empty `decisions:` / `prior_insights:` / `findings:` blocks, the structural fields and narrative prose must pass schema checks.
 
@@ -77,94 +75,64 @@ analyses:
 - **Stub, not snapshot.** Don't try to author content for `decisions:`, `prior_insights:`, `findings:`. Those go in SPECIFY. Your job is the structural skeleton.
 - **Reserved names.** Sub-analysis IDs are noun phrases; avoid the reserved set. Each ID must be unique across the spec.
 - **Code-as-canonical for structure.** Where paper and code disagree on the decomposition, the code's structure is canonical (the paper compresses for narrative; the code reveals real seams).
-- **Targeted scope wins.** CLAUDE.md's **Goal** scopes the reproduction. If the user only wants Figures 3–4 plus Table 2, only those land as `outputs:`.
+- **Targeted scope wins.** `constitution.md`'s Scope fences the reproduction. If the user only wants Figures 3–4 plus Table 2, only those land as `outputs:`.
 - **Narrative prose, no anchors.** Author `narrative:` prose at root and per-sub-analysis levels. Do NOT add `astra-anchor:` references — the entries those would point at don't exist yet.
 - **Validate before exit.** `astra validate astra.yaml` must return clean.
-- **Don't re-ingest.** The experts have already read the paper and code in depth. Query them; don't try to absorb the materials yourself. Your context window is for synthesis, not absorption.
+- **Targeted reads, not whole-paper absorption.** The indices give you most of what you need; reach into the source / document / code for specific items, not as a default.
 
-## Step 2 — Self-review (rigor chosen per spawn)
+After the stub is written and validates, commit it (`architect: stub astra.yaml`) and update `CLAUDE.md`'s Rigor with the stub's state (e.g. *stub: baseline*).
 
-After the stub lands, a fresh-context sub-agent cross-checks it against paper + code: are the sub-analyses the right decomposition? Are the inputs and outputs declared at the sub-analysis level wired correctly? Does the narrative prose accurately describe what each sub-analysis does?
+## Review — the next iteration
 
-The depth of self-review is set by the rigor level the orchestrator picked when it spawned this `architect` sub-agent — read CLAUDE.md's **Rigor** section for the current state and what the orchestrator flagged as the chosen rigor for this spawn:
+There is no in-iteration review-round mechanism. The ralph loop's iteration boundary *is* the fresh-context review: iteration N writes the stub; iteration N+1 reads it fresh and reviews; iteration N+2 applies fixes if any; the cycle terminates when two consecutive iterations find nothing to fix or after a 5-iteration cap on this artifact. The fresh-context-no-bias property is automatic at iteration boundaries.
 
-- **Cheap:** skip review entirely, or run a single fresh-context reviewer pass and incorporate its fixes once.
-- **Heavy:** N rounds — each round spawns a fresh reviewer against `astra.yaml` + the ACQUIRE indices + the experts; the architect sub-agent incorporates fixes; the next round spawns another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap.
+When a subsequent iteration enters, surveys, and finds the stub `astra.yaml` exists but `work/notes/architect/review-N.md` is missing (or the prior review iteration left findings to apply), this is what its work looks like:
 
-Each round spawns a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; the architect sub-agent edits the stub between rounds (or for trivial mechanical fixes, the orchestrator can do the edit directly).
+### When entering as a review iteration
 
-After self-review terminates, the architect sub-agent updates CLAUDE.md's **Rigor** section with the post-spawn state of `astra.yaml` (e.g. *stub: baseline* after a cheap pass, *stub: tightened* after heavy review).
+Don't edit `astra.yaml` on the first review pass — read it fresh and write findings. Apply fixes in a follow-up iteration so the next fresh iteration can review the fixes too.
 
-### Per-round fresh sub-agent — prompt shape
+Write findings to `work/notes/architect/review-<N>.md` (incrementing `<N>` based on existing files). For the first review iteration after the stub lands, `<N> = 1`; for the next, `<N> = 2`; and so on.
 
-```
-You are an ARCHITECT-stub reviewer. Read astra.yaml (the stub) and report
-structural inconsistencies. You are one of several independent reviewers;
-do not assume anything has already been fixed.
-
-Inputs:
-  - astra.yaml — the stub under review. decisions: / prior_insights: /
-    findings: are intentionally empty; do NOT flag those as missing.
-  - work/reference/index.json — paper structural index
-  - work/reference/astra.yaml — paper-extraction's paper-as-ASTRA stub
-  - work/reference/code-index.md — code inventory
-  - paper-expert agent ID: <id> — SendMessage for paper-side questions
-  - code-expert agent ID:  <id> — SendMessage for code-side questions
-  - CLAUDE.md — for the Goal section's scope fence
-
-What to check:
-  1. Sub-analysis decomposition. Right cuts? Consistent with code-index?
-     Defensible against the paper where the paper compresses?
-  2. Sub-analysis IDs. Noun phrases. No reserved-name collisions
-     (inputs, outputs, decisions, findings, prior_insights, analyses,
-      options, content, narrative).
-  3. Inputs at sub-analysis level. Each input has a stable id; the data
-     dependency is real (cross-check against code-index.md's
-     External-data-dependencies and the paper's data section).
-  4. Outputs at sub-analysis level. Each output corresponds to a result
-     locus from index.json OR an intermediate artifact a downstream
-     sub-analysis consumes. Targeted scope from CLAUDE.md's Goal is
-     honored — no out-of-scope outputs sneaking in, no in-scope targets
-     missed.
-  5. Narrative coverage. Root narrative includes a data-flow paragraph
-     (when sub-analyses exist). Each sub-analysis's narrative accurately
-     describes its role. No astra-anchor: references at this stage; flag
-     any that snuck in.
-  6. Validates. astra validate astra.yaml returns clean.
-
-What NOT to do:
-  - Do not flag empty decisions: / prior_insights: / findings:. That's
-    SPECIFY's territory.
-  - Do not edit any file. Output findings only.
-  - Do not re-read the entire paper or code. Use the indices and ask the
-    experts.
-  - Do not assume a prior reviewer has been here. You are fresh.
-
-Output: work/notes/architect/review-round-<N>.md (findings + verdict).
-```
+### What to check
+
+1. **Sub-analysis decomposition.** Right cuts? Consistent with `code-index.md`? Defensible against the paper where the paper compresses?
+2. **Sub-analysis IDs.** Noun phrases. No reserved-name collisions (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
+3. **Inputs at sub-analysis level.** Each input has a stable id; the data dependency is real (cross-check against `code-index.md`'s External-data-dependencies and the paper's data section).
+4. **Outputs at sub-analysis level.** Each output corresponds to a result locus from `index.json` OR an intermediate artifact a downstream sub-analysis consumes. Targeted scope from `constitution.md`'s Scope is honored — no out-of-scope outputs sneaking in, no in-scope targets missed.
+5. **Narrative coverage.** Root narrative includes a data-flow paragraph (when sub-analyses exist). Each sub-analysis's narrative accurately describes its role. No `astra-anchor:` references at this stage; flag any that snuck in.
+6. **Validates.** `astra validate astra.yaml` returns clean.
+
+### What NOT to do during review
+
+- Don't flag empty `decisions:` / `prior_insights:` / `findings:`. That's SPECIFY's territory.
+- Don't edit `astra.yaml` on the review iteration itself — write findings, exit, let the next iteration apply fixes (and the iteration after that re-review the fixes).
+- Don't re-read the entire paper or code. Use the indices and targeted reads.
+
+### Review-fix pass
+
+The iteration after the review-iteration reads `work/notes/architect/review-<N>.md`, applies the fixes to `astra.yaml`, commits (`architect: apply review-N fixes`), updates `CLAUDE.md`'s Rigor (e.g. *stub: tightened* after review-N fixes land), and exits. The iteration after *that* is the next review-iteration — fresh context, no memory of the prior round's fixes.
 
 ### Termination
 
-- **Cheap:** one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
-- **Heavy:**
-  - Round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
-  - First round (N=1): spawn round 2 unconditionally so we can compare.
-  - Round N produced fixes: spawn round (N+1) as a fresh sub-agent that does not see round N's findings or fixes.
-  - 5-round cap without two consecutive clean rounds: stop, report back to orchestrator. If user is reachable, ask in prose: "ARCHITECT review reached round cap with N fixes still landing; continue, accept the current stub, or revise scope?" If unreachable, accept the current stub, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed to SPECIFY or re-spawn ARCHITECT later.
+- If two consecutive `work/notes/architect/review-<N>.md` files both have verdict `clean`, ARCHITECT is done; the next iteration's survey advances to SPECIFY.
+- If 5 review iterations have happened without two consecutive clean rounds, log the unfinished tail to `open-questions.md` ("ARCHITECT review reached round cap with N fixes still landing; user should review during REVIEW close-out") and let the next iteration advance to SPECIFY anyway. Don't loop forever on stub-level review.
+- If the iteration's fidelity-intent assessment calls for *cheap* — verdict `pass` on the first review-iteration is enough; skip the second-clean-round requirement and move on. The Rigor accumulator stays *stub: baseline*.
+
+This "review by iteration boundary" pattern is the default. For phases where parallelism actually pays (LITERATURE with many cited papers, SPECIFY with many independent sub-analyses, IMPLEMENT with many outputs), the relevant reference describes in-iteration fan-out as an alternative. ARCHITECT is small and serial enough that sequential-via-iteration is always the right call.
 
 ## Survey signals (entry into ARCHITECT)
 
-- `work/reference/index.json` + `work/reference/astra.yaml` + `work/reference/code-index.md` (when code present) exist ⇒ ACQUIRE indices are ready
-- paper-expert and code-expert agent IDs received from the orchestrator ⇒ experts are reachable
-- `astra.yaml` exists at project root; `astra validate astra.yaml` returns clean; sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty ⇒ stub written
-- For cheap: `work/notes/architect/review-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ ARCHITECT done
-- For heavy: two consecutive `work/notes/architect/review-round-<N>.md` files both with verdict `clean` ⇒ ARCHITECT done; orchestrator proceeds to SPECIFY
+- `work/reference/index.json` + `work/reference/astra.yaml` + `work/reference/code-index.md` (when code present) exist ⇒ ACQUIRE substrate is ready
+- `astra.yaml` at project root absent (or present-but-empty) ⇒ this iteration writes the stub
+- `astra.yaml` exists, validates clean, sub-analyses + inputs + outputs + narrative populated, `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty, but no `work/notes/architect/review-1.md` ⇒ this iteration writes review-1
+- `review-N.md` exists with `needs-fixes` verdict, fixes not yet applied ⇒ this iteration applies the fixes
+- Two consecutive `review-<N>.md` files both `clean` ⇒ ARCHITECT done; next iteration surveys for SPECIFY
 
 ## Notes
 
-- **Experts replace re-ingestion.** ACQUIRE's paper-expert and code-expert are alive with deep context. ARCHITECT does not spawn its own Explore sub-agents; it queries the experts. This keeps the architect sub-agent's context lean.
+- **No persistent expert sub-agents.** The on-disk substrate (`index.json`, `code-index.md`, the paper-extraction `astra.yaml`) carries the orientation iterations need. Targeted reads on entry replace what used to be SendMessage queries to long-lived experts.
 - **The stub's empty blocks are intentional.** `decisions: {}`, `prior_insights: {}`, `findings: {}` make it clear at a glance that ARCHITECT's job is structural and SPECIFY fills them. Don't try to half-author content — empty is honest.
 - **Code-as-canonical for structure, paper-as-canonical for narrative voice.** The code reveals where the real stage boundaries are; the paper provides the words to describe them. The stub uses both.
-- **Resume is automatic.** If `astra.yaml` already validates and has the structural fields populated, on re-spawn the architect sub-agent skips Step 1 and runs Step 2 (review) only.
 - **The narrative skill is the prose author, not the structure author.** Invoke `/narrative` for the prose blocks; ARCHITECT's job is the structural skeleton plus invoking `/narrative` to fill the `narrative:` keys cleanly.
-- **Commit each artifact as it lands.** The orchestrator reads `git log` to see how far the architect sub-agent got. Stub commits before any review-round files; review-round files commit one per round. Small, descriptive commits keep the trail readable.
+- **Commit each artifact as it lands.** Stub commits before any review-N file; review-N files commit one per iteration; each fix pass commits separately. Small, descriptive commits keep `git log` legible to the next iteration.

From bf55fce1001f0e8404c46b167779ed8c14c37b32 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:27:00 +0200
Subject: [PATCH 054/124] docs + bundle index: catch up with the ralph dispatch
 shape
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

User-facing surfaces now describe the new architecture rather than the
orchestrator-and-sub-agents framing:

- README.md: /lc-from-paper paragraph rewritten — INTERVIEW + ACQUIRE in
  user's main session, ralph loop for the long middle, REVIEW back in
  main session after the loop closes.
- claude/lightcone/skills/README.md: bundle index gains a /ralph row
  alongside the project-lifecycle skills and reframes /lc-from-paper as
  the reproduction driver (not orchestrator); names /ralph as the loop
  substrate /lc-from-paper invokes.
- docs/skills/lc-from-paper.md: page-level rewrite. Two-piece
  architecture; nine-phase table with the right "where it runs" labels;
  per-paper substrate split (constitution.md + CLAUDE.md); related-skills
  list adds /ralph.
- docs/skills/index.md: ralph row in the project-lifecycle table; plugin
  layout shows ralph/ + the templates/ split.
- docs/skills/ralph.md: new page covering authoring + launching + loop
  discipline (mirrors the SKILL.md three-modes shape).
- zensical.toml: ralph nav entry.
- lc-from-code SKILL.md: scan-only branch tightened — do inventory
  inline (Read/Glob/Grep) rather than dispatching the Explore sub-agent.
  Fresh-migration mode keeps the Explore dispatch (runs in user's main
  session, no nesting risk). Restructures Phase 1 prose so both
  invocation contexts share the discipline but split on the dispatch
  mechanism.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 README.md                                     |   2 +-
 claude/lightcone/skills/README.md             |  16 +-
 claude/lightcone/skills/lc-from-code/SKILL.md |  15 +-
 docs/skills/index.md                          |   6 +-
 docs/skills/lc-from-paper.md                  | 145 ++++++++++--------
 docs/skills/ralph.md                          | 103 +++++++++++++
 zensical.toml                                 |   1 +
 7 files changed, 209 insertions(+), 79 deletions(-)
 create mode 100644 docs/skills/ralph.md

diff --git a/README.md b/README.md
index 489a1e2a..78ec33eb 100644
--- a/README.md
+++ b/README.md
@@ -42,7 +42,7 @@ Scans an existing codebase, drafts an `astra.yaml` that captures its inputs, out
 
 ### `/lc-from-paper` — Reproduce a published paper
 
-Interview-first orchestrator for reproducing a published paper in ASTRA. Drafts a per-paper `CLAUDE.md`, then runs as a persistent orchestrator session that spawns named per-phase sub-agents (architect, specify, literature, implement, run, compare) the user can drop into directly. The bookends — INTERVIEW, ACQUIRE, and REVIEW — run in the orchestrator session itself; ACQUIRE spawns two persistent expert sub-agents (`paper-expert`, `code-expert`) that downstream phases consult via `SendMessage` instead of re-ingesting materials. Composes a bundle of sibling skills (paper-extraction, narrative, figure-comparison, check-sentence-by-sentence). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
+Interview-first driver for reproducing a published paper in ASTRA. INTERVIEW + ACQUIRE run in the user's main session — drafting a per-paper `constitution.md` (the ralph loop's driving document) plus a `CLAUDE.md` (auto-loading rules + accumulators), then standing up the on-disk substrate (`/paper-extraction` for the paper, `/lc-from-code` in scan-only mode for the code). Then the rest of the reproduction hands off to a **ralph loop** whose iterations carry the long middle: ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE. Each iteration runs in a fresh tmux session against the constitution; the fresh-context property between iterations is what makes per-phase review work. When the loop closes (constitution `status: closed` after COMPARE returns `pass`), REVIEW runs back in the user's main session. Composes a bundle of sibling skills (`ralph`, `paper-extraction`, `narrative`, `figure-comparison`, `check-sentence-by-sentence`). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
 
 ### `/lc-feedback` — Report a bug
 
diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 9e223ff4..2dec2528 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -10,6 +10,7 @@ Each subdirectory is one Claude Code skill: `SKILL.md` plus optional `references
 | `lc-from-code` | Bring an existing codebase into ASTRA — scan, spec, parameterize. |
 | `lc-from-paper` | Reproduce a published paper in ASTRA (paper-reproduction bundle entry point — see below). |
 | `lc-feedback` | Report bugs and feature requests upstream. |
+| `ralph` | Author a constitution and run a ralph loop against it (authoring + launching + iterating in one skill). `lc-from-paper` uses this for the long middle of a reproduction; standalone for any other long-running work. |
 
 ## Paper-reproduction bundle
 
@@ -17,16 +18,17 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper `CLAUDE.md`, then runs as a persistent orchestrator session that spawns named per-phase sub-agents the user can drop into directly. Nine phases — INTERVIEW → ACQUIRE → ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE → REVIEW — bookended by INTERVIEW and REVIEW running in the orchestrator session itself; the seven phases between are sub-agent dispatches. Rigor is chosen per spawn from CLAUDE.md's Rigor section, not as a global dial. |
-| [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by lc-from-paper during SPECIFY. |
-| [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations) plus a stub `astra.yaml` for the paper. Primary acquisition path for lc-from-paper's ACQUIRE phase. |
-| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from lc-from-paper's REVIEW close-out (opt-in); also user-invokable directly. |
-| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from lc-from-paper's REVIEW close-out (mandatory); also user-invokable directly. |
+| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Reproduction driver.** Interview-first; INTERVIEW + ACQUIRE run in the user's main session (drafts a per-paper `constitution.md` + `CLAUDE.md`, stands up the substrate via `/paper-extraction` and `/lc-from-code` scan-only). Then hands off to a ralph loop whose iterations carry the long middle: ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE. When the loop closes (constitution `status: closed` after COMPARE returns `pass`), REVIEW runs back in the user's main session. Fidelity intent — captured as prose at INTERVIEW — is what every iteration translates into per-move cheap/heavy decisions, and what COMPARE grades opportunities against. |
+| [`ralph`](ralph/SKILL.md) | The loop substrate. `lc-from-paper`'s INTERVIEW invokes `/ralph`'s Authoring mode to draft the per-paper constitution; ACQUIRE's hand-off invokes the launcher. Each iteration runs `/ralph`'s Loop protocol against the constitution. |
+| [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by `lc-from-paper`'s ARCHITECT (for the structural narrative) and SPECIFY (for anchored content narrative). |
+| [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations with resolved DOIs) plus a stub `astra.yaml` for the paper. Primary acquisition path for `lc-from-paper`'s ACQUIRE; also invoked per cited paper by LITERATURE. |
+| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from `lc-from-paper`'s REVIEW close-out (opt-in); also user-invokable directly. |
+| [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from `lc-from-paper`'s REVIEW close-out (mandatory); also user-invokable directly. |
 
-The full reproduction story spans these five skills. lc-from-paper's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about lc-from-paper.
+The full reproduction story spans these skills. `lc-from-paper`'s `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about `lc-from-paper`.
 
 ### Why bundle (not depend on plugin install)
 
-- **Testability.** We want to verify lc-from-paper invokes its sibling skills correctly. That only works when all are in the same checkout.
+- **Testability.** We want to verify `lc-from-paper` invokes its sibling skills correctly. That only works when all are in the same checkout.
 - **Single install path.** `lc init` brings the full toolkit. Adding a separate plugin-marketplace step is friction we don't need.
 - **Future consolidation is open.** The long-run shape may be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all. See [[lightcone/skills-location-policy]].
diff --git a/claude/lightcone/skills/lc-from-code/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
index c6882d91..f3a0a158 100644
--- a/claude/lightcone/skills/lc-from-code/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -12,7 +12,7 @@ End-to-end migration: scan existing code, draft or add to `astra.yaml`, paramete
 
 This skill has two invocation contexts. The first is the user-driven default described in the phases below: do the full scan → spec → parameterize → run flow.
 
-The second is **scan-only**, used when `/lc-from-paper`'s ACQUIRE spawns this skill as `code-expert`. The orchestrator's prompt will tell you explicitly to *do only Phase 1's scan*, write the inventory to a path it specifies (typically `work/reference/code-index.md`), and **stop** — do not touch `astra.yaml` at the project root, do not parameterize any code, do not run anything, do not modify the cloned repo. After scanning, stay alive: ARCHITECT and later phases will `SendMessage` you with questions about the code as they write the spec. Trust the spawn prompt's instructions over the defaults below; if the prompt says scan-only, the scan-only contract holds.
+The second is **scan-only**, used when `/lc-from-paper`'s ACQUIRE invokes this skill against a cloned reference repo at `work/reference/code/`. The invocation prompt will tell you explicitly to *do only Phase 1's scan*, write the inventory to a path it specifies (typically `work/reference/code-index.md`), and **stop** — do not touch `astra.yaml` at the project root, do not parameterize any code, do not run anything, do not modify the cloned repo. In scan-only mode, **do the inventory work inline** (using Read, Glob, Grep directly) rather than dispatching the Explore sub-agent that fresh-migration mode uses below. The scan-only branch can run nested inside another agent context (no sub-agent dispatch is safe in that case), and the inventory is bounded enough to do inline. Trust the invocation prompt's instructions over the fresh-migration defaults below; if the prompt says scan-only, the scan-only contract holds.
 
 ## References
 
@@ -25,7 +25,9 @@ First, read the Decisions section of [ASTRA Reference](../../guides/astra-refere
 - **Fresh migration:** no meaningful `astra.yaml` exists yet. Use the code scan to draft `astra.yaml` and `universes/baseline.yaml`.
 - **Augment existing ASTRA:** `astra.yaml` already exists from a paper, user interview, or prior ASTRA work. Use the code scan to add to the current spec — recipes, dependencies, containers, code-backed decision options, baseline selections, implementation notes, and missing inputs / outputs where they naturally belong. Do not create a second `astra.yaml`, do not replace the existing structure wholesale, and surface major structure conflicts to the user before reshaping the spec.
 
-Then spawn an Explore subagent to scan the project. Include the decision criteria in the prompt so the subagent can classify candidates:
+### Scanning the project
+
+In **fresh migration** mode (user's main session, full migration flow), spawn an Explore subagent to scan the project. Include the decision criteria in the prompt so the subagent can classify candidates:
 
 ```
 Agent(subagent_type="Explore", prompt="""
@@ -60,7 +62,14 @@ For reference, here are the decision criteria for classifying candidates:
 """)
 ```
 
-Write the scan results to `CLAUDE.md` under `## Project Notes` as a script inventory, then draft or add to `astra.yaml` from the scan results following the spec structure documented in `.claude/guides/astra-reference.md`. Use the decision criteria from [ASTRA Reference](../../guides/astra-reference.md) to filter the subagent's candidate decisions down to only true analytical choices — most hardcoded values are implementation details, not decisions. Use current hardcoded values as defaults.
+In **scan-only** mode (invoked by `/lc-from-paper` ACQUIRE), do the same inventory work inline using Read / Glob / Grep:
+
+- `Glob` for `**/*.py`, `**/*.ipynb`, `**/Dockerfile`, `**/Containerfile`, `**/requirements*.txt`, `**/environment*.yml`, `**/pyproject.toml`, and any other relevant dependency / container manifests. Inventory the matches.
+- For each script and notebook, `Read` it (paginating with offset / limit for large files) to identify what it does, what it reads / writes, and any hardcoded analytical choices with `file:line` references.
+- `Grep` for repeated patterns when surveying for candidate decisions across the tree (magic numbers, common method-selector patterns, config-dict keys).
+- Apply the same decision criteria from the Decisions section of ASTRA Reference to classify candidates; the criteria are the filter regardless of whether the inventory came from an Explore sub-agent or inline reads.
+
+Either way, write the scan results to `CLAUDE.md` under `## Project Notes` (fresh migration) or to the path the invocation prompt specifies (scan-only — typically `work/reference/code-index.md`) as a script inventory, then in fresh migration mode draft or add to `astra.yaml` from the scan results following the spec structure documented in `.claude/guides/astra-reference.md`. In scan-only mode, stop after the inventory file lands; do not touch `astra.yaml`. Use the decision criteria from [ASTRA Reference](../../guides/astra-reference.md) to filter candidate decisions down to only true analytical choices — most hardcoded values are implementation details, not decisions. Use current hardcoded values as defaults.
 
 In augment mode, preserve the existing paper-derived or user-derived `inputs`, `outputs`, `decisions`, `findings`, and `narrative` unless the code scan shows a real conflict. Attach code evidence to the nearest existing home first. Create new ASTRA structure only when the code reveals a real analysis object that has no suitable home in the current spec.
 
diff --git a/docs/skills/index.md b/docs/skills/index.md
index e3b25da3..23a77491 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -21,8 +21,9 @@ user-invokable directly.
 |-------|---------|---------|
 | [lc-new](lc-new.md) | `/lc-new` | Scope a research question into an `astra.yaml`, with optional literature extraction. |
 | [lc-from-code](lc-from-code.md) | `/lc-from-code` | Wrap an existing codebase in ASTRA: scan, generate spec, parameterize, run. |
-| [lc-from-paper](lc-from-paper.md) | `/lc-from-paper` | Reproduce a published paper in ASTRA — interview-first orchestrator that spawns named per-phase sub-agents. |
+| [lc-from-paper](lc-from-paper.md) | `/lc-from-paper` | Reproduce a published paper in ASTRA — interview-first driver that hands off to a ralph loop for the long middle. |
 | [lc-feedback](lc-feedback.md) | `/lc-feedback` | File a GitHub issue against the right Lightcone repo with auto-collected context. |
+| [ralph](ralph.md) | `/ralph` | Author a constitution and run a ralph loop against it. Used by `lc-from-paper` for the long middle; standalone for any other long-running work. |
 
 ### Paper-reproduction bundle (sibling skills)
 
@@ -67,8 +68,9 @@ claude/lightcone/
 ├── skills/
 │   ├── lc-new/{SKILL.md, references/*.md}
 │   ├── lc-from-code/SKILL.md
-│   ├── lc-from-paper/{SKILL.md, references/*.md, templates/CLAUDE.md}
+│   ├── lc-from-paper/{SKILL.md, references/*.md, templates/{constitution.md, CLAUDE.md}}
 │   ├── lc-feedback/SKILL.md
+│   ├── ralph/{SKILL.md, references/*.md, scripts/ralph}
 │   ├── paper-extraction/{SKILL.md, scripts/*.py}
 │   ├── narrative/{SKILL.md, references/*.md}
 │   ├── figure-comparison/{SKILL.md, scripts/*.py}
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index f9031361..01f30e56 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -1,12 +1,16 @@
 # /lc-from-paper
 
 Reproduce a published scientific paper as a complete ASTRA project. The
-skill is an **orchestrator**: it opens with an interactive interview,
-drafts a per-paper `CLAUDE.md`, then runs as a persistent session that
-spawns named per-phase sub-agents the user can drop into directly.
+skill is **interview-first** and **ralph-driven**: INTERVIEW + ACQUIRE
+run in the user's main session to set up the per-paper substrate, then
+a ralph loop carries the long middle (ARCHITECT → SPECIFY → LITERATURE
+→ IMPLEMENT → RUN → COMPARE) across many iterations against the same
+constitution, with REVIEW returning to the user's main session after
+the loop closes.
 
 `/lc-from-paper` is the entry point of the paper-reproduction bundle.
-The four sibling skills ([`paper-extraction`](paper-extraction.md),
+The sibling skills ([`ralph`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/SKILL.md)
+for the loop, [`paper-extraction`](paper-extraction.md),
 [`narrative`](narrative.md), [`figure-comparison`](figure-comparison.md),
 [`check-sentence-by-sentence`](check-sentence-by-sentence.md)) are
 co-located in the same plugin and invoked by role across the phases.
@@ -15,63 +19,62 @@ Source: [`claude/lightcone/skills/lc-from-paper/SKILL.md`](https://github.com/Li
 
 ## Architecture
 
-The orchestrator never absorbs paper or code content directly — it
-spawns sub-agents and reads what they return. Each sub-agent gets its
-own context window, runs one phase, commits its work to git, and exits.
-The orchestrator holds the through-line: user intent, what's been done,
-what's next, how rigorously to spawn the next phase.
+Two pieces:
 
-Two persistent sub-agents — `paper-expert` and `code-expert` — are
-spawned during ACQUIRE and stay alive for the rest of the reproduction.
-Later phases query them via `SendMessage` instead of re-ingesting
-materials.
+1. **Interactive bookends in the user's main session.** INTERVIEW and
+   REVIEW are conversations with the user. ACQUIRE is two parallel
+   sub-skill invocations (`/paper-extraction` and `/lc-from-code` in
+   scan-only mode) that produce the on-disk substrate everything
+   downstream consults.
 
-**The user can interact with any sub-agent directly.** When the
-orchestrator spawns one, it appears as a chat surface (typically at the
-bottom of the screen). The user switches in for turn-by-turn dialogue,
-switches back out, and the sub-agent stays addressable.
+2. **A ralph loop for the long middle.** Once `constitution.md` is
+   drafted (INTERVIEW) and the substrate is on disk (ACQUIRE),
+   `/lc-from-paper` launches a ralph loop against the constitution.
+   Each iteration starts a fresh tmux-detached Claude session with the
+   constitution as system prompt, surveys the workdir, picks the next
+   valuable move (typically one phase's worth of work), does it,
+   commits, exits. The fresh-context property is automatic — iteration
+   N+1 reads N's work without bias, which makes per-phase review
+   collapse into "the next iteration is the review."
+
+Parallel fan-out (LITERATURE Haiku quote-finders, SPECIFY per-sub-
+analysis work, IMPLEMENT per-output work) happens *inside* an
+iteration, one level deep from the iteration's main session.
 
 ## Phases
 
-Nine phases, zero-indexed. Phases 0, 1, and 8 run in the orchestrator
-session; phases 2–7 are sub-agent dispatches.
+Nine phases, zero-indexed. INTERVIEW + ACQUIRE + REVIEW run in the
+user's main session; phases 2–7 run as ralph iterations.
 
 | # | Phase | Where | Primary outputs |
 |---|-------|-------|------------------|
-| 0 | INTERVIEW | orchestrator | per-paper `CLAUDE.md` |
-| 1 | ACQUIRE | orchestrator | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-index.md}`; `paper-expert` and `code-expert` sub-agents |
-| 2 | ARCHITECT | sub-agent | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative) |
-| 3 | SPECIFY | sub-agent | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `universes/baseline.yaml` |
-| 4 | LITERATURE | sub-agent | `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
-| 5 | IMPLEMENT | sub-agent | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
-| 6 | RUN | sub-agent | `results/<universe>/<output>/` |
-| 7 | COMPARE | sub-agent | `comparison-report.{yaml,md}` plus an opportunity assessment graded against the user's fidelity intent |
-| 8 | REVIEW | orchestrator | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
-
-ACQUIRE runs in the orchestrator session because its work is two
-parallel sub-skill invocations (`/paper-extraction` and `/lc-from-code`
-in scan-only mode) plus capturing the resulting persistent sub-agents.
-INTERVIEW and REVIEW run there because both are interactive bookends.
-
-## Per-paper `CLAUDE.md`
-
-Drafted during INTERVIEW. The reproduction workdir holds a single
-`CLAUDE.md` that sub-agents and future orchestrator sessions walk up to
-automatically. Sections:
-
-- **Paper identity** — DOI, arXiv ID, title, authors, one-line subject;
-  where the original code lives.
-- **Goal** — the user's **fidelity intent** as prose: their own answer
-  to "when is this good enough." Read on every spawn decision.
-- **Rigor** — *Current state* per output or phase (*sketch / baseline /
-  tightened / canonical*) plus *open opportunities*. Updated by
-  sub-agents as they work.
-- **Disagreements** — paper-vs-code disagreements logged as found.
-  Code is canonical for numerics; both options are preserved as
-  decision options in `astra.yaml`.
-- **Rules** — code-as-canonical, never-block-on-`AskUserQuestion`-
-  mid-sub-agent, arxiv-LaTeX-first acquisition, `astra validate
-  --verify-evidence` as the fidelity gate.
+| 0 | INTERVIEW | user's main session | per-paper `constitution.md` + `CLAUDE.md` |
+| 1 | ACQUIRE | user's main session | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}` |
+| 2 | ARCHITECT | ralph iteration | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative) |
+| 3 | SPECIFY | ralph iteration | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `universes/baseline.yaml` |
+| 4 | LITERATURE | ralph iteration | `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
+| 5 | IMPLEMENT | ralph iteration | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
+| 6 | RUN | ralph iteration | `results/<universe>/<output>/` |
+| 7 | COMPARE | ralph iteration | `comparison-report.{yaml,md}` plus an opportunity assessment graded against the user's fidelity intent |
+| 8 | REVIEW | user's main session | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
+
+## Per-paper substrate: constitution + CLAUDE.md
+
+Drafted during INTERVIEW. The reproduction workdir holds **two files**
+that iterations walk up to automatically:
+
+- **`constitution.md`** — the ralph loop's driving document. YAML
+  frontmatter `status: active`; sections: Goal (with **fidelity
+  intent** prose — the user's own answer to "when is this good
+  enough"), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv
+  ID, code repo URL), Open dimensions. Sharpens slowly — only when
+  something fundamental shifts.
+- **`CLAUDE.md`** — auto-loading walk-up. Paper identity at the top,
+  Rules (code-as-canonical, never-block-on-`AskUserQuestion`-
+  mid-iteration, arxiv-LaTeX-first, `astra validate --verify-evidence`
+  as the fidelity gate), Rigor accumulator (*Current state* per output
+  + *Open opportunities*, updated by iterations), Disagreements log
+  (running, updated by iterations), Pointers.
 
 Pointers, not snapshots.
 
@@ -84,34 +87,44 @@ Pointers, not snapshots.
   code disagree on something material, code wins for numerics but the
   disagreement is preserved as a decision option and noted in
   CLAUDE.md.
-- **Rigor is a trajectory toward the user's intent.** Sub-agent
-  fresh-context self-review is sized per spawn from the gap between
-  *Current state* and the Goal's fidelity intent — cheap (skip or one
-  pass) vs heavy (iterate until two consecutive clean rounds, cap 5).
+- **Rigor is a trajectory toward the user's intent.** Each iteration
+  sizes its work from the gap between *Current state* and the Goal's
+  fidelity intent — cheap (write and exit; let the next iteration's
+  fresh-context survey serve as the review) vs heavy (in-iteration
+  fan-out for parallel review). Default is sequential review via
+  iteration boundaries.
 - **arxiv-LaTeX-first acquisition.** PDF + Docling is the non-arxiv
   fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,
   every input must be real.
+- **Open-questions for autonomous iteration.** Iterations run detached
+  in tmux; `AskUserQuestion` isn't available. Questions go to
+  `open-questions.md` with the iteration's best-judgment default
+  applied; the user resolves at REVIEW close-out.
 
 ## Anti-patterns
 
-- Reading content the orchestrator doesn't need. If the answer fits in
-  a sub-agent's return, don't re-read the source.
-- Doing phase work in the orchestrator session. Exceptions are
-  INTERVIEW, ACQUIRE, and REVIEW.
-- Asking a sub-agent to use `AskUserQuestion` — they don't have it.
+- Doing the long middle in the user's main session instead of launching
+  the loop. INTERVIEW + ACQUIRE + REVIEW belong in the main session;
+  ARCHITECT through COMPARE belong in iterations.
+- Asking an iteration to use `AskUserQuestion` — iterations are
+  detached.
 - Re-implementing what `astra` already does (`astra validate`, `astra
   paper add`).
-- Forgetting to announce the spawn — the user needs to know a sub-agent
-  has launched and that they can switch into its chat.
+- Bundling phases into one iteration — defeats fresh-context review.
+- Accreting amendment sections in `constitution.md` — reshape, don't
+  append.
 
 ## Related
 
 - [Bundle README](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md)
   — why the bundle is co-located rather than a separate plugin install.
+- [`/ralph`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/SKILL.md)
+  — the loop substrate (authoring + launching + iterating).
 - [`/paper-extraction`](paper-extraction.md) — ACQUIRE's primary
-  acquisition path.
-- [`/narrative`](narrative.md) — SPECIFY's prose authoring.
+  acquisition path; also invoked per cited paper by LITERATURE.
+- [`/narrative`](narrative.md) — ARCHITECT's structural narrative and
+  SPECIFY's anchored content narrative.
 - [`/figure-comparison`](figure-comparison.md) — REVIEW (mandatory) and
   also user-invokable.
 - [`/check-sentence-by-sentence`](check-sentence-by-sentence.md) —
diff --git a/docs/skills/ralph.md b/docs/skills/ralph.md
new file mode 100644
index 00000000..7c257715
--- /dev/null
+++ b/docs/skills/ralph.md
@@ -0,0 +1,103 @@
+# /ralph
+
+Author a constitution — a markdown document describing a desired state
+for autonomous iteration — and run a ralph loop against it. The loop is
+a detached tmux session that respawns a fresh worker per iteration,
+with the constitution injected as the system prompt; iterations
+terminate when one flips the constitution's frontmatter `status:` to
+`closed` after a cold survey.
+
+Used by [`/lc-from-paper`](lc-from-paper.md) for the long middle of a
+reproduction (ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN →
+COMPARE). Standalone for any other long-running work where adaptation
+matters more than a fixed plan: refactors, exploratory analyses,
+research narratives that keep growing.
+
+Source: [`claude/lightcone/skills/ralph/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/SKILL.md).
+
+## Three modes
+
+One applies at a time:
+
+- **Authoring** — drafting a constitution from scratch (Study → Draft
+  → Refine → Launch). Reference depth in
+  [`references/constitution.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/references/constitution.md)
+  and the careful-thinking rhythm in
+  [`references/crafting.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/references/crafting.md).
+- **Launching** — outside any active loop, invoking the bundled script
+  to start one on an existing constitution.
+- **Inside a loop** — the constitution is in the system prompt; the
+  worker follows the Loop protocol (Survey → Work → Update → Exit).
+
+## Launching
+
+After `lc init` copies the bundle into a project, the launcher lives at
+`.claude/skills/ralph/scripts/ralph`:
+
+```bash
+.claude/skills/ralph/scripts/ralph <constitution.md> [--backend claude|codex] [-- extra-flags...]
+```
+
+The constitution must have `status: open` or `status: active` in YAML
+frontmatter; the launcher refuses to start otherwise. Termination is
+automatic when an iteration flips `status:` to `closed`.
+
+The session detaches as `ralph-<dirname>-<basename>`. Attach with
+`tmux attach -t <session>`. A second launch with the same constitution
+detects the existing session and prints the attach command instead of
+double-starting.
+
+## What goes in a constitution
+
+A constitution describes what the system looks like when it's right —
+the desired state. It outlasts any single iteration. Nothing in it
+becomes confusing or unnecessary as the desired state is reached. The
+constitutional principle: write what remains true until the work is
+done.
+
+Common sections — use what fits, skip what doesn't:
+
+- **Desired State** — what "done" looks like. Invariants, quality bar,
+  done-conditions. Fence the scope.
+- **Context** — file paths, existing patterns, architectural constraints.
+- **Skills** — which skills to activate before working.
+- **Evidence** — how to check progress (commands, test suites, grep
+  patterns).
+- **Open Questions** — uncertainties the user weighs in on between
+  loops.
+
+See the SKILL's *What goes in a constitution* and
+[`references/constitution.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/references/constitution.md)
+for the discipline that keeps a constitution from sliding into a plan.
+
+## Authoring principles
+
+- **Constitution, not plan.** Say what the system looks like when it's
+  right. Never describe the current state.
+- **Pointers, not snapshots.** "Check `grep -r 'old_pattern'`", not
+  "50 files remain." Snapshots go stale; pointers stay valid.
+- **Reshape, don't accrete.** When the desired state evolves, rewrite
+  the affected sections — don't tack on "Round 2" or "Amendments."
+- **Constraints need reasons.** Bare constraints get circumvented.
+- **Scope is a gift.** A clear fence frees iterations to work
+  confidently inside it.
+
+## Loop discipline
+
+Each iteration: Survey → Work → Update → Exit (`kill $PPID`). The
+survey is a fixed cost; exit when the next valuable move needs a
+different mental workspace, not when one task ends. Exit before context
+is half-full — the handoff matters more than the marginal step you'd
+squeeze in.
+
+**Closing the constitution is reserved for cold surveys that find
+nothing left to do.** If an iteration made any changes, it may not flip
+`status:` to `closed`; that decision waits for the next fresh-eyes
+iteration. This adds at least one cold review pass on every closing
+decision.
+
+## Related
+
+- [`/lc-from-paper`](lc-from-paper.md) — uses `/ralph` for the long
+  middle of a reproduction.
+- [Bundle README](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md).
diff --git a/zensical.toml b/zensical.toml
index f09d0639..8ad4f707 100644
--- a/zensical.toml
+++ b/zensical.toml
@@ -50,6 +50,7 @@ nav = [
       {"lc-from-code" = "skills/lc-from-code.md"},
       {"lc-from-paper" = "skills/lc-from-paper.md"},
       {"lc-feedback" = "skills/lc-feedback.md"},
+      {"ralph" = "skills/ralph.md"},
       {"paper-extraction" = "skills/paper-extraction.md"},
       {"narrative" = "skills/narrative.md"},
       {"figure-comparison" = "skills/figure-comparison.md"},

From ba8af15304bbef25fdfaeeca05ffbfa2b66f1db6 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:37:52 +0200
Subject: [PATCH 055/124] lc-from-paper: retune the remaining phase references
 for the ralph shape
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

specify, literature, implement, run, compare, review — all the
references that still framed phases as "the orchestrator-spawned <phase>
sub-agent" or called persistent experts via SendMessage. Convert to:

- "What a ralph iteration does when the workdir signals X is next"
  framing throughout.
- Drop paper-expert / code-expert / SendMessage entirely; iterations
  Grep into work/reference/source/ or work/reference/document.md for
  paper text and read targeted modules under work/reference/code/. The
  structural index at work/reference/index.json and the inventory at
  work/reference/code-index.md give the orientation to know where to
  look.
- Self-review-via-fresh-context-sub-agent (broken — sub-agents can't
  spawn sub-agents) collapses into review by iteration boundary: one
  iteration writes, the next reads fresh and reviews, terminating on
  two consecutive clean iterations or a 5-iteration cap. The per-round
  reviewer check-lists and findings-file shapes are preserved verbatim;
  only the dispatch mechanism changes.
- Optional in-iteration fan-out for phases where parallelism actually
  pays (LITERATURE with many cited papers, SPECIFY with many
  independent sub-analyses, IMPLEMENT with many independent outputs) —
  one-level-deep sub-agents inside the iteration's main session.
- COMPARE: verdict-and-decision used to be split (compare sub-agent's
  judgment vs orchestrator's keep-iterating call in dialogue with user);
  now both happen at iteration boundary. One iteration writes the
  report and the take; the next surveys and decides retry vs accept.
  The user's voice enters at REVIEW, not mid-loop.
- REVIEW: framed as "close-out in the user's main session" rather than
  "orchestrator-session close-out". Runs after the ralph loop's tmux
  session has exited (constitution status: closed).

Anti-pattern noted in architect.md: persistent expert sub-agents are
explicitly out — preserves the historical reasoning so the next reader
understands why on-disk substrate + targeted reads replaced them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../lc-from-paper/references/compare.md       | 20 ++---
 .../lc-from-paper/references/implement.md     | 57 +++++++------
 .../lc-from-paper/references/literature.md    | 47 +++++------
 .../skills/lc-from-paper/references/review.md | 20 ++---
 .../skills/lc-from-paper/references/run.md    | 10 +--
 .../lc-from-paper/references/specify.md       | 79 +++++++++----------
 6 files changed, 110 insertions(+), 123 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index dca3cc8d..d4d2da8d 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -1,8 +1,8 @@
 # COMPARE — judge the match, name the opportunities
 
-Compare reproduced results against the paper's replication targets. COMPARE returns two things: a **verdict** (pass / partial / fail) and an **opportunity assessment** — where the gaps are, how much they likely matter, and how they sit relative to the user's fidelity intent in CLAUDE.md's Goal section. The verdict drives whether the orchestrator re-spawns IMPLEMENT for another retry; the opportunity assessment tells the orchestrator (and the user) which gaps fall below intent and would be high-leverage to close, even on `pass`. Together they replace the old yes/no framing.
+Compare reproduced results against the paper's replication targets. COMPARE returns two things: a **verdict** (pass / partial / fail) and an **opportunity assessment** — where the gaps are, how much they likely matter, and how they sit relative to the user's fidelity intent in `constitution.md`'s Goal section. The verdict drives whether a subsequent iteration retries IMPLEMENT; the opportunity assessment tells the next iteration (and the user at REVIEW) which gaps fall below intent and would be high-leverage to close, even on `pass`. Together they replace the old yes/no framing.
 
-This phase runs as the orchestrator-spawned `compare` sub-agent. The orchestrator and the user together decide what to do with COMPARE's output — spend another IMPLEMENT round now (close a below-intent gap), accept the current verdict and proceed to REVIEW, or land at the current trajectory and log the gap as an open opportunity in CLAUDE.md's **Rigor** section. The user can drop into the compare sub-agent's chat for the verdict ratification conversation, or wait until REVIEW close-out.
+COMPARE is what a ralph iteration does when the workdir signals "RUN done (`results/` materialized) + `comparison-report.yaml` absent or stale relative to latest RUN." The iteration writes the report; what happens next depends on the verdict and the iteration's read of the constitution's Fidelity intent. If verdict is `partial`/`fail` AND an opportunity is below intent AND attempt budget remains, the next iteration takes a retry attempt at IMPLEMENT against the failing outputs first. If verdict is `pass` AND no opportunities are below intent (or budget is exhausted), the iteration logs un-acted opportunities into CLAUDE.md's **Rigor** *Open opportunities*; a subsequent cold-survey iteration with no contributions closes the constitution and REVIEW runs in the user's main session.
 
 ## Inputs
 
@@ -10,8 +10,8 @@ This phase runs as the orchestrator-spawned `compare` sub-agent. The orchestrato
 - `astra.yaml` — output definitions (each target maps to an output)
 - `targets/` — reference figures / tables for comparison
 - `results/<universe>/<output_id>/` — reproduced results
-- **paper-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Useful for "what does the paper actually claim for this number" or "how does the paper describe what Figure 3 should show" when grading the comparison.
-- **code-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Useful for diagnosing divergence: "what does the reference code compute here that ours might miss".
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — target paper text. Grep into for "what does the paper actually claim for this number" or "how does the paper describe what Figure 3 should show" when grading the comparison.
+- `work/reference/code/` (when present) — read targeted modules pointed at by `code-index.md` for diagnosing divergence: "what does the reference code compute here that ours might miss".
 
 ## Outputs
 
@@ -98,21 +98,21 @@ Also write `comparison-report.md` with a human-readable summary. For figure / ta
 
 ## Verdict + opportunity surfacing
 
-After writing the report, the compare sub-agent reports back to the orchestrator with the verdict, the failing-output count (if any), and the headline opportunities — `below`-intent items first. The orchestrator either:
+After writing the report, the iteration acts against the fidelity intent (iterations run detached; the user isn't reachable interactively):
 
-- **Carries the report to the user** (if the user is reachable in the orchestrator session or the compare sub-agent's chat) for ratification: present verdict, the failing outputs (if `partial` / `fail`), and the top `below`-intent opportunities; ask whether to spend another IMPLEMENT round on those gaps, accept and proceed to REVIEW, or land at the current trajectory and log the gaps as open opportunities in CLAUDE.md.
-- **Acts against intent** (if the user is unreachable): if attempt < budget AND (verdict is `partial` / `fail` OR any opportunity is `below` intent), re-spawn `implement` targeting the `below` gaps first; if verdict is `pass` AND no opportunities are `below`, OR attempt >= budget, log remaining opportunities in CLAUDE.md's **Rigor** section and proceed to REVIEW.
+- If attempt < budget AND (verdict is `partial` / `fail` OR any opportunity is `below` intent), commit the report, exit. The next iteration surveys, sees the report's `below`-intent opportunities, and takes a retry attempt at IMPLEMENT targeting those gaps first.
+- If verdict is `pass` AND no opportunities are `below` intent, OR attempt budget is exhausted, log un-acted opportunities into CLAUDE.md's **Rigor** *Open opportunities* list, commit. A subsequent cold-survey iteration (no contributions) closes the constitution by flipping `status:` to `closed`, and REVIEW close-out runs in the user's main session.
 
-The verdict is the compare sub-agent's judgment; the **decision to keep iterating or move on** is the orchestrator's (in dialogue with the user). The opportunity assessment — graded against the user's fidelity intent — is the bridge that turns a binary verdict into a picture both parties can navigate.
+The verdict is the iteration's judgment from the data; the **decision to keep iterating or close** happens by iteration boundary — one iteration writes the report and the take, the next surveys and decides whether to retry or accept. The opportunity assessment — graded against the user's fidelity intent — is the bridge that turns a binary verdict into a picture the next iteration (and REVIEW) can navigate.
 
 ## Survey signals (entry into COMPARE)
 
 - All outputs in `lc status --universe baseline` are `ok` ⇒ ready to compare
 - `comparison-report.yaml` exists with current `attempt` ⇒ COMPARE done for this attempt
-- `comparison-report.yaml` verdict is `pass` (or `partial` accepted) ⇒ COMPARE → IMPLEMENT loop terminated; orchestrator proceeds to REVIEW close-out
+- `comparison-report.yaml` verdict is `pass` (or `partial` with un-acted opportunities logged to CLAUDE.md as Open opportunities) ⇒ COMPARE → IMPLEMENT loop terminated; the next cold-survey iteration closes the constitution and REVIEW runs in the user's main session
 
 ## Notes
 
 - **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-<N>.yaml` if useful, or commit each between attempts so `git log` carries the history.
-- **The verdict is the compare sub-agent's; the keep-iterating decision is the orchestrator's** (in dialogue with the user, when reachable). Treat them as separate.
+- **The verdict is the iteration's judgment from the data; the keep-iterating decision happens at iteration boundary.** One iteration writes the report and the take on what should happen next; the next iteration surveys, reads the take, and either retries or accepts. The user's voice enters at REVIEW close-out, not mid-loop.
 - **The opportunity assessment is part of the durable record.** When the user accepts the current verdict, propagate the un-acted-on opportunities into CLAUDE.md's **Rigor** section's *Open opportunities* list. Future sessions and future-Cail returning to this reproduction see them; tightening any becomes a re-spawn of IMPLEMENT against a clearer target.
diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index 15d5f40a..4addfffe 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -1,8 +1,8 @@
-# IMPLEMENT — write scripts and recipes; per-spawn self-review
+# IMPLEMENT — write scripts and recipes; review by iteration boundary
 
-Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, a self-review pass cross-checks the implementation against paper + code — same fresh-context-no-bias shape ARCHITECT, SPECIFY, and LITERATURE use. Fixes feed back inside the same implement sub-agent for the next iteration.
+Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, review (by iteration boundary, or in-iteration fan-out for parallelism) cross-checks the implementation against paper + code — same fresh-context-no-bias shape ARCHITECT, SPECIFY, and LITERATURE use, with the fresh-context property given for free by iteration boundaries.
 
-This phase runs as the orchestrator-spawned `implement` sub-agent. Most implementation is mechanical (translate spec → script), but algorithm choices on tricky steps may want user ratification — the user can drop into the implement sub-agent's chat for that. Where parallelization is feasible (multiple independent outputs from different scripts), the implement sub-agent fans out to one Task-tool sub-sub-agent per output and merges.
+IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done + scripts/ absent (first pass) or comparison-report.yaml shows partial/fail (retry pass)". Most implementation is mechanical (translate spec → script). Where parallelization is feasible (multiple independent outputs from different scripts), the iteration fans out to one-level-deep sub-agents per output (inside its own main session) and merges.
 
 ## Inputs
 
@@ -11,17 +11,16 @@ This phase runs as the orchestrator-spawned `implement` sub-agent. Most implemen
 - `work/reference/index.json` — paper-side structural index (figures, tables, outline, citations); useful when the spec compresses or you need to find where in the paper a behavior is described.
 - `work/reference/code-index.md` (when code present) — code inventory: module map, candidate decisions with file:line, entry-points, data dependencies, gotchas (the canonical map of where each sub-analysis's logic lives in `work/reference/code/`).
 - `work/reference/code/` (if present) — **canonical reference. Read it when implementing each output.** Where paper and code disagree, code wins for numerics, plotting, and method.
-- **code-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. The first stop for "where does X live in the code", "what's the canonical entry-point for Y", "what's the default parameter the code uses for Z". Cheaper than re-reading the code yourself.
-- **paper-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Useful when implementing an output and the spec doesn't fully capture what the paper says it should produce (e.g. "what's the expected axis range for Figure 4").
-- CLAUDE.md — **Rigor** for this spawn's chosen rigor level; **Paper-vs-code disagreements** for prior conflicts already logged.
+- `constitution.md` — Fidelity intent (used to size cheap vs heavy on this iteration's review).
+- CLAUDE.md — Rigor *Current state* per output; **Paper-vs-code disagreements** for prior conflicts already logged.
 
 ## Outputs
 
 - `scripts/<output>.py` (or `.sh`, or whatever fits) — one script per output (or shared scripts for tightly-coupled outputs)
 - `requirements.txt` — Python dependencies
 - Recipes in `astra.yaml` — each output gets a `recipe:` block with `command:` and `inputs:`
-- `work/notes/implement-review/round-<N>.md` — each review round's findings (one file per round; how many rounds depends on the rigor level)
-- CLAUDE.md updates — append to **Paper-vs-code disagreements** for any new conflict surfaced during implementation; update **Rigor** with the post-spawn state per output (e.g. *baseline* after a cheap pass, *tightened* after heavy review).
+- `work/notes/implement-review/round-<N>.md` — each review iteration's findings (one file per review-iteration; how many depends on the fidelity-intent calculus)
+- CLAUDE.md updates — append to **Paper-vs-code disagreements** for any new conflict surfaced during implementation; update **Rigor** *Current state* with the post-iteration state per output (e.g. *baseline* after a one-iteration pass, *tightened* after a review-iteration applied fixes).
 
 ## Step 1: write recipes + scripts
 
@@ -29,10 +28,7 @@ Read `astra.yaml` and `implementation-notes.md`. For each output, write a script
 
 ### With a code reference (`work/reference/code/` exists)
 
-**Read the relevant code when implementing each output** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that SPECIFY's code pass missed:
-
-- **User reachable** (in the implement sub-agent's chat): ask in prose — paper method + code method + plausible impact + which one to take.
-- **User unreachable**: continue with the code's behavior, append the disagreement to CLAUDE.md's **Paper-vs-code disagreements** AND `open-questions.md`, and note it in `implementation-notes.md` so REVIEW close-out can ratify or override.
+**Read the relevant code when implementing each output** — not just to resolve ambiguities but as the canonical source of truth for numerics + method. Write clean scripts following ASTRA conventions (not verbatim copies), but treat the code's behavior as authoritative when it disagrees with the paper. When you encounter a paper-vs-code disagreement that SPECIFY's code pass missed: continue with the code's behavior (per the canonical-resolution default; the iteration runs detached, no interactive ratification), append the disagreement to CLAUDE.md's **Paper-vs-code disagreements** AND `open-questions.md`, and note it in `implementation-notes.md` so REVIEW close-out can ratify or override.
 
 Without this discipline, the implementation drifts to "looks right" rather than "matches" — the failure mode the first-paper test surfaced.
 
@@ -40,20 +36,20 @@ When the reference code is substantial enough that implementation is really a mi
 
 ### Without a code reference (`work/reference/code/` is absent)
 
-When `code-status.yaml` records `found: false` or the cloned repo turned out to be unusable, there is no canonical code substrate to anchor against. **Write the implementation fresh from the spec** — `astra.yaml`'s decisions, findings, and prior_insights are now the only source of method-level truth, and the paper's prose (consulted via paper-expert) is the source of numerics-level truth. Don't pretend a code reference exists; don't try to find a similar paper's code as a stand-in. Implement what the spec describes, ask paper-expert when the spec compresses something you need clarified, and rely on COMPARE to surface anywhere the implementation has drifted from the paper's claims.
+When `code-status.yaml` records `found: false` or the cloned repo turned out to be unusable, there is no canonical code substrate to anchor against. **Write the implementation fresh from the spec** — `astra.yaml`'s decisions, findings, and prior_insights are now the only source of method-level truth, and the paper's prose (Grep into `work/reference/source/` or `document.md` for specific facts) is the source of numerics-level truth. Don't pretend a code reference exists; don't try to find a similar paper's code as a stand-in. Implement what the spec describes, read targeted paper sections when the spec compresses something you need clarified, and rely on COMPARE to surface anywhere the implementation has drifted from the paper's claims.
 
 The code-as-canonical rule does not apply here — there is no code to be canonical. The paper is the only anchor. This is the harder path; reproductions on it converge slower and have more open questions for REVIEW close-out. Surface that honestly to the user as you go; don't dress up paper-only implementations as if they had a code anchor.
 
 ### Parallelize where feasible
 
-When outputs are produced by independent scripts (no shared expensive computation), the implement sub-agent spawns one Task-tool sub-sub-agent per output. Each sub-sub-agent gets:
+When outputs are produced by independent scripts (no shared expensive computation), the iteration spawns one-level-deep sub-agents per output (inside its own main session). Each sub-agent gets:
 
 - The output's spec entry from `astra.yaml` (including its sub-analysis's `decisions:` / `findings:` for context)
 - The relevant section of `implementation-notes.md`
 - The matching entry in `work/reference/code-index.md`'s natural-decomposition / entry-points block — that's the pointer back to the canonical code location for the sub-analysis the output lives in
 - The relevant code path(s) under `work/reference/code/`
 
-The implement sub-agent merges scripts and recipes after the per-output sub-sub-agents finish. Tightly-coupled outputs (e.g. an MCMC producing both a chain and a summary statistic) stay in one sub-sub-agent and one script.
+The iteration merges scripts and recipes after the per-output sub-agents finish. Tightly-coupled outputs (e.g. an MCMC producing both a chain and a summary statistic) stay in one sub-agent and one script.
 
 ### Rules for the first pass
 
@@ -64,14 +60,15 @@ The implement sub-agent merges scripts and recipes after the per-output sub-sub-
 5. **Do not execute scripts** — the RUN phase handles execution via `lc run`.
 6. **Validate** with `astra validate astra.yaml` after adding recipes.
 
-## Step 2: self-review (rigor chosen per spawn)
+## Step 2: review — by iteration boundary (default) or in-iteration fan-out (optional)
+
+After the first-pass implementation lands, the cross-check question is: is the implementation consistent with the paper and the code? The depth is sized from the gap between CLAUDE.md's Rigor *Current state* and `constitution.md`'s Fidelity intent:
 
-After the first-pass implementation lands, the rigor level the orchestrator picked for this spawn (read CLAUDE.md's **Rigor** section) decides what happens next:
+**Default: review by iteration boundary.** The iteration that wrote the first pass exits when `scripts/`, recipes, and `requirements.txt` are committed; the next iteration enters fresh, surveys, finds the implementation present but no `work/notes/implement-review/round-1.md`, reads `scripts/` + `astra.yaml`'s recipes + the paper, and writes findings to `round-1.md`. The iteration after that applies the fixes. Two consecutive review-iterations with verdict `clean` terminates the review cycle; the next iteration advances to RUN. Sized: *cheap* — accept after one clean review-iteration; *heavy* — require two consecutive clean.
 
-- **Cheap:** one minimal review pass — a single fresh Task-tool sub-agent reads `scripts/`, `astra.yaml`'s recipes, and the paper, and reports any obvious paper-vs-implementation inconsistencies. Fixes are applied once; no further iteration. If no fixes are needed, IMPLEMENT proceeds to RUN.
-- **Heavy:** N rounds of fresh-context Task-tool sub-agent review + fix. Each round spawns a fresh reviewer that does not see the prior round's findings or fixes. Stop when **two consecutive rounds find no fixes**, or after 5 rounds (system cap), whichever comes first.
+**Optional: in-iteration fan-out.** When the implementation is large (many outputs, many scripts) and the fidelity intent calls for *heavy*, the iteration holding the review can fan out parallel reviewers as one-level-deep sub-agents inside its own session, partitioned by output or sub-analysis, merge findings, apply fixes in the same iteration. The next iteration's survey acts as the consolidating review.
 
-The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: each round's reviewer is fresh, prompted to check "is the implementation consistent with the paper and the code?", and outputs findings only — not edits. Fixes are applied between rounds by the implement sub-agent itself (or the orchestrator inline for trivial mechanical fixes). Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
+The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: review is fresh-context (whether across iterations or across fan-out spawns), prompted to check "is the implementation consistent with the paper and the code?", outputs findings only — not edits. Fixes are applied between iterations by the next iteration (or merged in the same iteration for fan-out). Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
 
 ### Per-round fresh sub-agent — system prompt
 
@@ -99,7 +96,7 @@ The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: each ro
 > ### What NOT to do
 >
 > - **Do not edit any file.** Your output is a findings file; an IMPLEMENT-fix pass responds to the findings.
-> - **Do not re-read the entire paper.** Ask paper-expert / code-expert via `SendMessage` for claim verification, or Grep into `work/reference/index.json`, `work/reference/code-index.md`, and `work/reference/source/` (or `document.md`) for specific items. The filled `astra.yaml` is your primary source for what each sub-analysis is supposed to do.
+> - **Do not re-read the entire paper.** Grep into `work/reference/index.json`, `work/reference/code-index.md`, and `work/reference/source/` (or `document.md`) for specific items. The filled `astra.yaml` is your primary source for what each sub-analysis is supposed to do.
 > - **Do not invent problems.** If the implementation matches paper + code, say so briefly.
 > - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
 >
@@ -128,14 +125,14 @@ The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: each ro
 
 ### Step 3: IMPLEMENT-fix pass between rounds
 
-After each round's findings file lands, the implement sub-agent (or the orchestrator inline for trivial fixes) edits `scripts/`, `astra.yaml` recipes, `requirements.txt`, and `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`, run `astra validate astra.yaml`.
+After each round's findings file lands, the iteration edits `scripts/`, `astra.yaml` recipes, `requirements.txt`, and `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`, run `astra validate astra.yaml`.
 
 ### Step 4: termination check
 
 - **Cheap:** one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
 - **Heavy:**
   - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
-  - If N hits the 5-round system cap without two consecutive clean rounds, the implement sub-agent stops and reports back to the orchestrator. If the user is reachable, ask in prose: "implement-review reached round cap with N fixes still landing; continue, accept the current implementation, or revise scope?" If the user is unreachable, accept current implementation, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed or re-spawn.
+  - If N hits the 5-round system cap without two consecutive clean rounds, an iteration logs the unfinished tail in `open-questions.md` ('IMPLEMENT review reached round cap with N fixes still landing; user should review during REVIEW close-out') and the next iteration advances to RUN anyway.
 
 The IMPLEMENT-review iterations are independent of the COMPARE → IMPLEMENT retry loop — review iterations run before RUN, on the spec/implementation alignment side; COMPARE retries run after RUN, on the result-matching side.
 
@@ -149,23 +146,23 @@ If a dataset is behind a paywall, requires registration, or is "available upon r
 
 ## Retry attempts (post-COMPARE)
 
-If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, the orchestrator may re-spawn `implement` as a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. Default attempt budget is 5 (the orchestrator can override per spawn); the implement sub-agent's first move is to check whether `attempt` in the report has reached the budget. If it has, stop and report back — if the user is reachable, ask in prose ("verdict still failing after N attempts — continue, change scope, or accept partial?"); if not, accept partial, log the failure in CLAUDE.md's **Rigor** section as an open opportunity, and let the orchestrator decide.
+If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, a subsequent iteration may take on a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. Default attempt budget is 5; the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, accept partial, log the failure in CLAUDE.md's **Rigor** section as an open opportunity (so REVIEW close-out can decide whether to push further or accept the trajectory), and exit; subsequent iterations either accept the verdict via a cold close or pivot scope based on REVIEW's input.
 
-A retry attempt re-runs the IMPLEMENT-review iterations on the changed scripts before proceeding to RUN.
+A retry attempt re-runs IMPLEMENT review (by iteration boundary) on the changed scripts before the next iteration advances to RUN.
 
 ## Survey signals (entry into IMPLEMENT)
 
 - `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement first pass
 - `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done
 - For cheap: `work/notes/implement-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ IMPLEMENT done
-- For heavy: two consecutive `work/notes/implement-review/round-<N>.md` files both have verdict `clean` ⇒ IMPLEMENT done; orchestrator proceeds to RUN
-- `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; orchestrator proceeds to REVIEW close-out
+- For heavy: two consecutive `work/notes/implement-review/round-<N>.md` files both have verdict `clean` ⇒ IMPLEMENT done; the next iteration surveys and advances to RUN
+- `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; the constitution can close after a cold survey, and REVIEW close-out runs in the user's main session
 
 ## Notes
 
 - **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
 - **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
 - **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
-- **The fresh-context discipline is the same as ARCHITECT's, SPECIFY's, and LITERATURE's self-review.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Each round must spawn a brand-new Task-tool sub-agent.
-- **Minimize churn in fixes.** Targeted edits, not restructures. Big restructures defeat the round-over-round comparison the implement sub-agent uses to decide termination.
-- **Commit per output as it lands.** One commit per script + recipe wiring; one commit per review-round file; one commit per fix pass. The orchestrator reads `git log` to track progress.
+- **The fresh-context discipline is the same as ARCHITECT's, SPECIFY's, and LITERATURE's review.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Iteration boundaries give fresh context automatically; in-iteration fan-out reviewers each get fresh-from-merge state without prior-round contamination.
+- **Minimize churn in fixes.** Targeted edits, not restructures. Big restructures defeat the round-over-round comparison the iteration sequence uses to decide termination.
+- **Commit per output as it lands.** One commit per script + recipe wiring; one commit per review-round file; one commit per fix pass. The next iteration reads `git log` to track progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index 1622ed51..d1c0fc5c 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -6,15 +6,15 @@ The quote-finding direction is: **target paper's claim → quote inside the cite
 
 LITERATURE runs **after SPECIFY**, not before — relevant `prior_insights:` are defined by the decisions and findings they justify. Fetching cited papers speculatively before SPECIFY would do work for citations that may never end up needed.
 
-This phase runs as the orchestrator-spawned `literature` sub-agent. Its internal architecture is **two simple stages**: mechanical fetch (paper-extraction's deterministic script, batched-parallel via shell — no agent fan-out), then quote-finding (literature does it itself for small placeholder counts; spawns a small number of Haiku sub-agents for large counts). The agentic work is the quote-matching; the fetch is plumbing.
+LITERATURE is what a ralph iteration does when the workdir signals "SPECIFY done + `prior_insights:` placeholders present without `evidence:` selectors." Its internal architecture is **two simple stages**: mechanical fetch (paper-extraction's deterministic script, batched-parallel via shell — no agent fan-out), then quote-finding (the iteration does it itself for small placeholder counts; spawns a small number of Haiku sub-agents inside its own main session for large counts). The agentic work is the quote-matching; the fetch is plumbing.
 
 ## Inputs
 
 - `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries with `claim:` + `doi:` + `decision_links:` but no `evidence:` selector. These are the placeholders LITERATURE resolves.
 - `work/reference/index.json#citations` — paper-extraction's cite-key → `{locations, citation, doi}` mapping for every entry in the target paper's bibliography. Used as the canonical cite-key → DOI lookup when cross-checking placeholder DOIs and surfacing unresolved-DOI cases.
-- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — the target paper; useful for context on how the cited paper is invoked.
-- **paper-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Useful when a placeholder's claim is ambiguous and you need to know what the target paper actually says around the citation site.
-- CLAUDE.md — **Rigor** for this spawn's chosen rigor level.
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — target paper text. Grep into for context on how the cited paper is invoked, when a placeholder's claim is ambiguous.
+- `constitution.md` — Fidelity intent (used to size cheap vs heavy on this iteration's review).
+- CLAUDE.md — Rigor *Current state* per output (so this iteration knows where prior insights currently sit).
 
 ## Outputs
 
@@ -31,7 +31,7 @@ Collect every `prior_insights:` entry whose `evidence:` is missing or empty. Gro
 Run paper-extraction's substrate script for each unique DOI **in batches of 5** via shell parallelism. paper-extraction's `extract-paper-substrate.py` is deterministic — no agent involvement needed. Each invocation writes to `work/cited/<doi-slug>/work/reference/`:
 
 ```bash
-# Pseudocode for the batched fetch loop the literature sub-agent runs.
+# Pseudocode for the batched fetch loop an iteration runs.
 # For each unique DOI in the placeholder set:
 mkdir -p work/cited/<doi-slug>
 cd work/cited/<doi-slug>
@@ -58,23 +58,23 @@ Resume: if `work/cited/<doi-slug>/work/reference/index.json` already exists, ski
 
 Once all substrate is in place, count placeholders:
 
-- **≤10 placeholders:** the literature sub-agent does the quote-finding itself. It walks the placeholders one at a time, greps into the relevant cited paper's substrate for terms from the claim, identifies the verbatim quote, and writes `{exact, prefix, suffix, page}` to `work/notes/literature/resolutions.yaml`. Single agent, low context overhead per placeholder (grep + targeted read, not whole-paper-absorption).
+- **≤10 placeholders:** the iteration does the quote-finding itself. It walks the placeholders one at a time, greps into the relevant cited paper's substrate for terms from the claim, identifies the verbatim quote, and writes `{exact, prefix, suffix, page}` to `work/notes/literature/resolutions.yaml`. Single agent, low context overhead per placeholder (grep + targeted read, not whole-paper-absorption).
 
-- **>10 placeholders:** the literature sub-agent partitions placeholders across **a small number of Haiku sub-agents** (rough rule: aim for 5–8 placeholders per Haiku, so 11–15 placeholders → 2 Haikus, 30 placeholders → 4 Haikus). Each Haiku gets its subset of placeholders + the substrate paths for the cited papers those placeholders reference. Haikus are cheap and fast and the work is well-bounded (grep + format YAML), so this is the right model. Each Haiku writes to `work/notes/literature/haiku-<N>.yaml`; literature reads them all, merges into `resolutions.yaml`, then writes back to `astra.yaml`.
+- **>10 placeholders:** the iteration partitions placeholders across **a small number of Haiku sub-agents** (rough rule: aim for 5–8 placeholders per Haiku, so 11–15 placeholders → 2 Haikus, 30 placeholders → 4 Haikus). Each Haiku gets its subset of placeholders + the substrate paths for the cited papers those placeholders reference. Haikus are cheap and fast and the work is well-bounded (grep + format YAML), so this is the right model. Each Haiku writes to `work/notes/literature/haiku-<N>.yaml`; the iteration reads them all, merges into `resolutions.yaml`, then writes back to `astra.yaml`.
 
-The exact Haiku threshold and partition size are heuristic — they trade off context-budget per Haiku vs. orchestration overhead. The literature sub-agent has discretion; the rule of thumb is "few enough to track easily, each one small enough to finish in a single fast turn."
+The exact Haiku threshold and partition size are heuristic — they trade off context-budget per Haiku vs. orchestration overhead. The iteration has discretion; the rule of thumb is "few enough to track easily, each one small enough to finish in a single fast turn."
 
 ### Stage 3 — Merge into astra.yaml
 
-The literature sub-agent reads `work/notes/literature/resolutions.yaml` and writes the resolutions back into `astra.yaml`:
+The iteration reads `work/notes/literature/resolutions.yaml` and writes the resolutions back into `astra.yaml`:
 
 - For each resolved placeholder, locate `prior_insights[<id>]` in `astra.yaml` (the placeholder already lives in its sub-analysis; the merge just sets its `evidence:` field).
 - For each unresolved placeholder, append a line to `open-questions.md` describing it — the user resolves at REVIEW close-out by either supplying a different citation, weakening the claim, or removing the placeholder entirely.
 - Run `astra validate astra.yaml --verify-evidence` after the merge to catch structural breakage early.
 
-Single writer (the literature sub-agent), no merge conflicts even when Haikus produced the inputs in parallel.
+Single writer (the iteration), no merge conflicts even when Haikus produced the inputs in parallel.
 
-## Quote-finding contract (used by both the literature sub-agent and Haiku sub-agents)
+## Quote-finding contract (used by both the iteration itself and any Haiku sub-agents the iteration spawns)
 
 The agent doing the quote-finding (literature itself, or each Haiku) follows the same contract. The Haiku prompt is just this contract with concrete placeholders + paths spliced in.
 
@@ -158,22 +158,17 @@ Rules:
   - Do NOT edit astra.yaml. The merge step does that.
 ```
 
-When the literature sub-agent fans out to Haikus, each Haiku is spawned with `model="haiku"` and gets this contract plus its assigned subset of placeholders and substrate paths.
+When the iteration fans out to Haikus, each Haiku is spawned with `model="haiku"` and gets this contract plus its assigned subset of placeholders and substrate paths.
 
-## Self-review (rigor chosen per spawn)
+## Review — by iteration boundary (default) or in-iteration fan-out (optional)
 
-After the merge lands, a fresh-context Task-tool sub-agent cross-checks each resolved `prior_insights:` entry against its cited paper:
+After the merge lands, the cross-check question is: do the `evidence:` quotes belong to the cited paper at the cited page? Do the quotes actually justify the placeholders' claims, or are they technically present but tangential? Do the claims actually support the decision options they're linked to via `decision_links:`?
 
-- Does the `evidence:` quote belong to the cited paper at the cited page? (`astra validate --verify-evidence` does the deterministic check; the sub-agent does the semantic check.)
-- Does the quote actually justify the placeholder's `claim:`? Or is the quote technically present but tangential?
-- Does the placeholder's `claim:` actually support the decision option it's linked to via `decision_links:`?
+**Default: review by iteration boundary.** The iteration that did the merge exits; the next iteration enters fresh, surveys, finds `astra.yaml`'s `prior_insights:` populated with `evidence:` selectors but no `work/notes/literature-review/round-N.md`, runs `astra validate --verify-evidence` for the deterministic check + a semantic re-read of each resolved insight, and writes review findings. The iteration after that applies the fixes (which may include re-running Haiku quote-finding for entries that need a different quote). Two consecutive review-iterations with verdict `clean` terminates the review cycle.
 
-The depth of self-review follows the rigor level the orchestrator picked for this spawn (read CLAUDE.md's **Rigor** section):
+**Optional: in-iteration fan-out.** When the placeholder count is large and the fidelity intent calls for *heavy*, the merge iteration (or a subsequent review iteration) can fan out parallel reviewers as one-level-deep sub-agents inside its own session, partitioned by cited-paper subset. Each reviewer writes findings for its subset; the iteration merges and applies fixes in the same session.
 
-- **Cheap:** skip review entirely, or run a single fresh-context reviewer pass and incorporate its fixes once.
-- **Heavy:** N rounds — each round spawns a fresh reviewer; literature incorporates fixes between rounds; the next round spawns another fresh reviewer that does not see the prior round's fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap.
-
-Each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check. Reviewers output findings only; the literature sub-agent edits `astra.yaml` between rounds (or re-spawns Haiku quote-finding for entries that need a different quote).
+Sized from the constitution's Fidelity intent: *cheap* — one clean review-iteration is enough; *heavy* — require two consecutive clean.
 
 ### Per-round fresh reviewer — prompt shape
 
@@ -200,7 +195,7 @@ Output findings to work/notes/literature-review/round-<N>.md, one fix
 per F-N entry. Verdict is `clean` or a count. Do NOT edit astra.yaml.
 ```
 
-If N hits the 5-round system cap without two consecutive clean rounds, the literature sub-agent stops and reports back to the orchestrator. If the user is reachable, ask in prose: "LITERATURE review reached round cap with N fixes still landing; continue, accept the current resolutions, or revise scope?" If unreachable, accept current state, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed or re-spawn.
+If 5 review-iterations have happened without two consecutive clean rounds, log the unfinished tail in `open-questions.md` ("LITERATURE review reached round cap with N fixes still landing; user should review during REVIEW close-out") and let the next iteration advance to IMPLEMENT anyway. Don't loop forever on literature review.
 
 ## Survey signals (entry into LITERATURE)
 
@@ -212,14 +207,14 @@ If N hits the 5-round system cap without two consecutive clean rounds, the liter
 - For cheap: at least one `work/notes/literature-review/round-<N>.md` with verdict `clean` (or no fixes were incorporated) ⇒ LITERATURE review done
 - For heavy: two consecutive `round-<N>.md` files with verdict `clean` ⇒ LITERATURE review done
 
-When all of the above hold ⇒ LITERATURE complete; orchestrator proceeds to IMPLEMENT.
+When all of the above hold ⇒ LITERATURE complete; the next iteration surveys and advances to IMPLEMENT.
 
 ## Notes
 
 - **Mechanical fetch is the substrate; quote-finding is the agentic work.** Don't conflate them. paper-extraction's deterministic script handles the fetch — batched-parallel via shell, no agent fan-out. Quote-finding is the semantic match between target-paper-claim and cited-paper-quote; that's the agent's job.
 - **paper-extraction is the canonical fetch mechanism.** Using `astra paper add` would give only the cached PDF; paper-extraction gives substrate (LaTeX source where available, structural index, figures, citations) which is much better material for verbatim quote-finding. The cost is small and parallelizable.
-- **Haiku is the right model for fan-out quote-finding.** Cheap, fast, well-suited to bounded grep-and-format work. Use Sonnet/Opus only when the placeholder count is small enough that the literature sub-agent does it itself anyway.
+- **Haiku is the right model for fan-out quote-finding.** Cheap, fast, well-suited to bounded grep-and-format work. Use Sonnet/Opus only when the placeholder count is small enough that the iteration does the quote-finding itself anyway.
 - **Resume is automatic.** If `work/cited/<doi-slug>/work/reference/index.json` exists, skip that DOI's fetch. If `work/notes/literature/resolutions.yaml` has an entry for a placeholder, skip that placeholder's quote-finding.
 - **Unresolved is not failure.** A placeholder that no quote in the cited paper supports is a real signal — the target paper cited loosely or paraphrased beyond what the source actually says. Surface to `open-questions.md`; don't fabricate evidence.
 - **`astra validate --verify-evidence` runs after the merge**, not after each Haiku's per-placeholder output. Haikus write to disjoint files; the deterministic check happens once `astra.yaml` is updated.
-- **Commit per stage.** Fetches commit together once Stage 1 completes (one commit for all cited-paper substrates). Quote-finding commits together once Stage 2 completes (`resolutions.yaml` + Haiku files). The merge into `astra.yaml` is its own commit. Each review round file commits as it lands. The orchestrator reads `git log` to see progress.
+- **Commit per stage.** Fetches commit together once Stage 1 completes (one commit for all cited-paper substrates). Quote-finding commits together once Stage 2 completes (`resolutions.yaml` + Haiku files). The merge into `astra.yaml` is its own commit. Each review round file commits as it lands. The next iteration reads `git log` to see progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index c1c92fb1..8c4d9949 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -1,6 +1,6 @@
-# REVIEW — orchestrator-session close-out
+# REVIEW — close-out in the user's main session
 
-The reproduction has converged (verdict `pass` or user-accepted `partial`). Control returns to the user. REVIEW is the second of two bookends that run in the orchestrator session itself, not as a named sub-agent (INTERVIEW being the first). It runs orchestrator-side because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which sub-agents don't have.
+The reproduction has converged: the constitution's `status:` is `closed` (after COMPARE returned `pass` or a cold-survey iteration accepted `partial` and logged the opportunities). The ralph loop's tmux session has exited. REVIEW runs back in the user's main session — the second of two interactive bookends, the first being INTERVIEW. It runs in the user's main session (not as an iteration) because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which isn't available inside detached ralph iterations.
 
 Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and propagate any un-acted-on opportunities from the latest COMPARE into CLAUDE.md's **Rigor** section — in one interactive arc.
 
@@ -14,7 +14,7 @@ The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding
 - `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
 - `open-questions.md` at the workdir root — running report from sub-agent phases (paper-vs-code conflicts, ambiguities, anything sub-agents flagged for user resolution)
 - `work/reference/index.json` and `work/reference/code-index.md` — for context
-- **paper-expert** and **code-expert** — still reachable via `SendMessage` if the user asks a follow-up question during REVIEW that the report and CLAUDE.md don't answer. The experts persist for the lifetime of the reproduction; they're useful here for "remind me what the paper says about X" or "did the original code do Y" without leaving the orchestrator session.
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) and `work/reference/code/` — directly available for follow-up questions the user asks during REVIEW that the report and CLAUDE.md don't answer ("remind me what the paper says about X", "did the original code do Y"). Grep into for specifics; read targeted spans by offset/limit.
 - `CLAUDE.md` at the workdir root — paper identity, Goal, Rigor, Paper-vs-code disagreements (the at-a-glance summary that's accumulated across all sub-agent spawns)
 
 ## Outputs
@@ -31,17 +31,17 @@ The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding
 
 ### `/figure-comparison` (mandatory)
 
-Invoke the `/figure-comparison` skill from the orchestrator session. It builds a portable HTML side-by-side comparing paper artifacts (from `targets/`) to reproduced artifacts (from `results/<universe>/`). The skill uses `AskUserQuestion` for any inputs it can't infer from the workdir; that works because REVIEW runs orchestrator-side — the prompts land in this session, not in a sub-agent's chat.
+Invoke the `/figure-comparison` skill from the user's main session. It builds a portable HTML side-by-side comparing paper artifacts (from `targets/`) to reproduced artifacts (from `results/<universe>/`). The skill uses `AskUserQuestion` for any inputs it can't infer from the workdir; that works because REVIEW runs back in the user's main session — the prompts land here, not in a detached iteration.
 
 Output lands at `.lightcone/comparison.html`. Show the user the path and offer to open it (`open` on macOS, `xdg-open` on Linux, or just print the path so they click in their terminal).
 
-**Do not spawn `/figure-comparison` under the `Task` tool or as a named sub-agent.** It has `AskUserQuestion` in its `allowed-tools`; sub-agents have no user-reach, so the prompt fires into nothing.
+**Do not spawn `/figure-comparison` under the `Task` tool or inside a ralph iteration.** It has `AskUserQuestion` in its `allowed-tools`; sub-agents and detached iterations have no user-reach, so the prompt fires into nothing.
 
 ### `/check-sentence-by-sentence` (opt-in)
 
 Ask the user via `AskUserQuestion` whether they want the claim audit. It's optional because for many reproductions the figure-comparison already settles "did it match?"; the sentence-by-sentence audit earns its keep when the paper makes many specific quantitative claims and the user wants each one anchored to a code location.
 
-If yes, invoke `/check-sentence-by-sentence`. Same discipline as `/figure-comparison` — it can prompt the user; do not spawn under `Task` or as a named sub-agent.
+If yes, invoke `/check-sentence-by-sentence`. Same discipline as `/figure-comparison` — it can prompt the user; do not spawn under `Task` or inside a ralph iteration.
 
 Output lands at `.lightcone/check-sentence-by-sentence.md` (or wherever the skill writes it). Show the user the path.
 
@@ -100,10 +100,10 @@ This commit is the durable mark that the reproduction has reached close-out. Fut
 
 ## Notes
 
-- **This phase runs in the orchestrator session.** Do not spawn it as a named sub-agent. The whole point of REVIEW is that the user is reachable — every step uses `AskUserQuestion` (directly, or via the sibling skills it invokes).
-- **`/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`.** That's why REVIEW runs orchestrator-side and they live here, not in any sub-agent. Spawning either as a sub-agent fires prompts into nothing.
-- **The user owns the verdict-acceptance decision.** REVIEW's purpose is to let the user see what the sub-agents did and decide whether they accept it. The skill renders surfaces and asks; it does not unilaterally close.
-- **Don't confuse with the per-spawn self-reviews.** ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT each run their own internal fresh-context self-review passes during their work. Those are unrelated to this close-out — same word, different jobs. The phase boundary makes them unambiguous: per-spawn self-reviews live inside their host phase's reference; this one is the orchestrator-session close-out.
+- **This phase runs in the user's main session.** Do not invoke it from inside a ralph iteration. The whole point of REVIEW is that the user is reachable — every step uses `AskUserQuestion` (directly, or via the sibling skills it invokes), and iterations are detached.
+- **`/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`.** That's why REVIEW runs in the user's main session and they live here, not in any iteration. Invoking either inside an iteration fires prompts into nothing.
+- **The user owns the verdict-acceptance decision.** REVIEW's purpose is to let the user see what the loop's iterations did and decide whether they accept it. The skill renders surfaces and asks; it does not unilaterally close.
+- **Don't confuse with the per-phase reviews inside the loop.** ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT each have their own fresh-context review discipline that happens by iteration boundary (or in-iteration fan-out). Those are unrelated to this close-out — same word, different jobs. The phase boundary makes them unambiguous: per-phase reviews live inside their host phase's reference; this one is the post-loop close-out in the user's main session.
 - **Open-question resolutions are durable.** Append to `open-questions.md`'s `## Resolutions` section so the next re-run / future session sees what was decided. Do not delete the original questions.
 - **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous.
 - **Do not invent further work.** If the user has accepted the verdict and the opportunities are propagated, the reproduction is done. The next session, the user, or a future revisit can decide whether tightening any open opportunity still serves them.
diff --git a/claude/lightcone/skills/lc-from-paper/references/run.md b/claude/lightcone/skills/lc-from-paper/references/run.md
index a4e7e456..d84c0dd0 100644
--- a/claude/lightcone/skills/lc-from-paper/references/run.md
+++ b/claude/lightcone/skills/lc-from-paper/references/run.md
@@ -2,7 +2,7 @@
 
 Materialize every output in `astra.yaml` for the requested universe. RUN is mostly mechanical — `lc run --universe <id>` does the heavy lifting. The phase exists as a discrete step so failures get diagnosed and re-run before COMPARE.
 
-This phase runs as the orchestrator-spawned `run` sub-agent. The user can drop into its chat if execution failures want diagnosis support; otherwise it logs failures, attempts targeted fixes within scope, and reports back. Universe defaults to `baseline` unless the orchestrator passes a different one when spawning.
+This phase runs as what a ralph iteration does when the workdir signals "recipes present in astra.yaml + scripts/ committed + results/<universe>/<output>/ absent for any output." The iteration runs the recipes, diagnoses failures, attempts targeted fixes, and exits. Universe defaults to `baseline`.
 
 ## Inputs
 
@@ -21,7 +21,7 @@ Execute all recipes:
 lc run --universe baseline
 ```
 
-(Use whatever universe the orchestrator passed when spawning; `baseline` is the default.)
+(Universe defaults to `baseline`; iterations override if the constitution scopes a different universe.)
 
 Check status:
 
@@ -48,10 +48,10 @@ If outputs fail:
 ## Survey signals (entry into RUN)
 
 - `astra.yaml` has recipes and validates ⇒ ready to run
-- `lc status --universe baseline` returns all `ok` ⇒ RUN done; orchestrator proceeds to COMPARE
+- `lc status --universe baseline` returns all `ok` ⇒ RUN done; the next iteration surveys and advances to COMPARE
 
 ## Notes
 
 - The runner backend (Docker / local / SLURM) comes from the project's target configuration — `~/.lightcone/config.yaml` and `.lightcone/lightcone.yaml`. RUN does not need to choose; the runner picks based on config.
-- For long-running computations, the script's stdout / stderr stream into the result directory's log file. The run sub-agent should `tail` the log file to monitor progress, not poll `lc status` repeatedly.
-- **Commit the materialized results' state when RUN settles.** The actual `results/` artifacts are gitignored heavy data, but the run-level outcome (which outputs reached `ok`, any failures logged) is worth a commit so the orchestrator can read `git log` to know RUN landed.
+- For long-running computations, the script's stdout / stderr stream into the result directory's log file. The iteration should use the Monitor tool on the log file to stream events (each stdout line surfaces as a notification), not poll `lc status` repeatedly. For one-shot waits, Bash with `run_in_background` notifies on completion.
+- **Commit the materialized results' state when RUN settles.** The actual `results/` artifacts are gitignored heavy data, but the run-level outcome (which outputs reached `ok`, any failures logged) is worth a commit so the next iteration can read `git log` to know RUN landed.
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index ba7ddf1d..e04c8a8b 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -1,27 +1,24 @@
 # SPECIFY — fill the stub `astra.yaml`, two passes per sub-analysis
 
-Read the stub `astra.yaml` from ARCHITECT and fill in `decisions:`, `prior_insights:`, `findings:` per sub-analysis, weaving the existing narrative with `astra-anchor:` references as entries land. SPECIFY is the **first user-ratification seam** — material paper-vs-code conflicts surface here, and they're often the highest-value moments for the user to weigh in on directly.
+Read the stub `astra.yaml` from ARCHITECT and fill in `decisions:`, `prior_insights:`, `findings:` per sub-analysis, weaving the existing narrative with `astra-anchor:` references as entries land. SPECIFY is the **first material-disagreement seam** — paper-vs-code conflicts surface here, and they're often the highest-value moments for the user to weigh in on at REVIEW.
 
-This phase runs as the orchestrator-spawned `specify` sub-agent. When the orchestrator launches it, it announces to the user: *"specify is the natural seam for paper-vs-code conflicts — drop into its chat if you want to ratify them as they come up; otherwise it'll take code as canonical and log disagreements to CLAUDE.md."* If the user is reachable in the sub-agent's chat, SPECIFY asks paper-vs-code conflicts in prose. If not, it takes the canonical-resolution default (code wins where paper and code disagree on a material choice) and logs the disagreement to CLAUDE.md's **Paper-vs-code disagreements** section plus `open-questions.md` for REVIEW close-out.
+SPECIFY is what a ralph iteration does when the workdir signals "stub `astra.yaml` present + sub-analyses' `decisions:` / `prior_insights:` / `findings:` blocks still empty." Iterations run detached in tmux; the user isn't reachable interactively, so the canonical-resolution default (code wins where paper and code disagree on a material choice) applies and disagreements are logged to CLAUDE.md's **Paper-vs-code disagreements** section plus `open-questions.md` for REVIEW close-out.
 
-The new structure runs **two passes per sub-analysis** (paper, then code, when code exists), then a self-review pass whose depth follows the rigor level the orchestrator picked for this spawn. The two passes are the cross-check: the paper pass authors what the paper says; the code pass surfaces where the code says something different; the difference is gold (it's where the reproduction has to make a decision).
+The structure runs **two passes per sub-analysis** (paper, then code, when code exists), then iteration-boundary review (or optional in-iteration fan-out). The two passes are the cross-check: the paper pass authors what the paper says; the code pass surfaces where the code says something different; the difference is gold (it's where the reproduction has to make a decision).
 
-Per-sub-analysis work is parallelizable when sub-analyses are independent. Each sub-analysis's two passes (paper, then code) run sequentially within that sub-analysis; across sub-analyses the work fans out via Task-tool sub-sub-agents from inside the specify session.
-
-When the specify sub-agent (or its per-sub-analysis sub-sub-agents) needs paper- or code-side context, prefer **querying paper-expert / code-expert via `SendMessage`** over re-reading materials directly. The experts already have deep context built up from ACQUIRE; SendMessage queries are cheaper and richer than fresh Explore passes. Falling back to direct reads (Grep on `work/reference/source/` / `document.md` / `code/`) is still fine for specific verbatim quote-hunting, but the experts should be the first stop for understanding.
+Per-sub-analysis work is parallelizable when sub-analyses are independent. Each sub-analysis's two passes (paper, then code) run sequentially within that sub-analysis; across sub-analyses the iteration can fan out parallel work as one-level-deep sub-agents from inside its main session. When SPECIFY needs paper- or code-side context, Grep into `work/reference/source/` / `document.md` for paper text or read targeted modules under `work/reference/code/`; the structural index at `work/reference/index.json` and the code inventory at `work/reference/code-index.md` give you the orientation to know where to look. Don't try to absorb the paper or code whole.
 
 ## Inputs
 
 - `astra.yaml` — the stub from ARCHITECT (sub-analyses, inputs, outputs, narrative; empty `decisions:` / `prior_insights:` / `findings:` blocks)
+- `constitution.md` — Goal (scope), Fidelity intent (used to size cheap-vs-heavy on this iteration's work), Quality bar
+- `CLAUDE.md` — Rules; Rigor *Current state* for the per-output trajectory tracking; **Paper-vs-code disagreements** for prior-iteration entries
 - `work/reference/index.json` — paper-extraction's structural index: figures, tables, section outline, citations. The `citations:` block maps each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. SPECIFY uses this to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch.
 - `work/reference/code-index.md` (when code present) — code inventory: module map, candidate decisions with file:line, entry-points, data dependencies, gotchas.
-- **paper-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Ask the deeper paper-side questions the structural index doesn't answer: "what decisions does the paper describe for the apodization choice", "where does the paper define the fiducial cosmology", "what does §4.2 conclude about the null tests". paper-expert has the paper's full context built up.
-- **code-expert** (agent ID passed in by the orchestrator) — reachable via `SendMessage`. Ask: "which module implements the BAO fit", "what's the default magnitude cut hardcoded in this script", "how does the code split data into bins". code-expert has the code's full context built up.
-- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
+- `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text. Grep into for specific facts; read targeted spans by offset/limit when you need more context. Don't re-read whole.
 - `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (Path B only)
-- `work/reference/code/` (if present) — original code, canonical reference for numerics + method
-- CLAUDE.md — **Goal** for scope, **Rigor** for the rigor level the orchestrator chose for this spawn, **Paper-vs-code disagreements** for prior-spawn entries
-- `work/notes/notes.md` — user-supplied context (read by every sub-agent if present)
+- `work/reference/code/` (if present) — original code, canonical reference for numerics + method. Read the modules that `code-index.md` points at for the sub-analysis you're filling.
+- `work/notes/notes.md` — user-supplied context (read by every iteration if present)
 
 ## Outputs
 
@@ -29,8 +26,8 @@ When the specify sub-agent (or its per-sub-analysis sub-sub-agents) needs paper-
 - `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
 - `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
 - `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
-- CLAUDE.md updates — append entries to **Paper-vs-code disagreements** for each material conflict surfaced; update **Rigor** with the post-spawn state of `astra.yaml` per sub-analysis (e.g. *baseline* after a cheap pass, *tightened* after heavy review)
-- `work/notes/specify-review/<sub-analysis>-round-<N>.md` — each review round's findings (one file per round per sub-analysis; how many rounds depends on the rigor level)
+- CLAUDE.md updates — append entries to **Paper-vs-code disagreements** for each material conflict surfaced; update **Rigor** *Current state* with the post-iteration state of `astra.yaml` per sub-analysis (e.g. *baseline* after a one-iteration write, *tightened* after a review-iteration applied fixes)
+- `work/notes/specify-review/<sub-analysis>-round-<N>.md` — each review iteration's findings (one file per review-iteration per sub-analysis; subsequent fix iterations apply them)
 
 ## Substrate skills to invoke
 
@@ -84,30 +81,25 @@ Read the code that implements this sub-analysis (`work/reference/code-index.md`'
    - **Material** = a different choice would plausibly change a numeric result the paper reports.
    - **Stylistic / cosmetic / pure-tooling** = not material; record in `implementation-notes.md` and move on.
 
-   For **material** disagreements, behavior depends on whether the user is reachable:
-   - **User reachable** (in the specify sub-agent's chat or in the orchestrator session): ask in prose — present the paper's stated method (with quote + section), the code's actual method (with `path:line`), the plausible impact ("changes the BAO peak amplitude by ~5%"), and offer three paths: paper, code, or something custom. The user can also defer ("just take code, I'll look in REVIEW"). **Default on user silence is code when `work/reference/code/` exists, otherwise paper.**
-   - **User unreachable** (sub-agent surface dismissed and orchestrator hasn't relayed): take **code as canonical** per the canonical-resolution rule, append the conflict to CLAUDE.md's **Paper-vs-code disagreements** section AND to `open-questions.md` so the user sees it at the next session boundary, and let `universes/baseline.yaml` select the code's method. The user can flip the baseline at REVIEW close-out.
-
-   Either way, the override is preserved in `astra.yaml` as a `decisions:` entry with both options preserved, plus the `universes/baseline.yaml` selecting whichever option won. A `findings:` entry (or an insight if the conflict matters for replication discipline broadly) records the conflict with quote + line evidence.
+   For **material** disagreements: take **code as canonical** per the canonical-resolution rule (the iteration runs detached; the user isn't reachable interactively). Append the conflict to CLAUDE.md's **Paper-vs-code disagreements** section AND to `open-questions.md` so the user sees it at REVIEW close-out, with the verbatim paper quote + the `path:line` code anchor + a plausible-impact one-liner ("changes the BAO peak amplitude by ~5%"). Let `universes/baseline.yaml` select the code's method. Preserve both options in the `astra.yaml` `decisions:` entry; the user can flip the baseline at REVIEW close-out.
 
 2. **Code-revealed insights and findings.** Things the code does that the paper doesn't describe (a calibration version, a cut stricter than stated, a hyperparameter the paper compressed). These earn `findings:` entries with `path:line` evidence anchors against the code (when an output corresponds), or `implementation-notes.md` bullets (when no formal output corresponds).
 
 3. **Decision-option augmentation.** Where the code reveals an option the paper didn't mention but is defensible (a sibling implementation alternative used in the codebase or referenced in a comment), add it as a sibling option to the relevant `decisions:` entry. Do not pre-emptively author every code variant; only the ones that bear on a real choice.
 
-4. **Surface paper-vs-code material disagreements** in prose (when user reachable) or to CLAUDE.md's **Paper-vs-code disagreements** section + `open-questions.md` (when user unreachable). The verbatim paper quote + the `path:line` code anchor + the plausible-impact one-liner should make it into both surfaces so the user sees enough to decide at REVIEW close-out.
+### Review — by iteration boundary (default) or in-iteration fan-out (optional)
 
-### Pass C — self-review (rigor chosen per spawn)
+After the paper + code passes land for a sub-analysis, the cross-check question is: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
 
-After the paper + code passes land for a sub-analysis, a fresh-context Task-tool sub-agent cross-checks: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
+**Default: review by iteration boundary.** The iteration that wrote a sub-analysis's passes exits when its passes are done; the next iteration enters fresh, surveys, finds the sub-analysis's passes present but no `work/notes/specify-review/<sub>-round-N.md`, reads the slice of `astra.yaml` + the paper + the code, and writes review findings. The iteration after that applies the fixes. Two consecutive review-iterations with verdict `clean` per sub-analysis terminates that sub-analysis's review cycle. The fresh-context-no-bias property is automatic at iteration boundaries — review iteration N doesn't see review iteration N-2's fixes any more than it would if they were sub-agent spawns. The depth is sized from the gap between CLAUDE.md's Rigor *Current state* for the sub-analysis and `constitution.md`'s Fidelity intent: *cheap* — accept after one clean review-iteration; *heavy* — require two consecutive clean.
 
-Self-review depth follows the rigor level the orchestrator picked for this spawn (read CLAUDE.md's **Rigor** section). Same shape as ARCHITECT's review pass and IMPLEMENT's:
+**Optional: in-iteration fan-out.** When there are many independent sub-analyses to review and the iteration's fidelity-intent calculus calls for *heavy*, the iteration that holds the review work can fan out one-level-deep sub-agents (one per sub-analysis) inside its own session, merge findings, and apply fixes in the same iteration. This trades the natural fresh-context property between iterations for parallelism within an iteration. Reach for this only where the parallelism actually pays — most reproductions have small enough sub-analysis counts that sequential review via iteration boundaries is cleaner.
 
-- **Cheap:** skip self-review, or run a single fresh Task-tool sub-agent pass and incorporate its fixes once.
-- **Heavy:** N rounds — each round spawns a fresh Task-tool reviewer; fixes are incorporated; the next round spawns another fresh reviewer that has not seen the fixes. Iterate until two consecutive rounds find no fixes, or a 5-round system cap. Each round runs a brand-new sub-agent that does NOT see prior rounds' findings or fixes — pattern-matching on prior fixes defeats the cross-check.
+#### Per-review-iteration prompt (whether sequential or fan-out)
 
-#### Per-round fresh sub-agent — system prompt
+The check-list and findings-file shape below applies whether the review work happens in a standalone iteration (default) or as a fan-out sub-agent inside an iteration (optional). Either way, the reviewer reads the slice fresh and writes findings only — never edits `astra.yaml` directly; that's the next iteration's (or the fan-out merge step's) job.
 
-> You are a SPECIFY reviewer for one sub-analysis. Read the relevant slice of `astra.yaml`, the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
+> You are a SPECIFY reviewer for one sub-analysis. Read the relevant slice of `astra.yaml`, the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers (whether across iterations or across fan-out spawns); do not assume anything has already been fixed.
 >
 > ### Inputs
 >
@@ -117,7 +109,7 @@ Self-review depth follows the rigor level the orchestrator picked for this spawn
 > - `work/reference/index.json` — the decision clusters and result loci that scoped the work
 > - `work/reference/code-index.md` (when code present)
 > - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
-> - `work/reference/code/` (when present) — canonical reference for numerics + method
+> - `work/reference/code/` (when present) — canonical reference for numerics + method (read targeted modules pointed at by code-index.md)
 > - `work/reference/index.json#citations` — cite-key → `{locations, citation, doi}` mapping from paper-extraction (use to confirm each `prior_insights:` placeholder's `doi:` matches what the paper cites)
 >
 > ### What to check
@@ -133,7 +125,7 @@ Self-review depth follows the rigor level the orchestrator picked for this spawn
 >
 > ### What NOT to do
 >
-> - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; a SPECIFY-fix pass responds to the findings. Editing here defeats the multi-round-fresh-context discipline.
+> - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; the next iteration (or the fan-out merge step) applies the fixes. Editing here defeats the fresh-context discipline that makes review work.
 > - **Do not flag missing `recipes:`.** Recipes are IMPLEMENT's, not SPECIFY's.
 > - **Do not re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/reference/index.json`.
 > - **Do not invent problems.** If the sub-analysis is consistent with paper + code, say so briefly.
@@ -162,23 +154,26 @@ Self-review depth follows the rigor level the orchestrator picked for this spawn
 > - **clean** | **needs-fixes**
 > ```
 
-#### SPECIFY-fix pass between rounds
+#### Applying fixes (the iteration after the review-iteration)
 
-After each round's findings file lands, a SPECIFY-fix pass (or the orchestrator inline for trivial mechanical fixes) edits `astra.yaml` for the sub-analysis, plus `universes/baseline.yaml` and `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`:
+The iteration after each review-iteration reads `work/notes/specify-review/<sub-analysis>-round-<N>.md`, applies the fixes to `astra.yaml` for the sub-analysis (plus `universes/baseline.yaml` and `implementation-notes.md` per the suggested fixes), commits, and exits. After any change to `astra.yaml`:
 
 ```bash
 astra validate astra.yaml
 astra validate astra.yaml --verify-evidence  # after LITERATURE has resolved the prior_insights placeholders
 ```
 
+For in-iteration fan-out, the iteration that spawned the parallel reviewers reads each reviewer's findings file and merges fixes back into `astra.yaml` itself in the same iteration; the next iteration's survey acts as the consolidating review.
+
 #### Termination
 
-- **Cheap:** one pass per sub-analysis. Done after fixes (or immediately, if `fixes_needed` was 0).
-- **Heavy:**
-  - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
-  - If round N is the first round (N=1), spawn round 2 unconditionally so we can compare.
-  - If round N produced fixes, spawn round (N+1) as a fresh sub-agent that does not see round N's findings or the fixes.
-  - If N hits the 5-round system cap without two consecutive clean rounds, the specify sub-agent stops on this sub-analysis and reports back to the orchestrator. If the user is reachable, ask in prose: "SPECIFY review for <sub-analysis-id> reached round cap with N fixes still landing; continue, accept the current spec, or revise scope?" If the user is unreachable, accept the current sub-analysis spec, log the unfinished tail in `open-questions.md`, and let the orchestrator decide whether to proceed or re-spawn.
+- **Cheap (sequential):** one review-iteration per sub-analysis after the passes land. Done after the next iteration applies the fixes (or immediately, if `fixes_needed` was 0).
+- **Heavy (sequential):**
+  - If review-iteration N's `fixes_needed` was 0 AND review-iteration (N-1)'s was also 0 → done.
+  - If review-iteration N is the first review (N=1), the next review-iteration runs unconditionally so we can compare across two fresh passes.
+  - If review-iteration N produced fixes, the next iteration applies them, and the iteration after that runs the next review fresh.
+  - If 5 review-iterations have happened without two consecutive clean rounds, log the unfinished tail in `open-questions.md` ("SPECIFY review for <sub-analysis-id> reached round cap; user should review during REVIEW close-out") and let the next iteration advance to LITERATURE.
+- **In-iteration fan-out (when chosen):** one iteration writes the passes; the next spawns N parallel reviewers and merges fixes in the same iteration. The iteration after that runs the next standalone review-iteration (or another fan-out, the iteration's call) for the second clean pass.
 
 When all sub-analyses' reviews terminate, SPECIFY produces the final outputs:
 
@@ -203,7 +198,7 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - **Do NOT add executable implementation code or invented run commands.** Do add concise provenance / recipe descriptions where ASTRA fields support them, especially for paper-derived calculations, figure generation, imported constants, and values that IMPLEMENT will need to regenerate.
 - **Equation and section numbers must match the rendered paper / PDF**, not a naïve count of TeX blocks or markdown headings. When citing "eq. N" or "§N", find the equation or heading by content in the rendered paper and use the printed number.
 - **Validate** with `astra validate astra.yaml` after each pass.
-- **Work primarily through paper-expert and code-expert** via `SendMessage` — they have the deep context built up. Use `work/reference/index.json` and `work/reference/code-index.md` for structural lookups, and `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) only to verify specific verbatim quotes (Grep for terms, or read targeted sections with offset/limit). Do not re-read the whole paper.
+- **Targeted reads, not whole-paper absorption.** Use `work/reference/index.json` and `work/reference/code-index.md` for structural lookups; Grep into `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) for specific verbatim quotes; read targeted code modules under `work/reference/code/` for canonical method details. Don't re-read the whole paper or whole code base.
 - **The narrative skill is the prose author, not the structure author.** SPECIFY weaves anchors into the prose ARCHITECT wrote — the structural surface is fixed, the anchored references are SPECIFY's contribution.
 
 ## Survey signals (entry into SPECIFY)
@@ -220,9 +215,9 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 
 ## Notes
 
-- **Material conflicts that the user explicitly defers** are appended to CLAUDE.md's **Paper-vs-code disagreements** section AND `open-questions.md`. CLAUDE.md is the at-a-glance summary every future sub-agent or orchestrator session sees; `open-questions.md` is the autonomous-mode resolution accumulator. Both lead to the same place: the user resolves at REVIEW close-out.
+- **Material disagreements** are appended to CLAUDE.md's **Paper-vs-code disagreements** section AND `open-questions.md`. CLAUDE.md is the at-a-glance summary every iteration sees; `open-questions.md` is the user-resolution accumulator. Both lead to the same place: the user resolves at REVIEW close-out.
 - **The narrative skill is the prose author, not the structure author.** SPECIFY's job is content correctness; `/narrative` invocation comes during the paper pass when authoring or extending the narrative prose to weave in anchor references.
 - **The target ledger is a derivation, not a separate phase's output.** Treat `targets/targets.md` as a small index produced alongside the filled `astra.yaml`, not a heavyweight artifact. The depth lives in `astra.yaml`'s `outputs:` / `findings:` / `decisions:`.
-- **Two-pass discipline is the cross-check.** Skipping the code pass (when code exists) loses the canonical-resolution surface and lets paper-vs-code material disagreements slip through. The fresh-context self-review can recover *some* of these but not all — the disciplined sequence (paper → code → self-review) catches more.
-- **Per-sub-analysis parallelism is opt-in.** When sub-analyses are independent (no shared decision blocks, no cross-sub-analysis findings), spawn one Task-tool sub-sub-agent per sub-analysis from inside specify's session to run its passes in parallel. When they share material decisions or findings (rare), serialize.
-- **Commit per sub-analysis as it lands.** Each sub-analysis's filled-in `astra.yaml` slice + its targets/implementation-notes/baseline updates earn one commit; review-round files commit one per round. The orchestrator reads `git log` to track progress; small commits keep the trail readable.
+- **Two-pass discipline is the cross-check.** Skipping the code pass (when code exists) loses the canonical-resolution surface and lets paper-vs-code material disagreements slip through. The fresh-context review (iteration boundary or in-iteration fan-out) can recover *some* of these but not all — the disciplined sequence (paper → code → review) catches more.
+- **Per-sub-analysis parallelism is opt-in.** When sub-analyses are independent (no shared decision blocks, no cross-sub-analysis findings), the iteration can fan out one-level-deep sub-agents (one per sub-analysis from inside its main session) to run their passes in parallel. When they share material decisions or findings (rare), serialize across iterations.
+- **Commit per sub-analysis as it lands.** Each sub-analysis's filled-in `astra.yaml` slice + its targets/implementation-notes/baseline updates earn one commit; review files commit one per review-iteration. The next iteration reads `git log` to track progress; small commits keep the trail readable.

From 0977eba6e9f7ade24cdf4bae6fa1abaca8ea587b Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 04:38:12 +0200
Subject: [PATCH 056/124] CLAUDE.md: add ralph to the skills enumeration

Architecture-block listing of bundled skills picks up `ralph` (the loop
substrate `lc-from-paper` uses for the long middle of a reproduction).
Comment-style enumeration mirrors what skills/README.md describes in
detail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 CLAUDE.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index feb9b56c..1bf0a8c3 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -68,9 +68,10 @@ src/lightcone/              # namespace — NO __init__.py
 
 claude/lightcone/           # Claude plugin source — force-included into the wheel
 ├── skills/                 # lc-new, lc-from-code, lc-from-paper,
-│                            # lc-feedback;
+│                            # lc-feedback, ralph;
 │                            # paper-reproduction bundle: lc-from-paper (entry),
-│                            # narrative, paper-extraction, figure-comparison,
+│                            # ralph (loop substrate), narrative,
+│                            # paper-extraction, figure-comparison,
 │                            # check-sentence-by-sentence
 │                            # (see skills/README.md for the full bundle map)
 ├── agents/                 # lc-extractor

From d7b48f87b9f01e8dae6b6d41e989b365600479aa Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 09:54:15 +0200
Subject: [PATCH 057/124] ralph + lc-from-paper docs: cold-survey cleanup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Four stragglers from the Round 7 reversal the prior iteration missed,
caught on a fresh-eyes pass:

- claude/lightcone/skills/ralph/scripts/ralph: the per-iteration prompt
  template told the worker to activate "the ralph-loops skill" — the
  pre-merge name. After the constitution + ralph-loops collapse into
  a single `ralph` skill, that activation would silently fail. Each
  iteration would land without the iteration discipline loaded. Fix:
  "ralph skill" + "Loop protocol" (matching the merged SKILL.md's
  section name).

- docs/user/agent-workflow.md: the user-facing `/lc-from-paper`
  section still described "orchestrator session + named per-phase
  sub-agents" verbatim from Round 6. Rewritten to describe the
  interview-first agent + ralph-loop architecture, with both
  per-paper artifacts (constitution.md driving the loop, CLAUDE.md
  walk-up), the detached-tmux loop semantics, REVIEW close-out back
  in the user's session.

- docs/skills/paper-extraction.md: "Related" section mentioned a
  `paper-expert` sub-agent reading index.json. Reframed to describe
  the iteration-level invocation: `/paper-extraction` is invoked
  during ACQUIRE for the target paper and again from inside a ralph
  iteration for each cited paper during LITERATURE.

- claude/lightcone/skills/lc-from-code/SKILL.md: the Explore-agent
  scan prompt referred to "The orchestrator will filter down later".
  Generalized to "The caller will filter down later" — fits both the
  fresh-migration mode (where the main session is the caller) and any
  future invocation contexts without the loaded "orchestrator" word
  the bundle just walked away from.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-code/SKILL.md |  2 +-
 claude/lightcone/skills/ralph/scripts/ralph   |  2 +-
 docs/skills/paper-extraction.md               |  5 +-
 docs/user/agent-workflow.md                   | 56 ++++++++++++-------
 4 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-code/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
index f3a0a158..4480fffd 100644
--- a/claude/lightcone/skills/lc-from-code/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -53,7 +53,7 @@ Return the results as a markdown table:
 
 And a separate list of ALL candidate decisions with file:line references.
 Err on the side of completeness — include anything that could plausibly
-be an analytical choice. The orchestrator will filter down later.
+be an analytical choice. The caller will filter down later.
 
 For reference, here are the decision criteria for classifying candidates:
 <decision-criteria>
diff --git a/claude/lightcone/skills/ralph/scripts/ralph b/claude/lightcone/skills/ralph/scripts/ralph
index 993fb586..9625e8d1 100755
--- a/claude/lightcone/skills/ralph/scripts/ralph
+++ b/claude/lightcone/skills/ralph/scripts/ralph
@@ -98,7 +98,7 @@ $SPEC_CONTENT
 SYSEOF
 
     cat > "$PROMPT_FILE" << 'PROMPTEOF'
-You are inside a ralph loop — meditative iteration toward a desired state. Activate the ralph-loops skill and follow its iteration protocol against the constitution above. The workdir's CLAUDE.md auto-loads; read it on entry.
+You are inside a ralph loop — meditative iteration toward a desired state. Activate the ralph skill and follow its Loop protocol against the constitution above. The workdir's CLAUDE.md auto-loads; read it on entry.
 PROMPTEOF
 
     PROMPT=$(cat "$PROMPT_FILE")
diff --git a/docs/skills/paper-extraction.md b/docs/skills/paper-extraction.md
index c1b68542..b0b601bd 100644
--- a/docs/skills/paper-extraction.md
+++ b/docs/skills/paper-extraction.md
@@ -126,7 +126,8 @@ cached PDF — paraphrasing breaks the gate.
 ## Related
 
 - [`/lc-from-paper`](lc-from-paper.md) — invokes `/paper-extraction`
-  during ACQUIRE; the resulting `paper-expert` sub-agent reads
-  `index.json` and the substrate.
+  during ACQUIRE for the target paper, and again from inside a ralph
+  iteration for each cited paper during LITERATURE; each iteration
+  reads `index.json` and the substrate directly.
 - [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md)
   — Insight + Evidence shape, `quote.exact` rules.
diff --git a/docs/user/agent-workflow.md b/docs/user/agent-workflow.md
index 8ab18c57..a11ad451 100644
--- a/docs/user/agent-workflow.md
+++ b/docs/user/agent-workflow.md
@@ -71,28 +71,46 @@ parameter plumbing.
 ## `/lc-from-paper` — reproduce a published paper
 
 **You have a DOI or arXiv ID. You end with a reproduction project
-driven by an orchestrator session and named per-phase sub-agents.**
+driven by an interview-first agent that hands off to a long-running
+ralph loop for the heavy middle.**
 
 `/lc-from-paper` is the entry point of the paper-reproduction bundle.
 It opens with a short interactive interview — paper identity, scope
-(full vs targeted), and any paper-specific conventions — then drafts
-a per-paper `CLAUDE.md` (the durable spec every sub-agent walks up to).
-After approval, the skill becomes a persistent **orchestrator session**
-that spawns named per-phase sub-agents (`acquire`, `architect`,
-`specify`, `literature`, `implement`, `run`, `compare`) you can drop
-into directly via the chat surface. The two bookends — INTERVIEW at
-start and REVIEW at close-out — run in the orchestrator session itself.
-
-Rigor is chosen per spawn from CLAUDE.md's Rigor section: cheap
-(skip self-review or one fresh-context pass) or heavy (iterate
-fresh-context review until two consecutive clean rounds). COMPARE
-returns a verdict plus an opportunity assessment — where the gaps are
-and how much they likely matter — so you and the orchestrator can
-decide whether to spend another IMPLEMENT round or land at the
-current rigor.
-
-The bundle composes sibling skills: `paper-extraction`, `narrative`,
-`figure-comparison`, and `check-sentence-by-sentence`. See
+(full vs targeted), fidelity intent (your prose answer to "when is
+this good enough"), and any paper-specific conventions — then drafts
+**two files** at the reproduction workdir root: `constitution.md`
+(the ralph loop's driving document — Goal, fidelity intent, scope,
+quality bar, evidence) and `CLAUDE.md` (the auto-loading walk-up with
+rules, the Rigor accumulator, and the paper-vs-code disagreements log).
+ACQUIRE then runs in the same session, standing up the paper and code
+substrate via `/paper-extraction` and `/lc-from-code` in scan-only mode.
+
+After ACQUIRE lands, the skill launches a **ralph loop** in a detached
+tmux session against `constitution.md`. Each iteration starts a fresh
+worker that surveys the workdir, picks the next valuable move
+(typically one of ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN
+→ COMPARE), does it, commits, exits. The fresh-context property
+between iterations is what makes per-phase review work: iteration N
+writes, iteration N+1 reads N's work without bias. You attach to the
+loop with `tmux attach` to watch or steer; iterations are detached so
+they can't ask you questions interactively — they log open questions
+to `open-questions.md` with a best-judgment default and the loop
+keeps moving.
+
+When the loop closes (constitution `status: closed` after COMPARE
+returns `pass` and a cold-survey iteration finds nothing left to
+improve), come back and the agent runs **REVIEW close-out** in your
+session: `/figure-comparison` against the targets, optional
+`/check-sentence-by-sentence`, a walk through the accumulated open
+questions, a `REPRODUCTION-SUMMARY.md`. COMPARE's opportunity
+assessment — where the gaps are, how much they likely matter, and how
+they sit relative to your fidelity intent — propagates into
+CLAUDE.md's Rigor section as the trajectory of what could be tightened
+on a return visit.
+
+The bundle composes sibling skills: `ralph` (the loop substrate),
+`paper-extraction`, `narrative`, `figure-comparison`, and
+`check-sentence-by-sentence`. See
 [`claude/lightcone/skills/README.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md)
 for the full bundle map.
 

From b7e0998d52e91acd79f498c1f8004018c1fb3553 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 09:59:28 +0200
Subject: [PATCH 058/124] lc-from-paper: two small clarifications in SKILL.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cold-eyes read caught two awkwardnesses the prior cold survey didn't:

- CLAUDE.md description listed "rules" alongside "running accumulators"
  — but rules are the standing discipline (template's defaults; rarely
  touched), not something iterations accrete. Split: "standing rules
  plus running accumulators (rigor state per output, paper-vs-code
  disagreements log)".

- Closing-conditions parenthetical was muddled ("by an iteration after
  COMPARE returns `pass` or after the iteration logs accepted
  opportunities") — conflated two iterations and the word "accepted"
  was unclear. Rewritten: COMPARE returns pass (or user-accepted
  partial), un-acted opportunities are logged, then a subsequent
  cold-survey iteration flips `status:` to `closed`. Matches what
  compare.md and the closing rule actually describe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-paper/SKILL.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index e90b617d..fb05569d 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -22,7 +22,7 @@ The architecture is two-piece:
 
 2. **A ralph loop for the long middle.** Once the per-paper `constitution.md` is drafted (INTERVIEW) and the substrate is on disk (ACQUIRE), you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution as system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, exits. The fresh-context property is automatic — iteration N+1 reads N's work without bias, which makes per-phase review collapse into "the next iteration is the review."
 
-The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The constitution describes the goal (what "done" looks like, the user's fidelity intent, scope, quality bar); CLAUDE.md carries the running accumulators (rigor state per output, paper-vs-code disagreements log, rules). Every iteration walks up to both.
+The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The constitution describes the goal (what "done" looks like, the user's fidelity intent, scope, quality bar); CLAUDE.md carries the standing rules plus running accumulators (rigor state per output, paper-vs-code disagreements log). Every iteration walks up to both.
 
 ## Setup: git-tracked workdir
 
@@ -44,7 +44,7 @@ Nine phases (zero-indexed). INTERVIEW and ACQUIRE run before the loop, in the us
 | 7 | COMPARE | ralph iteration | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
 | 8 | REVIEW | user's main session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
 
-COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the constitution's fidelity intent. A subsequent iteration decides whether to spend another IMPLEMENT round (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap as an open opportunity in CLAUDE.md's Rigor section. When the constitution's `status:` flips to `closed` (typically by an iteration after COMPARE returns `pass` or after the iteration logs accepted opportunities), the loop terminates and REVIEW runs in the user's main session.
+COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the constitution's fidelity intent. A subsequent iteration decides whether to spend another IMPLEMENT round (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap as an open opportunity in CLAUDE.md's Rigor section. Once COMPARE returns `pass` (or user-accepted `partial`) and the un-acted opportunities are logged, a subsequent cold-survey iteration finds nothing left to do and flips the constitution's `status:` to `closed`. The loop terminates; REVIEW runs in the user's main session.
 
 ## The two pre-loop bookends
 

From b1d532b6539771d46cef5999dc4058b78e9ffbe4 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 10:03:40 +0200
Subject: [PATCH 059/124] lc-from-paper: cold-survey cleanups in specify +
 compare references
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Third cold-eyes pass caught two more real issues the prior surveys
missed (both inside the reviewer prompts / verdict YAML where the
mass-edit work landed):

- specify.md reviewer prompt (check item 5) had stale Round-6 wording:
  "unless an interactive seam recorded a different user choice... where
  the spec picked the paper without an explicit user override". The
  new architecture has no mid-loop interactive seam; iterations run
  detached and the user resolves at REVIEW close-out. Rewritten so the
  reviewer's check matches what specify.md's Pass B actually does:
  spec records both options, baseline selects code's option per
  canonical-resolution, the conflict goes to CLAUDE.md's disagreements
  log + open-questions.md. Flag silent drops, missing disagreements-log
  entries, or baselines that picked the paper without the canonical-
  resolution rule applying.

- compare.md was internally inconsistent on priority vocabulary. The
  verdict-report YAML schema used `priority: high|medium|low` and the
  Verdict rules referred to "high-priority targets" / "medium-priority"
  — but the rest of the architecture (architect.md, SKILL.md table,
  interview.md, the per-paper constitution template) uses `primary |
  secondary` consistently. Even compare.md elsewhere (the opportunity-
  examples section, the relative_to_intent defaults) uses
  "primary"/"secondary". A reviewer following compare.md's YAML schema
  would flag the spec's `priority: primary` outputs as invalid. Aligned
  to `primary|secondary` throughout compare.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../lightcone/skills/lc-from-paper/references/compare.md  | 8 ++++----
 .../lightcone/skills/lc-from-paper/references/specify.md  | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index d4d2da8d..ea439cc7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -49,7 +49,7 @@ attempt: <attempt_number>
 outputs:
   <output_id>:
     type: metric|figure|table
-    priority: high|medium|low
+    priority: primary|secondary
     paper_value: "<from targets/targets.md>"
     reproduced_value: "<from results>"
     reference_file: "<path in targets/>"
@@ -69,9 +69,9 @@ opportunities:
 
 ## Verdict rules
 
-- **`pass`**: ALL high-priority targets match, no major issues with medium-priority.
-- **`partial`**: some high-priority match, or all high-priority match but medium has issues.
-- **`fail`**: most high-priority don't match, or fundamental methodological issue.
+- **`pass`**: ALL primary targets match, no major issues with secondary targets.
+- **`partial`**: some primary targets match, or all primary match but secondary has issues.
+- **`fail`**: most primary targets don't match, or fundamental methodological issue.
 
 If verdict is not `pass`, **`fix_suggestions` MUST reference specific scripts and line numbers**. "The result is wrong" is not actionable; "scripts/bao_fit.py:42 uses `damping_prior=flat`, paper specifies Gaussian; change to gaussian per Howlett+2017 §4.2" is.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index e04c8a8b..27a89c73 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -118,7 +118,7 @@ The check-list and findings-file shape below applies whether the review work hap
 > 2. **Decision options.** Each decision has the chosen option plus any sibling alternatives the paper discusses or the code reveals. The chosen option's `rationale:` is grounded in the paper's stated reasoning (or the code's, where canonical-resolution applied).
 > 3. **Evidence verification.** Every `evidence:` block uses `TextQuoteSelector` with a verbatim `exact:` quote, real surrounding-text `prefix:` / `suffix:`, and a real page or section anchor. Quotes that are paraphrased or whose prefix / suffix are editorial parentheticals will fail `--verify-evidence`. Note `prior_insights:` placeholders intentionally have no `evidence:` block at this stage — LITERATURE authors them — so do not flag missing `evidence:` on placeholder entries. After LITERATURE resolves the placeholders, run `astra validate astra.yaml --verify-evidence`.
 > 4. **Findings traceability.** Each `findings:` entry's `evidence:` resolves to a real paper claim (verbatim quote + source anchor) or a real code location (`path:line`).
-> 5. **Material-disagreement surfacing.** Where paper and code disagree on a material choice, the spec records both options under the relevant `decisions:` entry. `universes/baseline.yaml` selects the code's option (canonical-resolution default), unless an interactive seam recorded a different user choice. Flag any material disagreement that got silently dropped or where the spec picked the paper without an explicit user override.
+> 5. **Material-disagreement surfacing.** Where paper and code disagree on a material choice, the spec records both options under the relevant `decisions:` entry, `universes/baseline.yaml` selects the code's option (canonical-resolution default), and the conflict is appended to CLAUDE.md's *Paper-vs-code disagreements* section plus `open-questions.md` for the user to resolve at REVIEW close-out. Flag any material disagreement that got silently dropped, that didn't make it into the disagreements log, or where the baseline picked the paper without the canonical-resolution rule applying.
 > 6. **Narrative anchors.** The sub-analysis's `narrative:` weaves `astra-anchor:` references to the new `decisions:` / `prior_insights:` / `findings:` entries — the tree-path grammar must be valid, and entries actually exist at the referenced paths.
 > 7. **`narrative:` voice fidelity.** Hedges and qualifiers from the paper survive (per the narrative skill's discipline). Editorial commentary added beyond what the paper supports gets flagged.
 > 8. **No synthetic data.** Unless the paper itself uses synthetic data, every input has a real acquisition source — no mock / synthetic substitutes anywhere in the sub-analysis's inputs, decisions, or implementation-notes.

From 2ffbda45911b26cdbc37129a00957481ca36d9ae Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 10:11:11 +0200
Subject: [PATCH 060/124] lc-from-paper: rebase SPECIFY + LITERATURE onto
 astra-spec 0.0.10 grammar
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The skill bundle's specify.md and literature.md were written against a
future ASTRA grammar that the installed astra-spec 0.0.10 validator
rejects wholesale. Realign onto what 0.0.10 actually has:

- Decision-level `rationale:`, not per-option. Option carries
  `{label, description, insights}`; no per-option `rationale:` or
  `evidence:` block. The paper's chosen option is identified by
  `default:`.
- `Option.insights: [<insight_id>, ...]` is the back-reference that
  links options to prior_insights — replaces the misplaced
  `decision_links:` field that previously lived on the Insight.
- prior_insights placeholders are syntactically-complete Insights
  (`id`, `claim`, `created_at`, `evidence: [{id, doi}]`). The placeholder
  Evidence carries the cited paper's DOI but no `quote:` selector;
  LITERATURE fills `quote: {exact, prefix, suffix}` + `location: {page}`
  onto each Evidence later. Evidence with `doi:` and no `quote:` is
  structurally valid in 0.0.10.
- findings entries are full Insights with paper-anchored Evidence:
  `doi:` of the target paper + `quote: {exact, prefix, suffix}` +
  `location: {page: N}`. Code-revealed findings use
  `artifact: <output_id>` (+ optional `source_commit:`) instead of
  free-form `path:line` evidence anchors.
- LITERATURE input contract surfaces `backed_options` (derived from
  Option.insights back-refs) in place of the dropped `decision_links`
  forward-link.

Fixes LightconeResearch/lightcone-cli#126.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../lc-from-paper/references/literature.md    | 21 +++---
 .../lc-from-paper/references/specify.md       | 66 ++++++++++++++-----
 2 files changed, 62 insertions(+), 25 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index d1c0fc5c..90d8d8c1 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -1,6 +1,6 @@
 # LITERATURE — resolve `prior_insights:` placeholders against the cited papers
 
-After SPECIFY records each citation marker as a `prior_insights:` *placeholder* (`id`, `claim`, `doi`, `decision_links` — no `evidence:` selector), LITERATURE stands up each cited paper's reading materials, finds the verbatim quote in the cited paper that justifies the placeholder's claim, and authors the resolved `evidence:` selector back into `astra.yaml`. After LITERATURE, every `prior_insights:` entry is a verified citation; `astra validate astra.yaml --verify-evidence` returns clean.
+After SPECIFY records each citation marker as a `prior_insights:` *placeholder* — a syntactically-complete `Insight` (`id`, `claim`, `created_at`, `evidence: [{id, doi}]`) whose Evidence entry carries the cited paper's DOI but **no `quote:` selector yet** — LITERATURE stands up each cited paper's reading materials, finds the verbatim quote in the cited paper that justifies the placeholder's claim, and writes the resolved `quote: {exact, prefix, suffix}` (+ `location: {page: N}`) onto that Evidence entry. The decision↔insight linkage already lives on the option side (`Option.insights: [<insight_id>, ...]`); LITERATURE doesn't touch it — only the Evidence's `quote:` / `location:`. After LITERATURE, every `prior_insights:` Evidence entry has a verified quote; `astra validate astra.yaml --verify-evidence` returns clean.
 
 The quote-finding direction is: **target paper's claim → quote inside the cited paper**. The target paper says "we follow Smith+20's magnitude cut of i<24"; LITERATURE goes to Smith+20 and finds the verbatim quote there that justifies that statement ("we adopt a magnitude cut of i<24 as our fiducial selection"). The point is to verify the target paper's claims about its predecessors are real, not paraphrased or misremembered.
 
@@ -10,7 +10,7 @@ LITERATURE is what a ralph iteration does when the workdir signals "SPECIFY done
 
 ## Inputs
 
-- `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries with `claim:` + `doi:` + `decision_links:` but no `evidence:` selector. These are the placeholders LITERATURE resolves.
+- `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries shaped as syntactically-complete `Insight` blocks (`id`, `claim`, `created_at`, `evidence: [{id, doi}]`) where each Evidence carries a `doi:` but no `quote:` selector. These are the placeholders LITERATURE resolves by writing `quote: {exact, prefix, suffix}` and `location: {page}` onto each Evidence entry. The option↔insight linkage already lives on the option side (`Option.insights`); LITERATURE does not touch it.
 - `work/reference/index.json#citations` — paper-extraction's cite-key → `{locations, citation, doi}` mapping for every entry in the target paper's bibliography. Used as the canonical cite-key → DOI lookup when cross-checking placeholder DOIs and surfacing unresolved-DOI cases.
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — target paper text. Grep into for context on how the cited paper is invoked, when a placeholder's claim is ambiguous.
 - `constitution.md` — Fidelity intent (used to size cheap vs heavy on this iteration's review).
@@ -18,7 +18,7 @@ LITERATURE is what a ralph iteration does when the workdir signals "SPECIFY done
 
 ## Outputs
 
-- `astra.yaml` — `prior_insights:` placeholders **resolved**: each placeholder now has at least one `evidence:` entry with `TextQuoteSelector` (`exact:`, `prefix:`, `suffix:`) plus `FragmentSelector` (`page:`) pointing at the cited paper. `astra validate astra.yaml --verify-evidence` returns clean.
+- `astra.yaml` — `prior_insights:` placeholders **resolved**: each placeholder's Evidence entries now carry `quote: {exact, prefix, suffix}` (TextQuoteSelector) plus `location: {page: N}` (FragmentSelector, 1-indexed page) pointing at the cited paper. `astra validate astra.yaml --verify-evidence` returns clean.
 - `work/cited/<doi-slug>/` — one directory per cited paper, holding that paper's substrate from paper-extraction (`paper.pdf`, `source/` or `document.md`, `index.json`, `astra.yaml` stub, figures, tables). Resume-by-existence: re-running LITERATURE skips fetching any DOI whose `work/cited/<doi-slug>/` is already populated.
 - `work/notes/literature/resolutions.yaml` — consolidated per-placeholder evidence resolutions before merge (when Haiku fan-out is used, sub-Haiku outputs land in `work/notes/literature/haiku-<N>.yaml` and are merged into this single file). Intermediate; survives for audit.
 
@@ -88,8 +88,13 @@ Inputs:
       id:             the placeholder's unique id within astra.yaml
       claim:          what the cited paper supports about a decision
                       in the target paper (target paper's framing)
-      doi:            DOI of the cited paper
-      decision_links: which decision option(s) this placeholder backs
+      doi:            DOI of the cited paper (lives on the placeholder's
+                      Evidence entry; quote: needs to be filled in)
+      backed_options: a derived list of "<decision_id>.<option_id>" pairs
+                      that reference this placeholder via Option.insights
+                      — surface from astra.yaml when assembling the
+                      placeholder set so the resolver knows which
+                      decision-options this evidence has to support
   - Substrate path per cited paper at work/cited/<doi-slug>/work/reference/:
       paper.pdf, source/*.tex (Path A) or document.md (Path B),
       index.json (structural index for that cited paper).
@@ -104,8 +109,8 @@ For each placeholder:
 
   2. Read targeted spans (offset/limit) around the matches. Find a
      verbatim passage that supports the claim. Focus on:
-       - Empirical comparisons between approaches the claim's
-         decision_links reference.
+       - Empirical comparisons between the approaches the placeholder's
+         backed_options reference.
        - Performance benchmarks or validation results relevant to the
          choices.
        - Recommendations or caveats about specific methods/parameters.
@@ -162,7 +167,7 @@ When the iteration fans out to Haikus, each Haiku is spawned with `model="haiku"
 
 ## Review — by iteration boundary (default) or in-iteration fan-out (optional)
 
-After the merge lands, the cross-check question is: do the `evidence:` quotes belong to the cited paper at the cited page? Do the quotes actually justify the placeholders' claims, or are they technically present but tangential? Do the claims actually support the decision options they're linked to via `decision_links:`?
+After the merge lands, the cross-check question is: do the `evidence:` quotes belong to the cited paper at the cited page? Do the quotes actually justify the placeholders' claims, or are they technically present but tangential? Do the claims actually support the decision options that reference them via `Option.insights`?
 
 **Default: review by iteration boundary.** The iteration that did the merge exits; the next iteration enters fresh, surveys, finds `astra.yaml`'s `prior_insights:` populated with `evidence:` selectors but no `work/notes/literature-review/round-N.md`, runs `astra validate --verify-evidence` for the deterministic check + a semantic re-read of each resolved insight, and writes review findings. The iteration after that applies the fixes (which may include re-running Haiku quote-finding for entries that need a different quote). Two consecutive review-iterations with verdict `clean` terminates the review cycle.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 27a89c73..07787cb7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -22,7 +22,7 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 
 ## Outputs
 
-- `astra.yaml` — **filled form**: each sub-analysis's `decisions:` and `findings:` populated with paper-anchored `evidence:` selectors; `prior_insights:` populated as citation-only **placeholders** (id, claim, decision_links, `doi:` looked up from `work/reference/index.json#citations[<cite-key>].doi` — but no `evidence:` selector yet, LITERATURE fills those next); `narrative:` keys updated to weave `astra-anchor:` references into prose as entries land. `astra validate astra.yaml` returns clean; `astra validate astra.yaml --verify-evidence` runs after LITERATURE has resolved the placeholders.
+- `astra.yaml` — **filled form**: each sub-analysis's `decisions:` populated with decision-level `rationale:` prose plus options (the paper's choice is identified by `default:`); `findings:` populated as full `Insight` blocks with paper-anchored `evidence:` (the target paper's DOI + `quote: {exact, prefix, suffix}` + `location: {page: N}`); `prior_insights:` populated as citation **placeholders** — each a syntactically-complete `Insight` (`id`, `claim`, `created_at`, `evidence: [{id, doi}]`) whose placeholder Evidence carries the cited paper's DOI looked up from `work/reference/index.json#citations[<cite-key>].doi` **but no `quote:` selector yet** — LITERATURE fills those in. Each option that draws on a placeholder cites it via `Option.insights: [<insight_id>, ...]` (the back-reference that links options to prior_insights in the ASTRA grammar). `narrative:` keys updated to weave `astra-anchor:` references into prose as entries land. `astra validate astra.yaml` returns clean (Evidence with `doi:` and no `quote:` is structurally valid at this stage); `astra validate astra.yaml --verify-evidence` runs after LITERATURE has authored the quotes.
 - `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
 - `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
 - `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
@@ -46,32 +46,64 @@ Read the paper's section(s) covering this sub-analysis. Author:
 1. **`decisions:`** — every choice in this sub-analysis where a different defensible option could plausibly shift a numerical result: algorithmic methods, thresholds, statistical approaches, data selection criteria, calibration choices. Use `when`, `incompatible_with`, and `requires` constraints for non-independent decisions.
 
    For each decision, the paper-pass authors:
-   - The chosen option with its name + a `rationale:` block (use `/narrative` for the prose).
-   - Sibling alternatives mentioned in the paper, each as a separate option.
-   - `evidence:` for the chosen option using `TextQuoteSelector` against the paper text — verbatim quote + `prefix` / `suffix` from real surrounding text + page or section anchor.
+   - **Decision-level fields:** `label:` (short human-readable name), `rationale:` (the paper's stated reasoning — use `/narrative` for the prose), `default:` (the option the paper actually selects), and `options:` (the map of option entries below).
+   - **Options:** the chosen option plus any sibling alternatives the paper discusses. Each option carries `label:` (required) and an optional `description:`. Per the 0.0.10 grammar, options do **not** carry their own `rationale:` or `evidence:` block — the decision's `rationale:` covers the reasoning; paper-text evidence flows through `findings:` (for the paper's own quantitative claims) or via `Option.insights` back-references into `prior_insights:` (for citation-backed support).
+   - **Option ↔ prior_insights linkage:** when the option's support derives from cited literature, list the relevant `prior_insights:` ids in `Option.insights: [<insight_id>, ...]`. The placeholder block under `prior_insights:` (authored in step 2 below) is the back-end of this link — LITERATURE fills in the verbatim cited-paper quote later.
 
    Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. A typical sub-analysis has 2–6 decisions; if a sub-analysis has fewer than 2, revisit `work/reference/index.json` and reconsider.
 
-2. **`prior_insights:`** — for every `\cite{<key>}` (Path A) or rendered citation invocation (Path B) the paper invokes that bears on a decision in this sub-analysis, record a **placeholder**: an `id:`, a `claim:` describing what the cited paper supports about the decision (the target paper's framing of why it cites that paper here), a `doi:` looked up from `work/reference/index.json#citations[<cite-key>].doi`, and `decision_links:` mapping the placeholder to the relevant decision option(s). **Do not author the `evidence:` selector** — that's LITERATURE's job. Leave `evidence:` absent or empty; LITERATURE fetches the cited paper, finds the supporting quote, and authors the resolved selector back into this placeholder. The placeholder shape:
+   ```yaml
+   decisions:
+     <decision_id>:
+       label: "<short human-readable name>"
+       rationale: "<the paper's stated reasoning, weaving astra-anchors into prose>"
+       default: <chosen_option_id>
+       options:
+         <option_id>:
+           label: "<short name>"
+           description: "<optional longer description>"
+           insights: [<prior_insight_id>, ...]   # back-refs to prior_insights this option draws on
+   ```
+
+2. **`prior_insights:`** — for every `\cite{<key>}` (Path A) or rendered citation invocation (Path B) the paper invokes that bears on a decision in this sub-analysis, record a **placeholder**. The placeholder is a syntactically-complete `Insight` (`id`, `claim`, `created_at`, `evidence`) whose `evidence` array contains a single Evidence entry carrying the cited paper's `doi` but **no `quote:` selector** — LITERATURE fetches the cited paper, finds the supporting quote, and writes the resolved `quote: {exact, prefix, suffix}` (+ `location: {page: N}`) onto that Evidence entry. The decision↔insight linkage is the back-reference on the option (`Option.insights`, step 1 above), not a forward link on the insight. The placeholder shape:
 
    ```yaml
    prior_insights:
      <insight_id>:
        id: <insight_id>
        claim: "<what the cited paper supports about the decision>"
-       doi: "<DOI from index.json#citations[<cite-key>].doi>"
-       # evidence: omitted — LITERATURE fills this in
-       decision_links:
-         <decision_id>: [<option_id>, ...]
+       created_at: "<SPECIFY-iteration ISO-8601 timestamp, e.g. 2026-05-11T09:00:00Z>"
+       evidence:
+         - id: <evidence_id>
+           doi: "<DOI from work/reference/index.json#citations[<cite-key>].doi>"
+           # quote: omitted at SPECIFY time — LITERATURE fills the TextQuoteSelector in
    ```
 
-   When the citation's DOI is unresolved (`citations[<key>].doi: null` — flagged in `extraction_warnings`), record the placeholder with `doi: null` and a note in the `claim:`; LITERATURE will surface it as an unresolved entry rather than fabricate evidence. Don't pre-emptively fetch the cited paper or guess its content; LITERATURE does that with fresh context per paper.
+   Evidence with `doi:` and no `quote:` is structurally valid in 0.0.10 (`quote:` is optional on Evidence); the placeholder passes `astra validate` and waits for LITERATURE to fill the quote. `astra validate --verify-evidence` should only be run after LITERATURE has resolved every placeholder.
+
+   When the citation's DOI is unresolved (`citations[<key>].doi: null` — flagged in `extraction_warnings`), the placeholder still needs a `doi:` (Evidence requires exactly one of `doi` or `artifact`). In that case, omit the Evidence entry entirely or fall back to an artifact reference if the gap will be resolved internally — and log the unresolved citation to `open-questions.md` so the user can supply the DOI at REVIEW close-out. Don't pre-emptively fetch the cited paper or guess its content; LITERATURE does that with fresh context per paper.
 
-3. **`findings:`** — paper-level claims and quantitative results scoped to this sub-analysis, each with source-anchored `evidence:` (verbatim quote against the paper). Pull the verbatim claims for each output's expected value from the paper text + the result loci in `paper-index.md`.
+3. **`findings:`** — paper-level claims and quantitative results scoped to this sub-analysis. Each is a full `Insight` (`id`, `claim`, `created_at`, `evidence`) with at least one paper-anchored Evidence entry: `doi:` of the target paper itself + a verbatim `quote: {exact, prefix, suffix}` (TextQuoteSelector) + a `location: {page: N}` (FragmentSelector, page from the rendered PDF). For findings tied to a specific declared output, the Evidence may use `artifact: <output_id>` instead of (or in addition to) the DOI-based quote. Pull the verbatim claims for each output's expected value from the paper text + the result loci in `work/reference/index.json`.
+
+   ```yaml
+   findings:
+     <finding_id>:
+       id: <finding_id>
+       claim: "<the paper's quantitative claim, 1–2 sentences>"
+       created_at: "<ISO-8601 timestamp>"
+       evidence:
+         - id: <evidence_id>
+           doi: "<target paper's DOI>"
+           quote:
+             exact: "<verbatim quote from the paper>"
+             prefix: "<~20–100 chars BEFORE the quote, real surrounding text>"
+             suffix: "<~20–100 chars AFTER the quote, real surrounding text>"
+           location: { page: <N> }
+   ```
 
 4. **Weave `astra-anchor:` references into the existing narrative.** ARCHITECT wrote `narrative:` prose without anchors because the entries didn't exist. Now they do — extend the narrative to point at the new `decisions:` / `prior_insights:` / `findings:` entries via the tree-path anchor grammar. Use `/narrative` for this pass; it carries the discipline.
 
-5. **Verify evidence quotes against the paper source by Grep** — `astra validate --verify-evidence` will verify `prior_insights` evidence after LITERATURE resolves the placeholders; for now, manually Grep the paper source to confirm each `decisions:` and `findings:` `evidence:` quote is verbatim. Artifact-anchored `findings` evidence still needs a manual quote check before the code pass.
+5. **Verify finding quotes against the paper source by Grep.** For each `findings:` Evidence entry with a `quote:`, Grep the paper source to confirm the `exact:` text is verbatim and the `prefix:` / `suffix:` are real surrounding text. `astra validate --verify-evidence` will run the deterministic check across every quote later (after LITERATURE resolves the `prior_insights:` placeholders); a manual Grep now catches typos and paraphrases before the code pass.
 
 ### Pass B — code pass (when `work/reference/code/` exists)
 
@@ -83,7 +115,7 @@ Read the code that implements this sub-analysis (`work/reference/code-index.md`'
 
    For **material** disagreements: take **code as canonical** per the canonical-resolution rule (the iteration runs detached; the user isn't reachable interactively). Append the conflict to CLAUDE.md's **Paper-vs-code disagreements** section AND to `open-questions.md` so the user sees it at REVIEW close-out, with the verbatim paper quote + the `path:line` code anchor + a plausible-impact one-liner ("changes the BAO peak amplitude by ~5%"). Let `universes/baseline.yaml` select the code's method. Preserve both options in the `astra.yaml` `decisions:` entry; the user can flip the baseline at REVIEW close-out.
 
-2. **Code-revealed insights and findings.** Things the code does that the paper doesn't describe (a calibration version, a cut stricter than stated, a hyperparameter the paper compressed). These earn `findings:` entries with `path:line` evidence anchors against the code (when an output corresponds), or `implementation-notes.md` bullets (when no formal output corresponds).
+2. **Code-revealed insights and findings.** Things the code does that the paper doesn't describe (a calibration version, a cut stricter than stated, a hyperparameter the paper compressed). These earn `findings:` entries with Evidence using `artifact: <output_id>` (referencing a declared output) plus an optional `source_commit:` (the git SHA that produced it). When the insight isn't tied to a formal output, drop it into `implementation-notes.md` as a bullet rather than synthesizing a degenerate finding.
 
 3. **Decision-option augmentation.** Where the code reveals an option the paper didn't mention but is defensible (a sibling implementation alternative used in the codebase or referenced in a comment), add it as a sibling option to the relevant `decisions:` entry. Do not pre-emptively author every code variant; only the ones that bear on a real choice.
 
@@ -115,9 +147,9 @@ The check-list and findings-file shape below applies whether the review work hap
 > ### What to check
 >
 > 1. **Decision coverage.** Does this sub-analysis's `decisions:` block cover every choice in the paper-side index's decision clusters? Cosmetic / pure-tooling choices should NOT be decisions; anything material that's missing should be added.
-> 2. **Decision options.** Each decision has the chosen option plus any sibling alternatives the paper discusses or the code reveals. The chosen option's `rationale:` is grounded in the paper's stated reasoning (or the code's, where canonical-resolution applied).
-> 3. **Evidence verification.** Every `evidence:` block uses `TextQuoteSelector` with a verbatim `exact:` quote, real surrounding-text `prefix:` / `suffix:`, and a real page or section anchor. Quotes that are paraphrased or whose prefix / suffix are editorial parentheticals will fail `--verify-evidence`. Note `prior_insights:` placeholders intentionally have no `evidence:` block at this stage — LITERATURE authors them — so do not flag missing `evidence:` on placeholder entries. After LITERATURE resolves the placeholders, run `astra validate astra.yaml --verify-evidence`.
-> 4. **Findings traceability.** Each `findings:` entry's `evidence:` resolves to a real paper claim (verbatim quote + source anchor) or a real code location (`path:line`).
+> 2. **Decision options.** Each decision has the option the paper selects (named in `default:`) plus any sibling alternatives the paper discusses or the code reveals. The decision-level `rationale:` is grounded in the paper's stated reasoning (or the code's, where canonical-resolution applied). Per the 0.0.10 grammar, options do not carry per-option `rationale:` or `evidence:`; cited support is back-referenced via `Option.insights` into a `prior_insights:` entry.
+> 3. **Evidence verification.** Every `findings:` Evidence entry uses `TextQuoteSelector` with a verbatim `exact:` quote, real surrounding-text `prefix:` / `suffix:`, and a `location: {page: N}` (1-indexed). Quotes that are paraphrased or whose `prefix:` / `suffix:` are editorial parentheticals will fail `--verify-evidence`. `prior_insights:` placeholders intentionally have `evidence: [{id, doi}]` without a `quote:` at this stage — LITERATURE authors the quotes — so do not flag a missing quote on placeholder entries. After LITERATURE resolves the placeholders, run `astra validate astra.yaml --verify-evidence`.
+> 4. **Findings traceability.** Each `findings:` Insight's `evidence:` resolves either to a real paper claim (target-paper DOI + verbatim `quote:` + page) or to a real declared output via `artifact: <output_id>` (with optional `source_commit:` and `snapshot:`).
 > 5. **Material-disagreement surfacing.** Where paper and code disagree on a material choice, the spec records both options under the relevant `decisions:` entry, `universes/baseline.yaml` selects the code's option (canonical-resolution default), and the conflict is appended to CLAUDE.md's *Paper-vs-code disagreements* section plus `open-questions.md` for the user to resolve at REVIEW close-out. Flag any material disagreement that got silently dropped, that didn't make it into the disagreements log, or where the baseline picked the paper without the canonical-resolution rule applying.
 > 6. **Narrative anchors.** The sub-analysis's `narrative:` weaves `astra-anchor:` references to the new `decisions:` / `prior_insights:` / `findings:` entries — the tree-path grammar must be valid, and entries actually exist at the referenced paths.
 > 7. **`narrative:` voice fidelity.** Hedges and qualifiers from the paper survive (per the narrative skill's discipline). Editorial commentary added beyond what the paper supports gets flagged.
@@ -204,7 +236,7 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 ## Survey signals (entry into SPECIFY)
 
 - `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative; empty decisions / prior_insights / findings) ⇒ ready to specify
-- For each sub-analysis: `decisions:` and `findings:` populated with paper-anchored `evidence:` selectors AND `prior_insights:` populated as citation-only placeholders (id, claim, doi, decision_links — no `evidence:` selector yet, LITERATURE fills those next) ⇒ paper pass done
+- For each sub-analysis: `decisions:` populated with decision-level `rationale:` + options (paper's choice at `default:`); `findings:` populated as full Insight blocks with paper-anchored Evidence (DOI + `quote: {exact, prefix, suffix}` + `location: {page}`); `prior_insights:` populated as citation placeholders (`id`, `claim`, `created_at`, `evidence: [{id, doi}]` with `quote:` omitted — LITERATURE fills the quotes next); `Option.insights` back-references wired up where options draw on placeholders ⇒ paper pass done
 - For each sub-analysis: when `work/reference/code/` exists, code-pass material-disagreement entries land in `decisions:` (with both options) and `universes/baseline.yaml` selects the canonical-resolution choice; `implementation-notes.md` carries non-material gotchas ⇒ code pass done
 - For cheap: each sub-analysis has at least a `work/notes/specify-review/<sub>-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ SPECIFY review done
 - For heavy: each sub-analysis has two consecutive `<sub>-round-<N>.md` files with verdict `clean` ⇒ SPECIFY review done

From 1f9ea4de4fe8a7e1bc87f851b25ee616c67bc39c Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 15:52:09 +0200
Subject: [PATCH 061/124] lc-from-paper: cold-survey cleanups in review +
 compare references
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

review.md and compare.md carried orchestrator-era phrasings that the
Round 7 retune missed. Five corrections:

- compare.md: "CLAUDE.md's Goal section" → "constitution.md's Goal
  section". Goal lives in constitution.md per the template split;
  CLAUDE.md has no Goal section. A worker reading this would grep for
  Goal in the wrong file.

- review.md Inputs: list both constitution.md and CLAUDE.md with the
  sections each actually owns. Previously claimed CLAUDE.md had Goal
  (wrong) and described the accumulator as accreting "across all
  sub-agent spawns" (orchestrator-era; iterations are the unit now).

- review.md open-questions framing: "running report from sub-agent
  phases" / "anything sub-agents flagged" → "from the iteration-phases"
  / "anything iterations flagged". The phases run as iterations, not as
  sub-agents under an orchestrator.

- review.md open-questions walkthrough: "which sub-agent flagged it" /
  "the default the sub-agent applied" → "which phase flagged it" / "the
  default the phase applied". Phase is the durable label; whether it
  ran inside a single iteration or fanned out is implementation detail.

- review.md re-running section: "future sub-agents auto-load it" →
  "future Claude Code sessions auto-load it on walk-up". CLAUDE.md
  auto-load is a session property of the Claude Code harness, not a
  sub-agent-specific behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../skills/lc-from-paper/references/compare.md        |  2 +-
 .../skills/lc-from-paper/references/review.md         | 11 ++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index ea439cc7..9c5e7897 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -84,7 +84,7 @@ The `opportunities:` block surfaces **gaps that didn't necessarily fail the verd
 - A decision SPECIFY recorded with code-as-canonical that has an unresolved disagreement still in `open-questions.md` and could move the result.
 - A sub-analysis whose evidence quotes are paraphrased rather than verbatim (would fail `--verify-evidence` if pushed harder).
 
-Each opportunity gets two grades: a **leverage** one-liner (impact if closed) and a **relative_to_intent** placement against the user's fidelity intent in CLAUDE.md's Goal section:
+Each opportunity gets two grades: a **leverage** one-liner (impact if closed) and a **relative_to_intent** placement against the user's fidelity intent in `constitution.md`'s Goal section:
 
 - `below` — the user's intent calls for tighter than this; closing the gap moves the reproduction toward what they actually want.
 - `at` — closing the gap reaches the intent; further tightening would be gravy.
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index 8c4d9949..bb143fb4 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -12,10 +12,11 @@ The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding
 - `comparison-report.yaml`, `comparison-report.md` — final verdict + opportunity assessment
 - `targets/targets.md` — what was being matched against; reference figures / tables in `targets/`
 - `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
-- `open-questions.md` at the workdir root — running report from sub-agent phases (paper-vs-code conflicts, ambiguities, anything sub-agents flagged for user resolution)
+- `open-questions.md` at the workdir root — running report from the iteration-phases (paper-vs-code conflicts, ambiguities, anything iterations flagged for user resolution)
 - `work/reference/index.json` and `work/reference/code-index.md` — for context
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) and `work/reference/code/` — directly available for follow-up questions the user asks during REVIEW that the report and CLAUDE.md don't answer ("remind me what the paper says about X", "did the original code do Y"). Grep into for specifics; read targeted spans by offset/limit.
-- `CLAUDE.md` at the workdir root — paper identity, Goal, Rigor, Paper-vs-code disagreements (the at-a-glance summary that's accumulated across all sub-agent spawns)
+- `CLAUDE.md` at the workdir root — paper identity, Rules, Rigor, Paper-vs-code disagreements (the at-a-glance summary that's accumulated across iterations)
+- `constitution.md` at the workdir root — Goal, Fidelity intent, Scope, Quality bar, Evidence (the driving document the loop has been working against)
 
 ## Outputs
 
@@ -50,8 +51,8 @@ Output lands at `.lightcone/check-sentence-by-sentence.md` (or wherever the skil
 Read `open-questions.md` at the workdir root. For each unresolved entry, surface it via `AskUserQuestion` with:
 
 - **The question** (verbatim from the file)
-- **Origin** — which sub-agent flagged it
-- **The default the sub-agent applied** (if any — e.g. "code as canonical")
+- **Origin** — which phase flagged it
+- **The default the phase applied** (if any — e.g. "code as canonical")
 - **Three options**: ratify the default, override (user spells out their choice), or defer (leave as a known limitation in the final report)
 
 Append a `## Resolutions` section to `open-questions.md` capturing what the user said for each entry. This makes the resolution durable — re-runs and future sessions see it. Cross-reference with CLAUDE.md's **Paper-vs-code disagreements** section: every entry there should now have its resolution recorded, either inline (if the user picked the canonical default) or in `open-questions.md`.
@@ -69,7 +70,7 @@ A single markdown file at the project root, ~1–2 pages. The canonical record o
 5. **Open opportunities** — pull from `comparison-report.yaml`'s `opportunities:` block, plus anything in CLAUDE.md's **Rigor** section's *Open opportunities* list. One bullet each with the leverage assessment. This is what a future session (or a future-Cail revisiting) would tighten next.
 6. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut stricter than stated, etc.). The reproduction's value to the broader literature.
 7. **Resolved open questions** — pull from `open-questions.md`'s `## Resolutions` section. One bullet per question + its resolution.
-8. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the relevant `astra.yaml`, where CLAUDE.md lives so future sub-agents auto-load it).
+8. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the relevant `astra.yaml`, where CLAUDE.md lives so future Claude Code sessions auto-load it on walk-up).
 
 Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
 

From 1171786a7f466968022aa853c517062a3c0ee0ac Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 15:57:37 +0200
Subject: [PATCH 062/124] lc-from-paper/specify: document Option.insights ../
 scope grammar
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

astra-tools v0.2.9 (EiffL's PR #88, merged today) ships stricter
semantics for Option.insights resolution than the 2ffbda4 grammar
rebase assumed: bare ids in [<id>] resolve node-locally only,
cross-scope refs require explicit ../ prefix. Same ../ grammar as
Input.from / Decision.from.

The typical SPECIFY authoring shape (declare each cited paper's
prior_insight at the sub-analysis that uses it, reference with a
bare id from same-scope options) is already what specify.md
prescribes — that's the node-local case and stays correct under
the new semantics.

The change adds the scope rules to step 1's Option ↔ prior_insights
linkage paragraph so SPECIFY iterations doing the rare cross-scope
ref reach for ../ instead of failing astra validate with
INVALID_INSIGHT_REF.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-paper/references/specify.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 07787cb7..99ecc73a 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -48,7 +48,7 @@ Read the paper's section(s) covering this sub-analysis. Author:
    For each decision, the paper-pass authors:
    - **Decision-level fields:** `label:` (short human-readable name), `rationale:` (the paper's stated reasoning — use `/narrative` for the prose), `default:` (the option the paper actually selects), and `options:` (the map of option entries below).
    - **Options:** the chosen option plus any sibling alternatives the paper discusses. Each option carries `label:` (required) and an optional `description:`. Per the 0.0.10 grammar, options do **not** carry their own `rationale:` or `evidence:` block — the decision's `rationale:` covers the reasoning; paper-text evidence flows through `findings:` (for the paper's own quantitative claims) or via `Option.insights` back-references into `prior_insights:` (for citation-backed support).
-   - **Option ↔ prior_insights linkage:** when the option's support derives from cited literature, list the relevant `prior_insights:` ids in `Option.insights: [<insight_id>, ...]`. The placeholder block under `prior_insights:` (authored in step 2 below) is the back-end of this link — LITERATURE fills in the verbatim cited-paper quote later.
+   - **Option ↔ prior_insights linkage:** when the option's support derives from cited literature, list the relevant `prior_insights:` ids in `Option.insights: [<insight_id>, ...]`. The placeholder block under `prior_insights:` (authored in step 2 below) is the back-end of this link — LITERATURE fills in the verbatim cited-paper quote later. **Scope rules** (astra-tools ≥ 0.2.9): bare ids resolve **node-locally only** — the prior_insight must be declared in the same sub-analysis as the option. For a citation declared at an ancestor scope, use explicit upward refs: `[../id]` for the parent, `[../../id]` for the grandparent, etc. (same `../` grammar as `Input.from` and `Decision.from`). The natural shape — declare each cited paper at the sub-analysis that uses it, reference with a bare id from same-scope options — keeps everything node-local and needs no `../`.
 
    Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. A typical sub-analysis has 2–6 decisions; if a sub-analysis has fewer than 2, revisit `work/reference/index.json` and reconsider.
 

From d7179390ed4fa227e968b347e9ac59b411732c20 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 18:41:13 +0200
Subject: [PATCH 063/124] lc-from-paper: cold-survey cleanups in SKILL.md +
 compare/implement/review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fifth cold-survey iteration finds seven orchestrator-era ghosts that
the prior four passes missed:

- SKILL.md:139 — "per-spawn tactical decision" → "tactical sizing
  decision" (the iteration is the unit; sizing is per-iteration, not
  per-spawn)
- SKILL.md:160 — "re-spawn /paper-extraction" → "re-invoke" (slash
  commands are invoked, not spawned)
- review.md:7 — "per-spawn self-review passes" → "per-iteration
  self-review passes" (matches the SKILL.md framing of "self-review
  is the next iteration")
- review.md:60 — "re-spawn IMPLEMENT" → "re-open the loop for another
  IMPLEMENT pass" (REVIEW runs in main session; user re-opens the
  closed loop)
- review.md:79 — "future re-spawns" → "future loop re-launches"
- compare.md:118 — "re-spawn of IMPLEMENT" → "future IMPLEMENT pass"
- implement.md:73 — "Per-round fresh sub-agent" → "Per-round fresh
  reviewer" (the system prompt below is used by both iteration-
  boundary review and in-iteration fan-out — matches literature.md's
  "Per-round fresh reviewer — prompt shape" pattern)
---
 claude/lightcone/skills/lc-from-paper/SKILL.md              | 4 ++--
 claude/lightcone/skills/lc-from-paper/references/compare.md | 2 +-
 .../lightcone/skills/lc-from-paper/references/implement.md  | 2 +-
 claude/lightcone/skills/lc-from-paper/references/review.md  | 6 +++---
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index fb05569d..958044a7 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -136,7 +136,7 @@ REVIEW runs in your main session because `/figure-comparison` and `/check-senten
 
 **Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose — their own words for what "good enough" looks like (e.g. *"just checking the analysis is tractable"*, *"Figure 3 must be right; the rest can stay rough"*, *"every primary and secondary target lining up within stated tolerance"*).
 
-Each iteration translates the fidelity intent into a per-spawn tactical decision when working on an artifact-producing phase (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT). Derive how much in-iteration self-review-via-fan-out to run from the gap between where the artifact currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* write the artifact and exit; let the next iteration's fresh-context survey serve as the review. *Heavy:* fan out parallel reviewers as one-level-deep sub-agents inside the iteration, merge findings, apply fixes, exit. Either way, update CLAUDE.md's Rigor *Current state* so the trajectory stays honest across iterations.
+Each iteration translates the fidelity intent into a tactical sizing decision when working on an artifact-producing phase (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT). Derive how much in-iteration self-review-via-fan-out to run from the gap between where the artifact currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* write the artifact and exit; let the next iteration's fresh-context survey serve as the review. *Heavy:* fan out parallel reviewers as one-level-deep sub-agents inside the iteration, merge findings, apply fixes, exit. Either way, update CLAUDE.md's Rigor *Current state* so the trajectory stays honest across iterations.
 
 The default is **sequential review via iteration boundaries** — cheaper, no fan-out, and the fresh-context property is automatic. Reach for in-iteration fan-out when the parallelism actually pays (LITERATURE with many cited papers, SPECIFY with many independent sub-analyses, IMPLEMENT with many outputs).
 
@@ -157,7 +157,7 @@ When the user walks back into a workdir that already has artifacts:
 1. **Skip INTERVIEW** unless the user explicitly wants to revise scope (in which case edit `constitution.md` together, no re-draft from scratch).
 2. **If `constitution.md`'s `status:` is `active` and the tmux session isn't running**, re-launch the ralph loop: `.claude/skills/ralph/scripts/ralph constitution.md`. The next iteration surveys the workdir and picks up wherever the prior loop left off.
 3. **If `constitution.md`'s `status:` is `closed`**, the reproduction is at REVIEW. Run REVIEW close-out in your main session.
-4. **If ACQUIRE substrate is incomplete**, finish ACQUIRE in your main session before launching the loop — re-spawn `/paper-extraction` and/or `/lc-from-code` against the existing partial state (both are survey-first and skip done work).
+4. **If ACQUIRE substrate is incomplete**, finish ACQUIRE in your main session before launching the loop — re-invoke `/paper-extraction` and/or `/lc-from-code` against the existing partial state (both are survey-first and skip done work).
 
 ## Anti-patterns
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index 9c5e7897..462008e4 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -115,4 +115,4 @@ The verdict is the iteration's judgment from the data; the **decision to keep it
 
 - **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-<N>.yaml` if useful, or commit each between attempts so `git log` carries the history.
 - **The verdict is the iteration's judgment from the data; the keep-iterating decision happens at iteration boundary.** One iteration writes the report and the take on what should happen next; the next iteration surveys, reads the take, and either retries or accepts. The user's voice enters at REVIEW close-out, not mid-loop.
-- **The opportunity assessment is part of the durable record.** When the user accepts the current verdict, propagate the un-acted-on opportunities into CLAUDE.md's **Rigor** section's *Open opportunities* list. Future sessions and future-Cail returning to this reproduction see them; tightening any becomes a re-spawn of IMPLEMENT against a clearer target.
+- **The opportunity assessment is part of the durable record.** When the user accepts the current verdict, propagate the un-acted-on opportunities into CLAUDE.md's **Rigor** section's *Open opportunities* list. Future sessions and future-Cail returning to this reproduction see them; tightening any becomes a future IMPLEMENT pass against a clearer target.
diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index 4addfffe..2cbb45a0 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -70,7 +70,7 @@ After the first-pass implementation lands, the cross-check question is: is the i
 
 The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: review is fresh-context (whether across iterations or across fan-out spawns), prompted to check "is the implementation consistent with the paper and the code?", outputs findings only — not edits. Fixes are applied between iterations by the next iteration (or merged in the same iteration for fan-out). Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
 
-### Per-round fresh sub-agent — system prompt
+### Per-round fresh reviewer — system prompt
 
 > You are a paper-vs-implementation review agent. Read the implementation (`scripts/`, `astra.yaml` recipes), the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
 >
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index bb143fb4..e8b68f9c 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -4,7 +4,7 @@ The reproduction has converged: the constitution's `status:` is `closed` (after
 
 Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and propagate any un-acted-on opportunities from the latest COMPARE into CLAUDE.md's **Rigor** section — in one interactive arc.
 
-The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding into ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT as their per-spawn self-review passes. This close-out is what the previous shape called SUMMARIZE_RUN.
+The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding into ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT as their per-iteration self-review passes. This close-out is what the previous shape called SUMMARIZE_RUN.
 
 ## Inputs
 
@@ -57,7 +57,7 @@ Read `open-questions.md` at the workdir root. For each unresolved entry, surface
 
 Append a `## Resolutions` section to `open-questions.md` capturing what the user said for each entry. This makes the resolution durable — re-runs and future sessions see it. Cross-reference with CLAUDE.md's **Paper-vs-code disagreements** section: every entry there should now have its resolution recorded, either inline (if the user picked the canonical default) or in `open-questions.md`.
 
-If a resolution warrants a spec change (the user picks an override), edit `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` accordingly and re-run `astra validate astra.yaml`. If the change would invalidate the comparison report (e.g. flips the canonical method for a primary output), surface that to the user — in most cases the reproduction is "done" and the override is a known limitation, but the user may choose to re-spawn IMPLEMENT.
+If a resolution warrants a spec change (the user picks an override), edit `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` accordingly and re-run `astra validate astra.yaml`. If the change would invalidate the comparison report (e.g. flips the canonical method for a primary output), surface that to the user — in most cases the reproduction is "done" and the override is a known limitation, but the user may choose to re-open the loop for another IMPLEMENT pass.
 
 ## Step 3: write `REPRODUCTION-SUMMARY.md`
 
@@ -76,7 +76,7 @@ Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes;
 
 ## Step 4: propagate opportunities into CLAUDE.md
 
-For each opportunity in `comparison-report.yaml`'s `opportunities:` block that the user did NOT act on (i.e. they accepted the current verdict and chose to land here), append it to CLAUDE.md's **Rigor** section's *Open opportunities* list. Format: `<area> — <what could be tightened> — <leverage>`. This is what future sessions and future re-spawns walk up to; it's how the reproduction stays honest about what's at sketch / baseline / tightened / canonical rigor across its outputs.
+For each opportunity in `comparison-report.yaml`'s `opportunities:` block that the user did NOT act on (i.e. they accepted the current verdict and chose to land here), append it to CLAUDE.md's **Rigor** section's *Open opportunities* list. Format: `<area> — <what could be tightened> — <leverage>`. This is what future sessions and future loop re-launches walk up to; it's how the reproduction stays honest about what's at sketch / baseline / tightened / canonical rigor across its outputs.
 
 If the user acted on an opportunity (e.g. authorized one more IMPLEMENT round to close a gap), it doesn't go in the open list — but its closure is worth noting in *Current state* (e.g. *Figure 3: tightened* if the systematics treatment got a heavier pass).
 

From f971ae6a1b9bf825aae5bb7ae188a215bbf5f1c3 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 18:58:25 +0200
Subject: [PATCH 064/124] docs: catch bundle-count slips up with the ralph
 re-add
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`docs/skills/index.md` still framed the paper-reproduction bundle as
"five-skill" / "four siblings" — counting only the four content sibling
skills (paper-extraction, narrative, figure-comparison,
check-sentence-by-sentence). Every other surface (bundle README,
top-level README, PR #86 description, `docs/skills/lc-from-paper.md`,
`docs/user/agent-workflow.md`) treats the bundle as six skills with
ralph as the loop substrate sibling. The bundle README explicitly
dual-lists ralph in both the Project lifecycle table and the bundle
table; docs/skills/index.md had only the Project lifecycle row.

Also: docs/index.md's plugin tree enumerated "lc-new, lc-from-code,
lc-from-paper, lc-feedback (+ bundle siblings)" — omitting ralph from
the top-level skills list. Top-level CLAUDE.md was caught up in
0977eba; docs/index.md was missed by that sweep.

Both are residual debris from Round 7's ralph re-add: most surfaces
were updated, these two were missed.
---
 docs/index.md        | 2 +-
 docs/skills/index.md | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/index.md b/docs/index.md
index ec18104e..8c954908 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -57,7 +57,7 @@ src/snakemake_executor_plugin_dask/   # Snakemake executor → dask.distributed
 
 claude/lightcone/               # Claude Code plugin (force-included into the wheel)
 ├── skills/                     # lc-new, lc-from-code, lc-from-paper,
-│                                # lc-feedback (+ bundle siblings)
+│                                # lc-feedback, ralph (+ bundle siblings)
 ├── agents/                     # lc-extractor (literature subagent)
 ├── guides/                     # astra-reference, lightcone-cli-reference, ui-brand
 ├── templates/                  # project CLAUDE.md template
diff --git a/docs/skills/index.md b/docs/skills/index.md
index 23a77491..151c0308 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -11,8 +11,8 @@ guide is the friendly version. This page is for maintainers.
 ## Available skills
 
 The `/lc-from-*` family is parallel by what you start from: a question,
-code, or a paper. `/lc-from-paper` is the entry point of a five-skill
-paper-reproduction bundle; the four bundle siblings stand alone and are
+code, or a paper. `/lc-from-paper` is the entry point of a six-skill
+paper-reproduction bundle; the five bundle siblings stand alone and are
 user-invokable directly.
 
 ### Project lifecycle
@@ -33,6 +33,7 @@ dispatches them by role during the reproduction.
 
 | Skill | Command | Purpose |
 |-------|---------|---------|
+| [ralph](ralph.md) | `/ralph` | Loop substrate. `lc-from-paper`'s INTERVIEW invokes ralph's Authoring mode to draft the per-paper constitution; ACQUIRE's hand-off invokes the launcher; each iteration runs ralph's Loop protocol. Also user-invokable standalone (see the Project lifecycle row above). |
 | [paper-extraction](paper-extraction.md) | `/paper-extraction` | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: substrate, figures, tables, citations (with resolved DOIs), and a stub `astra.yaml`. |
 | [narrative](narrative.md) | `/narrative` | Author the `narrative:` prose and decision `rationale:` against an existing `astra.yaml`, in paper-reproduction, retrofit, or co-drafting mode. |
 | [figure-comparison](figure-comparison.md) | `/figure-comparison` | Build a self-contained HTML side-by-side: paper figures, tables, and numerics vs reproduced artifacts. |

From ee7fb70b4acd7683cab456b0fb3e7f7e3c530637 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 19:03:39 +0200
Subject: [PATCH 065/124] lc-from-paper/run: align phase-intro with sibling
 references
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

All six phase references (architect, specify, literature, implement,
compare) open with "X is what a ralph iteration does when the workdir
signals ..." — run.md was the outlier with "This phase runs as what a
ralph iteration does ...", and also dropped backticks on `astra.yaml` /
`scripts/` / `results/<universe>/<output>/` where the siblings use them.
Cold-survey cleanup; same shape as the prior consistency passes.
---
 claude/lightcone/skills/lc-from-paper/references/run.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/run.md b/claude/lightcone/skills/lc-from-paper/references/run.md
index d84c0dd0..bbc1bba0 100644
--- a/claude/lightcone/skills/lc-from-paper/references/run.md
+++ b/claude/lightcone/skills/lc-from-paper/references/run.md
@@ -2,7 +2,7 @@
 
 Materialize every output in `astra.yaml` for the requested universe. RUN is mostly mechanical — `lc run --universe <id>` does the heavy lifting. The phase exists as a discrete step so failures get diagnosed and re-run before COMPARE.
 
-This phase runs as what a ralph iteration does when the workdir signals "recipes present in astra.yaml + scripts/ committed + results/<universe>/<output>/ absent for any output." The iteration runs the recipes, diagnoses failures, attempts targeted fixes, and exits. Universe defaults to `baseline`.
+RUN is what a ralph iteration does when the workdir signals "recipes present in `astra.yaml` + `scripts/` committed + `results/<universe>/<output>/` absent for any output." The iteration runs the recipes, diagnoses failures, attempts targeted fixes, and exits. Universe defaults to `baseline`.
 
 ## Inputs
 

From 754756b3b5b0187c0684c3e48fb53cbc0584f94f Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 19:09:40 +0200
Subject: [PATCH 066/124] lc-from-paper/interview: correct Evidence
 parenthetical
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

interview.md line 75 said "Evidence (... — these are pointers to
substrate, not to the workdir paths, which CLAUDE.md handles)". The
templates/constitution.md Evidence section *does* include workdir-
substrate path bullets (Paper:, Code:), so the parenthetical was
contradicting the template plus its own line 13 framing
("Evidence (paper DOI, arXiv ID, code repo URL, where the substrate
lives)").

Reframe so the parenthetical names what the agent fills in (the
user-supplied identifiers) and what stays as template boilerplate
(the substrate-path bullets), without making a false claim about
where workdir paths live.
---
 claude/lightcone/skills/lc-from-paper/references/interview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index 86b4d4ee..546233f7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -72,7 +72,7 @@ Light touch. Ask the user if there's anything they want every iteration to know
 
 Open both templates side-by-side:
 
-- [`../templates/constitution.md`](../templates/constitution.md) — fill in the header, Goal (with fidelity intent), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL — these are pointers to substrate, not to the workdir paths, which CLAUDE.md handles), Open dimensions. Leave the YAML frontmatter `status: active` intact.
+- [`../templates/constitution.md`](../templates/constitution.md) — fill in the header, Goal (with fidelity intent), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL — these are the user-supplied identifiers; the substrate-path bullets in the template stay as boilerplate, naming where each substrate lives on disk), Open dimensions. Leave the YAML frontmatter `status: active` intact.
 - [`../templates/CLAUDE.md`](../templates/CLAUDE.md) — fill in the header (paper title + arXiv ID + DOI + one-line subject), any paper-specific Pointers. Leave Rules in the template state (universal across reproductions). Leave Rigor and Disagreements sections empty — iterations populate them.
 
 Show both drafts to the user, take corrections, refine, save. Then `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline) and commit both as the first commit.

From 0022bed14fe0b584a7fbb1156816d75826f507bf Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 19:15:58 +0200
Subject: [PATCH 067/124] docs: catch SPECIFY's implementation-notes.md output
 in lc-from-paper phase table
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The phase table in docs/skills/lc-from-paper.md listed SPECIFY's outputs
as astra.yaml + targets/targets.md + universes/baseline.yaml — but the
source-of-truth row in claude/lightcone/skills/lc-from-paper/SKILL.md
also names implementation-notes.md, and specify.md describes it as a
primary output ("concise practical guidance for the IMPLEMENT phase").
Align the docs row.
---
 docs/skills/lc-from-paper.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index 01f30e56..1a0ffeec 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -51,7 +51,7 @@ user's main session; phases 2–7 run as ralph iterations.
 | 0 | INTERVIEW | user's main session | per-paper `constitution.md` + `CLAUDE.md` |
 | 1 | ACQUIRE | user's main session | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}` |
 | 2 | ARCHITECT | ralph iteration | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative) |
-| 3 | SPECIFY | ralph iteration | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `universes/baseline.yaml` |
+| 3 | SPECIFY | ralph iteration | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
 | 4 | LITERATURE | ralph iteration | `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
 | 5 | IMPLEMENT | ralph iteration | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
 | 6 | RUN | ralph iteration | `results/<universe>/<output>/` |

From 987a0d55deb2340d4aca8f779d79ee9dbfe0e672 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 19:24:17 +0200
Subject: [PATCH 068/124] lc-from-paper: cold-survey cleanup of prior_insights
 placeholder language
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Survey signals and table rows said "placeholders without `evidence:`" or
"resolved with `evidence:` selectors" — but per the body of specify.md a
SPECIFY placeholder is a syntactically-complete Insight with
`evidence: [{id, doi}]` already present. What LITERATURE adds is the
`quote:` (+ `location:`) selectors inside each Evidence entry, not
"evidence selectors."

Same idiom-class slip across SKILL.md (phase-outputs + workdir-as-state
tables), literature.md (phase intro + ready-to-resolve signal + merge-
done signal + review-iteration entry condition), specify.md (validate-
clean signal), review.md (Inputs note), and docs/skills/lc-from-paper.md.

Body of specify.md and literature.md was already precise; the survey
signals and adjacent prose just slipped to shorthand that contradicts it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-paper/SKILL.md            | 4 ++--
 .../skills/lc-from-paper/references/literature.md         | 8 ++++----
 .../lightcone/skills/lc-from-paper/references/review.md   | 2 +-
 .../lightcone/skills/lc-from-paper/references/specify.md  | 2 +-
 docs/skills/lc-from-paper.md                              | 2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 958044a7..7738e7f2 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -38,7 +38,7 @@ Nine phases (zero-indexed). INTERVIEW and ACQUIRE run before the loop, in the us
 | 1 | ACQUIRE | user's main session | [`references/acquire.md`](references/acquire.md) | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}` |
 | 2 | ARCHITECT | ralph iteration | [`references/architect.md`](references/architect.md) | stub `astra.yaml` at project root (sub-analyses, inputs, outputs, narrative) |
 | 3 | SPECIFY | ralph iteration | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
-| 4 | LITERATURE | ralph iteration | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
+| 4 | LITERATURE | ralph iteration | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; per-paper PDFs cached via `astra paper add` |
 | 5 | IMPLEMENT | ralph iteration | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
 | 6 | RUN | ralph iteration | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
 | 7 | COMPARE | ralph iteration | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
@@ -114,7 +114,7 @@ Each iteration's survey reads the workdir to determine what phase is next. File
 | `work/reference/code/` (or `code-status.yaml` with `found: false`) + `work/reference/code-index.md` | ACQUIRE code substrate |
 | `astra.yaml` at project root validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
 | `astra.yaml` non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
-| `astra.yaml`'s `prior_insights:` resolved with `evidence:` selectors; `work/cited/<doi-slug>/` populated per cited paper | LITERATURE |
+| `astra.yaml`'s `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; `work/cited/<doi-slug>/` populated per cited paper | LITERATURE |
 | recipes present in `astra.yaml` + `scripts/` + `requirements.txt` | IMPLEMENT |
 | `results/<universe>/<output>/` for every output | RUN |
 | `comparison-report.yaml` | COMPARE |
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index 90d8d8c1..aa4f152b 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -6,7 +6,7 @@ The quote-finding direction is: **target paper's claim → quote inside the cite
 
 LITERATURE runs **after SPECIFY**, not before — relevant `prior_insights:` are defined by the decisions and findings they justify. Fetching cited papers speculatively before SPECIFY would do work for citations that may never end up needed.
 
-LITERATURE is what a ralph iteration does when the workdir signals "SPECIFY done + `prior_insights:` placeholders present without `evidence:` selectors." Its internal architecture is **two simple stages**: mechanical fetch (paper-extraction's deterministic script, batched-parallel via shell — no agent fan-out), then quote-finding (the iteration does it itself for small placeholder counts; spawns a small number of Haiku sub-agents inside its own main session for large counts). The agentic work is the quote-matching; the fetch is plumbing.
+LITERATURE is what a ralph iteration does when the workdir signals "SPECIFY done + `prior_insights:` placeholders present whose Evidence entries carry `doi:` but no `quote:` selector yet." Its internal architecture is **two simple stages**: mechanical fetch (paper-extraction's deterministic script, batched-parallel via shell — no agent fan-out), then quote-finding (the iteration does it itself for small placeholder counts; spawns a small number of Haiku sub-agents inside its own main session for large counts). The agentic work is the quote-matching; the fetch is plumbing.
 
 ## Inputs
 
@@ -169,7 +169,7 @@ When the iteration fans out to Haikus, each Haiku is spawned with `model="haiku"
 
 After the merge lands, the cross-check question is: do the `evidence:` quotes belong to the cited paper at the cited page? Do the quotes actually justify the placeholders' claims, or are they technically present but tangential? Do the claims actually support the decision options that reference them via `Option.insights`?
 
-**Default: review by iteration boundary.** The iteration that did the merge exits; the next iteration enters fresh, surveys, finds `astra.yaml`'s `prior_insights:` populated with `evidence:` selectors but no `work/notes/literature-review/round-N.md`, runs `astra validate --verify-evidence` for the deterministic check + a semantic re-read of each resolved insight, and writes review findings. The iteration after that applies the fixes (which may include re-running Haiku quote-finding for entries that need a different quote). Two consecutive review-iterations with verdict `clean` terminates the review cycle.
+**Default: review by iteration boundary.** The iteration that did the merge exits; the next iteration enters fresh, surveys, finds `astra.yaml`'s `prior_insights:` Evidence entries populated with resolved `quote:` + `location:` selectors but no `work/notes/literature-review/round-N.md`, runs `astra validate --verify-evidence` for the deterministic check + a semantic re-read of each resolved insight, and writes review findings. The iteration after that applies the fixes (which may include re-running Haiku quote-finding for entries that need a different quote). Two consecutive review-iterations with verdict `clean` terminates the review cycle.
 
 **Optional: in-iteration fan-out.** When the placeholder count is large and the fidelity intent calls for *heavy*, the merge iteration (or a subsequent review iteration) can fan out parallel reviewers as one-level-deep sub-agents inside its own session, partitioned by cited-paper subset. Each reviewer writes findings for its subset; the iteration merges and applies fixes in the same session.
 
@@ -204,10 +204,10 @@ If 5 review-iterations have happened without two consecutive clean rounds, log t
 
 ## Survey signals (entry into LITERATURE)
 
-- `astra.yaml` has `prior_insights:` placeholders — entries with `claim:` + `doi:` but no `evidence:` ⇒ ready to resolve
+- `astra.yaml` has `prior_insights:` placeholders — entries with `claim:` plus Evidence carrying `doi:` but no `quote:` selector ⇒ ready to resolve
 - `work/cited/<doi-slug>/work/reference/index.json` exists for each unique cited DOI ⇒ fetches done
 - `work/notes/literature/resolutions.yaml` exists with non-empty resolutions / unresolved sections ⇒ quote-finding done
-- `astra.yaml`'s `prior_insights:` entries each have a resolved `evidence:` selector ⇒ merge done
+- `astra.yaml`'s `prior_insights:` entries each have a resolved `quote:` (+ `location:`) selector on their Evidence ⇒ merge done
 - `astra validate astra.yaml --verify-evidence` returns clean ⇒ structural validation done
 - For cheap: at least one `work/notes/literature-review/round-<N>.md` with verdict `clean` (or no fixes were incorporated) ⇒ LITERATURE review done
 - For heavy: two consecutive `round-<N>.md` files with verdict `clean` ⇒ LITERATURE review done
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index e8b68f9c..577adf6a 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -8,7 +8,7 @@ The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding
 
 ## Inputs
 
-- `astra.yaml` — final spec (validates with `--verify-evidence` once LITERATURE has resolved every `prior_insights:` placeholder's `evidence:` selector)
+- `astra.yaml` — final spec (validates with `--verify-evidence` once LITERATURE has resolved every `prior_insights:` placeholder's Evidence `quote:` selector)
 - `comparison-report.yaml`, `comparison-report.md` — final verdict + opportunity assessment
 - `targets/targets.md` — what was being matched against; reference figures / tables in `targets/`
 - `results/<universe>/<output_id>/` — reproduced figures / tables / metrics
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 99ecc73a..f67851dd 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -240,7 +240,7 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - For each sub-analysis: when `work/reference/code/` exists, code-pass material-disagreement entries land in `decisions:` (with both options) and `universes/baseline.yaml` selects the canonical-resolution choice; `implementation-notes.md` carries non-material gotchas ⇒ code pass done
 - For cheap: each sub-analysis has at least a `work/notes/specify-review/<sub>-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ SPECIFY review done
 - For heavy: each sub-analysis has two consecutive `<sub>-round-<N>.md` files with verdict `clean` ⇒ SPECIFY review done
-- `astra validate astra.yaml` returns clean (placeholders without `evidence:` are valid at this stage) ⇒ structural side validated; `--verify-evidence` waits until LITERATURE has authored the resolved `evidence:` selectors
+- `astra validate astra.yaml` returns clean (placeholders whose Evidence carries `doi:` without `quote:` are valid at this stage) ⇒ structural side validated; `--verify-evidence` waits until LITERATURE has authored the `quote:` + `location:` selectors
 - `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-ledger done
 - `implementation-notes.md` exists ⇒ practical-guidance side done
 - All of the above ⇒ SPECIFY complete; proceed to IMPLEMENT
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index 1a0ffeec..e88474b6 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -52,7 +52,7 @@ user's main session; phases 2–7 run as ralph iterations.
 | 1 | ACQUIRE | user's main session | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}` |
 | 2 | ARCHITECT | ralph iteration | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative) |
 | 3 | SPECIFY | ralph iteration | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
-| 4 | LITERATURE | ralph iteration | `prior_insights:` resolved with `evidence:` selectors; per-paper PDFs cached via `astra paper add` |
+| 4 | LITERATURE | ralph iteration | `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; per-paper PDFs cached via `astra paper add` |
 | 5 | IMPLEMENT | ralph iteration | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
 | 6 | RUN | ralph iteration | `results/<universe>/<output>/` |
 | 7 | COMPARE | ralph iteration | `comparison-report.{yaml,md}` plus an opportunity assessment graded against the user's fidelity intent |

From d20c680721a6702adf20e2c3c156dafff02ec59a Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 19:31:05 +0200
Subject: [PATCH 069/124] lc-from-paper: cold-survey cleanup of
 placeholder-evidence shorthand in literature + architect
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Same idiom-class slip the prior round swept — procedural prose drifting
to shorthand that contradicts the precise model in the body.

literature.md Stage 1 fetch criterion ("entry whose `evidence:` is
missing or empty") and Stage 3 merge description ("the merge just sets
its `evidence:` field") both implied the Evidence is unset at SPECIFY
exit. Per literature.md's own body + the workdir-as-state signals: every
placeholder has `evidence: [{id, doi}]` from SPECIFY; what's missing is
the `quote:` selector inside each Evidence entry, and the merge augments
each Evidence with `quote:` + `location:`.

architect.md's stub-shape yaml comment ("LITERATURE resolves evidence")
carried the same shorthand at a different surface; updated to "LITERATURE
fills the quote: selectors" so the comment matches what specify.md and
literature.md say in their bodies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-paper/references/architect.md | 2 +-
 .../lightcone/skills/lc-from-paper/references/literature.md   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 797b23dc..bfc68611 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -63,7 +63,7 @@ analyses:
         description: |
           <one-line on what this output is>
     decisions: {}      # SPECIFY fills
-    prior_insights: {} # SPECIFY records placeholders (citation only), LITERATURE resolves evidence
+    prior_insights: {} # SPECIFY records placeholders (Evidence with doi:, no quote: yet), LITERATURE fills the quote: selectors
     findings: {}       # SPECIFY fills
 
   <sub-analysis-id-2>:
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index aa4f152b..42cae3bf 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -26,7 +26,7 @@ LITERATURE is what a ralph iteration does when the workdir signals "SPECIFY done
 
 ### Stage 1 — Mechanical fetch (batched, no agent fan-out)
 
-Collect every `prior_insights:` entry whose `evidence:` is missing or empty. Group by DOI. Each unique DOI becomes one fetch.
+Collect every unresolved `prior_insights:` placeholder — its Evidence carries `doi:` but no `quote:` selector yet. Group those DOIs uniquely; each unique DOI becomes one fetch.
 
 Run paper-extraction's substrate script for each unique DOI **in batches of 5** via shell parallelism. paper-extraction's `extract-paper-substrate.py` is deterministic — no agent involvement needed. Each invocation writes to `work/cited/<doi-slug>/work/reference/`:
 
@@ -68,7 +68,7 @@ The exact Haiku threshold and partition size are heuristic — they trade off co
 
 The iteration reads `work/notes/literature/resolutions.yaml` and writes the resolutions back into `astra.yaml`:
 
-- For each resolved placeholder, locate `prior_insights[<id>]` in `astra.yaml` (the placeholder already lives in its sub-analysis; the merge just sets its `evidence:` field).
+- For each resolved placeholder, locate `prior_insights[<id>]` in `astra.yaml` (the placeholder already lives in its sub-analysis with `evidence: [{id, doi}]`; the merge augments each Evidence entry with the newly-authored `quote:` + `location:` selectors — `id` and `doi` were already there).
 - For each unresolved placeholder, append a line to `open-questions.md` describing it — the user resolves at REVIEW close-out by either supplying a different citation, weakening the claim, or removing the placeholder entirely.
 - Run `astra validate astra.yaml --verify-evidence` after the merge to catch structural breakage early.
 

From 6f25aa5c795b74c49e403b79d77e079bf45d32bd Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 19:37:27 +0200
Subject: [PATCH 070/124] lc-from-paper/architect: cold-survey cleanup of
 stub-yaml top comment
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Same idiom-class slip the prior rounds swept — procedural prose at a
sibling surface drifting to shorthand that contradicts the precise model
nearby.

The stub-yaml top comment said "SPECIFY fills decisions, findings,
prior_insights, evidence, anchors" — lumping three different shapes into
one peer enumeration:

- decisions / findings / prior_insights are the empty top-level blocks
  SPECIFY populates (matches the per-key comments two lines below).
- "evidence" isn't a top-level block at all — it's nested inside Insight
  entries under findings / prior_insights, and its fill is two-phase:
  SPECIFY records placeholders with Evidence carrying doi: (no quote:),
  LITERATURE fills the quote: + location: selectors. The line-66 per-key
  comment already states this precisely; the top comment was
  contradicting it.
- "anchors" (astra-anchor: references) aren't a yaml key — they're
  textual references woven into existing narrative prose, at a
  different layer. The body uses "weave" (lines 34, 79, 103, 104) for
  this; the top comment was treating them as empty containers to fill.

Rewrote to mirror the body's topic sentence at architect.md line 3:
SPECIFY fills decisions/findings/prior_insights and weaves astra-anchor
references into the narrative. The per-key comments below carry the
prior_insights two-phase detail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-paper/references/architect.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index bfc68611..0f6bd73a 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -37,7 +37,7 @@ Read `constitution.md`, `CLAUDE.md`, `work/reference/index.json`, `work/referenc
 ### Stub shape — what `astra.yaml` looks like after ARCHITECT
 
 ```yaml
-# Stub: structure + narrative; SPECIFY fills decisions, findings, prior_insights, evidence, anchors.
+# Stub: structure + narrative. SPECIFY fills decisions/findings/prior_insights and weaves astra-anchor references into the narrative.
 id: <paper-slug>
 title: "<paper title>"
 doi: <doi>

From 6786a45470f3796caa9aa1b0fcd21f9e2c683da5 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 19:43:14 +0200
Subject: [PATCH 071/124] lc-from-paper/specify: precise three-block evidence
 shape in topic sentence

---
 claude/lightcone/skills/lc-from-paper/references/specify.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index f67851dd..8b7195f9 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -33,7 +33,7 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 
 - **`/narrative`** — narrative authoring (any of the five `narrative.{summary,inputs,methods,findings,outputs}` keys, plus decision `rationale:` fields) is owned by the narrative skill. Invoke it during the **paper pass** when authoring or extending narrative prose. The narrative skill teaches reserved entity names, the tree-path anchor grammar, the conditional-narrative requirement (which keys are required when), the five-key authoring order, paper-reproduction fidelity discipline, and the new downstream-consumer discipline (lightcone-cli#108). Do not duplicate that content.
 
-Your responsibility in this phase is the **content**: build out the `decisions:` / `prior_insights:` / `findings:` for each sub-analysis with verbatim paper quotes anchored to the paper as evidence, and weave `astra-anchor:` references back into the narrative as entries land. ARCHITECT already settled the structure.
+Your responsibility in this phase is the **content**: build out the `decisions:` / `prior_insights:` / `findings:` for each sub-analysis (each with its own evidence shape — detailed below), and weave `astra-anchor:` references back into the narrative as entries land. ARCHITECT already settled the structure.
 
 ## The two-pass-per-sub-analysis structure
 

From febf2f306205853afec4be4acf33fee558bbfb9e Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 19:50:52 +0200
Subject: [PATCH 072/124] lc-from-paper/implement: drop imprecise parenthetical
 from method-fidelity bullet
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The implement-reviewer prompt's Method-fidelity bullet had a parenthetical
claiming decisions: and findings: in astra.yaml '(carry the verbatim paper
quotes and code anchors)'. Under 0.0.10 grammar this is wrong on decisions:
(rationale prose + options without per-option evidence:; cited-paper quotes
flow through Option.insights → prior_insights:, target-paper quotes only on
findings:) and loose on 'code anchors' (code anchors live in CLAUDE.md's
Paper-vs-code disagreements section and implementation-notes.md, not in
decisions: or findings: structurally).

Same slip-class as iter 13's specify.md:36 fix (a sibling-surface topic claim
that conflated three different evidence shapes into one). Iter 13 left this
one as 'borderline; defensible' and handed it forward explicitly; fresh-eyes
re-evaluation finds the same wrong shape claim and the cleanest fix is to
drop the parenthetical entirely. The reviewer's input list already names
astra.yaml as input, and the surrounding bullet already directs them to
check the script against the spec's method — the parenthetical was doing
duplicate work and getting the shape wrong.
---
 claude/lightcone/skills/lc-from-paper/references/implement.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index 2cbb45a0..55855c24 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -87,7 +87,7 @@ The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: review
 > ### What to check
 >
 > 1. **Recipe coverage.** Every output in `astra.yaml` has a recipe; every recipe runs a script that exists in `scripts/`.
-> 2. **Method fidelity.** For each output, the script implements the method described by the relevant sub-analysis's `decisions:` and `findings:` in `astra.yaml` (which carry the verbatim paper quotes and code anchors). Where SPECIFY's code pass surfaced a material disagreement, the script follows the code's method (canonical-resolution rule), unless the spec recorded a different override in `decisions:` and `universes/baseline.yaml`.
+> 2. **Method fidelity.** For each output, the script implements the method described by the relevant sub-analysis's `decisions:` and `findings:` in `astra.yaml`. Where SPECIFY's code pass surfaced a material disagreement, the script follows the code's method (canonical-resolution rule), unless the spec recorded a different override in `decisions:` and `universes/baseline.yaml`.
 > 3. **Numerical correctness.** Constants, hyperparameters, threshold values match the paper (or the code, where the canonical-resolution rule applied). Flag mismatches with `path:line` of the script and the paper §/eq + the relevant `astra.yaml#analyses.<sub-id>.decisions.<key>` entry.
 > 4. **Data acquisition.** Scripts that fetch data use the real acquisition path from `astra.yaml`'s inputs — no synthetic / mock substitutes.
 > 5. **Determinism.** Scripts set random seeds where the paper's method is stochastic. Library versions in `requirements.txt` are pinned where reproducibility requires it.

From ffeb1a1a94e7e23dc67f450e88d9dde5b025cc7e Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 20:00:32 +0200
Subject: [PATCH 073/124] lc-from-paper: cold-survey cleanup of
 user-accepted-partial shorthand
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

SKILL.md:47 + SKILL.md:89 + review.md:95 all carried "(or user-accepted
`partial`)" as a parenthetical on the close pathway. compare.md is
precise: the close conditions are `pass` AND no `below`-intent
opportunities, OR retry budget exhausted — both iteration-level. There
is no mid-loop user-accept verb in the mechanism. The user's actual
acceptance of `partial` happens at REVIEW close-out (post-close,
AskUserQuestion-driven), or implicitly by editing the constitution
between iterations; neither is a "user-accepted" pre-condition for the
cold-survey close.

Same slip-class as iter 13's specify.md:36 and iter 14's
implement.md:90: a sibling-surface parenthetical drifted to looser
shorthand that doesn't match what the body describes. compare.md's own
compressed form (line 112) is "`partial` with un-acted opportunities
logged" — match that.

SKILL.md:47 also folds in "the COMPARE → IMPLEMENT loop terminates"
phrasing from compare.md to make the sentence read as the natural
post-condition of the retry-loop-ending it describes in the prior
sentences. SKILL.md:89 and review.md:95 just swap the parenthetical.
---
 claude/lightcone/skills/lc-from-paper/SKILL.md             | 4 ++--
 claude/lightcone/skills/lc-from-paper/references/review.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 7738e7f2..09ba5acd 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -44,7 +44,7 @@ Nine phases (zero-indexed). INTERVIEW and ACQUIRE run before the loop, in the us
 | 7 | COMPARE | ralph iteration | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
 | 8 | REVIEW | user's main session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
 
-COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the constitution's fidelity intent. A subsequent iteration decides whether to spend another IMPLEMENT round (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap as an open opportunity in CLAUDE.md's Rigor section. Once COMPARE returns `pass` (or user-accepted `partial`) and the un-acted opportunities are logged, a subsequent cold-survey iteration finds nothing left to do and flips the constitution's `status:` to `closed`. The loop terminates; REVIEW runs in the user's main session.
+COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the constitution's fidelity intent. A subsequent iteration decides whether to spend another IMPLEMENT round (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap as an open opportunity in CLAUDE.md's Rigor section. Once the COMPARE → IMPLEMENT loop terminates (verdict `pass`, or `partial` with the un-acted opportunities logged), a subsequent cold-survey iteration finds nothing left to do and flips the constitution's `status:` to `closed`. The loop terminates; REVIEW runs in the user's main session.
 
 ## The two pre-loop bookends
 
@@ -86,7 +86,7 @@ After INTERVIEW + ACQUIRE land, hand the rest of the reproduction off to a ralph
 
 The launcher detaches a tmux session named `ralph-<workdir>-constitution`. The user attaches with `tmux attach -t <session>`. Iterations start firing immediately; each runs in a fresh Claude (or Codex) session with `constitution.md` injected as the system prompt and the workdir's `CLAUDE.md` auto-loading.
 
-The loop runs until an iteration flips `constitution.md`'s frontmatter `status:` to `closed` — typically after COMPARE returns `pass` (or user-accepted `partial`) and the iteration that runs after that survey finds nothing left to do.
+The loop runs until an iteration flips `constitution.md`'s frontmatter `status:` to `closed` — typically after COMPARE returns `pass` (or `partial` with the un-acted opportunities logged) and the iteration that runs after that survey finds nothing left to do.
 
 Tell the user explicitly: "Launching the ralph loop in tmux session `<name>`. Attach with `tmux attach -t <name>`. Detach with the usual tmux prefix + `d`. The loop will run until the constitution closes (typically after COMPARE returns `pass`); at that point come back here and I'll run REVIEW close-out."
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index 577adf6a..382373a7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -92,7 +92,7 @@ This commit is the durable mark that the reproduction has reached close-out. Fut
 
 ## Survey signals (entry into REVIEW)
 
-- `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready to close out
+- `comparison-report.yaml` verdict is `pass` (or `partial` with un-acted opportunities logged) ⇒ ready to close out
 - `.lightcone/comparison.html` exists ⇒ `/figure-comparison` rendered
 - `open-questions.md` has a `## Resolutions` section covering every entry ⇒ open-questions walkthrough done
 - `REPRODUCTION-SUMMARY.md` exists ⇒ final report written

From 6406b0872f901115a32584f7b2339213c0249fb2 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 20:05:27 +0200
Subject: [PATCH 074/124] lc-from-paper/review: address iter 15's deferred
 review.md:3 flag
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

review.md:3 was the last surviving instance of the slip-class iter 15
was sweeping (siblings: SKILL.md:47, SKILL.md:89, review.md:95). Iter
15 explicitly handed it forward as borderline — "the parsing where
'a cold-survey iteration ... accepted partial' is talking about the
iteration that wrote the report" was defensible, but conflated two
distinct iterations: the COMPARE iteration that produced the verdict
and logged opportunities, and the subsequent cold-survey iteration
that found nothing and flipped status. The closing rule explicitly
separates them — an iteration that contributed cannot close.

Rewrote to the same parenthetical shape iter 15 applied at SKILL.md:89:

> after COMPARE returned `pass`, or `partial` with the un-acted
> opportunities logged, and the next cold-survey iteration found
> nothing left to do

`pass` and `partial`-with-logging are the COMPARE-side disjuncts;
the next-cold-survey-iteration is the close mechanism that follows
either disjunct. Matches compare.md:5/104/112's precise body.
---
 claude/lightcone/skills/lc-from-paper/references/review.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index 382373a7..0a53fbf0 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -1,6 +1,6 @@
 # REVIEW — close-out in the user's main session
 
-The reproduction has converged: the constitution's `status:` is `closed` (after COMPARE returned `pass` or a cold-survey iteration accepted `partial` and logged the opportunities). The ralph loop's tmux session has exited. REVIEW runs back in the user's main session — the second of two interactive bookends, the first being INTERVIEW. It runs in the user's main session (not as an iteration) because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which isn't available inside detached ralph iterations.
+The reproduction has converged: the constitution's `status:` is `closed` (after COMPARE returned `pass`, or `partial` with the un-acted opportunities logged, and the next cold-survey iteration found nothing left to do). The ralph loop's tmux session has exited. REVIEW runs back in the user's main session — the second of two interactive bookends, the first being INTERVIEW. It runs in the user's main session (not as an iteration) because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which isn't available inside detached ralph iterations.
 
 Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and propagate any un-acted-on opportunities from the latest COMPARE into CLAUDE.md's **Rigor** section — in one interactive arc.
 

From 475d3891f752ce5f0b31a7d1f9daa8d48caecd7a Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 20:15:40 +0200
Subject: [PATCH 075/124] lc-from-paper: surface cheap-mode review termination
 in SKILL.md:100

The per-iteration-discipline bullet's termination clause read heavy-only
("until two consecutive review iterations find no fixes or a 5-iteration
cap"), while all four artifact-producing references (architect.md:120,
specify.md:126, literature.md:176, implement.md:67) explicitly name both
modes (cheap: one clean review-iteration; heavy: two consecutive). Same
sibling-surface drift as the iter 13-16 slip-class: a one-line summary
at SKILL.md drifted from the precise body in the references.

Reword to name both termination modes inline, preserving the original
sentence shape and the 5-iteration cap.
---
 claude/lightcone/skills/lc-from-paper/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 09ba5acd..7508110f 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -97,7 +97,7 @@ Iterations follow the `/ralph` skill's Loop protocol — Survey → Work → Upd
 - **Survey starts with the constitution + CLAUDE.md, then the workdir.** Read the constitution to remember the goal and the fidelity intent. Read CLAUDE.md's Rigor accumulator to know where each output currently sits relative to the quality bar. Then survey the workdir against the **Workdir-as-state** table below to identify the next phase that needs work.
 - **One phase per iteration is the typical shape.** Don't try to do ARCHITECT *and* SPECIFY in one iteration; the fresh-context property of the next iteration is what makes review work, and conflating phases collapses the seam. (Exceptions: small targeted fixes after COMPARE may touch multiple phases in one iteration if they're tightly coupled.)
 - **Phase reference is your working spec for the iteration.** Whichever phase is next, read its `references/<phase>.md` on entry. That file carries the discipline for that phase's work (what to produce, code-as-canonical, rigor adjustment, etc.).
-- **Self-review is the next iteration.** Where ARCHITECT/SPECIFY/LITERATURE/IMPLEMENT used to spawn fresh-context reviewer sub-agents per round (broken — sub-agents can't spawn sub-agents), the discipline now collapses into iteration boundaries: iteration N writes the artifact, iteration N+1 reads it fresh and reviews, iteration N+2 applies fixes if needed, until two consecutive review iterations find no fixes or a 5-iteration cap. Each iteration is fresh by construction; the no-bias property is free.
+- **Self-review is the next iteration.** Where ARCHITECT/SPECIFY/LITERATURE/IMPLEMENT used to spawn fresh-context reviewer sub-agents per round (broken — sub-agents can't spawn sub-agents), the discipline now collapses into iteration boundaries: iteration N writes the artifact, iteration N+1 reads it fresh and reviews, iteration N+2 applies fixes if needed, until the review reaches its termination — one clean review-iteration for cheap, two consecutive clean for heavy, or a 5-iteration cap. Each iteration is fresh by construction; the no-bias property is free.
 - **Parallel fan-out lives inside an iteration.** LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis work, IMPLEMENT per-output work — these fan out as one-level-deep `Agent(...)` spawns inside the iteration's main session. Sub-agents can't spawn sub-agents, but an iteration *is* the main session, so it can spawn freely.
 - **`AskUserQuestion` is not available inside an iteration.** Each iteration runs in a detached tmux session; the user isn't reachable interactively. Iterations append questions to `open-questions.md` with their best-judgment default applied, and the user resolves them at REVIEW close-out (back in their main session).
 - **Update the accumulators in CLAUDE.md** before exit: Rigor *Current state* per output that the iteration changed; *Paper-vs-code disagreements* for any material conflict the iteration surfaced; *Open opportunities* for COMPARE-surfaced gaps.

From ba312399edac7c23f419bd2efb1f2859a59b1718 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 20:21:30 +0200
Subject: [PATCH 076/124] lc-from-paper: align cheap/heavy at SKILL.md:139 +
 docs mirror with the references
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

SKILL.md:139 and docs/skills/lc-from-paper.md:92-94 framed cheap/heavy
as the no-fan-out vs fan-out axis ("Cheap: write and exit, let the next
iter's survey serve as the review. Heavy: fan out parallel reviewers
inside the iter"). The four artifact-producing references
(architect.md:120, specify.md:126/202, literature.md:176,
implement.md:67) and SKILL.md:100 (iter 17's fix) define cheap/heavy as
the termination-depth axis instead — cheap = one clean review-iteration
is enough, heavy = two consecutive clean review-iterations required.
Sequential vs in-iteration fan-out is the orthogonal parallelism choice,
correctly handled at SKILL.md:141.

Within SKILL.md itself, :100 and :139 had been using cheap/heavy two
different ways. Realign :139 (and the docs mirror) with the references'
termination-depth meaning. The fan-out paragraph at :141 stays as the
parallelism axis, now cleanly orthogonal.

Same recurring slip-class as iters 13-17: a one-line summary at SKILL.md
drifting from precise bodies in the references.
---
 claude/lightcone/skills/lc-from-paper/SKILL.md | 2 +-
 docs/skills/lc-from-paper.md                   | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 7508110f..937ef654 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -136,7 +136,7 @@ REVIEW runs in your main session because `/figure-comparison` and `/check-senten
 
 **Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose — their own words for what "good enough" looks like (e.g. *"just checking the analysis is tractable"*, *"Figure 3 must be right; the rest can stay rough"*, *"every primary and secondary target lining up within stated tolerance"*).
 
-Each iteration translates the fidelity intent into a tactical sizing decision when working on an artifact-producing phase (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT). Derive how much in-iteration self-review-via-fan-out to run from the gap between where the artifact currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* write the artifact and exit; let the next iteration's fresh-context survey serve as the review. *Heavy:* fan out parallel reviewers as one-level-deep sub-agents inside the iteration, merge findings, apply fixes, exit. Either way, update CLAUDE.md's Rigor *Current state* so the trajectory stays honest across iterations.
+Each iteration translates the fidelity intent into a tactical sizing decision when working on an artifact-producing phase (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT). Derive how much review the artifact needs from the gap between where it currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* one clean review-iteration is enough — write, let the next iter read it fresh and review, accept after that single clean pass (with fixes applied in between if needed). *Heavy:* two consecutive clean review-iterations required — the review/fix cycle runs until two fresh-eyes passes both find nothing to fix. Either way, update CLAUDE.md's Rigor *Current state* so the trajectory stays honest across iterations.
 
 The default is **sequential review via iteration boundaries** — cheaper, no fan-out, and the fresh-context property is automatic. Reach for in-iteration fan-out when the parallelism actually pays (LITERATURE with many cited papers, SPECIFY with many independent sub-analyses, IMPLEMENT with many outputs).
 
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index e88474b6..ac0fca2e 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -89,10 +89,10 @@ Pointers, not snapshots.
   CLAUDE.md.
 - **Rigor is a trajectory toward the user's intent.** Each iteration
   sizes its work from the gap between *Current state* and the Goal's
-  fidelity intent — cheap (write and exit; let the next iteration's
-  fresh-context survey serve as the review) vs heavy (in-iteration
-  fan-out for parallel review). Default is sequential review via
-  iteration boundaries.
+  fidelity intent — cheap (one clean review-iteration is enough) vs
+  heavy (two consecutive clean review-iterations required). Default is
+  sequential review via iteration boundaries; in-iteration fan-out is
+  the orthogonal parallelism option where it actually pays.
 - **arxiv-LaTeX-first acquisition.** PDF + Docling is the non-arxiv
   fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,

From 8f6f4aafd3890c3def563e2dfac3d84f337f1708 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 21:49:42 +0200
Subject: [PATCH 077/124] lc-from-paper/templates: drop redundant
 code-as-canonical from constitution Evidence

The constitution.md template's Evidence section duplicated the
code-as-canonical rule that already lives in CLAUDE.md's Rules section.
The Evidence section's purpose is to describe substrate (where the paper
/ code / etc. live); the rule (what to do when paper and code disagree)
accretes per iteration via CLAUDE.md and isn't constitution-shaped.

Per the Round 7 constitution's "no section duplicated across both"
quality bar, the rule belongs only in CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-paper/templates/constitution.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
index b8e249ba..3607f81e 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/constitution.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -33,7 +33,7 @@ This is the ceiling; the fidelity intent determines which outputs need to actual
 The substrate this reproduction is built against — the canonical sources iterations consult:
 
 - **Paper:** `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` (from `/paper-extraction` during ACQUIRE). The `index.json#citations` block carries each cited paper's resolved DOI for LITERATURE.
-- **Code:** `work/reference/code/` (cloned during ACQUIRE; scan inventory at `work/reference/code-index.md`). Code is canonical for numerics, plotting, and method where it disagrees with the paper.
+- **Code:** `work/reference/code/` (cloned during ACQUIRE; scan inventory at `work/reference/code-index.md`).
 - **Paper DOI:** <doi>
 - **arXiv ID:** <id> (if applicable)
 - **Code repo URL:** <url>

From 17d53291995e3e7a8a372fa6398660581f2d22ea Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 21:49:50 +0200
Subject: [PATCH 078/124] lc-from-paper/architect: trim vestigial Round-6
 SendMessage reference
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The "No persistent expert sub-agents" bullet ended with "Targeted reads
on entry replace what used to be SendMessage queries to long-lived
experts" — a vestigial reference to Round 6's persistent paper-expert /
code-expert pattern that doesn't exist in the bundle post-Round-7. A
fresh reader has no context for "what used to be."

The Round 6/7 lineage lives in the PR description's Provenance section
and in the orchestrator-vs-ralph fiber's closing addendum, which is
where it belongs.

Reframe to forward-facing: "re-read what you need on entry."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-paper/references/architect.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 0f6bd73a..57192b50 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -131,7 +131,7 @@ This "review by iteration boundary" pattern is the default. For phases where par
 
 ## Notes
 
-- **No persistent expert sub-agents.** The on-disk substrate (`index.json`, `code-index.md`, the paper-extraction `astra.yaml`) carries the orientation iterations need. Targeted reads on entry replace what used to be SendMessage queries to long-lived experts.
+- **No persistent expert sub-agents.** The on-disk substrate (`index.json`, `code-index.md`, the paper-extraction `astra.yaml`) carries the orientation iterations need; re-read what you need on entry.
 - **The stub's empty blocks are intentional.** `decisions: {}`, `prior_insights: {}`, `findings: {}` make it clear at a glance that ARCHITECT's job is structural and SPECIFY fills them. Don't try to half-author content — empty is honest.
 - **Code-as-canonical for structure, paper-as-canonical for narrative voice.** The code reveals where the real stage boundaries are; the paper provides the words to describe them. The stub uses both.
 - **The narrative skill is the prose author, not the structure author.** Invoke `/narrative` for the prose blocks; ARCHITECT's job is the structural skeleton plus invoking `/narrative` to fill the `narrative:` keys cleanly.

From 6367c6b8c13274f6ef3790b501328cd9f61187f1 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 21:50:00 +0200
Subject: [PATCH 079/124] cli/init: reframe Next steps for agent-first usage
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The `lc init` Next steps message hardcoded a specific implementation
flow ("Edit astra.yaml ... lc run to materialize outputs ... lc status
to check what's done"), which is at odds with what the docs say (the
agent handles implementation; you stay in charge of scientific choices).

Alexandre flagged this on first contact with the dev version: "Do we
want to update these steps? This is a bit at odds with what we say in
the docs."

Reframe to "launch your agent, tell it what you want to do; the lc CLI
keeps the substrate in sync along the way" — matches Liam's framing of
lightcone-cli as the tracking layer, not a prescribed workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 src/lightcone/cli/commands.py | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/lightcone/cli/commands.py b/src/lightcone/cli/commands.py
index e528c6fd..24465d36 100644
--- a/src/lightcone/cli/commands.py
+++ b/src/lightcone/cli/commands.py
@@ -261,9 +261,13 @@ def init(
             )
 
     console.print("\nNext steps:")
-    console.print("  • Edit [cyan]astra.yaml[/cyan] to declare outputs and recipes")
-    console.print("  • [cyan]lc run[/cyan] to materialize outputs")
-    console.print("  • [cyan]lc status[/cyan] to check what's done")
+    console.print(
+        "  • Launch [cyan]claude[/cyan] (or your agent of choice) and tell it what you want to do"
+    )
+    console.print(
+        "  • [cyan]lc run[/cyan] / [cyan]lc status[/cyan] / [cyan]lc verify[/cyan] "
+        "keep the substrate in sync along the way"
+    )
 
 
 _CONTAINERFILE = """\

From 6918267395da651ecd43c51dab43651c60b2b811 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 21:50:10 +0200
Subject: [PATCH 080/124] README + user docs: substrate tracks, not prescribes
 (Liam alignment)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Three places in the user-facing surface that prescribed a specific
workflow rather than describing the substrate:

- README.md Quick Start hardcoded `/lc-new`, contradicting the next
  paragraph's "the `/lc-from-*` family is parallel by what you start
  from" framing. Reframe to mention all three entries.
- README.md "Building and verifying" prescribed a specific
  implementation flow ("the agent reads X, writes scripts under src/,
  runs lc run, watches lc status until ok ..."). After the spec exists,
  the user can work however suits them — agent-driven, ralph-looped, or
  hand-written. The lc CLI tracks regardless of process.
- docs/user/getting-started.md and docs/user/agent-workflow.md treated
  the slash commands as the only way to enter the agent surface. Add
  paragraphs making clear they're structured entry points, not
  requirements — vanilla Claude works too, the substrate keeps things
  tracked.

Per Liam's framing in #dev-agent-layer: "we aren't really prescribing
a way of doing an analysis, just a way of tracking it. So if you want
to use ralph loop / vanilla claude / write everything by hand you can
do so."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 README.md                    | 4 ++--
 docs/user/agent-workflow.md  | 5 +++++
 docs/user/getting-started.md | 6 ++++++
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 78ec33eb..56fcbdad 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@ cd my-analysis
 claude
 ```
 
-Then tell the agent `/lc-new` to scope your research question. After the spec exists, just tell the agent to build it — implementation is a normal Claude Code workflow guided by `.claude/guides/`.
+Then tell the agent what you have to start from — a research question (`/lc-new`), existing code (`/lc-from-code`), or a paper to reproduce (`/lc-from-paper`). After the spec exists, work with the agent however suits you; the substrate (`astra.yaml`, `lc run`, `lc status`, `lc verify`) keeps things in sync.
 
 ## Skills
 
@@ -50,7 +50,7 @@ Files a GitHub issue against the right repo (ASTRA or lightcone-cli) with versio
 
 ### Building and verifying
 
-Once `astra.yaml` exists, the agent reads `.claude/guides/lightcone-cli-reference.md` (workflow, commands, status meanings) and `.claude/guides/astra-reference.md` (spec syntax), writes the analysis scripts under `src/`, runs `lc run`, watches `lc status` until every output is `ok`, then runs `astra validate astra.yaml` and `lc verify` to confirm the spec is valid and the provenance chain is intact.
+Once `astra.yaml` exists, you (or the agent) build it however suits you. The typical flow is `lc run` to materialize outputs, `lc status` to track progress, `astra validate astra.yaml` for spec validity, and `lc verify` for provenance integrity — agent-driven, ralph-looped, or hand-written, the `lc` substrate stays in sync.
 
 ## CLI Reference
 
diff --git a/docs/user/agent-workflow.md b/docs/user/agent-workflow.md
index a11ad451..acecd4e2 100644
--- a/docs/user/agent-workflow.md
+++ b/docs/user/agent-workflow.md
@@ -7,6 +7,11 @@ a structured prompt: the agent follows a specific phased flow, not
 free-form chat. This page walks through each of them in the order you'd
 naturally hit them.
 
+The skills are structured entry points; they aren't requirements. Once
+you're inside a project, you can also just describe what you're working
+on to Claude — `astra.yaml` and the `lc` CLI keep things tracked
+whether you go through a skill or not.
+
 > The bracketed `→ astra.yaml` etc. notes show what each phase actually
 > writes to disk. You stay in charge of approving everything; the agent
 > never publishes a paper for you.
diff --git a/docs/user/getting-started.md b/docs/user/getting-started.md
index 63032827..a4692694 100644
--- a/docs/user/getting-started.md
+++ b/docs/user/getting-started.md
@@ -70,6 +70,12 @@ bug reports without leaving the session.
 | `/lc-from-paper` | You have a published paper (DOI / arXiv ID) you want to reproduce. |
 | `/lc-feedback` | Something broke and you want to file a GitHub issue without leaving the session. |
 
+These are structured entry points for common starting situations. You
+don't have to use them — once you're inside a project, you can also
+just describe what you're trying to do to Claude. `astra.yaml`,
+`lc run`, and `lc verify` keep things tracked regardless of how you
+got there.
+
 The next page, [The Agent Workflow](agent-workflow.md),
 explains each of these in more detail.
 

From 00e346773babc6a63c6125f0f8db0d902d9ad05e Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 21:50:16 +0200
Subject: [PATCH 081/124] gitignore: .DS_Store

macOS auto-creates these in any browsed directory. Add a one-line rule
so they stop showing up in git status.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .gitignore | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.gitignore b/.gitignore
index b22c8944..1c1dd2a3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -219,3 +219,6 @@ uv.lock
 .dev.vars*
 !.dev.vars.example
 !.env.example
+
+# macOS
+.DS_Store

From f12817d1c5959c81077c8318aad5525299393e37 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 23:03:56 +0200
Subject: [PATCH 082/124] =?UTF-8?q?lc-from-paper:=20drop=20in-iteration=20?=
 =?UTF-8?q?review=20fan-out=20=E2=80=94=20review=20is=20just=20sequential?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per-phase review had two procedures: sequential via iteration boundaries
(default) and in-iteration fan-out (optional, parallel reviewers spawned
inside one iteration). Two procedures for one job is overcomplication;
the optional fan-out rarely paid in practice and the framing forced
every review section to spell out "default vs optional" + a per-prompt
note covering both shapes.

Collapse to one: review always happens via iteration boundaries. The
fresh-context property is automatic; no second mode.

Work fan-out stays where it's load-bearing — LITERATURE Haiku
quote-finders, SPECIFY per-sub-analysis work, IMPLEMENT per-output
work. Those aren't reviews; they're parallelizable bulk work inside an
iteration's main session.

Cheap / heavy stays too — it's about termination depth (1 vs 2
consecutive clean review-iterations), orthogonal to the
parallelism axis that just collapsed.

Touches SKILL.md, architect.md, specify.md, literature.md,
implement.md, review.md, and the docs mirror.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../lightcone/skills/lc-from-paper/SKILL.md   |  2 +-
 .../lc-from-paper/references/architect.md     |  2 --
 .../lc-from-paper/references/implement.md     | 14 +++++------
 .../lc-from-paper/references/literature.md    |  6 ++---
 .../skills/lc-from-paper/references/review.md |  2 +-
 .../lc-from-paper/references/specify.md       | 25 ++++++++-----------
 docs/skills/lc-from-paper.md                  |  6 ++---
 7 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 937ef654..dfd7269d 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -138,7 +138,7 @@ REVIEW runs in your main session because `/figure-comparison` and `/check-senten
 
 Each iteration translates the fidelity intent into a tactical sizing decision when working on an artifact-producing phase (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT). Derive how much review the artifact needs from the gap between where it currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* one clean review-iteration is enough — write, let the next iter read it fresh and review, accept after that single clean pass (with fixes applied in between if needed). *Heavy:* two consecutive clean review-iterations required — the review/fix cycle runs until two fresh-eyes passes both find nothing to fix. Either way, update CLAUDE.md's Rigor *Current state* so the trajectory stays honest across iterations.
 
-The default is **sequential review via iteration boundaries** — cheaper, no fan-out, and the fresh-context property is automatic. Reach for in-iteration fan-out when the parallelism actually pays (LITERATURE with many cited papers, SPECIFY with many independent sub-analyses, IMPLEMENT with many outputs).
+Review always happens via iteration boundaries — the fresh-context property is automatic. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
 
 The *sketch / baseline / tightened / canonical* and *cheap / heavy* vocabularies are the iteration's internal scaffolding for sizing its work. The user's surface is the intent prose; the scaffolding only shows through when they ask how an iteration sized itself.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 57192b50..2af9aab7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -119,8 +119,6 @@ The iteration after the review-iteration reads `work/notes/architect/review-<N>.
 - If 5 review iterations have happened without two consecutive clean rounds, log the unfinished tail to `open-questions.md` ("ARCHITECT review reached round cap with N fixes still landing; user should review during REVIEW close-out") and let the next iteration advance to SPECIFY anyway. Don't loop forever on stub-level review.
 - If the iteration's fidelity-intent assessment calls for *cheap* — verdict `pass` on the first review-iteration is enough; skip the second-clean-round requirement and move on. The Rigor accumulator stays *stub: baseline*.
 
-This "review by iteration boundary" pattern is the default. For phases where parallelism actually pays (LITERATURE with many cited papers, SPECIFY with many independent sub-analyses, IMPLEMENT with many outputs), the relevant reference describes in-iteration fan-out as an alternative. ARCHITECT is small and serial enough that sequential-via-iteration is always the right call.
-
 ## Survey signals (entry into ARCHITECT)
 
 - `work/reference/index.json` + `work/reference/astra.yaml` + `work/reference/code-index.md` (when code present) exist ⇒ ACQUIRE substrate is ready
diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index 55855c24..6c5f16e1 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -1,6 +1,6 @@
 # IMPLEMENT — write scripts and recipes; review by iteration boundary
 
-Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, review (by iteration boundary, or in-iteration fan-out for parallelism) cross-checks the implementation against paper + code — same fresh-context-no-bias shape ARCHITECT, SPECIFY, and LITERATURE use, with the fresh-context property given for free by iteration boundaries.
+Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, review by iteration boundary cross-checks the implementation against paper + code — same fresh-context-no-bias shape ARCHITECT, SPECIFY, and LITERATURE use, with the fresh-context property given for free by iteration boundaries.
 
 IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done + scripts/ absent (first pass) or comparison-report.yaml shows partial/fail (retry pass)". Most implementation is mechanical (translate spec → script). Where parallelization is feasible (multiple independent outputs from different scripts), the iteration fans out to one-level-deep sub-agents per output (inside its own main session) and merges.
 
@@ -60,15 +60,13 @@ The iteration merges scripts and recipes after the per-output sub-agents finish.
 5. **Do not execute scripts** — the RUN phase handles execution via `lc run`.
 6. **Validate** with `astra validate astra.yaml` after adding recipes.
 
-## Step 2: review — by iteration boundary (default) or in-iteration fan-out (optional)
+## Step 2: review by iteration boundary
 
-After the first-pass implementation lands, the cross-check question is: is the implementation consistent with the paper and the code? The depth is sized from the gap between CLAUDE.md's Rigor *Current state* and `constitution.md`'s Fidelity intent:
+After the first-pass implementation lands, the cross-check question is: is the implementation consistent with the paper and the code? The depth is sized from the gap between CLAUDE.md's Rigor *Current state* and `constitution.md`'s Fidelity intent.
 
-**Default: review by iteration boundary.** The iteration that wrote the first pass exits when `scripts/`, recipes, and `requirements.txt` are committed; the next iteration enters fresh, surveys, finds the implementation present but no `work/notes/implement-review/round-1.md`, reads `scripts/` + `astra.yaml`'s recipes + the paper, and writes findings to `round-1.md`. The iteration after that applies the fixes. Two consecutive review-iterations with verdict `clean` terminates the review cycle; the next iteration advances to RUN. Sized: *cheap* — accept after one clean review-iteration; *heavy* — require two consecutive clean.
+The iteration that wrote the first pass exits when `scripts/`, recipes, and `requirements.txt` are committed; the next iteration enters fresh, surveys, finds the implementation present but no `work/notes/implement-review/round-1.md`, reads `scripts/` + `astra.yaml`'s recipes + the paper, and writes findings to `round-1.md`. The iteration after that applies the fixes. Two consecutive review-iterations with verdict `clean` terminates the review cycle; the next iteration advances to RUN. Sized: *cheap* — accept after one clean review-iteration; *heavy* — require two consecutive clean.
 
-**Optional: in-iteration fan-out.** When the implementation is large (many outputs, many scripts) and the fidelity intent calls for *heavy*, the iteration holding the review can fan out parallel reviewers as one-level-deep sub-agents inside its own session, partitioned by output or sub-analysis, merge findings, apply fixes in the same iteration. The next iteration's survey acts as the consolidating review.
-
-The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: review is fresh-context (whether across iterations or across fan-out spawns), prompted to check "is the implementation consistent with the paper and the code?", outputs findings only — not edits. Fixes are applied between iterations by the next iteration (or merged in the same iteration for fan-out). Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
+The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: review is fresh-context, prompted to check "is the implementation consistent with the paper and the code?", outputs findings only — not edits. Fixes are applied between iterations by the next iteration. Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
 
 ### Per-round fresh reviewer — system prompt
 
@@ -163,6 +161,6 @@ A retry attempt re-runs IMPLEMENT review (by iteration boundary) on the changed
 - **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
 - **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
 - **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
-- **The fresh-context discipline is the same as ARCHITECT's, SPECIFY's, and LITERATURE's review.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Iteration boundaries give fresh context automatically; in-iteration fan-out reviewers each get fresh-from-merge state without prior-round contamination.
+- **The fresh-context discipline is the same as ARCHITECT's, SPECIFY's, and LITERATURE's review.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Iteration boundaries give fresh context automatically.
 - **Minimize churn in fixes.** Targeted edits, not restructures. Big restructures defeat the round-over-round comparison the iteration sequence uses to decide termination.
 - **Commit per output as it lands.** One commit per script + recipe wiring; one commit per review-round file; one commit per fix pass. The next iteration reads `git log` to track progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index 42cae3bf..1c4b5d89 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -165,13 +165,11 @@ Rules:
 
 When the iteration fans out to Haikus, each Haiku is spawned with `model="haiku"` and gets this contract plus its assigned subset of placeholders and substrate paths.
 
-## Review — by iteration boundary (default) or in-iteration fan-out (optional)
+## Review by iteration boundary
 
 After the merge lands, the cross-check question is: do the `evidence:` quotes belong to the cited paper at the cited page? Do the quotes actually justify the placeholders' claims, or are they technically present but tangential? Do the claims actually support the decision options that reference them via `Option.insights`?
 
-**Default: review by iteration boundary.** The iteration that did the merge exits; the next iteration enters fresh, surveys, finds `astra.yaml`'s `prior_insights:` Evidence entries populated with resolved `quote:` + `location:` selectors but no `work/notes/literature-review/round-N.md`, runs `astra validate --verify-evidence` for the deterministic check + a semantic re-read of each resolved insight, and writes review findings. The iteration after that applies the fixes (which may include re-running Haiku quote-finding for entries that need a different quote). Two consecutive review-iterations with verdict `clean` terminates the review cycle.
-
-**Optional: in-iteration fan-out.** When the placeholder count is large and the fidelity intent calls for *heavy*, the merge iteration (or a subsequent review iteration) can fan out parallel reviewers as one-level-deep sub-agents inside its own session, partitioned by cited-paper subset. Each reviewer writes findings for its subset; the iteration merges and applies fixes in the same session.
+The iteration that did the merge exits; the next iteration enters fresh, surveys, finds `astra.yaml`'s `prior_insights:` Evidence entries populated with resolved `quote:` + `location:` selectors but no `work/notes/literature-review/round-N.md`, runs `astra validate --verify-evidence` for the deterministic check + a semantic re-read of each resolved insight, and writes review findings. The iteration after that applies the fixes (which may include re-running Haiku quote-finding for entries that need a different quote). Two consecutive review-iterations with verdict `clean` terminates the review cycle.
 
 Sized from the constitution's Fidelity intent: *cheap* — one clean review-iteration is enough; *heavy* — require two consecutive clean.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index 0a53fbf0..ba50dee7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -104,7 +104,7 @@ This commit is the durable mark that the reproduction has reached close-out. Fut
 - **This phase runs in the user's main session.** Do not invoke it from inside a ralph iteration. The whole point of REVIEW is that the user is reachable — every step uses `AskUserQuestion` (directly, or via the sibling skills it invokes), and iterations are detached.
 - **`/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`.** That's why REVIEW runs in the user's main session and they live here, not in any iteration. Invoking either inside an iteration fires prompts into nothing.
 - **The user owns the verdict-acceptance decision.** REVIEW's purpose is to let the user see what the loop's iterations did and decide whether they accept it. The skill renders surfaces and asks; it does not unilaterally close.
-- **Don't confuse with the per-phase reviews inside the loop.** ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT each have their own fresh-context review discipline that happens by iteration boundary (or in-iteration fan-out). Those are unrelated to this close-out — same word, different jobs. The phase boundary makes them unambiguous: per-phase reviews live inside their host phase's reference; this one is the post-loop close-out in the user's main session.
+- **Don't confuse with the per-phase reviews inside the loop.** ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT each have their own fresh-context review discipline that happens by iteration boundary. Those are unrelated to this close-out — same word, different jobs. The phase boundary makes them unambiguous: per-phase reviews live inside their host phase's reference; this one is the post-loop close-out in the user's main session.
 - **Open-question resolutions are durable.** Append to `open-questions.md`'s `## Resolutions` section so the next re-run / future session sees what was decided. Do not delete the original questions.
 - **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous.
 - **Do not invent further work.** If the user has accepted the verdict and the opportunities are propagated, the reproduction is done. The next session, the user, or a future revisit can decide whether tightening any open opportunity still serves them.
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 8b7195f9..0d118ae7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -4,7 +4,7 @@ Read the stub `astra.yaml` from ARCHITECT and fill in `decisions:`, `prior_insig
 
 SPECIFY is what a ralph iteration does when the workdir signals "stub `astra.yaml` present + sub-analyses' `decisions:` / `prior_insights:` / `findings:` blocks still empty." Iterations run detached in tmux; the user isn't reachable interactively, so the canonical-resolution default (code wins where paper and code disagree on a material choice) applies and disagreements are logged to CLAUDE.md's **Paper-vs-code disagreements** section plus `open-questions.md` for REVIEW close-out.
 
-The structure runs **two passes per sub-analysis** (paper, then code, when code exists), then iteration-boundary review (or optional in-iteration fan-out). The two passes are the cross-check: the paper pass authors what the paper says; the code pass surfaces where the code says something different; the difference is gold (it's where the reproduction has to make a decision).
+The structure runs **two passes per sub-analysis** (paper, then code, when code exists), then iteration-boundary review. The two passes are the cross-check: the paper pass authors what the paper says; the code pass surfaces where the code says something different; the difference is gold (it's where the reproduction has to make a decision).
 
 Per-sub-analysis work is parallelizable when sub-analyses are independent. Each sub-analysis's two passes (paper, then code) run sequentially within that sub-analysis; across sub-analyses the iteration can fan out parallel work as one-level-deep sub-agents from inside its main session. When SPECIFY needs paper- or code-side context, Grep into `work/reference/source/` / `document.md` for paper text or read targeted modules under `work/reference/code/`; the structural index at `work/reference/index.json` and the code inventory at `work/reference/code-index.md` give you the orientation to know where to look. Don't try to absorb the paper or code whole.
 
@@ -119,19 +119,17 @@ Read the code that implements this sub-analysis (`work/reference/code-index.md`'
 
 3. **Decision-option augmentation.** Where the code reveals an option the paper didn't mention but is defensible (a sibling implementation alternative used in the codebase or referenced in a comment), add it as a sibling option to the relevant `decisions:` entry. Do not pre-emptively author every code variant; only the ones that bear on a real choice.
 
-### Review — by iteration boundary (default) or in-iteration fan-out (optional)
+### Review by iteration boundary
 
 After the paper + code passes land for a sub-analysis, the cross-check question is: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
 
-**Default: review by iteration boundary.** The iteration that wrote a sub-analysis's passes exits when its passes are done; the next iteration enters fresh, surveys, finds the sub-analysis's passes present but no `work/notes/specify-review/<sub>-round-N.md`, reads the slice of `astra.yaml` + the paper + the code, and writes review findings. The iteration after that applies the fixes. Two consecutive review-iterations with verdict `clean` per sub-analysis terminates that sub-analysis's review cycle. The fresh-context-no-bias property is automatic at iteration boundaries — review iteration N doesn't see review iteration N-2's fixes any more than it would if they were sub-agent spawns. The depth is sized from the gap between CLAUDE.md's Rigor *Current state* for the sub-analysis and `constitution.md`'s Fidelity intent: *cheap* — accept after one clean review-iteration; *heavy* — require two consecutive clean.
+The iteration that wrote a sub-analysis's passes exits when its passes are done; the next iteration enters fresh, surveys, finds the sub-analysis's passes present but no `work/notes/specify-review/<sub>-round-N.md`, reads the slice of `astra.yaml` + the paper + the code, and writes review findings. The iteration after that applies the fixes. Two consecutive review-iterations with verdict `clean` per sub-analysis terminates that sub-analysis's review cycle. The fresh-context-no-bias property is automatic at iteration boundaries — review iteration N doesn't see review iteration N-2's fixes. The depth is sized from the gap between CLAUDE.md's Rigor *Current state* for the sub-analysis and `constitution.md`'s Fidelity intent: *cheap* — accept after one clean review-iteration; *heavy* — require two consecutive clean.
 
-**Optional: in-iteration fan-out.** When there are many independent sub-analyses to review and the iteration's fidelity-intent calculus calls for *heavy*, the iteration that holds the review work can fan out one-level-deep sub-agents (one per sub-analysis) inside its own session, merge findings, and apply fixes in the same iteration. This trades the natural fresh-context property between iterations for parallelism within an iteration. Reach for this only where the parallelism actually pays — most reproductions have small enough sub-analysis counts that sequential review via iteration boundaries is cleaner.
+#### Per-review-iteration prompt
 
-#### Per-review-iteration prompt (whether sequential or fan-out)
+The reviewer reads the slice fresh and writes findings only — never edits `astra.yaml` directly; that's the next iteration's job.
 
-The check-list and findings-file shape below applies whether the review work happens in a standalone iteration (default) or as a fan-out sub-agent inside an iteration (optional). Either way, the reviewer reads the slice fresh and writes findings only — never edits `astra.yaml` directly; that's the next iteration's (or the fan-out merge step's) job.
-
-> You are a SPECIFY reviewer for one sub-analysis. Read the relevant slice of `astra.yaml`, the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers (whether across iterations or across fan-out spawns); do not assume anything has already been fixed.
+> You are a SPECIFY reviewer for one sub-analysis. Read the relevant slice of `astra.yaml`, the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers across iterations; do not assume anything has already been fixed.
 >
 > ### Inputs
 >
@@ -157,7 +155,7 @@ The check-list and findings-file shape below applies whether the review work hap
 >
 > ### What NOT to do
 >
-> - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; the next iteration (or the fan-out merge step) applies the fixes. Editing here defeats the fresh-context discipline that makes review work.
+> - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; the next iteration applies the fixes. Editing here defeats the fresh-context discipline that makes review work.
 > - **Do not flag missing `recipes:`.** Recipes are IMPLEMENT's, not SPECIFY's.
 > - **Do not re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/reference/index.json`.
 > - **Do not invent problems.** If the sub-analysis is consistent with paper + code, say so briefly.
@@ -195,17 +193,14 @@ astra validate astra.yaml
 astra validate astra.yaml --verify-evidence  # after LITERATURE has resolved the prior_insights placeholders
 ```
 
-For in-iteration fan-out, the iteration that spawned the parallel reviewers reads each reviewer's findings file and merges fixes back into `astra.yaml` itself in the same iteration; the next iteration's survey acts as the consolidating review.
-
 #### Termination
 
-- **Cheap (sequential):** one review-iteration per sub-analysis after the passes land. Done after the next iteration applies the fixes (or immediately, if `fixes_needed` was 0).
-- **Heavy (sequential):**
+- **Cheap:** one review-iteration per sub-analysis after the passes land. Done after the next iteration applies the fixes (or immediately, if `fixes_needed` was 0).
+- **Heavy:**
   - If review-iteration N's `fixes_needed` was 0 AND review-iteration (N-1)'s was also 0 → done.
   - If review-iteration N is the first review (N=1), the next review-iteration runs unconditionally so we can compare across two fresh passes.
   - If review-iteration N produced fixes, the next iteration applies them, and the iteration after that runs the next review fresh.
   - If 5 review-iterations have happened without two consecutive clean rounds, log the unfinished tail in `open-questions.md` ("SPECIFY review for <sub-analysis-id> reached round cap; user should review during REVIEW close-out") and let the next iteration advance to LITERATURE.
-- **In-iteration fan-out (when chosen):** one iteration writes the passes; the next spawns N parallel reviewers and merges fixes in the same iteration. The iteration after that runs the next standalone review-iteration (or another fan-out, the iteration's call) for the second clean pass.
 
 When all sub-analyses' reviews terminate, SPECIFY produces the final outputs:
 
@@ -250,6 +245,6 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - **Material disagreements** are appended to CLAUDE.md's **Paper-vs-code disagreements** section AND `open-questions.md`. CLAUDE.md is the at-a-glance summary every iteration sees; `open-questions.md` is the user-resolution accumulator. Both lead to the same place: the user resolves at REVIEW close-out.
 - **The narrative skill is the prose author, not the structure author.** SPECIFY's job is content correctness; `/narrative` invocation comes during the paper pass when authoring or extending the narrative prose to weave in anchor references.
 - **The target ledger is a derivation, not a separate phase's output.** Treat `targets/targets.md` as a small index produced alongside the filled `astra.yaml`, not a heavyweight artifact. The depth lives in `astra.yaml`'s `outputs:` / `findings:` / `decisions:`.
-- **Two-pass discipline is the cross-check.** Skipping the code pass (when code exists) loses the canonical-resolution surface and lets paper-vs-code material disagreements slip through. The fresh-context review (iteration boundary or in-iteration fan-out) can recover *some* of these but not all — the disciplined sequence (paper → code → review) catches more.
+- **Two-pass discipline is the cross-check.** Skipping the code pass (when code exists) loses the canonical-resolution surface and lets paper-vs-code material disagreements slip through. The fresh-context review can recover *some* of these but not all — the disciplined sequence (paper → code → review) catches more.
 - **Per-sub-analysis parallelism is opt-in.** When sub-analyses are independent (no shared decision blocks, no cross-sub-analysis findings), the iteration can fan out one-level-deep sub-agents (one per sub-analysis from inside its main session) to run their passes in parallel. When they share material decisions or findings (rare), serialize across iterations.
 - **Commit per sub-analysis as it lands.** Each sub-analysis's filled-in `astra.yaml` slice + its targets/implementation-notes/baseline updates earn one commit; review files commit one per review-iteration. The next iteration reads `git log` to track progress; small commits keep the trail readable.
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index ac0fca2e..0c13eb89 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -90,9 +90,9 @@ Pointers, not snapshots.
 - **Rigor is a trajectory toward the user's intent.** Each iteration
   sizes its work from the gap between *Current state* and the Goal's
   fidelity intent — cheap (one clean review-iteration is enough) vs
-  heavy (two consecutive clean review-iterations required). Default is
-  sequential review via iteration boundaries; in-iteration fan-out is
-  the orthogonal parallelism option where it actually pays.
+  heavy (two consecutive clean review-iterations required). Review
+  happens sequentially via iteration boundaries; the fresh-context
+  property is automatic.
 - **arxiv-LaTeX-first acquisition.** PDF + Docling is the non-arxiv
   fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,

From 2481015e904e4c49dc6ae5ae95c6052ebe815461 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 23:08:20 +0200
Subject: [PATCH 083/124] skills: add /astra and /lc-cli reference skills; drop
 heavy session-start guide injection
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The two guide files (`astra-reference.md`, `lightcone-cli-reference.md`)
were referenced in 14 places across the bundle + docs *and* auto-injected
in full (~480 lines / ~30KB) by the session-start hook. Circular noise:
the same content arrived through every channel.

Convert each guide into a discoverable skill:

- `claude/lightcone/skills/astra/SKILL.md` — `/astra` (was
  `guides/astra-reference.md`). Comprehensive `astra.yaml` reference:
  structure, sub-analyses, decisions, options, prior insights, findings,
  evidence, narrative anchors, composition mechanics. Invoke whenever
  reading / writing / validating / debugging `astra.yaml`.
- `claude/lightcone/skills/lc-cli/SKILL.md` — `/lc-cli` (was
  `guides/lightcone-cli-reference.md`). Reference for `lc` workflow:
  commands, the Spec-Code Invariant, status interpretation, failure
  diagnosis, multiverse runs, WRROC export. Invoke whenever running,
  debugging, or diagnosing `lc` workflows.

Both files moved with `git mv` so history follows. Frontmatter added on
top. Internal cross-reference inside `lc-cli/SKILL.md` ("for spec syntax
see astra-reference.md") updated to invoke `/astra`.

Session-start hook reshape (`session-start.sh`):

- Drop the full-guide reference line.
- Add a tight primer:
  - Substrate CLIs (`lc init / run / status / verify / build / export
    wrroc`, `astra validate / paper add / universe generate`) so the
    agent has a vocabulary it can search and reach for.
  - Names of the two reference skills (`/astra`, `/lc-cli`) so Claude is
    primed to invoke them when relevant.

~15 lines instead of ~480; reliable activation instead of always-on cost.

Hyperlink updates across the 14 hit sites:

- Bundle skills (`lc-new/SKILL.md`, `lc-from-code/SKILL.md`,
  `templates/CLAUDE.md`): pointers to guide files converted to `/astra`
  / `/lc-cli` invocations, or deleted where activation alone suffices.
- Docs (`docs/index.md`, `docs/skills/index.md`,
  `docs/skills/authoring.md`, `docs/skills/lc-from-code.md`,
  `docs/skills/lc-new.md`, `docs/skills/narrative.md`,
  `docs/skills/paper-extraction.md`, `docs/user/multiverse.md`):
  hyperlinks pointed at the reference-skills section of
  `docs/skills/index.md`. New `### Reference skills (auto-primed via
  session-start)` sub-section added there; "Reference guides loaded by
  skills" table reframed as "Other plugin files" (ui-brand,
  lc-extractor, session-start.sh).

Reference skills are deliberately not mentioned in the top-level README
— researchers use the project-lifecycle skills directly; the reference
skills are infrastructure. Per Francois's call: "it needs to be very
clear for the user what they need to care about."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/scripts/session-start.sh     | 19 +++++++++-----
 .../astra/SKILL.md}                           | 14 ++++++++++
 .../lc-cli/SKILL.md}                          | 18 +++++++++++--
 claude/lightcone/skills/lc-from-code/SKILL.md | 12 +++------
 claude/lightcone/skills/lc-new/SKILL.md       |  9 ++-----
 claude/lightcone/templates/CLAUDE.md          |  7 ++---
 docs/index.md                                 |  7 ++---
 docs/skills/authoring.md                      | 14 +++++-----
 docs/skills/index.md                          | 26 ++++++++++++++-----
 docs/skills/lc-from-code.md                   |  4 +--
 docs/skills/lc-new.md                         |  2 +-
 docs/skills/narrative.md                      |  3 +--
 docs/skills/paper-extraction.md               |  3 +--
 docs/user/multiverse.md                       |  2 +-
 14 files changed, 89 insertions(+), 51 deletions(-)
 rename claude/lightcone/{guides/astra-reference.md => skills/astra/SKILL.md} (97%)
 rename claude/lightcone/{guides/lightcone-cli-reference.md => skills/lc-cli/SKILL.md} (90%)

diff --git a/claude/lightcone/scripts/session-start.sh b/claude/lightcone/scripts/session-start.sh
index 23b812ec..121f7c9f 100755
--- a/claude/lightcone/scripts/session-start.sh
+++ b/claude/lightcone/scripts/session-start.sh
@@ -1,11 +1,12 @@
 #!/bin/bash
 # SessionStart hook: surface a terse project status to the agent.
 #
-# Reports validation status, materialization counts, and pointers to the
-# canonical reference docs. Project name / decision count / universe count
-# are intentionally omitted -- they are trivia the agent reads from
-# astra.yaml and CLAUDE.md when needed, and they cost against the 10k
-# additionalContext budget.
+# Reports validation status, materialization counts, and a tight CLI
+# primer so the agent knows what substrate commands exist and which
+# reference skills carry the depth. Project name / decision count /
+# universe count are intentionally omitted -- they are trivia the agent
+# reads from astra.yaml and CLAUDE.md when needed, and they cost against
+# the 10k additionalContext budget.
 
 input=$(cat)
 cwd=$(echo "$input" | jq -r '.cwd // empty')
@@ -48,7 +49,13 @@ fi
 summary="$summary
 Materialization: ok=$ok_count stale=$stale_count missing=$missing_count alias=$alias_count
 
-References: .claude/guides/astra-reference.md (spec) and .claude/guides/lightcone-cli-reference.md (CLI)."
+Substrate CLIs (use --help on any):
+  lc init / lc run / lc status / lc verify / lc build / lc export wrroc
+  astra validate / astra paper add / astra universe generate
+
+Reference skills (invoke when the surface above isn't enough):
+  /astra   — astra.yaml spec: decisions, prior_insights, findings, evidence, sub-analyses, narrative anchors
+  /lc-cli  — lc workflow: spec-code invariant, status interpretation, failure diagnosis"
 
 if [ "$validation_ok" -ne 0 ]; then
     # tail rather than head -- the leading lines are success markers
diff --git a/claude/lightcone/guides/astra-reference.md b/claude/lightcone/skills/astra/SKILL.md
similarity index 97%
rename from claude/lightcone/guides/astra-reference.md
rename to claude/lightcone/skills/astra/SKILL.md
index 1c4ffec7..df11bb25 100644
--- a/claude/lightcone/guides/astra-reference.md
+++ b/claude/lightcone/skills/astra/SKILL.md
@@ -1,3 +1,17 @@
+---
+name: astra
+description: >
+  Comprehensive reference for the `astra.yaml` specification — top-level
+  structure, sub-analyses, inputs/outputs, decisions and options, prior
+  insights and findings, evidence and quote verification, narrative
+  anchors, and composition mechanics. Invoke whenever reading, writing,
+  validating, or debugging an `astra.yaml` spec; whenever working with
+  decisions, options, prior_insights, findings, or evidence; or whenever
+  the user asks about ASTRA schema, spec syntax, or sub-analysis
+  composition.
+allowed-tools: Read, Glob, Grep, Bash(astra:*)
+---
+
 # ASTRA Reference
 
 ## What an ASTRA Analysis Is
diff --git a/claude/lightcone/guides/lightcone-cli-reference.md b/claude/lightcone/skills/lc-cli/SKILL.md
similarity index 90%
rename from claude/lightcone/guides/lightcone-cli-reference.md
rename to claude/lightcone/skills/lc-cli/SKILL.md
index 7868ceff..6192ea2a 100644
--- a/claude/lightcone/guides/lightcone-cli-reference.md
+++ b/claude/lightcone/skills/lc-cli/SKILL.md
@@ -1,6 +1,20 @@
+---
+name: lc-cli
+description: >
+  Reference for `lc` CLI execution: commands (init/run/status/verify/build/export),
+  the Spec-Code Invariant (`astra.yaml` and code never diverge), status
+  interpretation (ok/stale/missing/alias), failure diagnosis, multiverse
+  runs, scratch overrides for HPC, sub-analysis scaffolding, publishing
+  via WRROC. Invoke whenever running, debugging, or diagnosing `lc`
+  workflows; whenever interpreting `lc status` / `lc verify` output; or
+  whenever the user asks about the development workflow surrounding
+  `astra.yaml`.
+allowed-tools: Read, Glob, Grep, Bash(lc:*), Bash(astra:*)
+---
+
 # lightcone-cli Reference
 
-Reference for lightcone-cli execution: CLI commands, development workflow, status interpretation, and failure diagnosis. For `astra.yaml` spec syntax, see `astra-reference.md`.
+Reference for lightcone-cli execution: CLI commands, development workflow, status interpretation, and failure diagnosis. For `astra.yaml` spec syntax, invoke `/astra`.
 
 ## CLI Reference
 
@@ -33,7 +47,7 @@ Sub-analyses are scaffolded by hand, since each one is just another `astra.yaml`
 2. Add a `path:` entry to the parent `astra.yaml` under `analyses:` (e.g. `analyses: { my_sub: { path: ./analyses/my_sub } }`).
 3. Add a `<name>: { universe: baseline }` entry to each existing parent universe file.
 
-Populate the sub-analysis's `astra.yaml` with inputs, outputs, and decisions. Use `from:` references to wire inputs and decisions to the parent or siblings — see `astra-reference.md` under "Composition Mechanics."
+Populate the sub-analysis's `astra.yaml` with inputs, outputs, and decisions. Use `from:` references to wire inputs and decisions to the parent or siblings — invoke `/astra` and see "Composition Mechanics" for the grammar.
 
 ## Development Workflow
 
diff --git a/claude/lightcone/skills/lc-from-code/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
index 4480fffd..cd3894c1 100644
--- a/claude/lightcone/skills/lc-from-code/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -14,13 +14,9 @@ This skill has two invocation contexts. The first is the user-driven default des
 
 The second is **scan-only**, used when `/lc-from-paper`'s ACQUIRE invokes this skill against a cloned reference repo at `work/reference/code/`. The invocation prompt will tell you explicitly to *do only Phase 1's scan*, write the inventory to a path it specifies (typically `work/reference/code-index.md`), and **stop** — do not touch `astra.yaml` at the project root, do not parameterize any code, do not run anything, do not modify the cloned repo. In scan-only mode, **do the inventory work inline** (using Read, Glob, Grep directly) rather than dispatching the Explore sub-agent that fresh-migration mode uses below. The scan-only branch can run nested inside another agent context (no sub-agent dispatch is safe in that case), and the inventory is bounded enough to do inline. Trust the invocation prompt's instructions over the fresh-migration defaults below; if the prompt says scan-only, the scan-only contract holds.
 
-## References
-
-- [ASTRA Reference](../../guides/astra-reference.md) -- spec structure, decision identification, recipes, universes
-
 ## Phase 1: Scan & Spec
 
-First, read the Decisions section of [ASTRA Reference](../../guides/astra-reference.md), then decide which mode applies:
+First, invoke `/astra` and read its Decisions section, then decide which mode applies:
 
 - **Fresh migration:** no meaningful `astra.yaml` exists yet. Use the code scan to draft `astra.yaml` and `universes/baseline.yaml`.
 - **Augment existing ASTRA:** `astra.yaml` already exists from a paper, user interview, or prior ASTRA work. Use the code scan to add to the current spec — recipes, dependencies, containers, code-backed decision options, baseline selections, implementation notes, and missing inputs / outputs where they naturally belong. Do not create a second `astra.yaml`, do not replace the existing structure wholesale, and surface major structure conflicts to the user before reshaping the spec.
@@ -57,7 +53,7 @@ be an analytical choice. The caller will filter down later.
 
 For reference, here are the decision criteria for classifying candidates:
 <decision-criteria>
-{paste Decisions section from astra-reference.md here}
+{paste Decisions section from `/astra` here}
 </decision-criteria>
 """)
 ```
@@ -67,9 +63,9 @@ In **scan-only** mode (invoked by `/lc-from-paper` ACQUIRE), do the same invento
 - `Glob` for `**/*.py`, `**/*.ipynb`, `**/Dockerfile`, `**/Containerfile`, `**/requirements*.txt`, `**/environment*.yml`, `**/pyproject.toml`, and any other relevant dependency / container manifests. Inventory the matches.
 - For each script and notebook, `Read` it (paginating with offset / limit for large files) to identify what it does, what it reads / writes, and any hardcoded analytical choices with `file:line` references.
 - `Grep` for repeated patterns when surveying for candidate decisions across the tree (magic numbers, common method-selector patterns, config-dict keys).
-- Apply the same decision criteria from the Decisions section of ASTRA Reference to classify candidates; the criteria are the filter regardless of whether the inventory came from an Explore sub-agent or inline reads.
+- Apply the same decision criteria from `/astra` (Decisions section) to classify candidates; the criteria are the filter regardless of whether the inventory came from an Explore sub-agent or inline reads.
 
-Either way, write the scan results to `CLAUDE.md` under `## Project Notes` (fresh migration) or to the path the invocation prompt specifies (scan-only — typically `work/reference/code-index.md`) as a script inventory, then in fresh migration mode draft or add to `astra.yaml` from the scan results following the spec structure documented in `.claude/guides/astra-reference.md`. In scan-only mode, stop after the inventory file lands; do not touch `astra.yaml`. Use the decision criteria from [ASTRA Reference](../../guides/astra-reference.md) to filter candidate decisions down to only true analytical choices — most hardcoded values are implementation details, not decisions. Use current hardcoded values as defaults.
+Either way, write the scan results to `CLAUDE.md` under `## Project Notes` (fresh migration) or to the path the invocation prompt specifies (scan-only — typically `work/reference/code-index.md`) as a script inventory, then in fresh migration mode draft or add to `astra.yaml` from the scan results following the spec structure documented in `/astra`. In scan-only mode, stop after the inventory file lands; do not touch `astra.yaml`. Use the decision criteria from `/astra` (Decisions section) to filter candidate decisions down to only true analytical choices — most hardcoded values are implementation details, not decisions. Use current hardcoded values as defaults.
 
 In augment mode, preserve the existing paper-derived or user-derived `inputs`, `outputs`, `decisions`, `findings`, and `narrative` unless the code scan shows a real conflict. Attach code evidence to the nearest existing home first. Create new ASTRA structure only when the code reveals a real analysis object that has no suitable home in the current spec.
 
diff --git a/claude/lightcone/skills/lc-new/SKILL.md b/claude/lightcone/skills/lc-new/SKILL.md
index db95c21a..c2db4c63 100644
--- a/claude/lightcone/skills/lc-new/SKILL.md
+++ b/claude/lightcone/skills/lc-new/SKILL.md
@@ -8,11 +8,6 @@ allowed-tools: Read, Write(astra.yaml), Write(universes/*), Write(CLAUDE.md), Ed
 
 Create a new ASTRA analysis project through conversation. Build the spec iteratively -- write to `astra.yaml` after each phase so the user sees progress. Literature search and decision identification happen in distinct phases -- talk first, then extract papers, then identify decisions informed by both conversation and literature.
 
-## References
-
-- [ASTRA Reference](../../guides/astra-reference.md) -- spec structure, decision identification, recipes, universes
-- [lightcone-cli Reference](../../guides/lightcone-cli-reference.md) -- `lc` workflow for the implementation phase that follows scoping
-
 ## Setup
 
 1. Read `astra.yaml` if it exists (to understand context or avoid overwriting)
@@ -40,7 +35,7 @@ Stage banner: ANALYSIS STRUCTURE
 
 > "Walk me through your analysis step by step. What goes in, what comes out at the end?"
 
-**Guidance on sub-analyses:** Analyses should only be split into multiple sub-analyses if each sub analysis genuinely has materially different inputs and outputs, and if the scope may be too broad if there is just one analysis; we overall want a sub-analysis to feel like it should genuinely be a self-contained product. For example, training + evaluation would typically be one analysis, because the product would be the trained and validated neural network estimator. When in doubt, opt for a single analysis at this stage. If it does need to be multi-stage, ask the user for confirmation and how to split it. For multi-stage analyses, make sure you confirm stage boundaries. See `.claude/guides/astra-reference.md` for YAML structure and sub-analysis guidance.
+**Guidance on sub-analyses:** Analyses should only be split into multiple sub-analyses if each sub analysis genuinely has materially different inputs and outputs, and if the scope may be too broad if there is just one analysis; we overall want a sub-analysis to feel like it should genuinely be a self-contained product. For example, training + evaluation would typically be one analysis, because the product would be the trained and validated neural network estimator. When in doubt, opt for a single analysis at this stage. If it does need to be multi-stage, ask the user for confirmation and how to split it. For multi-stage analyses, make sure you confirm stage boundaries. Invoke `/astra` for YAML structure and sub-analysis guidance.
 
 **One output per output.** Each output should be a single metric, a single plot, or a single artifact. Do not bundle multiple metrics into one output (e.g., "performance_metrics" containing accuracy, F1, and AUC). Each of those is its own output. Same for plots -- one figure per output.
 
@@ -79,7 +74,7 @@ Write extracted prior insights to astra.yaml immediately. Synthesize them by top
 
 ### Decision Identification
 
-Use the conversation and literature to identify decisions. Apply the decision criteria from [astra-reference.md](../../guides/astra-reference.md):
+Use the conversation and literature to identify decisions. Apply the decision criteria from `/astra` (Decisions section):
 
 - What could be done differently and still be defensible?
 - Where did papers disagree or compare alternatives?
diff --git a/claude/lightcone/templates/CLAUDE.md b/claude/lightcone/templates/CLAUDE.md
index 325f81e9..ecfd8428 100644
--- a/claude/lightcone/templates/CLAUDE.md
+++ b/claude/lightcone/templates/CLAUDE.md
@@ -2,10 +2,7 @@
 
 ASTRA analysis project, orchestrated by lightcone-cli.
 
-**Source of truth:**
-- `astra.yaml` — the analysis specification
-- `.claude/guides/astra-reference.md` — astra.yaml spec syntax
-- `.claude/guides/lightcone-cli-reference.md` — `lc` CLI commands, workflow, status, failures
+The single source of truth for this analysis is `astra.yaml`. Spec syntax and CLI workflow live in the `/astra` and `/lc-cli` reference skills (named in the session-start primer; invoke when you need depth).
 
 ### Quick Start
 
@@ -16,7 +13,7 @@ lc verify                 # check provenance integrity
 
 ### Keep astra.yaml and code in sync
 
-`astra.yaml` and the code must never diverge. When you change one, update the other in the same edit and run `astra validate astra.yaml`. See `lightcone-cli-reference.md` → "Spec-Code Invariant" for the full rules.
+`astra.yaml` and the code must never diverge. When you change one, update the other in the same edit and run `astra validate astra.yaml`.
 
 ---
 
diff --git a/docs/index.md b/docs/index.md
index 8c954908..e1c2b64e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -57,11 +57,12 @@ src/snakemake_executor_plugin_dask/   # Snakemake executor → dask.distributed
 
 claude/lightcone/               # Claude Code plugin (force-included into the wheel)
 ├── skills/                     # lc-new, lc-from-code, lc-from-paper,
-│                                # lc-feedback, ralph (+ bundle siblings)
+│                                # lc-feedback, ralph (+ bundle siblings);
+│                                # reference skills: astra, lc-cli
 ├── agents/                     # lc-extractor (literature subagent)
-├── guides/                     # astra-reference, lightcone-cli-reference, ui-brand
+├── guides/                     # ui-brand
 ├── templates/                  # project CLAUDE.md template
-└── scripts/                    # session hooks (bash): venv, validate-on-save, …
+└── scripts/                    # session hooks (bash): venv, validate-on-save, session-start primer
 
 tests/                          # pytest, mirrors src/
 pyproject.toml                  # hatchling + hatch-vcs; ASTRA + Snakemake as deps
diff --git a/docs/skills/authoring.md b/docs/skills/authoring.md
index d8b929fe..4e2544b9 100644
--- a/docs/skills/authoring.md
+++ b/docs/skills/authoring.md
@@ -48,17 +48,19 @@ Follow [`claude/lightcone/guides/ui-brand.md`](https://github.com/LightconeResea
 - A `## Restrictions` (or `## Hard rules`) section at the end listing
   invariants Claude must not break.
 
-## Referencing guide files
+## Referencing reference skills
 
-Guides live alongside the skills:
+Spec and CLI reference content live in their own skills — `/astra` and
+`/lc-cli` — so any skill needing depth can invoke them directly:
 
 ```markdown
-Before starting, read `.claude/guides/astra-reference.md` for the
-spec, and `.claude/guides/lightcone-cli-reference.md` for the CLI.
+Invoke `/astra` and read the Decisions section before classifying
+candidate decisions, and `/lc-cli` for the Spec-Code Invariant rules.
 ```
 
-The plugin layout means these paths are stable across both bundled
-(installed-package) and dev (in-repo) modes.
+Both are named in the session-start primer so they're discoverable
+from the first turn; explicit invocation in a skill body is the right
+call when a specific section is load-bearing for that skill's work.
 
 ## Spawning subagents
 
diff --git a/docs/skills/index.md b/docs/skills/index.md
index 151c0308..8c076a97 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -41,6 +41,17 @@ dispatches them by role during the reproduction.
 
 See the [bundle README](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/README.md) for the rationale behind co-location vs plugin install.
 
+### Reference skills (auto-primed via session-start)
+
+Not direct entry points — invoked by other skills (or by Claude directly when relevant) to load reference content into the working session. The session-start hook names both in its primer, so Claude is aware they exist from the first turn.
+
+| Skill | Command | Purpose |
+|-------|---------|---------|
+| `astra` | `/astra` | Reference for the `astra.yaml` spec: structure, decisions, options, prior insights, findings, evidence, sub-analyses, narrative anchors, composition mechanics. |
+| `lc-cli` | `/lc-cli` | Reference for `lc` workflow: commands, the Spec-Code Invariant, status interpretation, failure diagnosis, multiverse runs, publishing via WRROC. |
+
+These intentionally don't appear in the top-level README — researchers use the project-lifecycle skills directly; the reference skills are infrastructure.
+
 ## How a skill is wired
 
 Each skill is a `claude/lightcone/skills/<name>/SKILL.md` file with
@@ -75,25 +86,28 @@ claude/lightcone/
 │   ├── paper-extraction/{SKILL.md, scripts/*.py}
 │   ├── narrative/{SKILL.md, references/*.md}
 │   ├── figure-comparison/{SKILL.md, scripts/*.py}
-│   └── check-sentence-by-sentence/SKILL.md
+│   ├── check-sentence-by-sentence/SKILL.md
+│   ├── astra/SKILL.md                  # reference: astra.yaml spec
+│   └── lc-cli/SKILL.md                 # reference: lc workflow
 ├── agents/lc-extractor.md             # literature subagent for /lc-new
-├── guides/                            # reference docs loaded by skills
+├── guides/ui-brand.md                 # visual formatting conventions
 ├── templates/CLAUDE.md                # the project CLAUDE.md template
-└── scripts/*.sh                       # session lifecycle hooks
+└── scripts/*.sh                       # session lifecycle hooks (incl. session-start primer)
 ```
 
 The plugin is force-included into the wheel via
 `pyproject.toml::tool.hatch.build.targets.wheel.force-include`, so
 `lc init` finds it whether you're running from source or PyPI.
 
-## Reference guides loaded by skills
+## Other plugin files
+
+The two reference *skills* (`/astra` and `/lc-cli`) live under `skills/` and are listed in the [Reference skills](#reference-skills-auto-primed-via-session-start) section above. Remaining plugin files:
 
 | File | Purpose |
 |------|---------|
-| `claude/lightcone/guides/astra-reference.md` | Full `astra.yaml` schema reference. Loaded by `lc-new` and `lc-from-code`. |
-| `claude/lightcone/guides/lightcone-cli-reference.md` | CLI commands, status interpretation, failure diagnosis. Loaded by implementation and validation workflows. |
 | `claude/lightcone/guides/ui-brand.md` | Visual formatting conventions for skill output. |
 | `claude/lightcone/agents/lc-extractor.md` | Literature extraction subagent invoked by `/lc-new`. |
+| `claude/lightcone/scripts/session-start.sh` | Session-start hook — surfaces validation + materialization status and primes Claude with the substrate CLIs and reference skill names. |
 
 ## Authoring a new skill
 
diff --git a/docs/skills/lc-from-code.md b/docs/skills/lc-from-code.md
index 52276c7a..53fcf22b 100644
--- a/docs/skills/lc-from-code.md
+++ b/docs/skills/lc-from-code.md
@@ -20,8 +20,8 @@ Agent, AskUserQuestion
 ### Phase 1 — Scan & spec
 
 The skill spawns an `Explore` subagent (Claude Code's general-purpose
-search agent) with the decision criteria from `astra-reference.md`
-inlined into the prompt. The subagent returns a structured inventory:
+search agent) with the decision criteria from `/astra` (Decisions
+section) inlined into the prompt. The subagent returns a structured inventory:
 
 - Per script/notebook: file path, what it does, files it reads & writes,
   hardcoded analytical choices (with file:line, current value, what it
diff --git a/docs/skills/lc-new.md b/docs/skills/lc-new.md
index a3de0a38..d99d0e15 100644
--- a/docs/skills/lc-new.md
+++ b/docs/skills/lc-new.md
@@ -65,5 +65,5 @@ at the end so the user has something visible to review at every step.
 
 - After `/lc-new`, ask the agent to implement the spec through the
   normal Claude Code workflow.
-- [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md) — `astra.yaml` schema, decision criteria, prior insights / findings, universe management.
+- [`/astra`](index.md#reference-skills-auto-primed-via-session-start) — `astra.yaml` schema, decision criteria, prior insights / findings, universe management.
 - [`claude/lightcone/agents/lc-extractor.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/agents/lc-extractor.md) — the literature extraction subagent definition.
diff --git a/docs/skills/narrative.md b/docs/skills/narrative.md
index 4d659ab6..dcb6246b 100644
--- a/docs/skills/narrative.md
+++ b/docs/skills/narrative.md
@@ -104,5 +104,4 @@ Mode-specific anti-patterns live in each mode's reference under
 
 - [`/lc-from-paper`](lc-from-paper.md) — invokes `/narrative` during
   SPECIFY in paper-reproduction mode.
-- [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md)
-  — full schema reference.
+- [`/astra`](index.md#reference-skills-auto-primed-via-session-start) — full schema reference.
diff --git a/docs/skills/paper-extraction.md b/docs/skills/paper-extraction.md
index b0b601bd..4c90d8c7 100644
--- a/docs/skills/paper-extraction.md
+++ b/docs/skills/paper-extraction.md
@@ -129,5 +129,4 @@ cached PDF — paraphrasing breaks the gate.
   during ACQUIRE for the target paper, and again from inside a ralph
   iteration for each cited paper during LITERATURE; each iteration
   reads `index.json` and the substrate directly.
-- [`claude/lightcone/guides/astra-reference.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md)
-  — Insight + Evidence shape, `quote.exact` rules.
+- [`/astra`](index.md#reference-skills-auto-primed-via-session-start) — Insight + Evidence shape, `quote.exact` rules.
diff --git a/docs/user/multiverse.md b/docs/user/multiverse.md
index c98bbee7..26bed730 100644
--- a/docs/user/multiverse.md
+++ b/docs/user/multiverse.md
@@ -130,7 +130,7 @@ decisions:
 
 `requires:` means "this option is only valid when those conditions
 hold." `incompatible_with:` is the dual. The full schema is in the
-[ASTRA spec reference](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/astra-reference.md).
+[`/astra` reference skill](../skills/index.md#reference-skills-auto-primed-via-session-start).
 
 ## Sub-analyses
 

From d9c9a0aec23085e50922e1fe77c05ba62028b120 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 23:09:01 +0200
Subject: [PATCH 084/124] skills/README: add Reference skills section for
 /astra and /lc-cli

Bundle README missed the two reference skills added in the prior commit
(`/astra`, `/lc-cli`). Add a "Reference skills" section between Project
lifecycle and Paper-reproduction bundle, noting they're not direct entry
points but are primed by session-start so Claude is aware of them from
turn one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/README.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 2dec2528..bc26ab79 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -12,6 +12,15 @@ Each subdirectory is one Claude Code skill: `SKILL.md` plus optional `references
 | `lc-feedback` | Report bugs and feature requests upstream. |
 | `ralph` | Author a constitution and run a ralph loop against it (authoring + launching + iterating in one skill). `lc-from-paper` uses this for the long middle of a reproduction; standalone for any other long-running work. |
 
+## Reference skills
+
+Not direct entry points — Claude invokes these (or other skills invoke them) to load reference content into the session. The session-start hook primes their names so they're discoverable from turn one.
+
+| Skill | Role |
+|---|---|
+| `astra` | Reference for the `astra.yaml` spec: structure, decisions, options, prior insights, findings, evidence, sub-analyses, narrative anchors, composition mechanics. |
+| `lc-cli` | Reference for `lc` workflow: commands, the Spec-Code Invariant, status interpretation, failure diagnosis, multiverse runs, WRROC export. |
+
 ## Paper-reproduction bundle
 
 A self-contained toolkit for reproducing published papers in ASTRA. The bundle is co-located so a single `lc init` brings the full toolkit into a project — no plugin marketplace, no separate installs.

From f99770daec876850eaafefb2329d3c239bbf61c4 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Mon, 11 May 2026 23:56:26 +0200
Subject: [PATCH 085/124] docs + CLAUDE.md: purge stale ui-brand / guides/
 references

The ui-brand guide was deleted in f3fdc53 ("Remove lc-build, lc-verify
skills and ui-brand guide") but six surfaces still pointed at it (or
at the guides/ directory that's now empty after the astra/lc-cli
skillification):

- lightcone-cli/CLAUDE.md directory layout
- docs/index.md directory layout
- docs/skills/index.md plugin layout + "Other plugin files" table row
- docs/skills/authoring.md "Body conventions" had a dead link to ui-brand
  (kept the conventions themselves; they were inlined in this doc anyway)
- docs/user/tutorial.md "references in .claude/guides/" prose
- docs/cli/init.md .claude/ directory listing

Drop all six. Delete the now-empty claude/lightcone/guides/ directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md                | 3 +--
 docs/cli/init.md         | 2 +-
 docs/index.md            | 1 -
 docs/skills/authoring.md | 2 --
 docs/skills/index.md     | 2 --
 docs/user/tutorial.md    | 4 ++--
 6 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index 1bf0a8c3..1a5473d8 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -75,9 +75,8 @@ claude/lightcone/           # Claude plugin source — force-included into the w
 │                            # check-sentence-by-sentence
 │                            # (see skills/README.md for the full bundle map)
 ├── agents/                 # lc-extractor
-├── guides/                 # astra-reference, lightcone-cli-reference, ui-brand
 ├── templates/              # Project CLAUDE.md template
-└── scripts/                # Session hooks (bash): venv activation, validate-on-save, status display
+└── scripts/                # Session hooks (bash): venv activation, validate-on-save, session-start primer
 
 tests/                      # pytest — mirrors src/ structure
 pyproject.toml              # hatchling + hatch-vcs, ASTRA + Snakemake as deps
diff --git a/docs/cli/init.md b/docs/cli/init.md
index f408bbaf..27b2e58e 100644
--- a/docs/cli/init.md
+++ b/docs/cli/init.md
@@ -23,7 +23,7 @@ CLAUDE.md                     # short note pointing future agents at the project
 results/                      # placeholder; populated by `lc run`
 universes/                    # placeholder; populate via `astra universe generate -n …`
 .claude/                      # bundled Claude Code plugin
-  skills/, agents/, hooks/, scripts/, guides/, templates/
+  skills/, agents/, hooks/, scripts/, templates/
   settings.json               # the chosen permission tier
 .venv/                        # Python venv (skipped with --no-venv)
 ```
diff --git a/docs/index.md b/docs/index.md
index e1c2b64e..081a7570 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -60,7 +60,6 @@ claude/lightcone/               # Claude Code plugin (force-included into the wh
 │                                # lc-feedback, ralph (+ bundle siblings);
 │                                # reference skills: astra, lc-cli
 ├── agents/                     # lc-extractor (literature subagent)
-├── guides/                     # ui-brand
 ├── templates/                  # project CLAUDE.md template
 └── scripts/                    # session hooks (bash): venv, validate-on-save, session-start primer
 
diff --git a/docs/skills/authoring.md b/docs/skills/authoring.md
index 4e2544b9..545a6a0f 100644
--- a/docs/skills/authoring.md
+++ b/docs/skills/authoring.md
@@ -38,8 +38,6 @@ argument-hint: "[OPTIONAL ARG] [--flag VALUE]"
 
 ## Body conventions
 
-Follow [`claude/lightcone/guides/ui-brand.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/guides/ui-brand.md):
-
 - `##` for phase headings; lead with a "Stage banner" line that the
   skill prints to the chat.
 - `✓ / ○ / ✗` for status; never emojis except inside the agent's own
diff --git a/docs/skills/index.md b/docs/skills/index.md
index 8c076a97..32597550 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -90,7 +90,6 @@ claude/lightcone/
 │   ├── astra/SKILL.md                  # reference: astra.yaml spec
 │   └── lc-cli/SKILL.md                 # reference: lc workflow
 ├── agents/lc-extractor.md             # literature subagent for /lc-new
-├── guides/ui-brand.md                 # visual formatting conventions
 ├── templates/CLAUDE.md                # the project CLAUDE.md template
 └── scripts/*.sh                       # session lifecycle hooks (incl. session-start primer)
 ```
@@ -105,7 +104,6 @@ The two reference *skills* (`/astra` and `/lc-cli`) live under `skills/` and are
 
 | File | Purpose |
 |------|---------|
-| `claude/lightcone/guides/ui-brand.md` | Visual formatting conventions for skill output. |
 | `claude/lightcone/agents/lc-extractor.md` | Literature extraction subagent invoked by `/lc-new`. |
 | `claude/lightcone/scripts/session-start.sh` | Session-start hook — surfaces validation + materialization status and primes Claude with the substrate CLIs and reference skill names. |
 
diff --git a/docs/user/tutorial.md b/docs/user/tutorial.md
index 29c1667d..8745f972 100644
--- a/docs/user/tutorial.md
+++ b/docs/user/tutorial.md
@@ -111,8 +111,8 @@ Implement this analysis from astra.yaml. Write the scripts, run the baseline uni
 ```
 
 The agent reads everything (spec, universe file, empty `scripts/` dir,
-and the references in `.claude/guides/`) and makes an implementation
-checklist. It might look like this:
+plus the `/astra` and `/lc-cli` reference skills primed at session
+start) and makes an implementation checklist. It might look like this:
 
 ```
 1. Add Python deps (scikit-learn, matplotlib) to requirements.txt

From ba929df477dd393289271bccc7cfecb4df365d47 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Tue, 12 May 2026 00:15:52 +0200
Subject: [PATCH 086/124] astra: add Options subsection + flag
 Insight.created_at as required
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Drift cleanup against astra-spec v0.0.10. The hand-written /astra
reference was largely in sync but had a few schema fields underemphasized
or buried.

- Add **Options** subsection under Decisions, documenting:
  - `label:` (required) — short rendering name
  - `description:` (optional) — longer prose, was completely missing
  - `insights:` (optional) — promote the prior_insights back-reference
    here from line 227 (where it was buried in the Prior Insights section
    as an inline aside)
  - `excluded:` + `excluded_reason:` pointer to Constraints
- State the Decision.label requirement explicitly: `label:` and `options:`
  are required unless the decision is aliased via `from:`. The schema
  enforces this (analysis.yaml:570-582) but the skill only showed it by
  example.
- Update the Insight model summary (Prior Insights section) to flag
  `created_at` as required and ISO 8601 datetime — schema marks it
  required: true (insight.yaml:110-113), the skill listed it among other
  fields without distinction.

Skipped per "not fluff": Evidence.version DOI-only clarity (already
adequate), Insight.derived/scope pedagogy (examples cover it),
reserved-ID pattern technicality (already documented as reserved word
list). Net +12 lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/astra/SKILL.md | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/astra/SKILL.md b/claude/lightcone/skills/astra/SKILL.md
index df11bb25..e46a044e 100644
--- a/claude/lightcone/skills/astra/SKILL.md
+++ b/claude/lightcone/skills/astra/SKILL.md
@@ -114,6 +114,17 @@ A decision is a methodological choice where a different defensible option could
 
 Decisions may carry an optional `tags:` list for grouping (e.g. `[preprocessing]`, `[physics]`, `[stats]`). Keep the tag vocabulary **small and consolidated** -- reuse existing tags rather than minting new ones, since tags are mostly useful for cross-cutting views over a shared decision space, and that view fragments quickly when every decision invents its own label.
 
+### Options
+
+Each decision must have at least one option. Options are `key: { ... }` entries:
+
+- `label:` (required) -- short human-readable name for compact rendering.
+- `description:` (optional) -- longer prose explaining what the option means.
+- `insights:` (optional) -- list of `prior_insights:` IDs that justify this option; back-references the supporting evidence (see [Prior Insights and Findings](#prior-insights-and-findings)).
+- `excluded:` + `excluded_reason:` -- option considered but rejected. See [Constraints](#constraints).
+
+`label:` and `options:` are required on the decision itself. An aliased decision (one that points at another via `from: ../decisions.foo` -- see [Composition Mechanics](#composition-mechanics)) inherits both from its source and doesn't redeclare them.
+
 ### Parameterization
 
 **Every decision must be parameterized in code** -- never hardcode a decision value. The recipe's `command:` template references it via `{decisions.<id>}` (see [Command Template Substitution](#command-template-substitution)).
@@ -187,7 +198,7 @@ Two kinds of insight, distinguished by direction:
 - **Prior insights** (`prior_insights:`) — knowledge from outside the analysis that informs decisions. From literature (by DOI) or artifacts from a prior/parent analysis.
 - **Findings** (`findings:`) — conclusions from the analysis itself, backed by its own output artifacts.
 
-Both use the same Insight model: `id`, `label` (optional), `claim`, `created_at`, `evidence`, plus optional `derived` (true if synthesized/inferred from multiple sources), `scope` (applicability conditions), `tags`, `notes`. Placement determines direction.
+Both use the same Insight model. Required: `id`, `claim`, `created_at` (ISO 8601 datetime — e.g. `"2025-02-01T14:00:00"`), `evidence`. Optional: `label`, `derived` (true if synthesized/inferred from multiple sources), `scope` (applicability conditions), `tags`, `notes`. Placement determines direction.
 
 Each evidence item has its own fields: `id`, exactly one of `doi` (literature) or `artifact` (output ID), and either a `quote` (TextQuoteSelector with required `exact`, optional `prefix`/`suffix`) or `location` (FragmentSelector with `value` like `"page=6"` and/or 1-indexed `page`). DOI evidence may add `version` (arXiv version). Artifact evidence may add `snapshot` (path to an immutable artifact copy) and `source_commit` (git commit that produced it).
 

From 0fa6295fa4a3ef8aa7a53254034bb95102623968 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Tue, 12 May 2026 01:27:21 +0200
Subject: [PATCH 087/124] eval/harness: drop dead .claude/guides refs from
 default loop prompt
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`.claude/guides/lightcone-cli-reference.md` and `astra-reference.md` were
deleted in ba929df ("astra: add Options subsection…") when the heavy guides
got converted to the `/astra` and `/lc-cli` reference skills. The eval
harness's default loop prompt still pointed at the dead paths.

Rewrites the prompt to invoke the skills by slash-command name, matching
how the session-start hook already primes Claude with these references.

Caught during the plugin repackage sweep — orthogonal to that work, but
lives in the same eval module that the install-path search touched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 src/lightcone/eval/harness.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/lightcone/eval/harness.py b/src/lightcone/eval/harness.py
index e0343f5e..a9f1b58a 100644
--- a/src/lightcone/eval/harness.py
+++ b/src/lightcone/eval/harness.py
@@ -31,9 +31,9 @@
 DEFAULT_LOOP_PROMPT = """\
 Build the analysis specified in `astra.yaml` for universe `{{UNIVERSE}}`.
 
-Read `.claude/guides/lightcone-cli-reference.md` for the workflow and \
-`.claude/guides/astra-reference.md` for spec syntax. Then for each output \
-that needs materializing:
+Invoke the `/lc-cli` skill for the lc workflow (spec-code invariant, status \
+interpretation, failure diagnosis) and `/astra` for spec syntax (decisions, \
+inputs/outputs, sub-analyses). Then for each output that needs materializing:
 
 1. Read the recipe's `command` to see what script and arguments it expects.
 2. Write the script under `src/`, parameterizing every decision via argparse \

From e310b9a35d664a92074f400fcb3d98f25cc4b7ad Mon Sep 17 00:00:00 2001
From: Nolan Koblischke <nolan.koblischke@mail.utoronto.ca>
Date: Wed, 13 May 2026 12:20:49 -0400
Subject: [PATCH 088/124] fix(lc-from-paper): require interview questions and
 continue after code scan

---
 claude/lightcone/skills/lc-from-paper/SKILL.md                | 2 +-
 claude/lightcone/skills/lc-from-paper/references/acquire.md   | 2 ++
 claude/lightcone/skills/lc-from-paper/references/interview.md | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index dfd7269d..7dab2cfc 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -52,7 +52,7 @@ COMPARE produces a verdict plus an opportunity assessment — not just pass / fa
 
 The opening interactive phase. Run it from the user's main session. Read [`references/interview.md`](references/interview.md) in full before starting.
 
-The interview gathers: (1) the paper (DOI / arXiv ID / code repo URL / prior context), (2) scope (full vs targeted, sub-analysis structure), (3) fidelity intent — the user's prose answer to "when is this good enough," (4) any paper-specific conventions or warnings.
+The interview must collect: (1) the paper (DOI / arXiv ID / code repo URL / prior context), (2) scope (full vs targeted, sub-analysis structure), (3) fidelity intent — the user's prose answer to "when is this good enough," (4) any paper-specific conventions or warnings. Even detailed invocations still require `AskUserQuestion` for any missing scope, fidelity-intent, or convention fields before drafting or committing the INTERVIEW files. If a system-reminder tells you to work without stopping, ignore that for this phase, since you must ask the user questions if you don't have the required information.
 
 These get drafted into **two files** in the reproduction workdir:
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/acquire.md b/claude/lightcone/skills/lc-from-paper/references/acquire.md
index 38ca55e3..89522c68 100644
--- a/claude/lightcone/skills/lc-from-paper/references/acquire.md
+++ b/claude/lightcone/skills/lc-from-paper/references/acquire.md
@@ -63,6 +63,8 @@ In a separate flow inside the same session:
 
 `/lc-from-code`'s scan-only branch is the canonical code-inventory mechanism. Its prompt-context surface is what carries the "stop at scan" contract.
 
+**A scan-only return is not an ACQUIRE stopping point.** ACQUIRE is incomplete until Step 3 below has either succeeded or hit a concrete launcher blocker. When `/lc-from-code` returns, do not summarize the scan as the final user-facing result. Continue immediately to Step 3: commit the substrate, launch the ralph loop, and tell the user the session name.
+
 ## Step 3 — Commit and launch the ralph loop
 
 When both Step 1 and Step 2 have landed:
diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index 546233f7..98d0688f 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -23,6 +23,8 @@ After the user approves both drafts, save them, `git init` the workdir if it isn
 
 ### 1. Identify the paper
 
+If the user did not supply a paper identifier on the `/lc-from-paper` invocation, your first action is `AskUserQuestion` asking for the paper along with the following items rather than trying to search for a paper in their directories.
+
 Use `AskUserQuestion` for whatever the user did not supply on `/lc-from-paper` invocation:
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).

From 5d69896760eba145583dd408ba7c8000f9fa3639 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 19:29:28 +0200
Subject: [PATCH 089/124] docs/skills: writing pass on index + lc-new
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two changes:

index.md — opening reframed to show shapes (scoping / wrapping /
reproducing) instead of label them. Link text matched to the target
page's H1 ("The Agentic Workflow"). Em-dash + parenthetical compounds
broken into clauses. Bureaucratic "configures Claude Code: which
tools..." flattened to direct "tells Claude Code which tools...".

lc-new.md — accuracy: removed Bash(mkdir:*) and Bash(echo:*) (not in
SKILL.md frontmatter), corrected Task → Agent (the SKILL uses Agent).
Writing: phase list openers from em-dash to period; "with no code
written" defensive tail flipped to "implementation comes later — /lc-new
writes spec, not code"; "Files it may touch" → "Touchable files".
---
 docs/skills/index.md  | 29 +++++++++--------
 docs/skills/lc-new.md | 75 ++++++++++++++++++++++---------------------
 2 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/docs/skills/index.md b/docs/skills/index.md
index 797dd693..ba960dbb 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -1,18 +1,19 @@
 # Skills
 
 Skills are Claude Code slash commands bundled in the lightcone-cli
-plugin. They give the agent a structured, phase-by-phase workflow for
-the most common research operations.
+plugin. Each shapes the agent's workflow around a recurring research
+operation: scoping an analysis, wrapping existing code, reproducing
+a paper.
 
-If you're a researcher trying to *use* these, the
-[Claude Code Workflow](../user/agent-workflow.md) page in the user
-guide is the friendly version. This page is for maintainers.
+If you want to *use* these, start with
+[The Agentic Workflow](../user/agent-workflow.md) in the user guide.
+This page is for maintainers.
 
 ## Available skills
 
-The `/lc-from-*` family is parallel by what you start from: a question,
+The `/lc-from-*` family is parallel in what you start from: a question,
 code, or a paper. `/lc-from-paper` is the entry point of a six-skill
-paper-reproduction bundle; the five bundle siblings stand alone and are
+paper-reproduction bundle; the five siblings stand alone and are
 user-invokable directly.
 
 ### Project lifecycle
@@ -43,14 +44,14 @@ See the [bundle README](https://github.com/LightconeResearch/lightcone-cli/blob/
 
 ### Reference skills (auto-primed via session-start)
 
-Not direct entry points — invoked by other skills (or by Claude directly when relevant) to load reference content into the working session. The session-start hook names both in its primer, so Claude is aware they exist from the first turn.
+Not entry points. Other skills invoke them — or Claude does, when a deeper reference would help — to load reference content into the working session. The session-start hook names both in its primer, so Claude knows they exist from the first turn.
 
 | Skill | Command | Purpose |
 |-------|---------|---------|
 | `astra` | `/astra` | Reference for the `astra.yaml` spec: structure, decisions, options, prior insights, findings, evidence, sub-analyses, narrative anchors, composition mechanics. |
 | `lc-cli` | `/lc-cli` | Reference for `lc` workflow: commands, the Spec-Code Invariant, status interpretation, failure diagnosis, multiverse runs, publishing via WRROC. |
 
-These intentionally don't appear in the top-level README — researchers use the project-lifecycle skills directly; the reference skills are infrastructure.
+These intentionally stay out of the top-level README. Researchers use the project-lifecycle skills directly; the reference skills are infrastructure.
 
 ## How a skill is wired
 
@@ -67,11 +68,11 @@ argument-hint: "[DESCRIPTION]"
 ---
 ```
 
-The frontmatter configures Claude Code: which tools the skill may
-invoke, and what the slash command's argument hint looks like. The
-body is the prompt — phase definitions, rules, references to guide
-files, anti-patterns. The skill bundles its own helper scripts under
-`scripts/` and its loop prompt template under `assets/` when relevant.
+The frontmatter tells Claude Code which tools the skill may invoke
+and what the slash command's argument hint looks like. The body is the
+prompt itself: phase definitions, rules, references to guide files,
+anti-patterns. Skills bundle their own helper scripts under `scripts/`
+and longer prompt fragments under `assets/` when relevant.
 
 ## Plugin layout
 
diff --git a/docs/skills/lc-new.md b/docs/skills/lc-new.md
index c030c779..038caace 100644
--- a/docs/skills/lc-new.md
+++ b/docs/skills/lc-new.md
@@ -1,8 +1,9 @@
 # /lc-new
 
-Scope a new ASTRA analysis from a research question through conversation.
-Produces a complete `astra.yaml` (and optionally a literature evidence
-trail) with no code written.
+Scope a new ASTRA analysis from a research question, through
+conversation. The output is a complete `astra.yaml` and (optionally) a
+literature evidence trail. Implementation comes later — `/lc-new`
+writes spec, not code.
 
 Source: [`claude/lightcone/skills/lc-new/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-new/SKILL.md).
 
@@ -11,55 +12,55 @@ Source: [`claude/lightcone/skills/lc-new/SKILL.md`](https://github.com/Lightcone
 ```text
 Read, Write(astra.yaml), Write(universes/*), Write(CLAUDE.md),
 Edit(astra.yaml), Edit(universes/*), Edit(CLAUDE.md),
-Glob, Grep, Bash(astra:*), Bash(lc:*), Bash(mkdir:*), Bash(echo:*),
-WebSearch, WebFetch, AskUserQuestion, Task
+Glob, Grep, Bash(astra:*), Bash(lc:*),
+WebSearch, WebFetch, AskUserQuestion, Agent
 ```
 
-The skill is locked to spec-only writes — it cannot write Python, R, or
-arbitrary files. The lc-extractor subagent is invoked via `Task`.
+Writes are locked to the spec surface — no Python, no R, no arbitrary
+files. The `lc-extractor` subagent is dispatched via `Agent`.
 
 ## Phases
 
-1. **Research question** — sharpen the question, write `version`, `name`,
-   `description` to `astra.yaml` immediately so the user sees progress.
-2. **Analysis structure** — walk through inputs, outputs, sub-analyses.
-   One output per output: a single metric, a single plot, a single
-   artifact. Updates `astra.yaml` with `inputs:` and `outputs:`.
-3. **Deep dive** (per section) — optional literature pass. Collect paper
-   candidates; for each approved paper, spawn one `lc-extractor`
-   subagent (parallel, via `Task`). Each subagent reads the PDF, pulls
-   verbatim quotes, runs `astra paper verify-quotes` to machine-verify
-   the quotes against the source, and returns extracted prior insights.
-   Then identify decisions informed by the conversation + literature
-   and write them to `astra.yaml`.
-4. **Finalize** — `astra validate astra.yaml`, `astra validate
-   --verify-evidence` if quotes exist, `astra universe generate -n
-   baseline`, populate the `narrative:` block in `astra.yaml` (`summary`,
-   `methods`, `inputs`, `outputs` — `findings` stays TODO until results
-   exist), then populate the `## Working Notes` section of `CLAUDE.md`
-   with conversational context not captured in the spec.
+1. **Research question.** Sharpen the question, then write `version`,
+   `name`, and `description` to `astra.yaml` so the user has something
+   visible to react to from the first turn.
+2. **Analysis structure.** Walk through inputs, outputs, and any
+   sub-analyses. One output per output: one metric, one plot, one
+   artifact. `inputs:` and `outputs:` land in `astra.yaml` as they
+   crystallize.
+3. **Deep dive (per section).** An optional literature pass. Collect
+   paper candidates with the user; for each approved paper, dispatch
+   one `lc-extractor` subagent in parallel. Each subagent reads the
+   PDF, pulls verbatim quotes, runs `astra paper verify-quotes` to
+   machine-verify the quotes against the source, and returns prior
+   insights. Decisions then fall out of the conversation and the
+   literature together.
+4. **Finalize.** `astra validate astra.yaml`; `astra validate
+   --verify-evidence` if quotes exist; `astra universe generate -n
+   baseline`. Populate the `narrative:` block (`summary`, `methods`,
+   `inputs`, `outputs` — `findings` stays TODO until results exist),
+   then fill the `## Working Notes` section of `CLAUDE.md` with
+   conversational context the spec doesn't carry.
 
-The skill writes to `astra.yaml` after each phase rather than in bulk
-at the end so the user has something visible to review at every step.
+Writes happen at the end of each phase, not in bulk — the user always
+has something visible to review.
 
 ## Hard restrictions (from the SKILL.md)
 
-
-- Specification agent only — cannot write Python, R, or other
-  implementation code.
-- Files it may touch: `astra.yaml`, `universes/*.yaml`, `CLAUDE.md`
+- Specification agent only. No Python, no R, no implementation code.
+- Touchable files: `astra.yaml`, `universes/*.yaml`, and `CLAUDE.md`
   (Finalize only).
-- Never fabricates quotes — all evidence must pass
+- Quotes are never fabricated; every evidence entry must pass
   `astra validate --verify-evidence`.
-- PDFs are read by lc-extractor subagents only; the main agent never
-  pulls a PDF into its own context.
+- PDFs stay inside `lc-extractor` subagents — the main agent never
+  pulls one into its own context.
 
 ## Anti-patterns called out in the prompt
 
 - Bulk-writing decisions at the end instead of after each crystallizes.
-- Accepting vague goals like "analyze this data" without sharpening.
-- Method-only decisions; the prompt actively probes for data
-  exclusion, variable operationalization, inference criteria.
+- Letting vague goals like "analyze this data" pass without sharpening.
+- Method-only decisions. The prompt actively probes data exclusion,
+  variable operationalization, and inference criteria.
 - Reading PDFs in the main agent context.
 - Skipping `astra validate --verify-evidence`.
 

From d1d76451ed28c6f566f6ef5e9cd885b64dde326c Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 19:29:43 +0200
Subject: [PATCH 090/124] lc init: scaffolded CLAUDE.md surfaces the three
 entry skills
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Previously the scaffolded CLAUDE.md only mentioned lc run/status/verify
— substrate, not on-ramps. A user opening claude in a fresh project
lost the entry-skill hints that lc init had just printed on the
terminal. Now CLAUDE.md names /lc-new, /lc-from-code, /lc-from-paper
with one-line use cases, so the first thing Claude reads about the
project tells it this is freshly scaffolded and which skill to reach
for.
---
 src/lightcone/cli/commands.py | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/lightcone/cli/commands.py b/src/lightcone/cli/commands.py
index 9987836e..c1870711 100644
--- a/src/lightcone/cli/commands.py
+++ b/src/lightcone/cli/commands.py
@@ -339,9 +339,17 @@ def init(
 
 _PROJECT_CLAUDE_MD = """# Project Notes for Claude
 
-This is an ASTRA project orchestrated by `lightcone-cli`.
+This is an ASTRA project orchestrated by `lightcone-cli`. It was just
+scaffolded by `lc init` and has not been scoped yet — `astra.yaml` holds
+the placeholder example, not real science.
 
-To materialize outputs declared in `astra.yaml`:
+The three entry skills cover the common starting points:
+
+- `/lc-new` — scope from a research question (empty `astra.yaml`).
+- `/lc-from-code` — wrap an existing codebase in ASTRA.
+- `/lc-from-paper` — reproduce a published paper end-to-end.
+
+Once scoped, the `lc` CLI keeps the substrate in sync:
 
 ```
 lc run                    # all outputs in the default universe

From 2b4a12005b424f88d322a8350cddf102b3e3641b Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 19:31:18 +0200
Subject: [PATCH 091/124] docs/skills/lc-from-code: writing pass + output-path
 accuracy
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Accuracy: outputs were described as files
(`results/{universe}/{output_id}.ext`), but each output is in fact a
directory (`results/{universe}/{output_id}/`). The recipe receives
`{output}` as the directory; scripts write artifacts inside it.
Reworded the convention bullet to match SKILL.md.

Writing: "Bring an existing project into ASTRA / lightcone-cli" → "Wrap
an existing codebase in ASTRA" — punchier verb, drops the slash
construction. "Should be minimal" → "are minimal parameter plumbing"
(removes the softener). Phase 1 collapsed a trailing single-sentence
paragraph into the main clause; em-dash bullet openers replaced with
bolded titles + periods; "behavior twice" in the Hard rules tightened.
---
 docs/skills/lc-from-code.md | 95 ++++++++++++++++++-------------------
 1 file changed, 47 insertions(+), 48 deletions(-)

diff --git a/docs/skills/lc-from-code.md b/docs/skills/lc-from-code.md
index 4f1cdafb..9745e078 100644
--- a/docs/skills/lc-from-code.md
+++ b/docs/skills/lc-from-code.md
@@ -1,9 +1,9 @@
 # /lc-from-code
 
-Bring an existing project into ASTRA / lightcone-cli, starting from the
-code. Scans the codebase, generates `astra.yaml`, parameterizes hardcoded
-analytical choices, and runs until outputs materialize. Existing logic
-stays intact — changes should be minimal.
+Wrap an existing codebase in ASTRA. The skill scans the project, drafts
+`astra.yaml` against what the code already does, parameterizes its
+hardcoded analytical choices, and runs until outputs materialize.
+Existing logic stays intact; the edits are minimal parameter plumbing.
 
 Source: [`claude/lightcone/skills/lc-from-code/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-from-code/SKILL.md).
 
@@ -20,63 +20,62 @@ Agent, AskUserQuestion
 ### Phase 1 — Scan & spec
 
 The skill spawns an `Explore` subagent (Claude Code's general-purpose
-search agent) with the decision criteria from `/astra` (Decisions
-section) inlined into the prompt. The subagent returns a structured inventory:
-
-- Per script/notebook: file path, what it does, files it reads & writes,
-  hardcoded analytical choices (with file:line, current value, what it
-  controls), how it's invoked.
-- Project-level: dependency files, data files, existing container
-  setup.
-
-The main agent filters the candidate decisions down to true analytical
-choices (most hardcoded values are implementation details, not
-decisions), drafts `astra.yaml` with `recipe:` blocks pointing at the
-existing scripts, and generates `universes/baseline.yaml` with all
-defaults matching the current hardcoded values — so the first run
-reproduces existing behavior. Spec is then validated with
-`astra validate astra.yaml`.
-
-The user is asked to review before Phase 2.
+search agent) with `/astra`'s Decisions criteria inlined into the
+prompt. The subagent returns a structured inventory:
+
+- **Per script/notebook**: file path, what it does, files it reads
+  and writes, hardcoded analytical choices (with `file:line`, current
+  value, what it controls), how it's invoked.
+- **Project-level**: dependency files, data files, any existing
+  container setup.
+
+The main agent keeps only the genuinely analytical choices (most
+hardcoded values are implementation details), drafts `astra.yaml` with
+`recipe:` blocks pointing at the existing scripts, and generates
+`universes/baseline.yaml` with defaults matching the current hardcoded
+values — so the first run reproduces existing behavior. `astra validate
+astra.yaml` then checks the spec, and the user reviews before Phase 2.
 
 ### Phase 2 — Implement (parameterize)
 
-The skill picks an approach per script type:
+The approach depends on the shape of each script:
 
-- **Script with hardcoded values** — add (or extend) argparse, replace
-  hardcoded values with parsed args.
-- **Notebook** — move the `.ipynb` to `notebooks/` (preserved as
-  reference), create a `.py` script that does the parameterized
-  version. The recipe points at the new script.
-- **Config-file-driven project** — write a thin wrapper script that
-  accepts ASTRA decision args, writes the config, then calls the
-  original entry point. The user's config-driven code stays untouched.
+- **Script with hardcoded values.** Add or extend `argparse`; replace
+  the hardcoded values with parsed args.
+- **Notebook.** Move the `.ipynb` to `notebooks/` (kept as reference)
+  and create a `.py` script that does the parameterized version. The
+  recipe points at the new script.
+- **Config-file-driven project.** Write a thin wrapper that accepts
+  ASTRA decision args, writes the config, and calls the original
+  entry point. The user's config-driven code stays untouched.
 
 Hard conventions enforced by the prompt:
 
-- Decision IDs use underscores in `astra.yaml` (`outlier_sigma`).
-  lightcone-cli passes `--outlier_sigma`. Argument parsing must match.
-- Output paths follow `results/{universe}/{output_id}.ext` (the
-  per-output convention).
-- Don't refactor, restructure, or "improve" existing code — only
-  parameter plumbing.
+- Decision IDs use underscores (`outlier_sigma`), and lightcone-cli
+  passes them as `--outlier_sigma`. Argument parsing must match.
+- Each output is a *directory*, `results/{universe}/{output_id}/`. The
+  recipe receives `{output}` as that directory; scripts write artifacts
+  inside it (`{output}/data.parquet`).
+- Don't refactor, restructure, or "improve" existing code — parameter
+  plumbing only.
 
 ### Phase 3 — Run & debug
 
-`lc run --universe baseline`. Iterate fixes until `lc status` shows all
-outputs `ok`. If the scan turned up existing results elsewhere in the
-project, compare them against the new `results/baseline/` to verify
-the migration preserved behavior. Then `astra validate astra.yaml` and
-present the summary.
+Run `lc run --universe baseline`, then iterate fixes until `lc status`
+shows every output `ok`. If the scan surfaced existing results
+elsewhere in the project, compare them against the new
+`results/baseline/<output_id>/` to confirm the migration preserved
+behavior. Re-validate with `astra validate astra.yaml` and present
+the summary.
 
 ## Hard rules
 
-- Minimal changes — no refactor, rename, reorganize.
-- Never guess — read every script before claiming what it does.
-- Filter decisions aggressively — most hardcoded values are
-  implementation details.
-- Preserve behavior — the baseline universe with default values must
-  reproduce the original behavior exactly.
+- **Minimal changes.** No refactor, no rename, no reorganize.
+- **Never guess.** Read every script before claiming what it does.
+- **Filter decisions aggressively.** Most hardcoded values are
+  implementation details, not decisions.
+- **Preserve behavior.** The baseline universe, with default values,
+  must reproduce the original exactly.
 
 ## Related
 

From 8f180c5050809b413cc141dd03236028bd3f5ead Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 19:59:12 +0200
Subject: [PATCH 092/124] =?UTF-8?q?docs/skills/lc-from-code:=20lede=20verb?=
 =?UTF-8?q?=20"Wrap"=20=E2=86=92=20"Import"?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

"Wrap" had an air of superficiality — implies just building a shell
around the code. "Import" matches what the skill actually does: bring
something into a different system, with real structural changes
(argparse, parameterization, spec generation) even though existing
logic stays intact.
---
 docs/skills/lc-from-code.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/skills/lc-from-code.md b/docs/skills/lc-from-code.md
index 9745e078..f0e2a31f 100644
--- a/docs/skills/lc-from-code.md
+++ b/docs/skills/lc-from-code.md
@@ -1,8 +1,8 @@
 # /lc-from-code
 
-Wrap an existing codebase in ASTRA. The skill scans the project, drafts
-`astra.yaml` against what the code already does, parameterizes its
-hardcoded analytical choices, and runs until outputs materialize.
+Import an existing codebase into ASTRA. The skill scans the project,
+drafts `astra.yaml` against what the code already does, parameterizes
+its hardcoded analytical choices, and runs until outputs materialize.
 Existing logic stays intact; the edits are minimal parameter plumbing.
 
 Source: [`claude/lightcone/skills/lc-from-code/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-from-code/SKILL.md).

From 3c4302eaa3f19348940098a603ad213c916f6431 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:05:47 +0200
Subject: [PATCH 093/124] docs/skills/lc-from-paper: writing pass
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Lede: comma-stuffed three-part run sentence broken into three sentences
that match its actual three-part shape (interactive bookends → loop →
return). Phase arrow chain pulled into em-dash framing for visual
breathing room.

Architecture: "Two pieces:" → "Two pieces." (the list itself
explains); folded the orphaned single-sentence "Parallel fan-out"
paragraph into part 2's closing line.

Per-paper substrate: "Drafted during INTERVIEW. The reproduction
workdir holds two files…" — the fragment-then-sentence shape merged
into a single declarative. Constitution + CLAUDE.md bullets repunctuated
with semicolons inside section enumerations so the nested parentheticals
breathe. "never-block-on-AskUserQuestion-mid-iteration" hyphen-chain
relaxed to "no blocking on AskUserQuestion mid-iteration".

Disciplines: "sizes its work" → "calibrates its work" (releases the
embalmed verb). "vs" → "versus" (formal context, three-word phrase).
"arxiv-LaTeX-first acquisition" → "arXiv LaTeX first" (proper casing,
drops the cargo word). "Open-questions" → "Open questions" (no
compound-adjective hyphen needed). Causal "; AskUserQuestion isn't
available" tightened to "so AskUserQuestion isn't available".
---
 docs/skills/lc-from-paper.md | 99 ++++++++++++++++++------------------
 1 file changed, 49 insertions(+), 50 deletions(-)

diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index 0c13eb89..1c16a8d8 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -1,25 +1,25 @@
 # /lc-from-paper
 
 Reproduce a published scientific paper as a complete ASTRA project. The
-skill is **interview-first** and **ralph-driven**: INTERVIEW + ACQUIRE
-run in the user's main session to set up the per-paper substrate, then
-a ralph loop carries the long middle (ARCHITECT → SPECIFY → LITERATURE
-→ IMPLEMENT → RUN → COMPARE) across many iterations against the same
-constitution, with REVIEW returning to the user's main session after
-the loop closes.
+skill is **interview-first** and **ralph-driven**. INTERVIEW and
+ACQUIRE run in the user's main session to set up the per-paper
+substrate. A ralph loop then carries the long middle —
+ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE —
+across many iterations against the same constitution. REVIEW returns
+to the user's main session once the loop closes.
 
 `/lc-from-paper` is the entry point of the paper-reproduction bundle.
-The sibling skills ([`ralph`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/SKILL.md)
+Sibling skills ([`ralph`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/SKILL.md)
 for the loop, [`paper-extraction`](paper-extraction.md),
 [`narrative`](narrative.md), [`figure-comparison`](figure-comparison.md),
-[`check-sentence-by-sentence`](check-sentence-by-sentence.md)) are
-co-located in the same plugin and invoked by role across the phases.
+[`check-sentence-by-sentence`](check-sentence-by-sentence.md)) live in
+the same plugin and are invoked by role across the phases.
 
 Source: [`claude/lightcone/skills/lc-from-paper/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/lc-from-paper/SKILL.md).
 
 ## Architecture
 
-Two pieces:
+Two pieces.
 
 1. **Interactive bookends in the user's main session.** INTERVIEW and
    REVIEW are conversations with the user. ACQUIRE is two parallel
@@ -30,16 +30,15 @@ Two pieces:
 2. **A ralph loop for the long middle.** Once `constitution.md` is
    drafted (INTERVIEW) and the substrate is on disk (ACQUIRE),
    `/lc-from-paper` launches a ralph loop against the constitution.
-   Each iteration starts a fresh tmux-detached Claude session with the
-   constitution as system prompt, surveys the workdir, picks the next
-   valuable move (typically one phase's worth of work), does it,
-   commits, exits. The fresh-context property is automatic — iteration
-   N+1 reads N's work without bias, which makes per-phase review
-   collapse into "the next iteration is the review."
-
-Parallel fan-out (LITERATURE Haiku quote-finders, SPECIFY per-sub-
-analysis work, IMPLEMENT per-output work) happens *inside* an
-iteration, one level deep from the iteration's main session.
+   Each iteration starts a fresh tmux-detached Claude session with
+   the constitution as system prompt, surveys the workdir, picks the
+   next valuable move (typically one phase's worth of work), does
+   it, commits, and exits. The fresh-context property is automatic:
+   iteration N+1 reads N's work without bias, so per-phase review
+   collapses into "the next iteration is the review." Parallel
+   fan-out (LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis
+   work, IMPLEMENT per-output work) happens *inside* an iteration,
+   one level deep from the iteration's main session.
 
 ## Phases
 
@@ -60,47 +59,47 @@ user's main session; phases 2–7 run as ralph iterations.
 
 ## Per-paper substrate: constitution + CLAUDE.md
 
-Drafted during INTERVIEW. The reproduction workdir holds **two files**
-that iterations walk up to automatically:
+INTERVIEW drafts two files in the reproduction workdir; every
+iteration walks up to them automatically.
 
 - **`constitution.md`** — the ralph loop's driving document. YAML
-  frontmatter `status: active`; sections: Goal (with **fidelity
-  intent** prose — the user's own answer to "when is this good
-  enough"), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv
-  ID, code repo URL), Open dimensions. Sharpens slowly — only when
-  something fundamental shifts.
-- **`CLAUDE.md`** — auto-loading walk-up. Paper identity at the top,
-  Rules (code-as-canonical, never-block-on-`AskUserQuestion`-
-  mid-iteration, arxiv-LaTeX-first, `astra validate --verify-evidence`
-  as the fidelity gate), Rigor accumulator (*Current state* per output
-  + *Open opportunities*, updated by iterations), Disagreements log
-  (running, updated by iterations), Pointers.
+  frontmatter declares `status: active`. Sections: Goal (carrying the
+  **fidelity intent** — the user's own "when is this good enough"),
+  Scope (in/out), Quality bar, Evidence (paper DOI, arXiv ID, code
+  repo URL), Open dimensions. Sharpens slowly, only when something
+  fundamental shifts.
+- **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the
+  top; Rules (code-as-canonical, no blocking on `AskUserQuestion`
+  mid-iteration, arXiv-LaTeX-first, `astra validate
+  --verify-evidence` as the fidelity gate); Rigor accumulator
+  (*Current state* per output plus *Open opportunities*, updated each
+  iteration); Disagreements log (running, also updated each
+  iteration); Pointers.
 
 Pointers, not snapshots.
 
 ## Disciplines
 
-- **Workdir is the state.** File existence + `git log` + `astra
-  validate` answer "what phase am I on" deterministically. No separate
-  state machine.
-- **Code-as-canonical, with disagreements recorded.** Where paper and
-  code disagree on something material, code wins for numerics but the
-  disagreement is preserved as a decision option and noted in
-  CLAUDE.md.
+- **Workdir is the state.** File existence, `git log`, and `astra
+  validate` answer "what phase am I on" deterministically — no
+  separate state machine.
+- **Code-as-canonical, with disagreements recorded.** Where paper
+  and code disagree on something material, code wins for numerics,
+  but the disagreement is preserved as a decision option and noted
+  in CLAUDE.md.
 - **Rigor is a trajectory toward the user's intent.** Each iteration
-  sizes its work from the gap between *Current state* and the Goal's
-  fidelity intent — cheap (one clean review-iteration is enough) vs
-  heavy (two consecutive clean review-iterations required). Review
-  happens sequentially via iteration boundaries; the fresh-context
-  property is automatic.
-- **arxiv-LaTeX-first acquisition.** PDF + Docling is the non-arxiv
-  fallback only.
+  calibrates its work from the gap between *Current state* and the
+  Goal's fidelity intent: cheap (one clean review-iteration is
+  enough) versus heavy (two consecutive clean review-iterations
+  required). Review happens sequentially via iteration boundaries;
+  the fresh-context property is automatic.
+- **arXiv LaTeX first.** PDF + Docling is the non-arXiv fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,
   every input must be real.
-- **Open-questions for autonomous iteration.** Iterations run detached
-  in tmux; `AskUserQuestion` isn't available. Questions go to
+- **Open questions for autonomous iteration.** Iterations run detached
+  in tmux, so `AskUserQuestion` isn't available. Questions go to
   `open-questions.md` with the iteration's best-judgment default
-  applied; the user resolves at REVIEW close-out.
+  applied; the user resolves them at REVIEW close-out.
 
 ## Anti-patterns
 

From fe54e8d7f8fe6bcb7b9c6f9efe81175784ce476d Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:06:54 +0200
Subject: [PATCH 094/124] docs/skills/ralph: writing pass
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Lede: split the long second sentence at the semicolon. "After a cold
survey" pulled into the main clause via comma pair so the agent
performing the action is foregrounded.

"Three modes" intro: "One applies at a time:" → "One mode applies at
a time." (the colon was promising a list that's already promised by
the section heading).

What-goes-in-a-constitution paragraph: "Nothing in it becomes confusing
or unnecessary as the desired state is reached" was passive and abstract.
Replaced with "nothing in it goes stale as the work progresses" —
direct verb, concrete process. "Write what remains true" → "write what
stays true" (lighter; "stays" carries duration the way "remains"
doesn't quite).
---
 docs/skills/ralph.md | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/docs/skills/ralph.md b/docs/skills/ralph.md
index 7c257715..91a911d7 100644
--- a/docs/skills/ralph.md
+++ b/docs/skills/ralph.md
@@ -1,11 +1,11 @@
 # /ralph
 
 Author a constitution — a markdown document describing a desired state
-for autonomous iteration — and run a ralph loop against it. The loop is
-a detached tmux session that respawns a fresh worker per iteration,
-with the constitution injected as the system prompt; iterations
-terminate when one flips the constitution's frontmatter `status:` to
-`closed` after a cold survey.
+for autonomous iteration — and run a ralph loop against it. The loop
+is a detached tmux session that respawns a fresh worker per iteration,
+with the constitution injected as system prompt. Iterations terminate
+when one of them, after a cold survey, flips the constitution's
+frontmatter `status:` to `closed`.
 
 Used by [`/lc-from-paper`](lc-from-paper.md) for the long middle of a
 reproduction (ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN →
@@ -17,7 +17,7 @@ Source: [`claude/lightcone/skills/ralph/SKILL.md`](https://github.com/LightconeR
 
 ## Three modes
 
-One applies at a time:
+One mode applies at a time.
 
 - **Authoring** — drafting a constitution from scratch (Study → Draft
   → Refine → Launch). Reference depth in
@@ -50,10 +50,9 @@ double-starting.
 ## What goes in a constitution
 
 A constitution describes what the system looks like when it's right —
-the desired state. It outlasts any single iteration. Nothing in it
-becomes confusing or unnecessary as the desired state is reached. The
-constitutional principle: write what remains true until the work is
-done.
+the desired state. It outlasts any single iteration; nothing in it
+goes stale as the work progresses. The constitutional principle:
+write what stays true until the work is done.
 
 Common sections — use what fits, skip what doesn't:
 

From 3c51609499dca4184ecfd23146af15a65460b8ea Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:08:06 +0200
Subject: [PATCH 095/124] docs/skills/paper-extraction: writing pass
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Lede: "stub astra.yaml treating the paper" → "stub astra.yaml that
treats the paper" (relative clause is grammatically cleaner than the
dangling participle).

"The deterministic structural work is done by …; the agent runs it" →
"The agent runs … for the deterministic structural pass" — passive
flipped to active, agent foregrounded.

Workflow phase openers: em-dash → bolded title + period, consistent
with lc-from-code Phases. Oxford comma added to the "equations,
ligatures, captions, and tables" list. "non-arxiv" → "non-arXiv"
(proper casing).
---
 docs/skills/paper-extraction.md | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/docs/skills/paper-extraction.md b/docs/skills/paper-extraction.md
index 4c90d8c7..09f2deb7 100644
--- a/docs/skills/paper-extraction.md
+++ b/docs/skills/paper-extraction.md
@@ -4,7 +4,7 @@ Turn an arXiv ID or DOI into a standardized, indexed `work/reference/`
 directory: substrate (arXiv LaTeX source preferred, PDF + Docling
 fallback), copied figures, per-table `.tex` files, a section outline
 with line numbers, deduplicated citation keys with resolved DOIs, the
-abstract, and a stub `astra.yaml` treating the paper as an ASTRA
+abstract, and a stub `astra.yaml` that treats the paper as an ASTRA
 artifact.
 
 Source: [`claude/lightcone/skills/paper-extraction/SKILL.md`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/paper-extraction/SKILL.md).
@@ -18,9 +18,9 @@ Argument hint: `<arxiv-id-or-doi>` — invoked as `/paper-extraction
 Read, Write, Edit, Bash, Grep, Glob, WebFetch, WebSearch
 ```
 
-The deterministic structural work is done by
-`scripts/extract-paper-substrate.py`; the agent runs it, then walks
-warnings and (optionally) fills `findings:`.
+The agent runs `scripts/extract-paper-substrate.py` for the
+deterministic structural pass, then walks any warnings and (optionally)
+fills `findings:`.
 
 ## Outputs
 
@@ -64,24 +64,23 @@ text.
 
 ## Workflow
 
-1. **Survey** — `ls work/reference/`; read `index.json` if present.
-   Skip work already done.
-2. **Acquire substrate** — Path A (arXiv → LaTeX source) or Path B
+1. **Survey.** `ls work/reference/`; read `index.json` if present. Skip
+   any work already done.
+2. **Acquire substrate.** Path A (arXiv → LaTeX source) or Path B
    (journal-only DOI → PDF + Docling).
-3. **Run the extraction script** — `extract-paper-substrate.py` does
-   the deterministic structural pass: figure copying, per-table
-   `.tex` extraction, outline, citation resolution, `astra.yaml`
-   stub.
-4. **Review warnings and fix structural gaps** — unresolved figures,
+3. **Run the extraction script.** `extract-paper-substrate.py` does
+   the deterministic structural pass: figure copying, per-table `.tex`
+   extraction, outline, citation resolution, `astra.yaml` stub.
+4. **Review warnings and fix structural gaps.** Unresolved figures,
    missing captions, unresolved citation DOIs, Path B caveats.
-5. **(Optional) Walk the paper for findings** — append the paper's
+5. **(Optional) Walk the paper for findings.** Append the paper's
    central numerical claims to `astra.yaml`'s `findings:` map with
    verbatim `quote.exact` evidence. Skip unless a downstream consumer
    needs it.
 
 Path A is preferred whenever the paper is on arXiv — equations,
-ligatures, captions, tables come through clean. Path B is for
-non-arxiv only.
+ligatures, captions, and tables come through clean. Path B is for
+non-arXiv only.
 
 ## Citation DOI resolution
 

From c9842d98b3f3f78b858b32d0ed0bc2a57c08e6a9 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:08:38 +0200
Subject: [PATCH 096/124] docs/skills/narrative: writing pass (one-line)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Flipped "/narrative is invoked by /lc-from-paper … and is also
user-invokable directly" — passive-then-passive — to "/lc-from-paper
invokes /narrative …; users can invoke it directly". Active agents in
both clauses; semicolon does the work the comma was doing.

Otherwise this file is already tight — leaving it alone.
---
 docs/skills/narrative.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/skills/narrative.md b/docs/skills/narrative.md
index dcb6246b..5a13e69f 100644
--- a/docs/skills/narrative.md
+++ b/docs/skills/narrative.md
@@ -23,8 +23,8 @@ If the second source isn't obvious, the skill asks. Hybrid is allowed
 (reproduction with co-drafted extensions; retrofit with co-drafted
 gap-filling).
 
-`/narrative` is invoked by `/lc-from-paper` during SPECIFY (paper-
-reproduction mode), and is also user-invokable directly in any mode.
+`/lc-from-paper` invokes `/narrative` during SPECIFY (paper-reproduction
+mode); users can invoke it directly in any mode.
 
 ## Allowed surfaces
 

From 0d9532181415c2d115030de0c0e7e507fb85e0b4 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:09:16 +0200
Subject: [PATCH 097/124] docs/skills/figure-comparison: writing pass
 (one-line)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

"portable to email, shared drives, or Slack without breaking links"
read as the HTML being portable *to* those venues — slight agency
wobble. Made the verbs concrete: "paste it into email, drop it on a
shared drive, or send it through Slack". Reader sees the actual moves.

Otherwise tight — leaving the rest.
---
 docs/skills/figure-comparison.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/skills/figure-comparison.md b/docs/skills/figure-comparison.md
index a54254af..3b2ad9df 100644
--- a/docs/skills/figure-comparison.md
+++ b/docs/skills/figure-comparison.md
@@ -57,8 +57,8 @@ and reproduced artifacts on the right. Helper scripts and intermediate
 manifests also live under `.lightcone/` so they don't pollute the
 baseline results.
 
-The HTML embeds figure images as base64 — portable to email, shared
-drives, or Slack without breaking links.
+The HTML embeds figure images as base64 — paste it into email, drop
+it on a shared drive, or send it through Slack without breaking links.
 
 ## When to invoke
 

From c8afb13cc154d1d9996f088b804aee701bc448f0 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:10:11 +0200
Subject: [PATCH 098/124] docs/skills/check-sentence-by-sentence: writing pass
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Setup #1: em-dash → period, to match #2 within the same section.

Section enumeration: "Each leaf (sub)section becomes one parallel
sub-agent dispatch" — "parallel sub-agent dispatch" was a noun pile.
Split into two sentences: each leaf becomes one sub-agent, parallelism
is in how they're issued. "Spawn them in a single message" → "Issue them
in a single tool-use block" (more precise for what's actually meant).

Hard rules: unified the four bullets to bold-period rhythm. "Quote
verbatim, trimmed to one sentence" → "Quote verbatim. Trim to one
sentence" (active verb). "Read only the assigned line range in each
sub-agent" was a hanging fragment — gave it a closing sentence,
"Each sub-agent stays inside its window".
---
 docs/skills/check-sentence-by-sentence.md | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/docs/skills/check-sentence-by-sentence.md b/docs/skills/check-sentence-by-sentence.md
index 9117faaf..4df74af0 100644
--- a/docs/skills/check-sentence-by-sentence.md
+++ b/docs/skills/check-sentence-by-sentence.md
@@ -23,7 +23,7 @@ execution.
 
 ## Setup
 
-1. **Confirm project root** — `astra.yaml` in cwd, or ask the user to
+1. **Confirm project root.** `astra.yaml` in cwd, or ask the user to
    `cd` to the ASTRA project.
 2. **Confirm paper source.** Resolve in order:
    - A `.tex` argument → `tex` mode.
@@ -54,10 +54,10 @@ Audit-relevant sections: methodology, results, discussion,
 appendices. Skip abstract, introduction, acknowledgements,
 references, author lists.
 
-Each leaf (sub)section becomes one parallel sub-agent dispatch — a
-section with subsections spawns one sub-agent per subsection plus
-optionally one for any pre-subsection prose span. Spawn them in a
-single message so they run in parallel.
+Each leaf (sub)section becomes one sub-agent. A section with
+subsections spawns one sub-agent per subsection, plus optionally one
+more for any pre-subsection prose span. Issue them in a single
+tool-use block so they run in parallel.
 
 ## Per-sub-agent output
 
@@ -95,11 +95,12 @@ appendices, with each entry's `quote`, `location`, and `note`.
 - **No execution.** Numerical results can be located at the line that
   computes them, but agreement isn't verifiable here. Use a note like
   "value computed at runtime".
-- **Quote verbatim**, trimmed to one sentence. Long sentences may keep
+- **Quote verbatim.** Trim to one sentence; long sentences may keep
   just the claim-bearing clause.
-- **`file:line` is specific** — the function call, parameter
-  assignment, or computed value, not just a file.
-- **Read only the assigned line range** in each sub-agent.
+- **`file:line` is specific.** The function call, parameter assignment,
+  or computed value — not just a file.
+- **Read only the assigned line range.** Each sub-agent stays inside
+  its window.
 
 ## When to invoke
 

From 6fd95dd978c57f3952a64e7b979690e643927e90 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:10:41 +0200
Subject: [PATCH 099/124] docs/skills/lc-feedback: writing pass (hard rules
 only)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Hard rules unified to bold-period rhythm to match the pattern used in
other skill docs (lc-from-code, paper-extraction, check-sentence-by-
sentence). The workflow steps stay as-is — their inconsistency
reflects genuinely different shapes (silent command vs nested list vs
code block), not sloppy editing.
---
 docs/skills/lc-feedback.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/skills/lc-feedback.md b/docs/skills/lc-feedback.md
index f1dab913..ab79f8c1 100644
--- a/docs/skills/lc-feedback.md
+++ b/docs/skills/lc-feedback.md
@@ -69,11 +69,11 @@ Sections that don't apply are dropped.
 
 ## Hard rules
 
-- Be fast — minimize back-and-forth, one confirmation then file.
-- Read-only on the project.
-- Trim aggressively — only the relevant portion of errors.
-- No sensitive data — strip absolute paths, credentials, tokens.
-- Don't editorialize — report what happened.
+- **Be fast.** Minimize back-and-forth: one confirmation, then file.
+- **Read-only on the project.**
+- **Trim aggressively.** Only the relevant portion of errors.
+- **No sensitive data.** Strip absolute paths, credentials, tokens.
+- **Don't editorialize.** Report what happened.
 
 ## Notes for the maintainer who's looking
 

From 03719fd3960439348531b2f42198b7d97b2ee8bd Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:11:52 +0200
Subject: [PATCH 100/124] docs/skills/authoring: writing pass + accuracy fixes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Accuracy:
- "Use Task with subagent_type" → "Use Agent with subagent_type" (the
  code block right below already used Agent correctly; the prose was
  stale, same drift caught earlier in lc-new.md).
- "Action prompts in bold sentences (`> ...`)" → "Action prompts in
  blockquotes" (the example uses `>` markdown, which is blockquote,
  not bold; checked actual SKILL.md files to confirm).

Duplication: the Python heredoc in "Installing changes" was an
incomplete copy of the canonical one in docs/cli/update.md — only
syncing `skills/` rather than all five plugin subdirs. Replaced with
a pointer to update.md to keep one source of truth.

Writing: "never emojis except inside the agent's own branded output"
flipped to "Skip emoji elsewhere — they belong only inside the agent's
own branded banner output" (positive frame, narrower scope on what
counts as "branded").
---
 docs/skills/authoring.md | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/docs/skills/authoring.md b/docs/skills/authoring.md
index e7d32e57..c97003b3 100644
--- a/docs/skills/authoring.md
+++ b/docs/skills/authoring.md
@@ -40,9 +40,9 @@ argument-hint: "[OPTIONAL ARG] [--flag VALUE]"
 
 - `##` for phase headings; lead with a "Stage banner" line that the
   skill prints to the chat.
-- `✓ / ○ / ✗` for status; never emojis except inside the agent's own
-  branded output.
-- Action prompts in bold sentences (`> "What are you trying to learn?"`).
+- `✓ / ○ / ✗` for status. Skip emoji elsewhere — they belong only
+  inside the agent's own branded banner output.
+- Action prompts in blockquotes (`> "What are you trying to learn?"`).
 - A `## Restrictions` (or `## Hard rules`) section at the end listing
   invariants Claude must not break.
 
@@ -62,7 +62,7 @@ call when a specific section is load-bearing for that skill's work.
 
 ## Spawning subagents
 
-Use `Task` with `subagent_type` to delegate work. The
+Use `Agent` with `subagent_type` to delegate work. The
 `lc-extractor` subagent in `agents/` is the canonical example:
 
 ```python
@@ -92,19 +92,7 @@ from lightcone.eval.cli import run_cmd
 
 ## Installing changes into an existing project
 
-`lc init` copies the plugin once. To pull updated skills into an
-existing project after editing them:
-
-```bash
-python - <<'PY'
-import shutil
-from pathlib import Path
-from lightcone.cli.plugin import get_plugin_source_dir
-src = get_plugin_source_dir()
-dst = Path(".claude/skills")
-if dst.exists(): shutil.rmtree(dst)
-shutil.copytree(src / "skills", dst)
-PY
-```
-
-(See [`lc update`](../cli/update.md) for the longer story.)
+`lc init` copies the plugin once and refuses to run a second time on
+the same directory. See [`lc update`](../cli/update.md) for the Python
+heredoc that resyncs all the plugin subdirs (`skills`, `agents`,
+`scripts`, `guides`, `templates`) into an existing project.

From 599ccb3a18ab47cf13cd8201c47032c3fced3ce9 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 20:21:08 +0200
Subject: [PATCH 101/124] lc-from-paper: reshape rigor framing + surface
 CLAUDE.md as state
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cail's review on docs/skills/lc-from-paper.md after the writing pass
caught five things, all of which want to land in both the user doc and
the SKILL itself.

Accuracy / phrasing:
- "constitution as system prompt" → "constitution loaded into [its]
  system prompt" — the ralph runner uses --append-system-prompt, so
  the constitution becomes part of the system prompt, not the whole.
- "fresh-context property is automatic" was throat-clearing — it
  announced a property without doing the work the property names.
  Dropped; the mechanism (iteration N+1 reads N's work cold) speaks.
- "walks up to them automatically" was unintuitive jargon (echoing the
  CLAUDE.md walk-up). Now "picks them up on launch".

Substantive:
- Added "CLAUDE.md is a state-expresser too" as a sibling discipline
  to "Workdir is the state". The workdir holds ground-truth files;
  CLAUDE.md holds running pointers (Rigor *Current state*, Disagreements
  log, Open opportunities, paper identity). Each iteration keeps those
  current at its Update step so the next cold survey reads them as
  fact, not stale.
- Reshaped the Rigor discipline: led with the honest framing that
  fidelity intent is partly aesthetic ("how good does this need to be")
  and partly pragmatic ("what's feasible given compute, tokens, time,
  attention available"). The cheap/heavy mechanism stays but is
  folded into one sentence rather than dominating the section.
- INTERVIEW §3 (Fidelity intent) reshaped to match. Lead-in now
  explicitly names the meta-conversation: "what does the user want
  out of this first stretch, given what's spendable on it?". First
  pivot question pulls compute/time into the same breath as fidelity
  shape. Examples grew time/session bounds where natural.
---
 .../lightcone/skills/lc-from-paper/SKILL.md   | 16 ++++-----
 .../lc-from-paper/references/interview.md     | 14 ++++----
 docs/skills/lc-from-paper.md                  | 36 +++++++++++--------
 3 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 7dab2cfc..61551744 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -20,9 +20,9 @@ The architecture is two-piece:
 
 1. **Interactive bookends in the user's main session.** INTERVIEW and REVIEW are conversations with the user. ACQUIRE is two parallel sub-skill invocations (`/paper-extraction` and `/lc-from-code` in scan-only mode) that produce the on-disk substrate everything downstream consults.
 
-2. **A ralph loop for the long middle.** Once the per-paper `constitution.md` is drafted (INTERVIEW) and the substrate is on disk (ACQUIRE), you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution as system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, exits. The fresh-context property is automatic — iteration N+1 reads N's work without bias, which makes per-phase review collapse into "the next iteration is the review."
+2. **A ralph loop for the long middle.** Once the per-paper `constitution.md` is drafted (INTERVIEW) and the substrate is on disk (ACQUIRE), you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution loaded into its system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, and exits. Iteration N+1 reads N's work cold, so per-phase review collapses into "the next iteration is the review."
 
-The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The constitution describes the goal (what "done" looks like, the user's fidelity intent, scope, quality bar); CLAUDE.md carries the standing rules plus running accumulators (rigor state per output, paper-vs-code disagreements log). Every iteration walks up to both.
+The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The constitution describes the goal (what "done" looks like, the user's fidelity intent, scope, quality bar); CLAUDE.md carries the standing rules plus running accumulators (rigor state per output, paper-vs-code disagreements log). Every iteration picks up both on launch.
 
 ## Setup: git-tracked workdir
 
@@ -84,7 +84,7 @@ After INTERVIEW + ACQUIRE land, hand the rest of the reproduction off to a ralph
 
 (Or `--backend codex`, or pass `-- --model <id>` for a specific model. See `/ralph`'s **Launching** section for the full surface.)
 
-The launcher detaches a tmux session named `ralph-<workdir>-constitution`. The user attaches with `tmux attach -t <session>`. Iterations start firing immediately; each runs in a fresh Claude (or Codex) session with `constitution.md` injected as the system prompt and the workdir's `CLAUDE.md` auto-loading.
+The launcher detaches a tmux session named `ralph-<workdir>-constitution`. The user attaches with `tmux attach -t <session>`. Iterations start firing immediately; each runs in a fresh Claude (or Codex) session with `constitution.md` loaded into the system prompt and the workdir's `CLAUDE.md` auto-loading.
 
 The loop runs until an iteration flips `constitution.md`'s frontmatter `status:` to `closed` — typically after COMPARE returns `pass` (or `partial` with the un-acted opportunities logged) and the iteration that runs after that survey finds nothing left to do.
 
@@ -132,15 +132,13 @@ REVIEW runs in your main session because `/figure-comparison` and `/check-senten
 
 **Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each iteration's first move is to survey the workdir on entry against the table above.
 
-**Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every iteration that touches a sub-analysis reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Paper-vs-code disagreements* section so it's visible to every iteration and to the user at REVIEW. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
-
-**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose — their own words for what "good enough" looks like (e.g. *"just checking the analysis is tractable"*, *"Figure 3 must be right; the rest can stay rough"*, *"every primary and secondary target lining up within stated tolerance"*).
+**CLAUDE.md is a state-expresser too.** Beyond the workdir's ground-truth files, CLAUDE.md carries running pointers — Rigor *Current state* per output, *Paper-vs-code disagreements*, *Open opportunities*, paper identity. Keep those pointers current at the **Update** step of each iteration's loop so the next cold survey reads them as fact, not stale.
 
-Each iteration translates the fidelity intent into a tactical sizing decision when working on an artifact-producing phase (ARCHITECT, SPECIFY, LITERATURE, IMPLEMENT). Derive how much review the artifact needs from the gap between where it currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the Goal's intent says the user cares about. *Cheap:* one clean review-iteration is enough — write, let the next iter read it fresh and review, accept after that single clean pass (with fixes applied in between if needed). *Heavy:* two consecutive clean review-iterations required — the review/fix cycle runs until two fresh-eyes passes both find nothing to fix. Either way, update CLAUDE.md's Rigor *Current state* so the trajectory stays honest across iterations.
+**Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every iteration that touches a sub-analysis reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Paper-vs-code disagreements* section so it's visible to every iteration and to the user at REVIEW. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
 
-Review always happens via iteration boundaries — the fresh-context property is automatic. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
+**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose. Intent is partly aesthetic ("how good does this need to be?") and partly pragmatic ("what's feasible given the compute, tokens, time, and attention available?"). Both dimensions belong in the prose — *"just checking the analysis is tractable"*, *"Figure 3 must be right; the rest can stay rough"*, *"every primary and secondary target lining up within stated tolerance, two sessions max"*.
 
-The *sketch / baseline / tightened / canonical* and *cheap / heavy* vocabularies are the iteration's internal scaffolding for sizing its work. The user's surface is the intent prose; the scaffolding only shows through when they ask how an iteration sized itself.
+Each iteration sizes its next move against that intent. Review happens via iteration boundaries: iteration N writes an artifact, N+1 reads it cold and either signs off or applies fixes. How many review passes an artifact gets depends on the gap between where it currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the intent says the user cares about. Cheap work gets one clean pass; canonical work waits for two consecutive clean passes; everything in between sizes by judgment. Update CLAUDE.md's Rigor *Current state* every iteration so the trajectory stays honest. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
 
 **arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv only.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index 98d0688f..3d1795a2 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -46,23 +46,23 @@ These answers go into `constitution.md`'s **Scope** section (in / out) and infor
 
 ### 3. Fidelity intent
 
-A reproduction can land anywhere from a quick "does this even run" sanity check to a full match across every primary and secondary target. The user owns where they want this one to land. The job here is to **elicit prose intent** — their own words for what "good enough" looks like, captured into `constitution.md`'s Goal section.
+A reproduction can land anywhere from a quick "does this even run" sanity check to a full match across every primary and secondary target. The user owns where they want this one to land — but where it *can* land in this stretch depends on the compute, tokens, time, and attention available. The honest meta-conversation is the point: what does the user want out of this first stretch, given what's spendable on it? Don't ask the abstract "what would you like to get out of this" — too literal, lands as a wish list. Pivot on what's actually being weighed.
 
-Reach for whichever pivot fits the conversation; you usually only need one or two:
+The job is to **elicit prose intent** — the user's own words for what "good enough" looks like for this stretch — and capture it into `constitution.md`'s Goal section. Reach for whichever pivot fits; you usually only need one or two:
 
-- *"What's the moment you'd call this reproduction useful — when any number comes out at all, when a specific figure matches in shape, when the headline number matches within stated uncertainty, or when every primary and secondary target lines up?"*
+- *"What's the right shape for this stretch — a quick check that the analysis is tractable, getting one specific figure right, or a full match across primary targets? How much compute and time do you have to spend on it?"*
 - *"Is there a specific result you care about more than the rest, where you'd want full fidelity even if the others stay rough?"*
-- *"If this took several sessions of iteration to reach high fidelity everywhere, is that the right investment, or would you rather get a working version in a couple of sessions and decide later whether to push further?"*
 - *"Are you trying to verify the paper, build on it, or critique it? That shifts where the fidelity bar wants to sit."*
+- *"If this took several sessions to reach high fidelity everywhere, is that the right investment? Or would a working version in a couple of sessions be enough to decide where to push further?"*
 
 Record the answer verbatim or in close paraphrase under **Fidelity intent** in `constitution.md`'s Goal section. Concrete examples of what good prose intent looks like:
 
 - *"Just checking if the analysis is tractable — quick sanity that some headline number comes out close."*
 - *"I care about Figure 3 being right. The rest can stay rough."*
-- *"Full fidelity on the BAO fit specifically; the rest can stay rough."*
-- *"Every primary and secondary target lining up within stated tolerance, every paper-vs-code conflict adjudicated."*
+- *"Full fidelity on the BAO fit specifically; the rest can stay rough. One session of compute."*
+- *"Every primary and secondary target lining up within stated tolerance, every paper-vs-code conflict adjudicated. Two sessions max."*
 
-Each iteration reads this when deciding cheap vs heavy on the next move; COMPARE grades opportunities against it. If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
+Each iteration reads this when sizing its next move, and COMPARE grades opportunities against it. If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
 
 ### 4. Paper-specific conventions or warnings
 
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index 1c16a8d8..9bae81eb 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -31,14 +31,14 @@ Two pieces.
    drafted (INTERVIEW) and the substrate is on disk (ACQUIRE),
    `/lc-from-paper` launches a ralph loop against the constitution.
    Each iteration starts a fresh tmux-detached Claude session with
-   the constitution as system prompt, surveys the workdir, picks the
-   next valuable move (typically one phase's worth of work), does
-   it, commits, and exits. The fresh-context property is automatic:
-   iteration N+1 reads N's work without bias, so per-phase review
-   collapses into "the next iteration is the review." Parallel
-   fan-out (LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis
-   work, IMPLEMENT per-output work) happens *inside* an iteration,
-   one level deep from the iteration's main session.
+   the constitution loaded into its system prompt, surveys the
+   workdir, picks the next valuable move (typically one phase's
+   worth of work), does it, commits, and exits. Iteration N+1 reads
+   N's work cold, so per-phase review collapses into "the next
+   iteration is the review." Parallel fan-out (LITERATURE Haiku
+   quote-finders, SPECIFY per-sub-analysis work, IMPLEMENT per-output
+   work) happens *inside* an iteration, one level deep from the
+   iteration's main session.
 
 ## Phases
 
@@ -60,7 +60,7 @@ user's main session; phases 2–7 run as ralph iterations.
 ## Per-paper substrate: constitution + CLAUDE.md
 
 INTERVIEW drafts two files in the reproduction workdir; every
-iteration walks up to them automatically.
+iteration picks them up on launch.
 
 - **`constitution.md`** — the ralph loop's driving document. YAML
   frontmatter declares `status: active`. Sections: Goal (carrying the
@@ -83,16 +83,22 @@ Pointers, not snapshots.
 - **Workdir is the state.** File existence, `git log`, and `astra
   validate` answer "what phase am I on" deterministically — no
   separate state machine.
+- **CLAUDE.md is a state-expresser too.** Beyond the workdir's
+  ground-truth files, CLAUDE.md carries running pointers — Rigor
+  accumulator, Disagreements log, paper identity. Each iteration
+  keeps those pointers current so the next cold survey reads them
+  as fact.
 - **Code-as-canonical, with disagreements recorded.** Where paper
   and code disagree on something material, code wins for numerics,
   but the disagreement is preserved as a decision option and noted
   in CLAUDE.md.
-- **Rigor is a trajectory toward the user's intent.** Each iteration
-  calibrates its work from the gap between *Current state* and the
-  Goal's fidelity intent: cheap (one clean review-iteration is
-  enough) versus heavy (two consecutive clean review-iterations
-  required). Review happens sequentially via iteration boundaries;
-  the fresh-context property is automatic.
+- **Rigor is a trajectory toward the user's intent.** Fidelity
+  intent is partly aesthetic ("how good does this need to be?") and
+  partly pragmatic ("what's feasible given the compute, tokens, and
+  time available?"). The honest meta-conversation lives in INTERVIEW;
+  each iteration then sizes its work from the gap between *Current
+  state* and that intent. Review happens sequentially via iteration
+  boundaries.
 - **arXiv LaTeX first.** PDF + Docling is the non-arXiv fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,
   every input must be real.

From 7b9558a78aec94aea2dc9ff9728a908e5fb6f1bb Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 21:01:03 +0200
Subject: [PATCH 102/124] lc-from-paper: collapse review-and-fix into single
 iteration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cail's dogfood note from the rigor reshape: drop N+1 vs N+2 entirely.
The single review-and-fix pass is the protocol; if rigor needs to push
further than that delivers, a future loop closes it via Open
opportunities.

The change cascades through every per-phase reference:

- **architect.md, specify.md, implement.md, literature.md** — collapse
  write/review/fix/re-review into write + review-and-fix. Iteration N
  writes; iteration N+1 reads cold, reviews silently, applies fixes
  inline, commits, exits. One pass. No intermediate review-N.md /
  round-N.md files. No cheap/heavy split. Termination is just "the
  review-and-fix commit landed."

- **review.md** — propagation step at close-out is now reconciliation
  rather than propagation. COMPARE iterations log un-acted-on Open
  opportunities directly into CLAUDE.md as they run, so REVIEW just
  cross-checks the list against comparison-report.yaml's opportunities
  block.

Companion split surfaced by the same dogfood note:

- **Rigor *Current state*** lives in constitution.md (task-bound;
  archivable once the reproduction closes).
- **Open opportunities** lives in CLAUDE.md (durable; future-Cail
  returning to this directory inherits it via the auto-loaded walk-up).

This is consistent with the principle Cail named: "the rigor question
is more for the constitution... CLAUDE.md should always be useful even
as the user continues to do work afterwards."

Templates updated to match. SKILL.md's Self-review and Rigor sections
rewritten. Doc surface (docs/skills/lc-from-paper.md) carries the new
shape. "session" cleanup from the earlier reshape preserved.
---
 .../lightcone/skills/lc-from-paper/SKILL.md   |  22 ++--
 .../lc-from-paper/references/architect.md     |  39 ++----
 .../lc-from-paper/references/compare.md       |   8 +-
 .../lc-from-paper/references/implement.md     | 120 +++++-------------
 .../lc-from-paper/references/interview.md     |  22 ++--
 .../lc-from-paper/references/literature.md    |  47 ++-----
 .../skills/lc-from-paper/references/review.md |  18 ++-
 .../lc-from-paper/references/specify.md       | 112 +++++-----------
 .../skills/lc-from-paper/templates/CLAUDE.md  |  20 +--
 .../lc-from-paper/templates/constitution.md   |  12 +-
 docs/skills/lc-from-paper.md                  |  48 +++----
 11 files changed, 166 insertions(+), 302 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 61551744..053af1bb 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -22,7 +22,7 @@ The architecture is two-piece:
 
 2. **A ralph loop for the long middle.** Once the per-paper `constitution.md` is drafted (INTERVIEW) and the substrate is on disk (ACQUIRE), you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution loaded into its system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, and exits. Iteration N+1 reads N's work cold, so per-phase review collapses into "the next iteration is the review."
 
-The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The constitution describes the goal (what "done" looks like, the user's fidelity intent, scope, quality bar); CLAUDE.md carries the standing rules plus running accumulators (rigor state per output, paper-vs-code disagreements log). Every iteration picks up both on launch.
+The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The split is intentional: the constitution is *task-bound* (what this reproduction is trying to achieve and how it's progressing — Goal, fidelity intent, scope, quality bar, plus the running rigor accumulators) and can be archived once the reproduction lands. CLAUDE.md is *durable* (rules, paper-vs-code disagreements, pointers to substrate) — it stays useful when the user comes back to do follow-on work in this directory. Every iteration picks up both on launch.
 
 ## Setup: git-tracked workdir
 
@@ -44,7 +44,7 @@ Nine phases (zero-indexed). INTERVIEW and ACQUIRE run before the loop, in the us
 | 7 | COMPARE | ralph iteration | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
 | 8 | REVIEW | user's main session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
 
-COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the constitution's fidelity intent. A subsequent iteration decides whether to spend another IMPLEMENT round (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap as an open opportunity in CLAUDE.md's Rigor section. Once the COMPARE → IMPLEMENT loop terminates (verdict `pass`, or `partial` with the un-acted opportunities logged), a subsequent cold-survey iteration finds nothing left to do and flips the constitution's `status:` to `closed`. The loop terminates; REVIEW runs in the user's main session.
+COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the constitution's fidelity intent. A subsequent iteration decides whether to spend another IMPLEMENT round (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap into CLAUDE.md's Open opportunities. Once the COMPARE → IMPLEMENT loop terminates (verdict `pass`, or `partial` with the un-acted opportunities logged), a subsequent cold-survey iteration finds nothing left to do and flips the constitution's `status:` to `closed`. The loop terminates; REVIEW runs in the user's main session.
 
 ## The two pre-loop bookends
 
@@ -56,8 +56,8 @@ The interview must collect: (1) the paper (DOI / arXiv ID / code repo URL / prio
 
 These get drafted into **two files** in the reproduction workdir:
 
-- **`constitution.md`** — the ralph loop's driving document. Goal, Fidelity intent, Scope, Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Open dimensions. Starts with YAML frontmatter `status: active` so the ralph launcher accepts it. Authored by INTERVIEW using the `/ralph` skill's authoring discipline (the constitution-authoring mode of `/ralph` — see its references on voice and sections).
-- **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the top, Rules (universal across reproductions; leave the template's defaults), Rigor accumulator (starts empty), Disagreements log (starts empty), Pointers (to `constitution.md`, `work/reference/`, etc.).
+- **`constitution.md`** — the ralph loop's driving document. Goal, Fidelity intent, Scope, Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Rigor *Current state* per output (starts empty), Open dimensions. Starts with YAML frontmatter `status: active` so the ralph launcher accepts it. Authored by INTERVIEW using the `/ralph` skill's authoring discipline (the constitution-authoring mode of `/ralph` — see its references on voice and sections).
+- **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the top, Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty), Open opportunities (starts empty), Pointers (to `constitution.md`, `work/reference/`, etc.).
 
 Templates ship in [`templates/constitution.md`](templates/constitution.md) and [`templates/CLAUDE.md`](templates/CLAUDE.md). Show the user both drafts, take corrections, refine, save.
 
@@ -94,13 +94,13 @@ Tell the user explicitly: "Launching the ralph loop in tmux session `<name>`. At
 
 Iterations follow the `/ralph` skill's Loop protocol — Survey → Work → Update → Exit. The per-paper specifics layered on top:
 
-- **Survey starts with the constitution + CLAUDE.md, then the workdir.** Read the constitution to remember the goal and the fidelity intent. Read CLAUDE.md's Rigor accumulator to know where each output currently sits relative to the quality bar. Then survey the workdir against the **Workdir-as-state** table below to identify the next phase that needs work.
+- **Survey starts with the constitution + CLAUDE.md, then the workdir.** Read the constitution end-to-end — Goal, Fidelity intent, Scope, Quality bar, and the Rigor *Current state* table for where each output currently sits relative to the quality bar. Skim CLAUDE.md for rules, paper-vs-code disagreements, and pointers. Then survey the workdir against the **Workdir-as-state** table below to identify the next phase that needs work.
 - **One phase per iteration is the typical shape.** Don't try to do ARCHITECT *and* SPECIFY in one iteration; the fresh-context property of the next iteration is what makes review work, and conflating phases collapses the seam. (Exceptions: small targeted fixes after COMPARE may touch multiple phases in one iteration if they're tightly coupled.)
 - **Phase reference is your working spec for the iteration.** Whichever phase is next, read its `references/<phase>.md` on entry. That file carries the discipline for that phase's work (what to produce, code-as-canonical, rigor adjustment, etc.).
-- **Self-review is the next iteration.** Where ARCHITECT/SPECIFY/LITERATURE/IMPLEMENT used to spawn fresh-context reviewer sub-agents per round (broken — sub-agents can't spawn sub-agents), the discipline now collapses into iteration boundaries: iteration N writes the artifact, iteration N+1 reads it fresh and reviews, iteration N+2 applies fixes if needed, until the review reaches its termination — one clean review-iteration for cheap, two consecutive clean for heavy, or a 5-iteration cap. Each iteration is fresh by construction; the no-bias property is free.
+- **Self-review is the next iteration.** Where ARCHITECT/SPECIFY/LITERATURE/IMPLEMENT used to spawn fresh-context reviewer sub-agents per round (broken — sub-agents can't spawn sub-agents), the discipline now collapses into a single review-and-fix iteration: iteration N writes the artifact, iteration N+1 reads it cold, reviews silently, applies any fixes inline, commits, and exits. One pass. No intermediate review file, no second clean check. If the user wants more rigor than a single review-and-fix pass delivers, that becomes a future loop (logged as an Open opportunity in CLAUDE.md).
 - **Parallel fan-out lives inside an iteration.** LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis work, IMPLEMENT per-output work — these fan out as one-level-deep `Agent(...)` spawns inside the iteration's main session. Sub-agents can't spawn sub-agents, but an iteration *is* the main session, so it can spawn freely.
 - **`AskUserQuestion` is not available inside an iteration.** Each iteration runs in a detached tmux session; the user isn't reachable interactively. Iterations append questions to `open-questions.md` with their best-judgment default applied, and the user resolves them at REVIEW close-out (back in their main session).
-- **Update the accumulators in CLAUDE.md** before exit: Rigor *Current state* per output that the iteration changed; *Paper-vs-code disagreements* for any material conflict the iteration surfaced; *Open opportunities* for COMPARE-surfaced gaps.
+- **Update the accumulators** before exit: in `constitution.md`, the Rigor *Current state* per output that the iteration changed; in `CLAUDE.md`, the Paper-vs-code disagreements log for any material conflict the iteration surfaced and Open opportunities for any COMPARE-surfaced gap the iteration didn't act on.
 - **Sharpen the constitution body itself** if something fundamental shifted — the user's fidelity intent reframed, a sub-analysis decomposition rethought, a quality-bar item that's now more concrete. Don't accrete amendment sections; rewrite the affected prose.
 
 ## Workdir-as-state
@@ -132,15 +132,15 @@ REVIEW runs in your main session because `/figure-comparison` and `/check-senten
 
 **Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each iteration's first move is to survey the workdir on entry against the table above.
 
-**CLAUDE.md is a state-expresser too.** Beyond the workdir's ground-truth files, CLAUDE.md carries running pointers — Rigor *Current state* per output, *Paper-vs-code disagreements*, *Open opportunities*, paper identity. Keep those pointers current at the **Update** step of each iteration's loop so the next cold survey reads them as fact, not stale.
+**Constitution is task-bound; CLAUDE.md is durable.** The constitution describes what *this reproduction* is trying to achieve and how it's progressing — Goal, Fidelity intent, Scope, Quality bar, Evidence, Rigor *Current state*, Open dimensions. Once the reproduction lands, the constitution can be archived. CLAUDE.md carries what stays useful past the reproduction — paper identity, rules, paper-vs-code disagreements, open opportunities for future tightening, pointers to substrate — so a user returning to this directory for follow-on work inherits it. When deciding where to put something new, ask: does it stay useful once the task is done?
 
 **Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every iteration that touches a sub-analysis reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Paper-vs-code disagreements* section so it's visible to every iteration and to the user at REVIEW. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
 
-**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose. Intent is partly aesthetic ("how good does this need to be?") and partly pragmatic ("what's feasible given the compute, tokens, time, and attention available?"). Both dimensions belong in the prose — *"just checking the analysis is tractable"*, *"Figure 3 must be right; the rest can stay rough"*, *"every primary and secondary target lining up within stated tolerance, two sessions max"*.
+**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose. Intent is partly aesthetic ("how good does this need to be?") and partly pragmatic ("what's feasible given the compute, tokens, and wall-clock available?"). Both dimensions belong in the prose — *"just checking the analysis is tractable — an afternoon"*, *"Figure 3 must be right; the rest can stay rough — overnight"*, *"every primary and secondary target lining up within stated tolerance, a few days"*.
 
-Each iteration sizes its next move against that intent. Review happens via iteration boundaries: iteration N writes an artifact, N+1 reads it cold and either signs off or applies fixes. How many review passes an artifact gets depends on the gap between where it currently stands (CLAUDE.md's Rigor *Current state* — *sketch / baseline / tightened / canonical*) and what the intent says the user cares about. Cheap work gets one clean pass; canonical work waits for two consecutive clean passes; everything in between sizes by judgment. Update CLAUDE.md's Rigor *Current state* every iteration so the trajectory stays honest. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
+The mechanism is simple: each artifact gets one fresh-context review-and-fix pass. Iteration N writes, iteration N+1 reads N's work cold, reviews silently, applies any fixes inline, updates the constitution's Rigor *Current state* (*sketch / baseline / tightened*), commits, exits. Outputs that the intent wants pushed further than a single review-and-fix pass get an Open opportunity entry in CLAUDE.md; a later loop relaunch closes them. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
 
-**arxiv-LaTeX-first acquisition.** When the paper is on arxiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arxiv only.
+**arXiv-LaTeX-first acquisition.** When the paper is on arXiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arXiv only.
 
 **Use the up-to-date `astra` CLI surfaces.** When `astra validate` already does the job, call it directly. Specifically: `astra validate <file>`, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces — don't write skill-specific wrappers.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 2af9aab7..924e3fb2 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -18,7 +18,7 @@ ARCHITECT is what a ralph iteration does when the workdir signals "ACQUIRE subst
 ## Outputs
 
 - `astra.yaml` at the project root — **stub form**: sub-analyses named, architecture wired (inputs / outputs declared at the sub-analysis level), high-level `narrative:` prose blocks per analysis. **No `decisions:`, `prior_insights:`, `findings:`, or `astra-anchor:` references yet** — those entries don't exist for the narrative to reference.
-- `CLAUDE.md` updates: Rigor *Current state* appended with the stub's state (e.g. *stub: baseline* after a single-iteration write, *stub: tightened* if this iteration was a review pass that incorporated fixes).
+- `constitution.md` updates: Rigor *Current state* appended with the stub's state (e.g. *stub: baseline* after a single-iteration write, *stub: tightened* if this iteration was a review pass that incorporated fixes).
 
 ## Step 1 — Read the substrate, then write the stub
 
@@ -80,21 +80,15 @@ analyses:
 - **Validate before exit.** `astra validate astra.yaml` must return clean.
 - **Targeted reads, not whole-paper absorption.** The indices give you most of what you need; reach into the source / document / code for specific items, not as a default.
 
-After the stub is written and validates, commit it (`architect: stub astra.yaml`) and update `CLAUDE.md`'s Rigor with the stub's state (e.g. *stub: baseline*).
+After the stub is written and validates, commit it (`architect: stub astra.yaml`) and update the constitution's Rigor *Current state* with the stub's state (e.g. *stub: baseline*).
 
-## Review — the next iteration
+## Review-and-fix — the next iteration
 
-There is no in-iteration review-round mechanism. The ralph loop's iteration boundary *is* the fresh-context review: iteration N writes the stub; iteration N+1 reads it fresh and reviews; iteration N+2 applies fixes if any; the cycle terminates when two consecutive iterations find nothing to fix or after a 5-iteration cap on this artifact. The fresh-context-no-bias property is automatic at iteration boundaries.
+There is no in-iteration review-round mechanism. The ralph loop's iteration boundary *is* the fresh-context review, and it's a single pass: iteration N writes the stub; iteration N+1 reads it cold, reviews silently, applies any fixes inline, commits, and exits. Done. No intermediate findings file, no separate fix iteration, no second clean check. If the user wants more rigor than one review-and-fix pass delivers, that's a future loop (logged as an Open opportunity in CLAUDE.md).
 
-When a subsequent iteration enters, surveys, and finds the stub `astra.yaml` exists but `work/notes/architect/review-N.md` is missing (or the prior review iteration left findings to apply), this is what its work looks like:
+### When entering as the review-and-fix iteration
 
-### When entering as a review iteration
-
-Don't edit `astra.yaml` on the first review pass — read it fresh and write findings. Apply fixes in a follow-up iteration so the next fresh iteration can review the fixes too.
-
-Write findings to `work/notes/architect/review-<N>.md` (incrementing `<N>` based on existing files). For the first review iteration after the stub lands, `<N> = 1`; for the next, `<N> = 2`; and so on.
-
-### What to check
+The signal is "stub `astra.yaml` exists at project root, `decisions:` / `prior_insights:` / `findings:` blocks empty, no `architect: review-and-fix` commit yet in the log." Read the stub cold, then check:
 
 1. **Sub-analysis decomposition.** Right cuts? Consistent with `code-index.md`? Defensible against the paper where the paper compresses?
 2. **Sub-analysis IDs.** Noun phrases. No reserved-name collisions (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
@@ -103,29 +97,20 @@ Write findings to `work/notes/architect/review-<N>.md` (incrementing `<N>` based
 5. **Narrative coverage.** Root narrative includes a data-flow paragraph (when sub-analyses exist). Each sub-analysis's narrative accurately describes its role. No `astra-anchor:` references at this stage; flag any that snuck in.
 6. **Validates.** `astra validate astra.yaml` returns clean.
 
-### What NOT to do during review
+Apply fixes inline as you find them. Don't write a separate findings file — the diff against the prior commit is the record of what changed. Commit (`architect: review-and-fix stub`), update the constitution's Rigor *Current state* (e.g. *stub: tightened*), exit. The next iteration's survey moves on to SPECIFY.
+
+### What NOT to do during review-and-fix
 
 - Don't flag empty `decisions:` / `prior_insights:` / `findings:`. That's SPECIFY's territory.
-- Don't edit `astra.yaml` on the review iteration itself — write findings, exit, let the next iteration apply fixes (and the iteration after that re-review the fixes).
 - Don't re-read the entire paper or code. Use the indices and targeted reads.
-
-### Review-fix pass
-
-The iteration after the review-iteration reads `work/notes/architect/review-<N>.md`, applies the fixes to `astra.yaml`, commits (`architect: apply review-N fixes`), updates `CLAUDE.md`'s Rigor (e.g. *stub: tightened* after review-N fixes land), and exits. The iteration after *that* is the next review-iteration — fresh context, no memory of the prior round's fixes.
-
-### Termination
-
-- If two consecutive `work/notes/architect/review-<N>.md` files both have verdict `clean`, ARCHITECT is done; the next iteration's survey advances to SPECIFY.
-- If 5 review iterations have happened without two consecutive clean rounds, log the unfinished tail to `open-questions.md` ("ARCHITECT review reached round cap with N fixes still landing; user should review during REVIEW close-out") and let the next iteration advance to SPECIFY anyway. Don't loop forever on stub-level review.
-- If the iteration's fidelity-intent assessment calls for *cheap* — verdict `pass` on the first review-iteration is enough; skip the second-clean-round requirement and move on. The Rigor accumulator stays *stub: baseline*.
+- Don't open a second review pass on the same stub. One pass is the protocol; further tightening waits for a future loop.
 
 ## Survey signals (entry into ARCHITECT)
 
 - `work/reference/index.json` + `work/reference/astra.yaml` + `work/reference/code-index.md` (when code present) exist ⇒ ACQUIRE substrate is ready
 - `astra.yaml` at project root absent (or present-but-empty) ⇒ this iteration writes the stub
-- `astra.yaml` exists, validates clean, sub-analyses + inputs + outputs + narrative populated, `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty, but no `work/notes/architect/review-1.md` ⇒ this iteration writes review-1
-- `review-N.md` exists with `needs-fixes` verdict, fixes not yet applied ⇒ this iteration applies the fixes
-- Two consecutive `review-<N>.md` files both `clean` ⇒ ARCHITECT done; next iteration surveys for SPECIFY
+- `astra.yaml` exists, validates clean, sub-analyses + inputs + outputs + narrative populated, `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty, no `architect: review-and-fix` commit ⇒ this iteration is the review-and-fix
+- `architect: review-and-fix` commit landed ⇒ ARCHITECT done; next iteration surveys for SPECIFY
 
 ## Notes
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index 462008e4..fc6746cd 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -2,7 +2,7 @@
 
 Compare reproduced results against the paper's replication targets. COMPARE returns two things: a **verdict** (pass / partial / fail) and an **opportunity assessment** — where the gaps are, how much they likely matter, and how they sit relative to the user's fidelity intent in `constitution.md`'s Goal section. The verdict drives whether a subsequent iteration retries IMPLEMENT; the opportunity assessment tells the next iteration (and the user at REVIEW) which gaps fall below intent and would be high-leverage to close, even on `pass`. Together they replace the old yes/no framing.
 
-COMPARE is what a ralph iteration does when the workdir signals "RUN done (`results/` materialized) + `comparison-report.yaml` absent or stale relative to latest RUN." The iteration writes the report; what happens next depends on the verdict and the iteration's read of the constitution's Fidelity intent. If verdict is `partial`/`fail` AND an opportunity is below intent AND attempt budget remains, the next iteration takes a retry attempt at IMPLEMENT against the failing outputs first. If verdict is `pass` AND no opportunities are below intent (or budget is exhausted), the iteration logs un-acted opportunities into CLAUDE.md's **Rigor** *Open opportunities*; a subsequent cold-survey iteration with no contributions closes the constitution and REVIEW runs in the user's main session.
+COMPARE is what a ralph iteration does when the workdir signals "RUN done (`results/` materialized) + `comparison-report.yaml` absent or stale relative to latest RUN." The iteration writes the report; what happens next depends on the verdict and the iteration's read of the constitution's Fidelity intent. If verdict is `partial`/`fail` AND an opportunity is below intent AND attempt budget remains, the next iteration takes a retry attempt at IMPLEMENT against the failing outputs first. If verdict is `pass` AND no opportunities are below intent (or budget is exhausted), the iteration logs un-acted opportunities into CLAUDE.md's *Open opportunities*; a subsequent cold-survey iteration with no contributions closes the constitution and REVIEW runs in the user's main session.
 
 ## Inputs
 
@@ -101,7 +101,7 @@ Also write `comparison-report.md` with a human-readable summary. For figure / ta
 After writing the report, the iteration acts against the fidelity intent (iterations run detached; the user isn't reachable interactively):
 
 - If attempt < budget AND (verdict is `partial` / `fail` OR any opportunity is `below` intent), commit the report, exit. The next iteration surveys, sees the report's `below`-intent opportunities, and takes a retry attempt at IMPLEMENT targeting those gaps first.
-- If verdict is `pass` AND no opportunities are `below` intent, OR attempt budget is exhausted, log un-acted opportunities into CLAUDE.md's **Rigor** *Open opportunities* list, commit. A subsequent cold-survey iteration (no contributions) closes the constitution by flipping `status:` to `closed`, and REVIEW close-out runs in the user's main session.
+- If verdict is `pass` AND no opportunities are `below` intent, OR attempt budget is exhausted, log un-acted opportunities into CLAUDE.md's *Open opportunities* list, commit. A subsequent cold-survey iteration (no contributions) closes the constitution by flipping `status:` to `closed`, and REVIEW close-out runs in the user's main session.
 
 The verdict is the iteration's judgment from the data; the **decision to keep iterating or close** happens by iteration boundary — one iteration writes the report and the take, the next surveys and decides whether to retry or accept. The opportunity assessment — graded against the user's fidelity intent — is the bridge that turns a binary verdict into a picture the next iteration (and REVIEW) can navigate.
 
@@ -109,10 +109,10 @@ The verdict is the iteration's judgment from the data; the **decision to keep it
 
 - All outputs in `lc status --universe baseline` are `ok` ⇒ ready to compare
 - `comparison-report.yaml` exists with current `attempt` ⇒ COMPARE done for this attempt
-- `comparison-report.yaml` verdict is `pass` (or `partial` with un-acted opportunities logged to CLAUDE.md as Open opportunities) ⇒ COMPARE → IMPLEMENT loop terminated; the next cold-survey iteration closes the constitution and REVIEW runs in the user's main session
+- `comparison-report.yaml` verdict is `pass` (or `partial` with un-acted opportunities logged into CLAUDE.md's Open opportunities) ⇒ COMPARE → IMPLEMENT loop terminated; the next cold-survey iteration closes the constitution and REVIEW runs in the user's main session
 
 ## Notes
 
 - **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-<N>.yaml` if useful, or commit each between attempts so `git log` carries the history.
 - **The verdict is the iteration's judgment from the data; the keep-iterating decision happens at iteration boundary.** One iteration writes the report and the take on what should happen next; the next iteration surveys, reads the take, and either retries or accepts. The user's voice enters at REVIEW close-out, not mid-loop.
-- **The opportunity assessment is part of the durable record.** When the user accepts the current verdict, propagate the un-acted-on opportunities into CLAUDE.md's **Rigor** section's *Open opportunities* list. Future sessions and future-Cail returning to this reproduction see them; tightening any becomes a future IMPLEMENT pass against a clearer target.
+- **The opportunity assessment stays accessible past close-out.** Un-acted-on opportunities sit in CLAUDE.md's *Open opportunities* list — durable, auto-loaded on any future Claude Code session in this workdir. Tightening any becomes a future IMPLEMENT pass against a clearer target.
diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index 6c5f16e1..dfaf57f4 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -1,6 +1,6 @@
-# IMPLEMENT — write scripts and recipes; review by iteration boundary
+# IMPLEMENT — write scripts and recipes; one review-and-fix pass
 
-Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, review by iteration boundary cross-checks the implementation against paper + code — same fresh-context-no-bias shape ARCHITECT, SPECIFY, and LITERATURE use, with the fresh-context property given for free by iteration boundaries.
+Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, one fresh-context iteration reads it cold, reviews against paper + code, applies fixes inline, and exits. One pass — same shape ARCHITECT and SPECIFY use.
 
 IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done + scripts/ absent (first pass) or comparison-report.yaml shows partial/fail (retry pass)". Most implementation is mechanical (translate spec → script). Where parallelization is feasible (multiple independent outputs from different scripts), the iteration fans out to one-level-deep sub-agents per output (inside its own main session) and merges.
 
@@ -11,16 +11,16 @@ IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done
 - `work/reference/index.json` — paper-side structural index (figures, tables, outline, citations); useful when the spec compresses or you need to find where in the paper a behavior is described.
 - `work/reference/code-index.md` (when code present) — code inventory: module map, candidate decisions with file:line, entry-points, data dependencies, gotchas (the canonical map of where each sub-analysis's logic lives in `work/reference/code/`).
 - `work/reference/code/` (if present) — **canonical reference. Read it when implementing each output.** Where paper and code disagree, code wins for numerics, plotting, and method.
-- `constitution.md` — Fidelity intent (used to size cheap vs heavy on this iteration's review).
-- CLAUDE.md — Rigor *Current state* per output; **Paper-vs-code disagreements** for prior conflicts already logged.
+- `constitution.md` — Fidelity intent, Rigor *Current state* per output.
+- `CLAUDE.md` — **Paper-vs-code disagreements** for prior conflicts already logged.
 
 ## Outputs
 
 - `scripts/<output>.py` (or `.sh`, or whatever fits) — one script per output (or shared scripts for tightly-coupled outputs)
 - `requirements.txt` — Python dependencies
 - Recipes in `astra.yaml` — each output gets a `recipe:` block with `command:` and `inputs:`
-- `work/notes/implement-review/round-<N>.md` — each review iteration's findings (one file per review-iteration; how many depends on the fidelity-intent calculus)
-- CLAUDE.md updates — append to **Paper-vs-code disagreements** for any new conflict surfaced during implementation; update **Rigor** *Current state* with the post-iteration state per output (e.g. *baseline* after a one-iteration pass, *tightened* after a review-iteration applied fixes).
+- `constitution.md` updates — Rigor *Current state* per output (e.g. *baseline* after the write iteration, *tightened* after the review-and-fix iteration)
+- `CLAUDE.md` updates — append to **Paper-vs-code disagreements** for any new conflict surfaced during implementation
 
 ## Step 1: write recipes + scripts
 
@@ -60,79 +60,29 @@ The iteration merges scripts and recipes after the per-output sub-agents finish.
 5. **Do not execute scripts** — the RUN phase handles execution via `lc run`.
 6. **Validate** with `astra validate astra.yaml` after adding recipes.
 
-## Step 2: review by iteration boundary
-
-After the first-pass implementation lands, the cross-check question is: is the implementation consistent with the paper and the code? The depth is sized from the gap between CLAUDE.md's Rigor *Current state* and `constitution.md`'s Fidelity intent.
-
-The iteration that wrote the first pass exits when `scripts/`, recipes, and `requirements.txt` are committed; the next iteration enters fresh, surveys, finds the implementation present but no `work/notes/implement-review/round-1.md`, reads `scripts/` + `astra.yaml`'s recipes + the paper, and writes findings to `round-1.md`. The iteration after that applies the fixes. Two consecutive review-iterations with verdict `clean` terminates the review cycle; the next iteration advances to RUN. Sized: *cheap* — accept after one clean review-iteration; *heavy* — require two consecutive clean.
-
-The discipline is the same shape ARCHITECT, SPECIFY, and LITERATURE use: review is fresh-context, prompted to check "is the implementation consistent with the paper and the code?", outputs findings only — not edits. Fixes are applied between iterations by the next iteration. Pattern-matching on prior fixes defeats the cross-check; the no-bias rule is load-bearing.
-
-### Per-round fresh reviewer — system prompt
-
-> You are a paper-vs-implementation review agent. Read the implementation (`scripts/`, `astra.yaml` recipes), the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers; do not assume anything has already been fixed.
->
-> ### Inputs
->
-> - `scripts/` — first-pass implementation
-> - `astra.yaml` — the spec (recipes are part of the implementation; structural + content fields are ARCHITECT's and SPECIFY's)
-> - `implementation-notes.md`
-> - `work/reference/index.json` — Grep into; do not re-read whole
-> - `work/reference/code-index.md` (when present) — natural decomposition + entry-points + gotchas
-> - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep)
-> - `work/reference/code/` (when present) — canonical reference for numerics + method
->
-> ### What to check
->
-> 1. **Recipe coverage.** Every output in `astra.yaml` has a recipe; every recipe runs a script that exists in `scripts/`.
-> 2. **Method fidelity.** For each output, the script implements the method described by the relevant sub-analysis's `decisions:` and `findings:` in `astra.yaml`. Where SPECIFY's code pass surfaced a material disagreement, the script follows the code's method (canonical-resolution rule), unless the spec recorded a different override in `decisions:` and `universes/baseline.yaml`.
-> 3. **Numerical correctness.** Constants, hyperparameters, threshold values match the paper (or the code, where the canonical-resolution rule applied). Flag mismatches with `path:line` of the script and the paper §/eq + the relevant `astra.yaml#analyses.<sub-id>.decisions.<key>` entry.
-> 4. **Data acquisition.** Scripts that fetch data use the real acquisition path from `astra.yaml`'s inputs — no synthetic / mock substitutes.
-> 5. **Determinism.** Scripts set random seeds where the paper's method is stochastic. Library versions in `requirements.txt` are pinned where reproducibility requires it.
-> 6. **Recipe wiring.** Recipe `inputs:` references match the data-flow the scripts actually consume; no orphan dependencies, no missing dependencies.
->
-> ### What NOT to do
->
-> - **Do not edit any file.** Your output is a findings file; an IMPLEMENT-fix pass responds to the findings.
-> - **Do not re-read the entire paper.** Grep into `work/reference/index.json`, `work/reference/code-index.md`, and `work/reference/source/` (or `document.md`) for specific items. The filled `astra.yaml` is your primary source for what each sub-analysis is supposed to do.
-> - **Do not invent problems.** If the implementation matches paper + code, say so briefly.
-> - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
->
-> ### Output format — `work/notes/implement-review/round-<N>.md`
->
-> ```markdown
-> # Implement-review round <N>
->
-> Reviewer ran fresh against scripts/, recipes in astra.yaml, paper, and code.
->
-> ## Findings
->
-> ### <category — e.g. "Method fidelity" / "Numerical correctness" / "Recipe wiring">
->
-> - **<one-line finding>**
->   - **What's wrong**: <quote or `script:line` of the implementation problem>
->   - **Where to fix**: <`scripts/<file>.py:line` or `astra.yaml#path/to/recipe`>
->   - **Suggested fix**: <one-line concrete change>
->   - **Source**: <paper §X.Y "quote" + `astra.yaml#analyses.<sub-id>.decisions.<key>` evidence, or code `path:line`>
->
-> ## Verdict
->
-> - **fixes_needed**: <count>
-> - **clean** | **needs-fixes**
-> ```
-
-### Step 3: IMPLEMENT-fix pass between rounds
-
-After each round's findings file lands, the iteration edits `scripts/`, `astra.yaml` recipes, `requirements.txt`, and `implementation-notes.md` per the suggested fixes. After any change to `astra.yaml`, run `astra validate astra.yaml`.
-
-### Step 4: termination check
-
-- **Cheap:** one pass. Done after fixes (or immediately, if `fixes_needed` was 0).
-- **Heavy:**
-  - If round N's `fixes_needed` was 0 AND round (N-1)'s was also 0 → done.
-  - If N hits the 5-round system cap without two consecutive clean rounds, an iteration logs the unfinished tail in `open-questions.md` ('IMPLEMENT review reached round cap with N fixes still landing; user should review during REVIEW close-out') and the next iteration advances to RUN anyway.
-
-The IMPLEMENT-review iterations are independent of the COMPARE → IMPLEMENT retry loop — review iterations run before RUN, on the spec/implementation alignment side; COMPARE retries run after RUN, on the result-matching side.
+## Step 2: review-and-fix — the next iteration
+
+After the first-pass implementation lands, one fresh-context iteration reads it cold, reviews silently against paper + code, applies any fixes inline, commits, exits.
+
+The cross-check question on entry: is the implementation consistent with the paper and the code?
+
+### What to check
+
+1. **Recipe coverage.** Every output in `astra.yaml` has a recipe; every recipe runs a script that exists in `scripts/`.
+2. **Method fidelity.** For each output, the script implements the method described by the relevant sub-analysis's `decisions:` and `findings:` in `astra.yaml`. Where SPECIFY's code pass surfaced a material disagreement, the script follows the code's method (canonical-resolution rule), unless the spec recorded a different override in `decisions:` and `universes/baseline.yaml`.
+3. **Numerical correctness.** Constants, hyperparameters, threshold values match the paper (or the code, where the canonical-resolution rule applied). Flag mismatches with `path:line` of the script and the paper §/eq + the relevant `astra.yaml#analyses.<sub-id>.decisions.<key>` entry.
+4. **Data acquisition.** Scripts that fetch data use the real acquisition path from `astra.yaml`'s inputs — no synthetic / mock substitutes.
+5. **Determinism.** Scripts set random seeds where the paper's method is stochastic. Library versions in `requirements.txt` are pinned where reproducibility requires it.
+6. **Recipe wiring.** Recipe `inputs:` references match the data-flow the scripts actually consume; no orphan dependencies, no missing dependencies.
+
+Apply fixes inline as you find them — `scripts/`, `astra.yaml` recipes, `requirements.txt`, `implementation-notes.md`, the disagreements log in CLAUDE.md when a new material conflict surfaces. After any change to `astra.yaml`, run `astra validate astra.yaml`. Commit (`implement: review-and-fix`), update the constitution's Rigor *Current state* per output (e.g. *baseline → tightened*), exit. The next iteration's survey advances to RUN.
+
+### What NOT to do during review-and-fix
+
+- **Don't re-read the entire paper.** Grep into `work/reference/index.json`, `work/reference/code-index.md`, and `work/reference/source/` (or `document.md`) for specific items.
+- **Don't open a second review pass.** One pass is the protocol. If the implementation needs more rigor than this delivers, log an Open opportunity in CLAUDE.md and let a future loop close it.
+
+The post-RUN COMPARE → IMPLEMENT retry loop is separate from this review-and-fix pass — that loop handles result-matching after the pipeline executes, not spec/implementation alignment before it.
 
 ## Data: REAL DATA ONLY
 
@@ -144,16 +94,15 @@ If a dataset is behind a paywall, requires registration, or is "available upon r
 
 ## Retry attempts (post-COMPARE)
 
-If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, a subsequent iteration may take on a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. Default attempt budget is 5; the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, accept partial, log the failure in CLAUDE.md's **Rigor** section as an open opportunity (so REVIEW close-out can decide whether to push further or accept the trajectory), and exit; subsequent iterations either accept the verdict via a cold close or pivot scope based on REVIEW's input.
+If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, a subsequent iteration may take on a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. Default attempt budget is 5; the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, accept partial, log the failure as an Open opportunity in CLAUDE.md (so REVIEW close-out can decide whether to push further or accept the trajectory), and exit; subsequent iterations either accept the verdict via a cold close or pivot scope based on REVIEW's input.
 
-A retry attempt re-runs IMPLEMENT review (by iteration boundary) on the changed scripts before the next iteration advances to RUN.
+A retry attempt re-runs the IMPLEMENT review-and-fix pass on the changed scripts before the next iteration advances to RUN.
 
 ## Survey signals (entry into IMPLEMENT)
 
 - `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement first pass
 - `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done
-- For cheap: `work/notes/implement-review/round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ IMPLEMENT done
-- For heavy: two consecutive `work/notes/implement-review/round-<N>.md` files both have verdict `clean` ⇒ IMPLEMENT done; the next iteration surveys and advances to RUN
+- An `implement: review-and-fix` commit lands ⇒ IMPLEMENT done; the next iteration surveys and advances to RUN
 - `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; the constitution can close after a cold survey, and REVIEW close-out runs in the user's main session
 
 ## Notes
@@ -161,6 +110,5 @@ A retry attempt re-runs IMPLEMENT review (by iteration boundary) on the changed
 - **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
 - **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
 - **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
-- **The fresh-context discipline is the same as ARCHITECT's, SPECIFY's, and LITERATURE's review.** A reviewer that sees the prior round's findings stops finding the next class of inconsistency. Iteration boundaries give fresh context automatically.
-- **Minimize churn in fixes.** Targeted edits, not restructures. Big restructures defeat the round-over-round comparison the iteration sequence uses to decide termination.
-- **Commit per output as it lands.** One commit per script + recipe wiring; one commit per review-round file; one commit per fix pass. The next iteration reads `git log` to track progress.
+- **One review-and-fix pass per artifact.** The fresh-context property is automatic at iteration boundaries. Re-opening a second pass on the same artifact in the same loop is anti-pattern — log an Open opportunity and let a future loop close it.
+- **Commit as you go.** One commit per script + recipe wiring; one commit for the review-and-fix pass. The next iteration reads `git log` to track progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index 3d1795a2..57fd4b79 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -10,8 +10,8 @@ The interview is short. Three to six `AskUserQuestion` rounds, total. The user d
 
 Two files at the reproduction workdir root:
 
-- **`constitution.md`** — drafted from [`../templates/constitution.md`](../templates/constitution.md). YAML frontmatter `status: active`, then Goal, Fidelity intent, Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL, where the substrate lives), Open dimensions. The ralph loop's driving document; each iteration reads it on entry. Sharpens slowly; the user can revise it at any point (including mid-loop — successive iterations re-read it).
-- **`CLAUDE.md`** — drafted from [`../templates/CLAUDE.md`](../templates/CLAUDE.md). Paper identity at the top (DOI, title, one-line subject), Rules (universal across reproductions; leave the template's defaults), Rigor accumulator (starts empty; iterations append), Disagreements log (starts empty; iterations append), Pointers (to `constitution.md`, `work/reference/`, etc.). The auto-loading walk-up; every Claude Code session in the workdir picks it up.
+- **`constitution.md`** — drafted from [`../templates/constitution.md`](../templates/constitution.md). YAML frontmatter `status: active`, then Goal, Fidelity intent, Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL, where the substrate lives), Rigor *Current state* per output (starts empty; iterations append), Open dimensions. The ralph loop's driving document; each iteration reads it on entry. The top half (Goal, Scope, Quality bar, Evidence) sharpens slowly; the bottom half (Rigor *Current state*) is updated each iteration. Task-bound — archivable once the reproduction closes.
+- **`CLAUDE.md`** — drafted from [`../templates/CLAUDE.md`](../templates/CLAUDE.md). Paper identity at the top (DOI, title, one-line subject), Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty; iterations append), Open opportunities (starts empty; iterations append), Pointers (to `constitution.md`, `work/reference/`, etc.). The auto-loading walk-up; every Claude Code session in the workdir picks it up. Durable — stays useful for any follow-on work in this directory once the reproduction lands.
 
 There is no separate "constitution skill" invocation — `/ralph`'s Authoring mode (Study → Draft → Refine → Launch) is what you're following here; the constitution authoring discipline + reference materials live there. Pull the discipline mentally; the deliverable is these two markdown files.
 
@@ -29,7 +29,7 @@ Use `AskUserQuestion` for whatever the user did not supply on `/lc-from-paper` i
 
 - **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
 - **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) When code is available, every iteration that touches a sub-analysis reads from `work/reference/code/` and treats code as canonical for numerics + method. This is recorded in CLAUDE.md's Rules.
-- **User's prior familiarity.** Has the user reproduced this paper before? Read it recently? Worked with the original authors? Affects how much you'd lean toward heavy in-iteration review on first iterations.
+- **User's prior familiarity.** Has the user reproduced this paper before? Read it recently? Worked with the original authors? Useful context for the iterations (and for the user at REVIEW).
 - **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; iterations will read it during ARCHITECT.
 
 ### 2. Scope the reproduction
@@ -50,19 +50,19 @@ A reproduction can land anywhere from a quick "does this even run" sanity check
 
 The job is to **elicit prose intent** — the user's own words for what "good enough" looks like for this stretch — and capture it into `constitution.md`'s Goal section. Reach for whichever pivot fits; you usually only need one or two:
 
-- *"What's the right shape for this stretch — a quick check that the analysis is tractable, getting one specific figure right, or a full match across primary targets? How much compute and time do you have to spend on it?"*
+- *"What's the right shape for this stretch — a quick check that the analysis is tractable, getting one specific figure right, or a full match across primary targets? How much compute and wall-clock do you have to spend on it?"*
 - *"Is there a specific result you care about more than the rest, where you'd want full fidelity even if the others stay rough?"*
 - *"Are you trying to verify the paper, build on it, or critique it? That shifts where the fidelity bar wants to sit."*
-- *"If this took several sessions to reach high fidelity everywhere, is that the right investment? Or would a working version in a couple of sessions be enough to decide where to push further?"*
+- *"If this took a few days of iteration to reach high fidelity everywhere, is that the right investment? Or would a working version overnight be enough to decide where to push further?"*
 
 Record the answer verbatim or in close paraphrase under **Fidelity intent** in `constitution.md`'s Goal section. Concrete examples of what good prose intent looks like:
 
-- *"Just checking if the analysis is tractable — quick sanity that some headline number comes out close."*
-- *"I care about Figure 3 being right. The rest can stay rough."*
-- *"Full fidelity on the BAO fit specifically; the rest can stay rough. One session of compute."*
-- *"Every primary and secondary target lining up within stated tolerance, every paper-vs-code conflict adjudicated. Two sessions max."*
+- *"Just checking if the analysis is tractable — quick sanity that some headline number comes out close. An afternoon."*
+- *"I care about Figure 3 being right. The rest can stay rough. Overnight if needed."*
+- *"Full fidelity on the BAO fit specifically; the rest can stay rough. A day or two."*
+- *"Every primary and secondary target lining up within stated tolerance, every paper-vs-code conflict adjudicated. No hard deadline."*
 
-Each iteration reads this when sizing its next move, and COMPARE grades opportunities against it. If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
+Time/compute bounds are part of the intent — the user's spendable budget shapes what "good enough" can mean for this stretch. Each iteration reads the intent when sizing its next move; COMPARE grades opportunities against it. If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
 
 ### 4. Paper-specific conventions or warnings
 
@@ -75,7 +75,7 @@ Light touch. Ask the user if there's anything they want every iteration to know
 Open both templates side-by-side:
 
 - [`../templates/constitution.md`](../templates/constitution.md) — fill in the header, Goal (with fidelity intent), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL — these are the user-supplied identifiers; the substrate-path bullets in the template stay as boilerplate, naming where each substrate lives on disk), Open dimensions. Leave the YAML frontmatter `status: active` intact.
-- [`../templates/CLAUDE.md`](../templates/CLAUDE.md) — fill in the header (paper title + arXiv ID + DOI + one-line subject), any paper-specific Pointers. Leave Rules in the template state (universal across reproductions). Leave Rigor and Disagreements sections empty — iterations populate them.
+- [`../templates/CLAUDE.md`](../templates/CLAUDE.md) — fill in the header (paper title + arXiv ID + DOI + one-line subject), any paper-specific Pointers. Leave Rules in the template state (universal across reproductions). Leave the Disagreements log and Open opportunities sections empty — iterations populate them.
 
 Show both drafts to the user, take corrections, refine, save. Then `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline) and commit both as the first commit.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index 1c4b5d89..f0372c1e 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -13,8 +13,7 @@ LITERATURE is what a ralph iteration does when the workdir signals "SPECIFY done
 - `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries shaped as syntactically-complete `Insight` blocks (`id`, `claim`, `created_at`, `evidence: [{id, doi}]`) where each Evidence carries a `doi:` but no `quote:` selector. These are the placeholders LITERATURE resolves by writing `quote: {exact, prefix, suffix}` and `location: {page}` onto each Evidence entry. The option↔insight linkage already lives on the option side (`Option.insights`); LITERATURE does not touch it.
 - `work/reference/index.json#citations` — paper-extraction's cite-key → `{locations, citation, doi}` mapping for every entry in the target paper's bibliography. Used as the canonical cite-key → DOI lookup when cross-checking placeholder DOIs and surfacing unresolved-DOI cases.
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — target paper text. Grep into for context on how the cited paper is invoked, when a placeholder's claim is ambiguous.
-- `constitution.md` — Fidelity intent (used to size cheap vs heavy on this iteration's review).
-- CLAUDE.md — Rigor *Current state* per output (so this iteration knows where prior insights currently sit).
+- `constitution.md` — Fidelity intent; Rigor *Current state* per output (so this iteration knows where prior insights currently sit).
 
 ## Outputs
 
@@ -165,40 +164,23 @@ Rules:
 
 When the iteration fans out to Haikus, each Haiku is spawned with `model="haiku"` and gets this contract plus its assigned subset of placeholders and substrate paths.
 
-## Review by iteration boundary
+## Review-and-fix — the next iteration
 
-After the merge lands, the cross-check question is: do the `evidence:` quotes belong to the cited paper at the cited page? Do the quotes actually justify the placeholders' claims, or are they technically present but tangential? Do the claims actually support the decision options that reference them via `Option.insights`?
+After the merge lands, one fresh-context iteration reads cold, runs `astra validate --verify-evidence` for the deterministic check, does a semantic re-read of each resolved insight, applies fixes inline, commits, exits.
 
-The iteration that did the merge exits; the next iteration enters fresh, surveys, finds `astra.yaml`'s `prior_insights:` Evidence entries populated with resolved `quote:` + `location:` selectors but no `work/notes/literature-review/round-N.md`, runs `astra validate --verify-evidence` for the deterministic check + a semantic re-read of each resolved insight, and writes review findings. The iteration after that applies the fixes (which may include re-running Haiku quote-finding for entries that need a different quote). Two consecutive review-iterations with verdict `clean` terminates the review cycle.
+The cross-check questions on entry:
 
-Sized from the constitution's Fidelity intent: *cheap* — one clean review-iteration is enough; *heavy* — require two consecutive clean.
+1. **Evidence integrity.** `astra validate --verify-evidence` handles the deterministic check; do the semantic check yourself.
+2. **Evidence justifies claim.** Does the quote actually support the claim, or is it tangential?
+3. **Claim supports the decision.** Does the placeholder's claim justify the decision option that references it via `Option.insights`?
+4. **Cited paper is the right paper.** Does the target paper actually invoke this DOI for this claim?
+5. **Unresolved entries are honest.** For entries in `open-questions.md` flagged unresolved, does a closer read of the cited paper find supporting evidence the resolver missed?
 
-### Per-round fresh reviewer — prompt shape
+Apply fixes inline as you find them — `astra.yaml`'s `prior_insights:` entries (including re-running Haiku quote-finding for entries that need a different quote, when the gap is mechanical rather than semantic). Commit (`literature: review-and-fix`), update the constitution's Rigor *Current state* (e.g. *baseline → tightened*), exit. The next iteration's survey advances to IMPLEMENT.
 
-```
-You are a LITERATURE reviewer. Read astra.yaml's prior_insights:
-entries, the cited papers (substrate at work/cited/<doi-slug>/), and
-the target paper. Report inconsistencies. You are one of several
-independent reviewers; assume nothing has been fixed.
-
-Check:
-  1. Evidence integrity. (astra validate --verify-evidence handles the
-     deterministic check; you do the semantic check.)
-  2. Evidence justifies claim. Does the quote actually support the
-     claim, or is it tangential?
-  3. Claim supports the decision. Does the placeholder's claim justify
-     the linked decision option?
-  4. Cited paper is the right paper. Does the target paper actually
-     invoke this DOI for this claim?
-  5. Unresolved entries are honest. For entries in open-questions.md
-     flagged unresolved, does a closer read of the cited paper find
-     supporting evidence the resolver missed?
-
-Output findings to work/notes/literature-review/round-<N>.md, one fix
-per F-N entry. Verdict is `clean` or a count. Do NOT edit astra.yaml.
-```
+If the entry genuinely has no supporting quote in the cited paper, log it to `open-questions.md` with a "no support found" note and leave the entry as-is for the user to resolve at REVIEW. Don't fabricate evidence.
 
-If 5 review-iterations have happened without two consecutive clean rounds, log the unfinished tail in `open-questions.md` ("LITERATURE review reached round cap with N fixes still landing; user should review during REVIEW close-out") and let the next iteration advance to IMPLEMENT anyway. Don't loop forever on literature review.
+One pass. If LITERATURE needs more rigor than this delivers, that's an Open opportunity for a future loop.
 
 ## Survey signals (entry into LITERATURE)
 
@@ -207,8 +189,7 @@ If 5 review-iterations have happened without two consecutive clean rounds, log t
 - `work/notes/literature/resolutions.yaml` exists with non-empty resolutions / unresolved sections ⇒ quote-finding done
 - `astra.yaml`'s `prior_insights:` entries each have a resolved `quote:` (+ `location:`) selector on their Evidence ⇒ merge done
 - `astra validate astra.yaml --verify-evidence` returns clean ⇒ structural validation done
-- For cheap: at least one `work/notes/literature-review/round-<N>.md` with verdict `clean` (or no fixes were incorporated) ⇒ LITERATURE review done
-- For heavy: two consecutive `round-<N>.md` files with verdict `clean` ⇒ LITERATURE review done
+- A `literature: review-and-fix` commit lands ⇒ LITERATURE review-and-fix done
 
 When all of the above hold ⇒ LITERATURE complete; the next iteration surveys and advances to IMPLEMENT.
 
@@ -220,4 +201,4 @@ When all of the above hold ⇒ LITERATURE complete; the next iteration surveys a
 - **Resume is automatic.** If `work/cited/<doi-slug>/work/reference/index.json` exists, skip that DOI's fetch. If `work/notes/literature/resolutions.yaml` has an entry for a placeholder, skip that placeholder's quote-finding.
 - **Unresolved is not failure.** A placeholder that no quote in the cited paper supports is a real signal — the target paper cited loosely or paraphrased beyond what the source actually says. Surface to `open-questions.md`; don't fabricate evidence.
 - **`astra validate --verify-evidence` runs after the merge**, not after each Haiku's per-placeholder output. Haikus write to disjoint files; the deterministic check happens once `astra.yaml` is updated.
-- **Commit per stage.** Fetches commit together once Stage 1 completes (one commit for all cited-paper substrates). Quote-finding commits together once Stage 2 completes (`resolutions.yaml` + Haiku files). The merge into `astra.yaml` is its own commit. Each review round file commits as it lands. The next iteration reads `git log` to see progress.
+- **Commit per stage.** Fetches commit together once Stage 1 completes (one commit for all cited-paper substrates). Quote-finding commits together once Stage 2 completes (`resolutions.yaml` + Haiku files). The merge into `astra.yaml` is its own commit. The review-and-fix iteration commits its diff. The next iteration reads `git log` to see progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index ba50dee7..63b0a381 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -2,7 +2,7 @@
 
 The reproduction has converged: the constitution's `status:` is `closed` (after COMPARE returned `pass`, or `partial` with the un-acted opportunities logged, and the next cold-survey iteration found nothing left to do). The ralph loop's tmux session has exited. REVIEW runs back in the user's main session — the second of two interactive bookends, the first being INTERVIEW. It runs in the user's main session (not as an iteration) because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which isn't available inside detached ralph iterations.
 
-Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, draft the final report, and propagate any un-acted-on opportunities from the latest COMPARE into CLAUDE.md's **Rigor** section — in one interactive arc.
+Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, and draft the final report — in one interactive arc. The Open opportunities list in CLAUDE.md already carries un-acted-on opportunities from the latest COMPARE (those iterations logged them directly); REVIEW just reads them.
 
 The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding into ARCHITECT, SPECIFY, LITERATURE, and IMPLEMENT as their per-iteration self-review passes. This close-out is what the previous shape called SUMMARIZE_RUN.
 
@@ -15,8 +15,8 @@ The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding
 - `open-questions.md` at the workdir root — running report from the iteration-phases (paper-vs-code conflicts, ambiguities, anything iterations flagged for user resolution)
 - `work/reference/index.json` and `work/reference/code-index.md` — for context
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) and `work/reference/code/` — directly available for follow-up questions the user asks during REVIEW that the report and CLAUDE.md don't answer ("remind me what the paper says about X", "did the original code do Y"). Grep into for specifics; read targeted spans by offset/limit.
-- `CLAUDE.md` at the workdir root — paper identity, Rules, Rigor, Paper-vs-code disagreements (the at-a-glance summary that's accumulated across iterations)
-- `constitution.md` at the workdir root — Goal, Fidelity intent, Scope, Quality bar, Evidence (the driving document the loop has been working against)
+- `CLAUDE.md` at the workdir root — paper identity, Rules, Paper-vs-code disagreements, Open opportunities (the durable surface, accumulated across iterations)
+- `constitution.md` at the workdir root — Goal, Fidelity intent, Scope, Quality bar, Evidence, Rigor *Current state* (the driving document the loop has been working against)
 
 ## Outputs
 
@@ -25,7 +25,7 @@ The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding
 - `open-questions.md` — same file, but with `## Resolutions` section appended capturing what the user said for each entry
 - Edits to `astra.yaml` / `implementation-notes.md` / `universes/baseline.yaml` if any open-question resolution warrants a spec change
 - `REPRODUCTION-SUMMARY.md` — final report; concise (~1–2 pages); the canonical record of what the reproduction landed on
-- CLAUDE.md updates — *Open opportunities* list under **Rigor** propagated from COMPARE's un-acted-on opportunities; **Paper-vs-code disagreements** entries reconciled with their resolutions
+- CLAUDE.md updates — **Paper-vs-code disagreements** entries reconciled with their resolutions (Open opportunities already there from COMPARE iterations)
 - A commit closing out the reproduction
 
 ## Step 1: render the validation surfaces
@@ -67,18 +67,16 @@ A single markdown file at the project root, ~1–2 pages. The canonical record o
 2. **Verdict** — pass / partial. If partial, what failed and why we accepted it.
 3. **Material decisions** — the paper-vs-code conflicts SPECIFY's code pass (and any IMPLEMENT pass) surfaced, what the user chose (in prose ratification or by canonical-resolution default), and why.
 4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target with the path to the reproduced result and a one-line match note from the comparison report.
-5. **Open opportunities** — pull from `comparison-report.yaml`'s `opportunities:` block, plus anything in CLAUDE.md's **Rigor** section's *Open opportunities* list. One bullet each with the leverage assessment. This is what a future session (or a future-Cail revisiting) would tighten next.
+5. **Open opportunities** — pull from CLAUDE.md's *Open opportunities* list (already carries un-acted-on opportunities from the latest COMPARE), plus anything fresh in `comparison-report.yaml`'s `opportunities:` block not yet reflected there. One bullet each with the leverage assessment. This is what a future session (or a future-Cail revisiting) would tighten next.
 6. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut stricter than stated, etc.). The reproduction's value to the broader literature.
 7. **Resolved open questions** — pull from `open-questions.md`'s `## Resolutions` section. One bullet per question + its resolution.
 8. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the relevant `astra.yaml`, where CLAUDE.md lives so future Claude Code sessions auto-load it on walk-up).
 
 Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them.
 
-## Step 4: propagate opportunities into CLAUDE.md
+## Step 4: reconcile the Open opportunities list
 
-For each opportunity in `comparison-report.yaml`'s `opportunities:` block that the user did NOT act on (i.e. they accepted the current verdict and chose to land here), append it to CLAUDE.md's **Rigor** section's *Open opportunities* list. Format: `<area> — <what could be tightened> — <leverage>`. This is what future sessions and future loop re-launches walk up to; it's how the reproduction stays honest about what's at sketch / baseline / tightened / canonical rigor across its outputs.
-
-If the user acted on an opportunity (e.g. authorized one more IMPLEMENT round to close a gap), it doesn't go in the open list — but its closure is worth noting in *Current state* (e.g. *Figure 3: tightened* if the systematics treatment got a heavier pass).
+COMPARE iterations have been logging un-acted-on opportunities into CLAUDE.md's *Open opportunities* list as they run, so the list is already populated. REVIEW's job here is reconciliation: cross-check that every opportunity in `comparison-report.yaml`'s `opportunities:` block that the user did NOT act on is present in CLAUDE.md's list, and remove any that the user acted on at REVIEW (e.g. authorized one more IMPLEMENT round to close). Note any acted-on closures in the constitution's Rigor *Current state* (e.g. *Figure 3: tightened* if the systematics treatment got a heavier pass).
 
 ## Step 5: commit
 
@@ -96,7 +94,7 @@ This commit is the durable mark that the reproduction has reached close-out. Fut
 - `.lightcone/comparison.html` exists ⇒ `/figure-comparison` rendered
 - `open-questions.md` has a `## Resolutions` section covering every entry ⇒ open-questions walkthrough done
 - `REPRODUCTION-SUMMARY.md` exists ⇒ final report written
-- CLAUDE.md's **Rigor** section's *Open opportunities* list reflects the un-acted-on opportunities from the latest COMPARE ⇒ propagation done
+- CLAUDE.md's *Open opportunities* list reflects the un-acted-on opportunities from the latest COMPARE ⇒ reconciliation done
 - A `review:` commit lands ⇒ REVIEW done; reproduction complete
 
 ## Notes
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 0d118ae7..0a14109e 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -11,8 +11,8 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 ## Inputs
 
 - `astra.yaml` — the stub from ARCHITECT (sub-analyses, inputs, outputs, narrative; empty `decisions:` / `prior_insights:` / `findings:` blocks)
-- `constitution.md` — Goal (scope), Fidelity intent (used to size cheap-vs-heavy on this iteration's work), Quality bar
-- `CLAUDE.md` — Rules; Rigor *Current state* for the per-output trajectory tracking; **Paper-vs-code disagreements** for prior-iteration entries
+- `constitution.md` — Goal (scope), Fidelity intent, Quality bar, Rigor *Current state* for the per-output trajectory tracking
+- `CLAUDE.md` — Rules; **Paper-vs-code disagreements** for prior-iteration entries
 - `work/reference/index.json` — paper-extraction's structural index: figures, tables, section outline, citations. The `citations:` block maps each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. SPECIFY uses this to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch.
 - `work/reference/code-index.md` (when code present) — code inventory: module map, candidate decisions with file:line, entry-points, data dependencies, gotchas.
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text. Grep into for specific facts; read targeted spans by offset/limit when you need more context. Don't re-read whole.
@@ -26,8 +26,8 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 - `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
 - `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
 - `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
-- CLAUDE.md updates — append entries to **Paper-vs-code disagreements** for each material conflict surfaced; update **Rigor** *Current state* with the post-iteration state of `astra.yaml` per sub-analysis (e.g. *baseline* after a one-iteration write, *tightened* after a review-iteration applied fixes)
-- `work/notes/specify-review/<sub-analysis>-round-<N>.md` — each review iteration's findings (one file per review-iteration per sub-analysis; subsequent fix iterations apply them)
+- `constitution.md` updates — Rigor *Current state* per sub-analysis (e.g. *baseline* after the write iteration, *tightened* after the review-and-fix iteration)
+- `CLAUDE.md` updates — append entries to **Paper-vs-code disagreements** for each material conflict surfaced
 
 ## Substrate skills to invoke
 
@@ -119,90 +119,39 @@ Read the code that implements this sub-analysis (`work/reference/code-index.md`'
 
 3. **Decision-option augmentation.** Where the code reveals an option the paper didn't mention but is defensible (a sibling implementation alternative used in the codebase or referenced in a comment), add it as a sibling option to the relevant `decisions:` entry. Do not pre-emptively author every code variant; only the ones that bear on a real choice.
 
-### Review by iteration boundary
-
-After the paper + code passes land for a sub-analysis, the cross-check question is: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
-
-The iteration that wrote a sub-analysis's passes exits when its passes are done; the next iteration enters fresh, surveys, finds the sub-analysis's passes present but no `work/notes/specify-review/<sub>-round-N.md`, reads the slice of `astra.yaml` + the paper + the code, and writes review findings. The iteration after that applies the fixes. Two consecutive review-iterations with verdict `clean` per sub-analysis terminates that sub-analysis's review cycle. The fresh-context-no-bias property is automatic at iteration boundaries — review iteration N doesn't see review iteration N-2's fixes. The depth is sized from the gap between CLAUDE.md's Rigor *Current state* for the sub-analysis and `constitution.md`'s Fidelity intent: *cheap* — accept after one clean review-iteration; *heavy* — require two consecutive clean.
-
-#### Per-review-iteration prompt
-
-The reviewer reads the slice fresh and writes findings only — never edits `astra.yaml` directly; that's the next iteration's job.
-
-> You are a SPECIFY reviewer for one sub-analysis. Read the relevant slice of `astra.yaml`, the paper, and the code (when present), and report any inconsistencies you find. You will be one of several independent reviewers across iterations; do not assume anything has already been fixed.
->
-> ### Inputs
->
-> - `astra.yaml` — focus on `analyses.<sub-analysis-id>` (`decisions:`, `prior_insights:`, `findings:`, `narrative:`, `inputs:`, `outputs:`)
-> - `universes/baseline.yaml`
-> - `implementation-notes.md`
-> - `work/reference/index.json` — the decision clusters and result loci that scoped the work
-> - `work/reference/code-index.md` (when code present)
-> - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — paper text (Grep into; do not re-read whole)
-> - `work/reference/code/` (when present) — canonical reference for numerics + method (read targeted modules pointed at by code-index.md)
-> - `work/reference/index.json#citations` — cite-key → `{locations, citation, doi}` mapping from paper-extraction (use to confirm each `prior_insights:` placeholder's `doi:` matches what the paper cites)
->
-> ### What to check
->
-> 1. **Decision coverage.** Does this sub-analysis's `decisions:` block cover every choice in the paper-side index's decision clusters? Cosmetic / pure-tooling choices should NOT be decisions; anything material that's missing should be added.
-> 2. **Decision options.** Each decision has the option the paper selects (named in `default:`) plus any sibling alternatives the paper discusses or the code reveals. The decision-level `rationale:` is grounded in the paper's stated reasoning (or the code's, where canonical-resolution applied). Per the 0.0.10 grammar, options do not carry per-option `rationale:` or `evidence:`; cited support is back-referenced via `Option.insights` into a `prior_insights:` entry.
-> 3. **Evidence verification.** Every `findings:` Evidence entry uses `TextQuoteSelector` with a verbatim `exact:` quote, real surrounding-text `prefix:` / `suffix:`, and a `location: {page: N}` (1-indexed). Quotes that are paraphrased or whose `prefix:` / `suffix:` are editorial parentheticals will fail `--verify-evidence`. `prior_insights:` placeholders intentionally have `evidence: [{id, doi}]` without a `quote:` at this stage — LITERATURE authors the quotes — so do not flag a missing quote on placeholder entries. After LITERATURE resolves the placeholders, run `astra validate astra.yaml --verify-evidence`.
-> 4. **Findings traceability.** Each `findings:` Insight's `evidence:` resolves either to a real paper claim (target-paper DOI + verbatim `quote:` + page) or to a real declared output via `artifact: <output_id>` (with optional `source_commit:` and `snapshot:`).
-> 5. **Material-disagreement surfacing.** Where paper and code disagree on a material choice, the spec records both options under the relevant `decisions:` entry, `universes/baseline.yaml` selects the code's option (canonical-resolution default), and the conflict is appended to CLAUDE.md's *Paper-vs-code disagreements* section plus `open-questions.md` for the user to resolve at REVIEW close-out. Flag any material disagreement that got silently dropped, that didn't make it into the disagreements log, or where the baseline picked the paper without the canonical-resolution rule applying.
-> 6. **Narrative anchors.** The sub-analysis's `narrative:` weaves `astra-anchor:` references to the new `decisions:` / `prior_insights:` / `findings:` entries — the tree-path grammar must be valid, and entries actually exist at the referenced paths.
-> 7. **`narrative:` voice fidelity.** Hedges and qualifiers from the paper survive (per the narrative skill's discipline). Editorial commentary added beyond what the paper supports gets flagged.
-> 8. **No synthetic data.** Unless the paper itself uses synthetic data, every input has a real acquisition source — no mock / synthetic substitutes anywhere in the sub-analysis's inputs, decisions, or implementation-notes.
->
-> ### What NOT to do
->
-> - **Do not edit `astra.yaml`** or any other file. Your output is a findings file; the next iteration applies the fixes. Editing here defeats the fresh-context discipline that makes review work.
-> - **Do not flag missing `recipes:`.** Recipes are IMPLEMENT's, not SPECIFY's.
-> - **Do not re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/reference/index.json`.
-> - **Do not invent problems.** If the sub-analysis is consistent with paper + code, say so briefly.
-> - **Do not assume a prior reviewer has been here.** You are fresh. First-principles read only.
->
-> ### Output format — `work/notes/specify-review/<sub-analysis>-round-<N>.md`
->
-> ```markdown
-> # Specify-review round <N> — <sub-analysis-id>
->
-> Reviewer ran fresh against astra.yaml's <sub-analysis-id> slice, paper, and code.
->
-> ## Findings
->
-> ### <category — e.g. "Decision coverage" / "Evidence" / "Material disagreement">
->
-> - **<one-line finding>**
->   - **What's wrong**: <quote or location of the spec problem>
->   - **Where to fix**: <`astra.yaml#analyses.<sub-id>.path/to/key` or `implementation-notes.md`>
->   - **Suggested fix**: <one-line concrete change>
->   - **Source**: <paper §X.Y "quote" + index row, or code `path:line`>
->
-> ## Verdict
->
-> - **fixes_needed**: <count>
-> - **clean** | **needs-fixes**
-> ```
-
-#### Applying fixes (the iteration after the review-iteration)
-
-The iteration after each review-iteration reads `work/notes/specify-review/<sub-analysis>-round-<N>.md`, applies the fixes to `astra.yaml` for the sub-analysis (plus `universes/baseline.yaml` and `implementation-notes.md` per the suggested fixes), commits, and exits. After any change to `astra.yaml`:
+### Review-and-fix — the next iteration
+
+After the paper + code passes land for a sub-analysis, one fresh-context iteration reads the slice cold, reviews silently, applies any fixes inline, commits, and exits. One pass. No intermediate findings file, no second clean check.
+
+The cross-check question on entry: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
+
+#### What to check
+
+1. **Decision coverage.** Does this sub-analysis's `decisions:` block cover every choice in the paper-side index's decision clusters? Cosmetic / pure-tooling choices should NOT be decisions; anything material that's missing should be added.
+2. **Decision options.** Each decision has the option the paper selects (named in `default:`) plus any sibling alternatives the paper discusses or the code reveals. The decision-level `rationale:` is grounded in the paper's stated reasoning (or the code's, where canonical-resolution applied). Per the 0.0.10 grammar, options do not carry per-option `rationale:` or `evidence:`; cited support is back-referenced via `Option.insights` into a `prior_insights:` entry.
+3. **Evidence verification.** Every `findings:` Evidence entry uses `TextQuoteSelector` with a verbatim `exact:` quote, real surrounding-text `prefix:` / `suffix:`, and a `location: {page: N}` (1-indexed). Quotes that are paraphrased or whose `prefix:` / `suffix:` are editorial parentheticals will fail `--verify-evidence`. `prior_insights:` placeholders intentionally have `evidence: [{id, doi}]` without a `quote:` at this stage — LITERATURE authors the quotes — so do not flag a missing quote on placeholder entries. After LITERATURE resolves the placeholders, run `astra validate astra.yaml --verify-evidence`.
+4. **Findings traceability.** Each `findings:` Insight's `evidence:` resolves either to a real paper claim (target-paper DOI + verbatim `quote:` + page) or to a real declared output via `artifact: <output_id>` (with optional `source_commit:` and `snapshot:`).
+5. **Material-disagreement surfacing.** Where paper and code disagree on a material choice, the spec records both options under the relevant `decisions:` entry, `universes/baseline.yaml` selects the code's option (canonical-resolution default), and the conflict is appended to CLAUDE.md's *Paper-vs-code disagreements* section plus `open-questions.md` for the user to resolve at REVIEW close-out. Flag any material disagreement that got silently dropped, that didn't make it into the disagreements log, or where the baseline picked the paper without the canonical-resolution rule applying.
+6. **Narrative anchors.** The sub-analysis's `narrative:` weaves `astra-anchor:` references to the new `decisions:` / `prior_insights:` / `findings:` entries — the tree-path grammar must be valid, and entries actually exist at the referenced paths.
+7. **`narrative:` voice fidelity.** Hedges and qualifiers from the paper survive (per the narrative skill's discipline). Editorial commentary added beyond what the paper supports gets flagged.
+8. **No synthetic data.** Unless the paper itself uses synthetic data, every input has a real acquisition source — no mock / synthetic substitutes anywhere in the sub-analysis's inputs, decisions, or implementation-notes.
+
+Apply fixes inline as you find them — `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md`, the disagreements log in CLAUDE.md as needed. The diff against the prior commit is the record of what changed; no separate findings file. After any change to `astra.yaml`:
 
 ```bash
 astra validate astra.yaml
 astra validate astra.yaml --verify-evidence  # after LITERATURE has resolved the prior_insights placeholders
 ```
 
-#### Termination
+Commit (`specify: review-and-fix <sub-analysis-id>`), update the constitution's Rigor *Current state* for the sub-analysis (e.g. *baseline → tightened*), exit. The next iteration's survey moves on (next sub-analysis, or next phase if all sub-analyses are done).
+
+#### What NOT to do during review-and-fix
 
-- **Cheap:** one review-iteration per sub-analysis after the passes land. Done after the next iteration applies the fixes (or immediately, if `fixes_needed` was 0).
-- **Heavy:**
-  - If review-iteration N's `fixes_needed` was 0 AND review-iteration (N-1)'s was also 0 → done.
-  - If review-iteration N is the first review (N=1), the next review-iteration runs unconditionally so we can compare across two fresh passes.
-  - If review-iteration N produced fixes, the next iteration applies them, and the iteration after that runs the next review fresh.
-  - If 5 review-iterations have happened without two consecutive clean rounds, log the unfinished tail in `open-questions.md` ("SPECIFY review for <sub-analysis-id> reached round cap; user should review during REVIEW close-out") and let the next iteration advance to LITERATURE.
+- **Don't flag missing `recipes:`.** Recipes are IMPLEMENT's, not SPECIFY's.
+- **Don't re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/reference/index.json`.
+- **Don't open a second review pass on the same sub-analysis.** One pass is the protocol. If the sub-analysis needs more rigor than this delivers, log an Open opportunity in CLAUDE.md and let a future loop close it.
 
-When all sub-analyses' reviews terminate, SPECIFY produces the final outputs:
+When every sub-analysis has had its review-and-fix pass, SPECIFY produces the final outputs:
 
 ## Target-ledger output
 
@@ -233,8 +182,7 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative; empty decisions / prior_insights / findings) ⇒ ready to specify
 - For each sub-analysis: `decisions:` populated with decision-level `rationale:` + options (paper's choice at `default:`); `findings:` populated as full Insight blocks with paper-anchored Evidence (DOI + `quote: {exact, prefix, suffix}` + `location: {page}`); `prior_insights:` populated as citation placeholders (`id`, `claim`, `created_at`, `evidence: [{id, doi}]` with `quote:` omitted — LITERATURE fills the quotes next); `Option.insights` back-references wired up where options draw on placeholders ⇒ paper pass done
 - For each sub-analysis: when `work/reference/code/` exists, code-pass material-disagreement entries land in `decisions:` (with both options) and `universes/baseline.yaml` selects the canonical-resolution choice; `implementation-notes.md` carries non-material gotchas ⇒ code pass done
-- For cheap: each sub-analysis has at least a `work/notes/specify-review/<sub>-round-1.md` with verdict `clean` (or no fixes were incorporated) ⇒ SPECIFY review done
-- For heavy: each sub-analysis has two consecutive `<sub>-round-<N>.md` files with verdict `clean` ⇒ SPECIFY review done
+- For each sub-analysis: a `specify: review-and-fix <sub-analysis-id>` commit lands ⇒ that sub-analysis's review-and-fix pass is done
 - `astra validate astra.yaml` returns clean (placeholders whose Evidence carries `doi:` without `quote:` are valid at this stage) ⇒ structural side validated; `--verify-evidence` waits until LITERATURE has authored the `quote:` + `location:` selectors
 - `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-ledger done
 - `implementation-notes.md` exists ⇒ practical-guidance side done
diff --git a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
index 4e8fdf1d..66e73a59 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
@@ -2,33 +2,27 @@
 
 Reproduction of **<paper title>** (<arXiv ID>). DOI: <doi>. One-line subject: <e.g. "BAO scale measurement from DESI DR1">.
 
-The driving document for this reproduction is [`constitution.md`](constitution.md) — Goal, Fidelity intent, Scope, Quality bar, Evidence. Every ralph iteration reads it on entry. This file (`CLAUDE.md`) is the auto-loading walk-up: rules + running accumulators.
+The driving document for this reproduction is [`constitution.md`](constitution.md) — Goal, Fidelity intent, Scope, Quality bar, Evidence, and the running Rigor *Current state* per output. Every ralph iteration reads it on entry. This file (`CLAUDE.md`) is the auto-loading walk-up: rules + durable findings that stay useful past the reproduction (Open opportunities for future tightening, Paper-vs-code disagreements, pointers).
 
 ## Rules
 
 - **Code-as-canonical when `work/reference/code/` exists.** Every iteration that touches a sub-analysis reads the relevant code first. Where paper and code disagree, code is canonical for numerics, plotting, and method. When `work/reference/code/` is absent, paper is the only anchor — implement fresh from the spec, expect slower convergence, surface gaps honestly to the user rather than dressing them up.
 - **Never block on `AskUserQuestion` mid-iteration.** Each ralph iteration runs in a fresh detached session; the user isn't reachable interactively. Append questions to `open-questions.md` and continue with the best-judgment default. The user resolves accumulated questions at REVIEW close-out (which runs in the user's main session).
-- **arxiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arxiv only.
+- **arXiv-LaTeX-first acquisition.** PDF + Docling is a fallback for non-arXiv only.
 - **`astra validate --verify-evidence`** is the fidelity gate; evidence quotes must match source PDFs.
 - **No synthetic data.** Unless the paper itself uses synthetic data as input, every input dataset must be downloaded or queried from its real source.
 - **Commit as you go.** Small, descriptive commits per significant change. The git log is the chronological trail of the reproduction; the next iteration reads it to know what landed.
-- **Updates go in code, files, and the accumulators below — not progress notes scattered in the body.** Discoverable updates; the next iteration finds what changed by inspecting the system.
+- **Updates go in code, files, and the accumulators in `constitution.md` and below — not progress notes scattered in the body.** Discoverable updates; the next iteration finds what changed by inspecting the system.
 
-## Rigor — current state
-
-Per-output trajectory tracking, updated by iterations as they produce artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened / canonical*. Read alongside [`constitution.md`](constitution.md)'s Fidelity intent to decide how much to push on the next iteration. Empty until the first iteration produces something:
-
-- (none yet)
-
-### Open opportunities
+## Paper-vs-code disagreements
 
-Gaps that could be tightened if the reproduction comes back. Each carries a sense of leverage and where it sits relative to the constitution's Fidelity intent. Format: `<area> — <what could be tightened> — <leverage> — <above|at|below intent>`. Empty until a COMPARE iteration surfaces one:
+Material disagreements between paper and code, logged here as iterations find them. Code is canonical for numerics, plotting, and method (per the rule above); both options are preserved in `astra.yaml` as decision alternatives. Each entry summarizes the disagreement and points to the corresponding decision so any iteration can see them at a glance. Surfaced to the user at REVIEW close-out (or earlier if they're around).
 
 - (none yet)
 
-## Paper-vs-code disagreements
+## Open opportunities
 
-Material disagreements between paper and code, logged here as iterations find them. Code is canonical for numerics, plotting, and method (per the rule above); both options are preserved in `astra.yaml` as decision alternatives. Each entry summarizes the disagreement and points to the corresponding decision so any iteration can see them at a glance. Surfaced to the user at REVIEW close-out (or earlier if they're around).
+Gaps that could be tightened in a future pass, surfaced by COMPARE iterations and persisted past close-out. Each carries a sense of leverage. Format: `<area> — <what could be tightened> — <leverage>`. A future Claude Code session walking into this directory reads this list and knows where another loop would have the most return. Empty until a COMPARE iteration surfaces one:
 
 - (none yet)
 
diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
index 3607f81e..e5da61ef 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/constitution.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -4,13 +4,13 @@ status: active
 
 # <paper-slug> — reproduction constitution
 
-The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, DOI <doi>). Every iteration reads this on entry to know what "done" looks like and how to size its next move. **Sharpened slowly** — only when something fundamental shifts (target moves, scope opens or fences, a material disagreement makes us re-think a sub-analysis). Running accumulators (per-output rigor state, the disagreements log, opportunities) live in `CLAUDE.md`, not here.
+The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, DOI <doi>). Every iteration reads this on entry to know what "done" looks like and where each output currently sits. The top half (Goal, Scope, Quality bar, Evidence) **sharpens slowly** — only when something fundamental shifts (target moves, scope opens or fences, a material disagreement makes us re-think a sub-analysis). The bottom half (Rigor *Current state*, Open dimensions) is updated each iteration. Durable findings that stay useful past the reproduction — paper-vs-code disagreements, open opportunities for future tightening, pointers to substrate — live in `CLAUDE.md`.
 
 ## Goal
 
 <What "done" looks like for this reproduction. Concrete: which targets, what verdict against them, what validation passes. E.g.: "A complete `astra.yaml` with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.">
 
-**Fidelity intent.** <The user's prose answer from INTERVIEW to "when is this good enough" — captured verbatim or in close paraphrase. E.g. "just checking if the analysis is tractable — quick sanity on a headline number", "Figure 3 must be right; the rest can stay rough", "full fidelity on the BAO fit, baseline elsewhere", "every primary and secondary target lining up within stated tolerance". Each iteration reads this when deciding cheap vs heavy next moves; COMPARE grades opportunities against it. Static once approved at INTERVIEW; the user can sharpen at any REVIEW.>
+**Fidelity intent.** <The user's prose answer from INTERVIEW to "what do you want out of this stretch, given what you have to spend on it" — captured verbatim or in close paraphrase. Carries both the aesthetic dimension (what "good enough" looks like) and the pragmatic dimension (compute, tokens, wall-clock budget). E.g.: "just checking if the analysis is tractable — an afternoon of compute", "Figure 3 must be right; the rest can stay rough — overnight", "full fidelity on the BAO fit, baseline elsewhere — a few days", "every primary and secondary target lining up within stated tolerance, no hard deadline". Each iteration reads this when sizing its next move; COMPARE grades opportunities against it. Static once approved at INTERVIEW; the user can sharpen at any REVIEW.>
 
 ## Scope
 
@@ -26,7 +26,7 @@ What "canonical" rigor looks like for *this* paper. The bar that primary-target
 - <e.g. "magnitude cuts and selection match the code's defaults exactly; any deviation is recorded as a paper-vs-code disagreement with both options preserved">
 - <e.g. "every prior insight cites a real verbatim quote from the cited paper">
 
-This is the ceiling; the fidelity intent determines which outputs need to actually reach it. CLAUDE.md's *Rigor — current state* table tracks where each output currently sits relative to this bar.
+This is the ceiling; the fidelity intent determines which outputs need to actually reach it. The *Rigor — current state* table below tracks where each output currently sits relative to this bar.
 
 ## Evidence
 
@@ -38,6 +38,12 @@ The substrate this reproduction is built against — the canonical sources itera
 - **arXiv ID:** <id> (if applicable)
 - **Code repo URL:** <url>
 
+## Rigor — current state
+
+Per-output trajectory tracking, updated by iterations as they produce artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened*. Read alongside Fidelity intent above so each iteration knows where each output currently sits. Empty until the first iteration produces something:
+
+- (none yet)
+
 ## Open dimensions
 
 Decisions worth surfacing to the user — places the reproduction could go differently and the call benefits from human ratification. Iterations append here when something material comes up that isn't itself a paper-vs-code disagreement (those go to `CLAUDE.md`'s disagreements log instead). The user resolves these at REVIEW close-out, or earlier if they're around.
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index 9bae81eb..1d07499e 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -62,19 +62,19 @@ user's main session; phases 2–7 run as ralph iterations.
 INTERVIEW drafts two files in the reproduction workdir; every
 iteration picks them up on launch.
 
-- **`constitution.md`** — the ralph loop's driving document. YAML
-  frontmatter declares `status: active`. Sections: Goal (carrying the
-  **fidelity intent** — the user's own "when is this good enough"),
-  Scope (in/out), Quality bar, Evidence (paper DOI, arXiv ID, code
-  repo URL), Open dimensions. Sharpens slowly, only when something
-  fundamental shifts.
-- **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the
-  top; Rules (code-as-canonical, no blocking on `AskUserQuestion`
-  mid-iteration, arXiv-LaTeX-first, `astra validate
-  --verify-evidence` as the fidelity gate); Rigor accumulator
-  (*Current state* per output plus *Open opportunities*, updated each
-  iteration); Disagreements log (running, also updated each
-  iteration); Pointers.
+- **`constitution.md`** — the ralph loop's driving document, *task-bound*.
+  YAML frontmatter declares `status: active`. Top half (sharpens slowly):
+  Goal (carrying the **fidelity intent** — the user's own "what do you
+  want out of this stretch, given what you have to spend on it"), Scope
+  (in/out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL).
+  Bottom half (updates each iteration): Rigor *Current state* per
+  output, Open dimensions. Archivable once the reproduction closes.
+- **`CLAUDE.md`** — the auto-loading walk-up, *durable*. Paper identity
+  at the top; Rules (code-as-canonical, no blocking on `AskUserQuestion`
+  mid-iteration, arXiv-LaTeX-first, `astra validate --verify-evidence`
+  as the fidelity gate); Disagreements log (running); Open opportunities
+  (gaps that future work could tighten); Pointers. Stays useful for any
+  follow-on work in this directory.
 
 Pointers, not snapshots.
 
@@ -83,11 +83,12 @@ Pointers, not snapshots.
 - **Workdir is the state.** File existence, `git log`, and `astra
   validate` answer "what phase am I on" deterministically — no
   separate state machine.
-- **CLAUDE.md is a state-expresser too.** Beyond the workdir's
-  ground-truth files, CLAUDE.md carries running pointers — Rigor
-  accumulator, Disagreements log, paper identity. Each iteration
-  keeps those pointers current so the next cold survey reads them
-  as fact.
+- **Constitution is task-bound; CLAUDE.md is durable.** The constitution
+  carries what *this reproduction* is trying to achieve and how it's
+  progressing — archivable once the reproduction closes. CLAUDE.md carries
+  what stays useful past the reproduction: paper identity, rules,
+  paper-vs-code disagreements, pointers to substrate. Keep both current
+  so the next cold survey reads them as fact.
 - **Code-as-canonical, with disagreements recorded.** Where paper
   and code disagree on something material, code wins for numerics,
   but the disagreement is preserved as a decision option and noted
@@ -95,10 +96,13 @@ Pointers, not snapshots.
 - **Rigor is a trajectory toward the user's intent.** Fidelity
   intent is partly aesthetic ("how good does this need to be?") and
   partly pragmatic ("what's feasible given the compute, tokens, and
-  time available?"). The honest meta-conversation lives in INTERVIEW;
-  each iteration then sizes its work from the gap between *Current
-  state* and that intent. Review happens sequentially via iteration
-  boundaries.
+  wall-clock available?"). The honest meta-conversation lives in
+  INTERVIEW; each iteration then sizes its work from the gap between
+  the constitution's Rigor *Current state* and that intent. The
+  per-artifact protocol is simple: iteration N writes, iteration N+1
+  reads cold and does one review-and-fix pass. Outputs that need
+  more rigor than that delivers become Open opportunities in
+  CLAUDE.md for a future loop.
 - **arXiv LaTeX first.** PDF + Docling is the non-arXiv fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,
   every input must be real.

From 2d007102baeb075f6bb4c1c968e83e466821ebd1 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 21:06:26 +0200
Subject: [PATCH 103/124] =?UTF-8?q?lc-from-paper:=20correct=20review=20ter?=
 =?UTF-8?q?mination=20=E2=80=94=20clean=20pass=20after=20last=20fix?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cail's clarification: the simplification I made on the prior commit
overshot. Dropping "N+1 vs N+2" meant dropping the two-consecutive-
clean requirement, not dropping the second review pass entirely.

The actual protocol:
- Iteration N writes the artifact (Rigor: baseline).
- The next fresh-context iteration reads cold, reviews, applies fixes
  inline. If fixes landed: Rigor → tightened. If nothing needed
  fixing: Rigor → canonical (and the review cycle terminates).
- The iteration that applied fixes cannot declare the artifact done.
  A subsequent fresh-context iteration earns canonical by reading the
  work and finding nothing to fix.

This restores *canonical* to the rigor vocabulary as the meaningful
terminal state: sketch → baseline → tightened → canonical. Termination
is one clean review pass after the last fix (not two consecutive
clean passes; not the same iteration that landed the fixes).

Updates land in SKILL.md (Self-review bullet + Rigor section), the
constitution template (vocabulary defined), all four per-phase
reference files (architect, specify, implement, literature), and the
user-facing doc. Survey signals across phases now read Rigor *Current
state* directly: *baseline*/*tightened* → review-and-fix iteration;
*canonical* → phase done, advance.
---
 .../lightcone/skills/lc-from-paper/SKILL.md   |  4 +--
 .../lc-from-paper/references/architect.md     | 23 +++++++++-------
 .../lc-from-paper/references/implement.md     | 27 ++++++++++---------
 .../lc-from-paper/references/literature.md    | 14 ++++------
 .../lc-from-paper/references/specify.md       | 16 +++++------
 .../lc-from-paper/templates/constitution.md   |  2 +-
 docs/skills/lc-from-paper.md                  | 10 ++++---
 7 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 053af1bb..c87d971e 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -97,7 +97,7 @@ Iterations follow the `/ralph` skill's Loop protocol — Survey → Work → Upd
 - **Survey starts with the constitution + CLAUDE.md, then the workdir.** Read the constitution end-to-end — Goal, Fidelity intent, Scope, Quality bar, and the Rigor *Current state* table for where each output currently sits relative to the quality bar. Skim CLAUDE.md for rules, paper-vs-code disagreements, and pointers. Then survey the workdir against the **Workdir-as-state** table below to identify the next phase that needs work.
 - **One phase per iteration is the typical shape.** Don't try to do ARCHITECT *and* SPECIFY in one iteration; the fresh-context property of the next iteration is what makes review work, and conflating phases collapses the seam. (Exceptions: small targeted fixes after COMPARE may touch multiple phases in one iteration if they're tightly coupled.)
 - **Phase reference is your working spec for the iteration.** Whichever phase is next, read its `references/<phase>.md` on entry. That file carries the discipline for that phase's work (what to produce, code-as-canonical, rigor adjustment, etc.).
-- **Self-review is the next iteration.** Where ARCHITECT/SPECIFY/LITERATURE/IMPLEMENT used to spawn fresh-context reviewer sub-agents per round (broken — sub-agents can't spawn sub-agents), the discipline now collapses into a single review-and-fix iteration: iteration N writes the artifact, iteration N+1 reads it cold, reviews silently, applies any fixes inline, commits, and exits. One pass. No intermediate review file, no second clean check. If the user wants more rigor than a single review-and-fix pass delivers, that becomes a future loop (logged as an Open opportunity in CLAUDE.md).
+- **Self-review is the next iteration.** Where ARCHITECT/SPECIFY/LITERATURE/IMPLEMENT used to spawn fresh-context reviewer sub-agents per round (broken — sub-agents can't spawn sub-agents), the discipline now lives on iteration boundaries. Each iteration that touches an artifact reads it cold, reviews silently, applies any fixes inline, updates the constitution's Rigor *Current state* (*baseline* → *tightened* if changes landed, → *canonical* if nothing needed fixing), commits, and exits. **The iteration that makes a change cannot declare the artifact done** — a subsequent fresh-context iteration has to read it and find nothing to fix for the artifact to reach *canonical*. No intermediate review files; the diff against the prior commit is the record. The cycle terminates when the artifact hits *canonical*; if 5 review iterations pass without reaching *canonical*, log the unfinished tail to `open-questions.md` and let the survey advance the phase.
 - **Parallel fan-out lives inside an iteration.** LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis work, IMPLEMENT per-output work — these fan out as one-level-deep `Agent(...)` spawns inside the iteration's main session. Sub-agents can't spawn sub-agents, but an iteration *is* the main session, so it can spawn freely.
 - **`AskUserQuestion` is not available inside an iteration.** Each iteration runs in a detached tmux session; the user isn't reachable interactively. Iterations append questions to `open-questions.md` with their best-judgment default applied, and the user resolves them at REVIEW close-out (back in their main session).
 - **Update the accumulators** before exit: in `constitution.md`, the Rigor *Current state* per output that the iteration changed; in `CLAUDE.md`, the Paper-vs-code disagreements log for any material conflict the iteration surfaced and Open opportunities for any COMPARE-surfaced gap the iteration didn't act on.
@@ -138,7 +138,7 @@ REVIEW runs in your main session because `/figure-comparison` and `/check-senten
 
 **Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose. Intent is partly aesthetic ("how good does this need to be?") and partly pragmatic ("what's feasible given the compute, tokens, and wall-clock available?"). Both dimensions belong in the prose — *"just checking the analysis is tractable — an afternoon"*, *"Figure 3 must be right; the rest can stay rough — overnight"*, *"every primary and secondary target lining up within stated tolerance, a few days"*.
 
-The mechanism is simple: each artifact gets one fresh-context review-and-fix pass. Iteration N writes, iteration N+1 reads N's work cold, reviews silently, applies any fixes inline, updates the constitution's Rigor *Current state* (*sketch / baseline / tightened*), commits, exits. Outputs that the intent wants pushed further than a single review-and-fix pass get an Open opportunity entry in CLAUDE.md; a later loop relaunch closes them. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
+The Rigor *Current state* vocabulary (*sketch / baseline / tightened / canonical*) tracks the trajectory per output. Each fresh-context iteration that reads the artifact and applies fixes moves it from *baseline* to *tightened*; the first subsequent fresh-context iteration that reads it and finds nothing to fix moves it to *canonical*. An artifact reaching *canonical* terminates its review cycle. If the intent wants outputs pushed even further than *canonical*, that's a future loop, logged as an Open opportunity in CLAUDE.md. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
 
 **arXiv-LaTeX-first acquisition.** When the paper is on arXiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arXiv only.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 924e3fb2..0ed76a2b 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -82,13 +82,18 @@ analyses:
 
 After the stub is written and validates, commit it (`architect: stub astra.yaml`) and update the constitution's Rigor *Current state* with the stub's state (e.g. *stub: baseline*).
 
-## Review-and-fix — the next iteration
+## Review-and-fix — the iterations after the write
 
-There is no in-iteration review-round mechanism. The ralph loop's iteration boundary *is* the fresh-context review, and it's a single pass: iteration N writes the stub; iteration N+1 reads it cold, reviews silently, applies any fixes inline, commits, and exits. Done. No intermediate findings file, no separate fix iteration, no second clean check. If the user wants more rigor than one review-and-fix pass delivers, that's a future loop (logged as an Open opportunity in CLAUDE.md).
+There is no in-iteration review-round mechanism. The ralph loop's iteration boundary *is* the fresh-context review: iteration N writes the stub at *baseline*; the next iteration reads it cold, reviews silently, applies any fixes inline, updates the constitution's Rigor *Current state* (*baseline → tightened* if fixes landed, *baseline → canonical* if nothing needed fixing), commits, and exits. **The iteration that makes changes cannot declare the stub done** — a subsequent fresh-context iteration has to read it and find nothing to fix for it to reach *canonical*.
 
-### When entering as the review-and-fix iteration
+The cycle terminates when the stub hits *canonical*. Typical shapes:
+- Write iteration writes a clean stub → next iteration finds nothing → stub: *canonical* in two iterations.
+- Write iteration leaves small issues → next iteration fixes them (→ *tightened*) → iteration after that finds nothing (→ *canonical*) in three.
+- Bigger gaps may take more, but cap at 5 review iterations: if *canonical* isn't reached by then, log the tail in `open-questions.md` ("ARCHITECT review cap reached; user to review at close-out") and let the survey advance to SPECIFY.
 
-The signal is "stub `astra.yaml` exists at project root, `decisions:` / `prior_insights:` / `findings:` blocks empty, no `architect: review-and-fix` commit yet in the log." Read the stub cold, then check:
+### When entering as a review-and-fix iteration
+
+The signal is "stub `astra.yaml` exists, Rigor *Current state* says *stub: baseline* or *stub: tightened*." Read the stub cold, then check:
 
 1. **Sub-analysis decomposition.** Right cuts? Consistent with `code-index.md`? Defensible against the paper where the paper compresses?
 2. **Sub-analysis IDs.** Noun phrases. No reserved-name collisions (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
@@ -97,20 +102,20 @@ The signal is "stub `astra.yaml` exists at project root, `decisions:` / `prior_i
 5. **Narrative coverage.** Root narrative includes a data-flow paragraph (when sub-analyses exist). Each sub-analysis's narrative accurately describes its role. No `astra-anchor:` references at this stage; flag any that snuck in.
 6. **Validates.** `astra validate astra.yaml` returns clean.
 
-Apply fixes inline as you find them. Don't write a separate findings file — the diff against the prior commit is the record of what changed. Commit (`architect: review-and-fix stub`), update the constitution's Rigor *Current state* (e.g. *stub: tightened*), exit. The next iteration's survey moves on to SPECIFY.
+Apply fixes inline as you find them. Don't write a separate findings file — the diff against the prior commit is the record of what changed. If fixes landed, commit (`architect: review-and-fix stub`) and update the constitution to *stub: tightened*. If nothing needed fixing, commit (`architect: review confirmed clean`, possibly empty) and update the constitution to *stub: canonical*. Exit.
 
 ### What NOT to do during review-and-fix
 
 - Don't flag empty `decisions:` / `prior_insights:` / `findings:`. That's SPECIFY's territory.
 - Don't re-read the entire paper or code. Use the indices and targeted reads.
-- Don't open a second review pass on the same stub. One pass is the protocol; further tightening waits for a future loop.
+- Don't declare the stub *canonical* in the same iteration where you applied fixes — the next fresh-context iteration earns that judgment.
 
 ## Survey signals (entry into ARCHITECT)
 
 - `work/reference/index.json` + `work/reference/astra.yaml` + `work/reference/code-index.md` (when code present) exist ⇒ ACQUIRE substrate is ready
-- `astra.yaml` at project root absent (or present-but-empty) ⇒ this iteration writes the stub
-- `astra.yaml` exists, validates clean, sub-analyses + inputs + outputs + narrative populated, `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty, no `architect: review-and-fix` commit ⇒ this iteration is the review-and-fix
-- `architect: review-and-fix` commit landed ⇒ ARCHITECT done; next iteration surveys for SPECIFY
+- `astra.yaml` at project root absent (or present-but-empty) ⇒ this iteration writes the stub (records *stub: baseline*)
+- `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty), Rigor *Current state* shows *stub: baseline* or *stub: tightened* ⇒ this iteration is review-and-fix
+- Rigor *Current state* shows *stub: canonical* ⇒ ARCHITECT done; next iteration surveys for SPECIFY
 
 ## Notes
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index dfaf57f4..b242c0c6 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -1,6 +1,6 @@
-# IMPLEMENT — write scripts and recipes; one review-and-fix pass
+# IMPLEMENT — write scripts and recipes; review by iteration boundary
 
-Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, one fresh-context iteration reads it cold, reviews against paper + code, applies fixes inline, and exits. One pass — same shape ARCHITECT and SPECIFY use.
+Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands at *baseline*, each subsequent fresh-context iteration reads it cold, reviews against paper + code, applies fixes inline if any, updates Rigor *Current state* (*baseline → tightened* or → *canonical*), and exits. The cycle terminates when the implementation reaches *canonical* (a fresh-context iteration read it and found nothing to fix). Same shape ARCHITECT and SPECIFY use.
 
 IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done + scripts/ absent (first pass) or comparison-report.yaml shows partial/fail (retry pass)". Most implementation is mechanical (translate spec → script). Where parallelization is feasible (multiple independent outputs from different scripts), the iteration fans out to one-level-deep sub-agents per output (inside its own main session) and merges.
 
@@ -19,7 +19,7 @@ IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done
 - `scripts/<output>.py` (or `.sh`, or whatever fits) — one script per output (or shared scripts for tightly-coupled outputs)
 - `requirements.txt` — Python dependencies
 - Recipes in `astra.yaml` — each output gets a `recipe:` block with `command:` and `inputs:`
-- `constitution.md` updates — Rigor *Current state* per output (e.g. *baseline* after the write iteration, *tightened* after the review-and-fix iteration)
+- `constitution.md` updates — Rigor *Current state* per output (*baseline* after the write iteration, *tightened* after a review-and-fix iteration that landed changes, *canonical* after a fresh-context iteration found nothing to fix)
 - `CLAUDE.md` updates — append to **Paper-vs-code disagreements** for any new conflict surfaced during implementation
 
 ## Step 1: write recipes + scripts
@@ -60,9 +60,9 @@ The iteration merges scripts and recipes after the per-output sub-agents finish.
 5. **Do not execute scripts** — the RUN phase handles execution via `lc run`.
 6. **Validate** with `astra validate astra.yaml` after adding recipes.
 
-## Step 2: review-and-fix — the next iteration
+## Step 2: review-and-fix — iterations after the write
 
-After the first-pass implementation lands, one fresh-context iteration reads it cold, reviews silently against paper + code, applies any fixes inline, commits, exits.
+After the first-pass implementation lands at *baseline*, the next fresh-context iteration reads it cold, reviews silently against paper + code, applies any fixes inline if needed, updates Rigor *Current state* (*baseline → tightened* if fixes landed, → *canonical* if nothing needed fixing), and exits. **The iteration that applied fixes cannot declare the implementation done** — a subsequent fresh-context iteration earns *canonical* by reading the implementation and finding nothing to fix. Cap at 5 review iterations: if *canonical* isn't reached by then, log the tail to `open-questions.md` and let the survey advance to RUN.
 
 The cross-check question on entry: is the implementation consistent with the paper and the code?
 
@@ -75,14 +75,14 @@ The cross-check question on entry: is the implementation consistent with the pap
 5. **Determinism.** Scripts set random seeds where the paper's method is stochastic. Library versions in `requirements.txt` are pinned where reproducibility requires it.
 6. **Recipe wiring.** Recipe `inputs:` references match the data-flow the scripts actually consume; no orphan dependencies, no missing dependencies.
 
-Apply fixes inline as you find them — `scripts/`, `astra.yaml` recipes, `requirements.txt`, `implementation-notes.md`, the disagreements log in CLAUDE.md when a new material conflict surfaces. After any change to `astra.yaml`, run `astra validate astra.yaml`. Commit (`implement: review-and-fix`), update the constitution's Rigor *Current state* per output (e.g. *baseline → tightened*), exit. The next iteration's survey advances to RUN.
+Apply fixes inline as you find them — `scripts/`, `astra.yaml` recipes, `requirements.txt`, `implementation-notes.md`, the disagreements log in CLAUDE.md when a new material conflict surfaces. After any change to `astra.yaml`, run `astra validate astra.yaml`. If fixes landed: commit (`implement: review-and-fix`), update Rigor to *tightened*. If nothing needed fixing: commit (`implement: review confirmed clean`, possibly empty), update Rigor to *canonical*. Exit.
 
 ### What NOT to do during review-and-fix
 
 - **Don't re-read the entire paper.** Grep into `work/reference/index.json`, `work/reference/code-index.md`, and `work/reference/source/` (or `document.md`) for specific items.
-- **Don't open a second review pass.** One pass is the protocol. If the implementation needs more rigor than this delivers, log an Open opportunity in CLAUDE.md and let a future loop close it.
+- **Don't declare the implementation *canonical* in the same iteration where you applied fixes.** That's the next fresh-context iteration's call.
 
-The post-RUN COMPARE → IMPLEMENT retry loop is separate from this review-and-fix pass — that loop handles result-matching after the pipeline executes, not spec/implementation alignment before it.
+The post-RUN COMPARE → IMPLEMENT retry loop is separate from this review cycle — that loop handles result-matching after the pipeline executes, not spec/implementation alignment before it.
 
 ## Data: REAL DATA ONLY
 
@@ -96,13 +96,14 @@ If a dataset is behind a paywall, requires registration, or is "available upon r
 
 If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, a subsequent iteration may take on a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. Default attempt budget is 5; the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, accept partial, log the failure as an Open opportunity in CLAUDE.md (so REVIEW close-out can decide whether to push further or accept the trajectory), and exit; subsequent iterations either accept the verdict via a cold close or pivot scope based on REVIEW's input.
 
-A retry attempt re-runs the IMPLEMENT review-and-fix pass on the changed scripts before the next iteration advances to RUN.
+A retry attempt restarts the IMPLEMENT review cycle on the changed scripts (back to *baseline*) before the next iteration advances to RUN.
 
 ## Survey signals (entry into IMPLEMENT)
 
 - `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement first pass
-- `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done
-- An `implement: review-and-fix` commit lands ⇒ IMPLEMENT done; the next iteration surveys and advances to RUN
+- `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done (Rigor: *baseline*)
+- Rigor *Current state* shows *baseline* or *tightened* for the implementation ⇒ this iteration is review-and-fix
+- Rigor *Current state* reaches *canonical* ⇒ IMPLEMENT done; the next iteration surveys and advances to RUN
 - `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; the constitution can close after a cold survey, and REVIEW close-out runs in the user's main session
 
 ## Notes
@@ -110,5 +111,5 @@ A retry attempt re-runs the IMPLEMENT review-and-fix pass on the changed scripts
 - **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
 - **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
 - **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
-- **One review-and-fix pass per artifact.** The fresh-context property is automatic at iteration boundaries. Re-opening a second pass on the same artifact in the same loop is anti-pattern — log an Open opportunity and let a future loop close it.
-- **Commit as you go.** One commit per script + recipe wiring; one commit for the review-and-fix pass. The next iteration reads `git log` to track progress.
+- **The iteration that fixed the artifact can't declare it canonical.** Termination requires a subsequent fresh-context iteration to read the work and find nothing to fix. This is what the fresh-context-no-bias property buys you; conflating the fix-iteration with the canonical-judgment defeats it.
+- **Commit as you go.** One commit per script + recipe wiring; one commit per review-and-fix pass; one commit per confirmed-clean review. The next iteration reads `git log` and Rigor *Current state* to track progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index f0372c1e..e1915c19 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -164,9 +164,9 @@ Rules:
 
 When the iteration fans out to Haikus, each Haiku is spawned with `model="haiku"` and gets this contract plus its assigned subset of placeholders and substrate paths.
 
-## Review-and-fix — the next iteration
+## Review-and-fix — iterations after the merge
 
-After the merge lands, one fresh-context iteration reads cold, runs `astra validate --verify-evidence` for the deterministic check, does a semantic re-read of each resolved insight, applies fixes inline, commits, exits.
+After the merge lands (Rigor: *baseline*), the next fresh-context iteration reads cold, runs `astra validate --verify-evidence` for the deterministic check, does a semantic re-read of each resolved insight, applies fixes inline if needed, updates Rigor *Current state* (*baseline → tightened* if fixes landed, → *canonical* if nothing needed fixing), and exits. **The iteration that applied fixes cannot declare LITERATURE done** — a subsequent fresh-context iteration earns *canonical* by reading the resolutions and finding nothing to fix. Cap at 5 review iterations: if *canonical* isn't reached by then, log the tail to `open-questions.md` and let the survey advance.
 
 The cross-check questions on entry:
 
@@ -176,22 +176,18 @@ The cross-check questions on entry:
 4. **Cited paper is the right paper.** Does the target paper actually invoke this DOI for this claim?
 5. **Unresolved entries are honest.** For entries in `open-questions.md` flagged unresolved, does a closer read of the cited paper find supporting evidence the resolver missed?
 
-Apply fixes inline as you find them — `astra.yaml`'s `prior_insights:` entries (including re-running Haiku quote-finding for entries that need a different quote, when the gap is mechanical rather than semantic). Commit (`literature: review-and-fix`), update the constitution's Rigor *Current state* (e.g. *baseline → tightened*), exit. The next iteration's survey advances to IMPLEMENT.
+Apply fixes inline as you find them — `astra.yaml`'s `prior_insights:` entries (including re-running Haiku quote-finding for entries that need a different quote, when the gap is mechanical rather than semantic). If fixes landed: commit (`literature: review-and-fix`), update Rigor to *tightened*. If nothing needed fixing: commit (`literature: review confirmed clean`, possibly empty), update Rigor to *canonical*. Exit.
 
 If the entry genuinely has no supporting quote in the cited paper, log it to `open-questions.md` with a "no support found" note and leave the entry as-is for the user to resolve at REVIEW. Don't fabricate evidence.
 
-One pass. If LITERATURE needs more rigor than this delivers, that's an Open opportunity for a future loop.
-
 ## Survey signals (entry into LITERATURE)
 
 - `astra.yaml` has `prior_insights:` placeholders — entries with `claim:` plus Evidence carrying `doi:` but no `quote:` selector ⇒ ready to resolve
 - `work/cited/<doi-slug>/work/reference/index.json` exists for each unique cited DOI ⇒ fetches done
 - `work/notes/literature/resolutions.yaml` exists with non-empty resolutions / unresolved sections ⇒ quote-finding done
-- `astra.yaml`'s `prior_insights:` entries each have a resolved `quote:` (+ `location:`) selector on their Evidence ⇒ merge done
+- `astra.yaml`'s `prior_insights:` entries each have a resolved `quote:` (+ `location:`) selector on their Evidence ⇒ merge done (Rigor: *baseline*)
 - `astra validate astra.yaml --verify-evidence` returns clean ⇒ structural validation done
-- A `literature: review-and-fix` commit lands ⇒ LITERATURE review-and-fix done
-
-When all of the above hold ⇒ LITERATURE complete; the next iteration surveys and advances to IMPLEMENT.
+- Rigor *Current state* reaches *canonical* for LITERATURE ⇒ LITERATURE done; the next iteration surveys and advances to IMPLEMENT
 
 ## Notes
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 0a14109e..8228f9ba 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -26,7 +26,7 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 - `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
 - `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
 - `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
-- `constitution.md` updates — Rigor *Current state* per sub-analysis (e.g. *baseline* after the write iteration, *tightened* after the review-and-fix iteration)
+- `constitution.md` updates — Rigor *Current state* per sub-analysis (*baseline* after the write iteration, *tightened* after a review-and-fix iteration that landed changes, *canonical* after a fresh-context iteration found nothing to fix)
 - `CLAUDE.md` updates — append entries to **Paper-vs-code disagreements** for each material conflict surfaced
 
 ## Substrate skills to invoke
@@ -119,9 +119,9 @@ Read the code that implements this sub-analysis (`work/reference/code-index.md`'
 
 3. **Decision-option augmentation.** Where the code reveals an option the paper didn't mention but is defensible (a sibling implementation alternative used in the codebase or referenced in a comment), add it as a sibling option to the relevant `decisions:` entry. Do not pre-emptively author every code variant; only the ones that bear on a real choice.
 
-### Review-and-fix — the next iteration
+### Review-and-fix — the iterations after the passes
 
-After the paper + code passes land for a sub-analysis, one fresh-context iteration reads the slice cold, reviews silently, applies any fixes inline, commits, and exits. One pass. No intermediate findings file, no second clean check.
+After the paper + code passes land for a sub-analysis (Rigor: *baseline*), the next fresh-context iteration reads the slice cold, reviews silently, applies any fixes inline, updates the Rigor *Current state* (*baseline → tightened* if fixes landed, *baseline → canonical* if nothing needed fixing), commits, and exits. **The iteration that makes changes cannot declare the sub-analysis done** — a subsequent fresh-context iteration earns *canonical* by reading the work and finding nothing to fix. Cap at 5 review iterations per sub-analysis: if *canonical* isn't reached by then, log the tail to `open-questions.md` and let the survey advance.
 
 The cross-check question on entry: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
 
@@ -136,22 +136,22 @@ The cross-check question on entry: are the decisions covering everything materia
 7. **`narrative:` voice fidelity.** Hedges and qualifiers from the paper survive (per the narrative skill's discipline). Editorial commentary added beyond what the paper supports gets flagged.
 8. **No synthetic data.** Unless the paper itself uses synthetic data, every input has a real acquisition source — no mock / synthetic substitutes anywhere in the sub-analysis's inputs, decisions, or implementation-notes.
 
-Apply fixes inline as you find them — `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md`, the disagreements log in CLAUDE.md as needed. The diff against the prior commit is the record of what changed; no separate findings file. After any change to `astra.yaml`:
+Apply fixes inline as you find them — `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md`, the disagreements log in CLAUDE.md as needed. The diff against the prior commit is the record of what changed. After any change to `astra.yaml`:
 
 ```bash
 astra validate astra.yaml
 astra validate astra.yaml --verify-evidence  # after LITERATURE has resolved the prior_insights placeholders
 ```
 
-Commit (`specify: review-and-fix <sub-analysis-id>`), update the constitution's Rigor *Current state* for the sub-analysis (e.g. *baseline → tightened*), exit. The next iteration's survey moves on (next sub-analysis, or next phase if all sub-analyses are done).
+If fixes landed: commit (`specify: review-and-fix <sub-analysis-id>`), update Rigor to *tightened*. If nothing needed fixing: commit (`specify: review confirmed clean <sub-analysis-id>`, possibly empty), update Rigor to *canonical*. Exit. The next iteration's survey checks each sub-analysis's Rigor state to decide what's next.
 
 #### What NOT to do during review-and-fix
 
 - **Don't flag missing `recipes:`.** Recipes are IMPLEMENT's, not SPECIFY's.
 - **Don't re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/reference/index.json`.
-- **Don't open a second review pass on the same sub-analysis.** One pass is the protocol. If the sub-analysis needs more rigor than this delivers, log an Open opportunity in CLAUDE.md and let a future loop close it.
+- **Don't declare a sub-analysis *canonical* in the same iteration where you applied fixes.** That's the next fresh-context iteration's call.
 
-When every sub-analysis has had its review-and-fix pass, SPECIFY produces the final outputs:
+When every sub-analysis reaches *canonical*, SPECIFY produces the final outputs:
 
 ## Target-ledger output
 
@@ -182,7 +182,7 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative; empty decisions / prior_insights / findings) ⇒ ready to specify
 - For each sub-analysis: `decisions:` populated with decision-level `rationale:` + options (paper's choice at `default:`); `findings:` populated as full Insight blocks with paper-anchored Evidence (DOI + `quote: {exact, prefix, suffix}` + `location: {page}`); `prior_insights:` populated as citation placeholders (`id`, `claim`, `created_at`, `evidence: [{id, doi}]` with `quote:` omitted — LITERATURE fills the quotes next); `Option.insights` back-references wired up where options draw on placeholders ⇒ paper pass done
 - For each sub-analysis: when `work/reference/code/` exists, code-pass material-disagreement entries land in `decisions:` (with both options) and `universes/baseline.yaml` selects the canonical-resolution choice; `implementation-notes.md` carries non-material gotchas ⇒ code pass done
-- For each sub-analysis: a `specify: review-and-fix <sub-analysis-id>` commit lands ⇒ that sub-analysis's review-and-fix pass is done
+- For each sub-analysis: Rigor *Current state* reaches *canonical* ⇒ that sub-analysis's review cycle is done
 - `astra validate astra.yaml` returns clean (placeholders whose Evidence carries `doi:` without `quote:` are valid at this stage) ⇒ structural side validated; `--verify-evidence` waits until LITERATURE has authored the `quote:` + `location:` selectors
 - `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-ledger done
 - `implementation-notes.md` exists ⇒ practical-guidance side done
diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
index e5da61ef..47666373 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/constitution.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -40,7 +40,7 @@ The substrate this reproduction is built against — the canonical sources itera
 
 ## Rigor — current state
 
-Per-output trajectory tracking, updated by iterations as they produce artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened*. Read alongside Fidelity intent above so each iteration knows where each output currently sits. Empty until the first iteration produces something:
+Per-output trajectory tracking, updated by iterations as they produce or review artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened / canonical*. *baseline* — first version written. *tightened* — at least one fresh-context iteration reviewed and applied fixes. *canonical* — a fresh-context iteration reviewed and found nothing to fix (terminates the review cycle for that artifact). Read alongside Fidelity intent above so each iteration knows where each output currently sits. Empty until the first iteration produces something:
 
 - (none yet)
 
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index 1d07499e..963247b2 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -99,10 +99,12 @@ Pointers, not snapshots.
   wall-clock available?"). The honest meta-conversation lives in
   INTERVIEW; each iteration then sizes its work from the gap between
   the constitution's Rigor *Current state* and that intent. The
-  per-artifact protocol is simple: iteration N writes, iteration N+1
-  reads cold and does one review-and-fix pass. Outputs that need
-  more rigor than that delivers become Open opportunities in
-  CLAUDE.md for a future loop.
+  vocabulary is *sketch / baseline / tightened / canonical*: the
+  write iteration produces *baseline*; each subsequent fresh-context
+  iteration either lands fixes (→ *tightened*) or finds nothing to fix
+  (→ *canonical*, which terminates the review cycle). The iteration
+  that applied fixes can't declare the artifact *canonical* — that
+  judgment belongs to the next fresh-context read.
 - **arXiv LaTeX first.** PDF + Docling is the non-arXiv fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,
   every input must be real.

From 5254111294bcae7a509ff6f35b593e4204abe784 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 21:27:46 +0200
Subject: [PATCH 104/124] lc-from-paper: split INTERVIEW into two beats with
 /paper-extraction between
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Restructures INTERVIEW to run the paper-extraction substrate inline,
between two question beats. This way every interview question after the
paper identifier is grounded in the actual paper — scope shown as a
menu of real figures/tables, fidelity intent referencing actual headline
numbers, code repo confirmed against what data-availability /
acknowledgements surface, conventions/warnings proactively proposed
from the paper, "prior notes" reframed as "external context outside
paper + code."

The paper identifier is asked in prose (not AskUserQuestion — the
answer is inherently free-form: arXiv ID, DOI, or PDF path).
AskUserQuestion lives entirely in Beat 2, after paper substrate is on
disk.

ACQUIRE becomes thin: just /lc-from-code scan-only against the cloned
reference repo, plus code-status.yaml. Paper substrate is INTERVIEW's
deliverable (committed as part of INTERVIEW's first commit, alongside
constitution.md + CLAUDE.md). No-public-repo case is one step: write
code-status.yaml found=false and launch.

Knock-on doc edits:
- SKILL.md Phases table: paper substrate moves to phase 0 outputs.
- SKILL.md "The two pre-loop bookends": INTERVIEW now describes the
  two-beat shape; ACQUIRE drops the "two parallel sub-skill" framing.
- SKILL.md Workdir-as-state: INTERVIEW row now includes paper substrate;
  ACQUIRE row simplified.
- SKILL.md Resuming: split the substrate-incomplete recovery into
  paper-side (re-run /paper-extraction) and code-side (re-run
  /lc-from-code) cases.
- references/acquire.md: full rewrite — drops paper-extraction step,
  becomes code-substrate-only with a "no-public-repo" branch.
- references/architect.md: survey signal reworded to reference paper
  substrate (from INTERVIEW) and code substrate (from ACQUIRE)
  separately.

Surfaced via Cail's dogfood run of /lc-from-paper on arXiv:2604.03227.
Friction entries #2 (AskUserQuestion is wrong shape for free-form paper
ID), #3 (INTERVIEW should ground questions in actual paper), and #6
(prior-notes framing doesn't match either scientist persona) all
addressed here.

Co-Authored-By: Claude Sonnet 4.7 (1M context) <noreply@anthropic.com>
---
 .../lightcone/skills/lc-from-paper/SKILL.md   |  32 ++--
 .../lc-from-paper/references/acquire.md       |  93 +++++------
 .../lc-from-paper/references/architect.md     |   4 +-
 .../lc-from-paper/references/interview.md     | 149 +++++++++++++-----
 4 files changed, 174 insertions(+), 104 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index c87d971e..eb45feb6 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -18,7 +18,7 @@ You are helping the user reproduce a published scientific paper as a complete AS
 
 The architecture is two-piece:
 
-1. **Interactive bookends in the user's main session.** INTERVIEW and REVIEW are conversations with the user. ACQUIRE is two parallel sub-skill invocations (`/paper-extraction` and `/lc-from-code` in scan-only mode) that produce the on-disk substrate everything downstream consults.
+1. **Interactive bookends in the user's main session.** INTERVIEW and REVIEW are conversations with the user; INTERVIEW also runs `/paper-extraction` inline between its two beats so its second beat can ground every remaining question in the actual paper. ACQUIRE is thin — one `/lc-from-code` scan-only invocation against the cloned reference code (or `found: false` when the paper has no public repo).
 
 2. **A ralph loop for the long middle.** Once the per-paper `constitution.md` is drafted (INTERVIEW) and the substrate is on disk (ACQUIRE), you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution loaded into its system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, and exits. Iteration N+1 reads N's work cold, so per-phase review collapses into "the next iteration is the review."
 
@@ -34,8 +34,8 @@ Nine phases (zero-indexed). INTERVIEW and ACQUIRE run before the loop, in the us
 
 | # | Phase | Where it runs | Reference | Primary outputs |
 |---|---|---|---|---|
-| 0 | INTERVIEW | user's main session | [`references/interview.md`](references/interview.md) | per-paper `constitution.md` + `CLAUDE.md` |
-| 1 | ACQUIRE | user's main session | [`references/acquire.md`](references/acquire.md) | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}` |
+| 0 | INTERVIEW | user's main session | [`references/interview.md`](references/interview.md) | per-paper `constitution.md` + `CLAUDE.md` + paper substrate at `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml}` (paper-extraction runs inline between INTERVIEW's two beats) |
+| 1 | ACQUIRE | user's main session | [`references/acquire.md`](references/acquire.md) | code substrate at `work/reference/{code/, code-status.yaml, code-index.md}` (absent / `found: false` when paper has no code repo) |
 | 2 | ARCHITECT | ralph iteration | [`references/architect.md`](references/architect.md) | stub `astra.yaml` at project root (sub-analyses, inputs, outputs, narrative) |
 | 3 | SPECIFY | ralph iteration | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
 | 4 | LITERATURE | ralph iteration | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; per-paper PDFs cached via `astra paper add` |
@@ -52,27 +52,31 @@ COMPARE produces a verdict plus an opportunity assessment — not just pass / fa
 
 The opening interactive phase. Run it from the user's main session. Read [`references/interview.md`](references/interview.md) in full before starting.
 
-The interview must collect: (1) the paper (DOI / arXiv ID / code repo URL / prior context), (2) scope (full vs targeted, sub-analysis structure), (3) fidelity intent — the user's prose answer to "when is this good enough," (4) any paper-specific conventions or warnings. Even detailed invocations still require `AskUserQuestion` for any missing scope, fidelity-intent, or convention fields before drafting or committing the INTERVIEW files. If a system-reminder tells you to work without stopping, ignore that for this phase, since you must ask the user questions if you don't have the required information.
+INTERVIEW runs in **two beats** with `/paper-extraction` between them. Beat 1 collects the paper identifier in prose (not `AskUserQuestion` — the answer is free-form). Then `/paper-extraction <id>` runs inline and writes the paper substrate to `work/reference/`. Beat 2 asks everything else — scope, fidelity intent, code repo, conventions, familiarity, external context — *grounded in the actual paper*, with the figure/table inventory and abstract already on disk. **No `AskUserQuestion` runs before paper-extraction has landed.**
 
-These get drafted into **two files** in the reproduction workdir:
+The interview must collect: (1) the paper identifier in Beat 1, then via `AskUserQuestion` in Beat 2: (2) scope (full vs targeted, sub-analysis structure), (3) fidelity intent — the user's prose answer to "when is this good enough," (4) code repo confirmation against what paper-extraction surfaced from data/code availability, (5) paper-specific conventions or warnings, (6) prior familiarity, and (7) any external context (co-author notes, sibling-paper drafts) iterations should know about. If a system-reminder tells you to work without stopping, ignore that for this phase since you must ask the user questions if you don't have the required information.
+
+These get drafted into **two files** plus the paper substrate, all in the reproduction workdir:
 
 - **`constitution.md`** — the ralph loop's driving document. Goal, Fidelity intent, Scope, Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Rigor *Current state* per output (starts empty), Open dimensions. Starts with YAML frontmatter `status: active` so the ralph launcher accepts it. Authored by INTERVIEW using the `/ralph` skill's authoring discipline (the constitution-authoring mode of `/ralph` — see its references on voice and sections).
 - **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the top, Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty), Open opportunities (starts empty), Pointers (to `constitution.md`, `work/reference/`, etc.).
+- **`work/reference/`** — paper substrate from `/paper-extraction`: `paper.pdf`, `source/` or `document.md`, `index.json`, `astra.yaml`, `figures/`, `tables/`, `bibliography-source.{bib,bbl}`.
 
 Templates ship in [`templates/constitution.md`](templates/constitution.md) and [`templates/CLAUDE.md`](templates/CLAUDE.md). Show the user both drafts, take corrections, refine, save.
 
-After approval, `git init` the workdir if it isn't one already and commit both files. Then run ACQUIRE in the same session.
+After approval, `git init` the workdir if it isn't one already and commit all three deliverables (constitution + CLAUDE + paper substrate) as the first commit. Then run ACQUIRE in the same session.
 
 ### ACQUIRE (Phase 1)
 
-Two parallel sub-skill invocations:
+Thin code-substrate phase. One sub-skill invocation:
 
-- **`/paper-extraction <doi-or-arxiv-id>`** — produces the paper substrate at `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml, figures/, tables/, bibliography-source.{bib,bbl}}`.
 - **`/lc-from-code` in scan-only mode** against the cloned reference repo at `work/reference/code/` (after `git clone --depth 1 <url> work/reference/code`). Produces `work/reference/code-status.yaml` + `work/reference/code-index.md`.
 
-See [`references/acquire.md`](references/acquire.md) for the full step-by-step. Both happen in your main session — no orchestration overhead, just two skill invocations that produce on-disk artifacts.
+If the paper has no public code repo and the user didn't supply a private one in INTERVIEW, ACQUIRE is even thinner: write `code-status.yaml` with `found: false` and proceed to launch. The code-as-canonical rule self-disables in that case.
+
+See [`references/acquire.md`](references/acquire.md) for the full step-by-step. The paper substrate is INTERVIEW's deliverable, not ACQUIRE's — INTERVIEW reads the paper to ground its second-beat questions, so the substrate is already on disk when ACQUIRE starts.
 
-When ACQUIRE returns, commit the new substrate and launch the ralph loop (see **Launching the loop** below).
+When ACQUIRE returns, commit the code substrate (`code-status.yaml` + `code-index.md`; the `code/` clone itself can be `.gitignore`d for large monorepos) and launch the ralph loop (see **Launching the loop** below).
 
 ## Launching the loop
 
@@ -109,9 +113,8 @@ Each iteration's survey reads the workdir to determine what phase is next. File
 
 | Signal | Phase done |
 |---|---|
-| `constitution.md` + `CLAUDE.md` at workdir root, both committed | INTERVIEW |
-| `work/reference/source/` (arxiv tarball) **or** `work/reference/document.md` (Docling fallback) + `work/reference/index.json` + `work/reference/astra.yaml` | ACQUIRE paper substrate |
-| `work/reference/code/` (or `code-status.yaml` with `found: false`) + `work/reference/code-index.md` | ACQUIRE code substrate |
+| `constitution.md` + `CLAUDE.md` at workdir root, both committed, **and** `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` present (paper substrate is INTERVIEW's deliverable) | INTERVIEW |
+| `work/reference/code/` present **or** `code-status.yaml` records `found: false`, **and** `code-index.md` present (or absent when `found: false`) | ACQUIRE |
 | `astra.yaml` at project root validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
 | `astra.yaml` non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
 | `astra.yaml`'s `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; `work/cited/<doi-slug>/` populated per cited paper | LITERATURE |
@@ -155,7 +158,8 @@ When the user walks back into a workdir that already has artifacts:
 1. **Skip INTERVIEW** unless the user explicitly wants to revise scope (in which case edit `constitution.md` together, no re-draft from scratch).
 2. **If `constitution.md`'s `status:` is `active` and the tmux session isn't running**, re-launch the ralph loop: `.claude/skills/ralph/scripts/ralph constitution.md`. The next iteration surveys the workdir and picks up wherever the prior loop left off.
 3. **If `constitution.md`'s `status:` is `closed`**, the reproduction is at REVIEW. Run REVIEW close-out in your main session.
-4. **If ACQUIRE substrate is incomplete**, finish ACQUIRE in your main session before launching the loop — re-invoke `/paper-extraction` and/or `/lc-from-code` against the existing partial state (both are survey-first and skip done work).
+4. **If the paper substrate is incomplete** (INTERVIEW didn't finish cleanly — paper-extraction errored or partial), re-invoke `/paper-extraction` in your main session against the existing partial state (idempotent; skips done work). Confirm the constitution + CLAUDE.md are consistent before continuing.
+5. **If ACQUIRE substrate is incomplete**, finish ACQUIRE in your main session before launching the loop — re-invoke `/lc-from-code` scan-only against the existing partial state.
 
 ## Anti-patterns
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/acquire.md b/claude/lightcone/skills/lc-from-paper/references/acquire.md
index 89522c68..683baee8 100644
--- a/claude/lightcone/skills/lc-from-paper/references/acquire.md
+++ b/claude/lightcone/skills/lc-from-paper/references/acquire.md
@@ -1,77 +1,66 @@
-# ACQUIRE — stand up the on-disk substrate
+# ACQUIRE — stand up the code substrate
 
-The pre-loop substrate phase. Runs in the user's main session, right after INTERVIEW has committed `constitution.md` and `CLAUDE.md`. Two parallel sub-skill invocations produce the on-disk material every subsequent ralph iteration consults: `/paper-extraction` for the paper side, `/lc-from-code` in scan-only mode for the code side. Both write to `work/reference/`; both are survey-first and skip already-done work, so re-invoking on a partial state is safe.
+The post-INTERVIEW substrate phase. Runs in the user's main session, right after INTERVIEW has committed `constitution.md`, `CLAUDE.md`, and the paper substrate produced by `/paper-extraction`. ACQUIRE is now thin: it stands up the **code substrate** if there's a reference code repository, then commits. The paper substrate is INTERVIEW's deliverable, not ACQUIRE's — INTERVIEW reads the paper to ground its grounded-beat questions, so the substrate already exists on disk when ACQUIRE starts.
 
-There is no `acquire` sub-agent. ACQUIRE's work *is* the two sub-skill invocations. Once they return, commit the substrate and launch the ralph loop (per SKILL.md's *Launching the loop* section).
+There is no `acquire` sub-agent. ACQUIRE's work is at most one skill invocation (`/lc-from-code` in scan-only mode) plus the surrounding clone, status file, and commit.
+
+If the paper has no reference code repo (and the user didn't supply a private one in INTERVIEW), ACQUIRE is one step: write `code-status.yaml` with `found: false` and proceed to launch the loop.
 
 ## Where this runs
 
-User's main session, directly. Sub-skills are invoked as `/paper-extraction <id>` and `/lc-from-code` against the cloned reference repo.
+User's main session, directly. The one sub-skill invocation is `/lc-from-code` in scan-only mode against the cloned reference repo.
 
 ## Inputs
 
-- The paper's DOI or arXiv ID (from `constitution.md`'s Evidence section)
-- An optional code repo URL (from the interview, if the user knew it; recorded in `constitution.md`'s Evidence section)
+- **Code repo URL** (from `constitution.md`'s Evidence section, surfaced during INTERVIEW Beat 2). May be absent if the paper has no public code and the user didn't supply a private one.
+- **Paper substrate** at `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml, figures/, tables/}` — produced by `/paper-extraction` during INTERVIEW. Read-only from ACQUIRE's perspective; iterations consult it, ACQUIRE doesn't modify it.
 
 ## Outputs
 
-All on-disk; no persistent agents:
-
-- `work/reference/paper.pdf`
-- `work/reference/source/` (Path A — arxiv LaTeX) **or** `work/reference/document.md` (Path B — Docling fallback)
-- `work/reference/index.json` — paper-side structural index (figures, tables, outline with line numbers, citations with resolved DOIs)
-- `work/reference/astra.yaml` — paper-extraction's ASTRA-shape stub of the paper (id, name, narrative.summary, optionally findings)
-- `work/reference/figures/`, `work/reference/tables/`, `work/reference/bibliography-source.{bib,bbl}`
-- `work/reference/code/` — cloned reference repo (absent if not found)
-- `work/reference/code-status.yaml` — record of where the code came from
-- `work/reference/code-index.md` — script inventory, candidate decisions, dependencies, container hints
-
-## Step 1 — Invoke `/paper-extraction`
-
-```
-/paper-extraction <doi-or-arxiv-id>
-```
+All on-disk:
 
-This runs the full paper-extraction workflow against the workdir. It writes everything under `work/reference/` listed above. The skill is idempotent; re-invoking on a partially-populated `work/reference/` is safe.
+- `work/reference/code/` — cloned reference repo (absent if `code-status.yaml` records `found: false`)
+- `work/reference/code-status.yaml` — record of where the code came from (or that it wasn't found)
+- `work/reference/code-index.md` — script inventory, candidate decisions, dependencies, container hints (absent when no code substrate)
 
-## Step 2 — Locate, clone, and scan the reference code (parallel with Step 1)
-
-In a separate flow inside the same session:
+## Step 1 — Locate, clone, scan
 
 1. **Locate the reference code repository.**
-   - If a URL was provided at INTERVIEW (in `constitution.md`'s Evidence section), use it.
-   - Otherwise, grep the paper materials in `work/reference/` for repo URLs (abstract, intro, conclusion, footnotes, "Code Availability" / "Data Availability" sections). Path A: grep across `work/reference/source/*.tex`. Path B: grep `work/reference/document.md`. If `/paper-extraction` hasn't finished yet when you need to grep, wait briefly or skip ahead and come back.
-   - If still nothing, web-search: paper title + "github", Papers With Code, or the first author's GitHub profile. A few searches max — record failure and move on.
+   - If a URL was supplied at INTERVIEW (recorded in `constitution.md`'s Evidence section), use it.
+   - Otherwise, the paper has no public code repo and the user didn't supply a private one — go to Step 1.4 and record `found: false`.
 
 2. **Clone if found:**
    ```bash
    git clone --depth 1 <url> work/reference/code
    ```
 
-3. **Write `work/reference/code-status.yaml`:**
+   For multi-project monorepos where the user pointed at specific subpaths (e.g. GitHub `tree/<branch>/<path>` URLs), clone the whole repo on the named branch — don't sparse-checkout — and capture the primary subpaths in `code-status.yaml` so `/lc-from-code` knows where to focus.
+
+3. **If `work/reference/code/` exists, run `/lc-from-code` in scan-only mode against it:**
+   - Invoke `/lc-from-code` pointing at the cloned repo, with an invocation prompt that names the primary subpaths from `code-status.yaml` (if any) and reminds it of the scan-only contract: write `work/reference/code-index.md` only; do not touch `astra.yaml` at the project root; do not parameterize any code; do not run anything; do not modify the cloned repo.
+   - The scan-only branch of `/lc-from-code` does the inventory pass and writes to `work/reference/code-index.md`.
+
+4. **Write `work/reference/code-status.yaml`:**
    ```yaml
    found: true        # or false
    url: "https://..."  # null if not found
+   branch: "main"     # or whichever branch was cloned; null if not found
    cloned: true       # false if found but clone failed
+   primary_subpaths:  # optional; for multi-project monorepos
+     - "notebooks/..."
+     - "..."
    notes: "..."
    ```
 
-4. **If `work/reference/code/` exists, run `/lc-from-code` in scan-only mode against it:**
-   - Invoke `/lc-from-code` pointing at the cloned repo.
-   - The scan-only branch of `/lc-from-code` does the inventory pass inline (no Explore sub-agent spawn); it writes to `work/reference/code-index.md`.
-   - Do not touch `astra.yaml` at the project root, do not parameterize any code, do not run anything, do not modify the cloned repo.
-
 `/lc-from-code`'s scan-only branch is the canonical code-inventory mechanism. Its prompt-context surface is what carries the "stop at scan" contract.
 
-**A scan-only return is not an ACQUIRE stopping point.** ACQUIRE is incomplete until Step 3 below has either succeeded or hit a concrete launcher blocker. When `/lc-from-code` returns, do not summarize the scan as the final user-facing result. Continue immediately to Step 3: commit the substrate, launch the ralph loop, and tell the user the session name.
+**A scan-only return is not an ACQUIRE stopping point.** ACQUIRE is incomplete until Step 2 below has either succeeded or hit a concrete launcher blocker. When `/lc-from-code` returns, do not summarize the scan as the final user-facing result. Continue immediately to Step 2: commit the code substrate, launch the ralph loop, and tell the user the session name.
 
-## Step 3 — Commit and launch the ralph loop
+## Step 2 — Commit and launch the ralph loop
 
-When both Step 1 and Step 2 have landed:
+1. **Commit the code substrate.** Stage `code-status.yaml` + `code-index.md` and commit — small, descriptive ("acquire: code substrate (sp_validation @ develop)"). The `work/reference/code/` clone itself can be `.gitignore`d or committed depending on the project's preference; the inventory file (`code-index.md`) is what downstream iterations actually consult, and gitignoring keeps the workdir tracked-size small for a 50+ MB monorepo clone.
 
-1. **Commit the substrate.** Stage `work/reference/` and commit — small, descriptive ("acquire: paper-extraction substrate"). For the code side: commit `code-status.yaml` + `code-index.md`. The `work/reference/code/` clone itself can be `.gitignore`d or committed depending on the project's preference; the inventory file (`code-index.md`) is what downstream iterations actually consult.
-
-2. **Tell the user** the ralph loop is about to launch. Surface anything notable from Step 2 — if `code-status.yaml` records `found: false` or the cloned repo is gnarly, mention it now so the user can adjust scope before iterations start working against the substrate.
+2. **Tell the user** the ralph loop is about to launch. Surface anything notable from Step 1 — if `code-status.yaml` records `found: false` or the cloned repo is gnarly (no `requirements.txt`, abandoned-looking, etc.), mention it now so the user can adjust scope before iterations start working against the substrate.
 
 3. **Launch the loop** (per SKILL.md's *Launching the loop* section):
    ```bash
@@ -83,18 +72,20 @@ When both Step 1 and Step 2 have landed:
 
 Run `ls work/reference/` first.
 
-- `paper.pdf` + path indicator (`source/` for Path A, `document.md` for Path B) + `index.json` + paper-side `astra.yaml` present → `/paper-extraction` has done its work (or is mid-run; re-invoking is idempotent and will skip done work).
-- `work/reference/code/` present, **or** `code-status.yaml` records `found: false`, **and** `code-index.md` is present → code-side work is done.
-- When both sides are present and committed → ACQUIRE is complete; commit any unstaged changes and launch the loop.
-- Otherwise, re-invoke whichever side is missing. Both skills are survey-first and skip already-done work.
+- `work/reference/code/` present, **or** `code-status.yaml` records `found: false`, **and** `code-index.md` is present → ACQUIRE is done. Commit any unstaged changes and launch the loop.
+- Otherwise, run Step 1.
+
+If the paper substrate (`paper.pdf`, `index.json`, etc.) is missing, INTERVIEW didn't complete cleanly — re-invoke `/paper-extraction` against the partial state (idempotent; skips done work) and confirm the constitution + CLAUDE.md are consistent with what's on disk, before continuing.
 
 ## Notes
 
-- **paper-extraction is the substrate authority.** Don't re-fetch the LaTeX source, don't re-run Docling, don't re-parse the paper from inside ACQUIRE. If a substrate need surfaces — including mid-reproduction, raised by an iteration — fix it in `/paper-extraction`, not here. Bibliography resolution is paper-extraction's: cited-paper text and DOIs live inside `index.json#citations[key]`, not in a side file.
 - **lc-from-code is the code-inventory authority** for the scan portion. ACQUIRE's invocation constrains it to scan-only via the prompt; the parameterization and run portions of `/lc-from-code` are not invoked at this phase.
-- **arXiv DOI form is `10.48550/arXiv.<id>`.** Useful when downstream tools want a DOI rather than an arXiv ID.
-- **Equation numbers and section numbers must match the rendered paper.** When citing "eq. N" or "§N" downstream, find by content, not by a naïve count of TeX blocks or markdown headings. Path A: source preserves printed numbers in `\label{}`s. Path B: Docling preserves printed numbers.
-- **This phase is acquisition, not understanding.** ACQUIRE doesn't write `astra.yaml` at the project root and doesn't compare paper to code. ARCHITECT does that, in the first ralph iteration after the loop launches.
-- **Code-as-canonical** is loaded by every iteration via `CLAUDE.md`'s Rules. ACQUIRE just stands up the reference so the rule has something to point at.
 - **The cloned code is read-only reference.** Iterations may re-read it; nothing modifies `work/reference/code/`. (When the reproduction's implementation needs to happen, that's an IMPLEMENT-phase decision, not an ACQUIRE one.)
-- **Surface anti-patterns from the scan.** If `code-status.yaml` reports the clone failed or the repo is clearly dead, or if `/paper-extraction` reports the paper substrate is broken, surface to the user immediately rather than launching a loop against half-acquired substrate.
+- **Code-as-canonical** is loaded by every iteration via `CLAUDE.md`'s Rules. ACQUIRE just stands up the reference so the rule has something to point at.
+- **This phase is acquisition, not understanding.** ACQUIRE doesn't write `astra.yaml` at the project root and doesn't compare paper to code. ARCHITECT does that, in the first ralph iteration after the loop launches.
+- **No reference code is still a valid ACQUIRE outcome.** When `code-status.yaml` records `found: false`, iterations operate in paper-only mode — methodology lives in the paper's prose; no code-as-canonical adjudication is needed. CLAUDE.md's code-as-canonical Rule self-disables in that case.
+- **Surface anti-patterns from the scan.** If `code-status.yaml` reports the clone failed or the repo is clearly dead, surface to the user immediately rather than launching a loop against half-acquired substrate.
+
+## Future substrate types
+
+ACQUIRE's purpose is "stand up reference substrate that wasn't surfaced in INTERVIEW." Today, that's just the code. If a future paper requires substrate types that aren't paper-or-code (a specific dataset to fetch from an open archive, supplementary materials, calibration files), they fit naturally as Step 1.5 in ACQUIRE — produced before commit + launch, with a status file recording what was acquired. Don't accrete those into INTERVIEW (which is about conversation) or into the ralph loop (which is about iteration over committed substrate).
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 0ed76a2b..606d7b98 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -2,7 +2,7 @@
 
 ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps each iteration's cognitive load manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
 
-ARCHITECT is what a ralph iteration does when the workdir signals "ACQUIRE substrate present + project-root `astra.yaml` absent (or empty stub)." The heavy work of *understanding* the paper and code happened in `/paper-extraction` and `/lc-from-code`'s scan-only branch; their on-disk substrate (the structural `index.json`, the paper-extraction `astra.yaml`, the `code-index.md`) is what you read on entry. No persistent expert sub-agents; targeted reads against the substrate carry the orientation.
+ARCHITECT is what a ralph iteration does when the workdir signals "paper substrate (from INTERVIEW) + code substrate (from ACQUIRE) both present + project-root `astra.yaml` absent (or empty stub)." The heavy work of *understanding* the paper and code happened in `/paper-extraction` (which INTERVIEW invokes inline) and `/lc-from-code`'s scan-only branch (which ACQUIRE invokes); their on-disk substrate (the structural `index.json`, the paper-extraction `astra.yaml`, the `code-index.md`) is what you read on entry. No persistent expert sub-agents; targeted reads against the substrate carry the orientation.
 
 ## Inputs
 
@@ -112,7 +112,7 @@ Apply fixes inline as you find them. Don't write a separate findings file — th
 
 ## Survey signals (entry into ARCHITECT)
 
-- `work/reference/index.json` + `work/reference/astra.yaml` + `work/reference/code-index.md` (when code present) exist ⇒ ACQUIRE substrate is ready
+- `work/reference/index.json` + `work/reference/astra.yaml` (paper substrate from INTERVIEW) + `work/reference/code-index.md` (code substrate from ACQUIRE, when code present) exist ⇒ paper + code substrate is ready
 - `astra.yaml` at project root absent (or present-but-empty) ⇒ this iteration writes the stub (records *stub: baseline*)
 - `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty), Rigor *Current state* shows *stub: baseline* or *stub: tightened* ⇒ this iteration is review-and-fix
 - Rigor *Current state* shows *stub: canonical* ⇒ ARCHITECT done; next iteration surveys for SPECIFY
diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index 57fd4b79..5612ee82 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -2,71 +2,144 @@
 
 The opening interactive phase. Runs from the user's main session, before the ralph loop launches. Its job is to crystallize what the user actually wants — which paper, what scope, any paper-specific gotchas — and bake that into the per-paper `constitution.md` (the ralph loop's driving document) and `CLAUDE.md` (the auto-loading walk-up with rules and accumulators) the loop's iterations will walk up to.
 
-The interview is short. Three to six `AskUserQuestion` rounds, total. The user does not need to teach you the paper; they need to tell you what they want reproduced.
+The interview runs in **two beats** with `/paper-extraction` between them. Beat 1 is short and cold — it collects just the paper identifier so the substrate can be acquired. Then `/paper-extraction` runs inline. Beat 2 asks everything else — scope, fidelity, conventions, code repo, familiarity, external context — *grounded in the actual paper*, with the figure/table inventory, abstract, conclusions, and data-availability section already on disk.
+
+The two-beat shape exists because most interview questions are inherently hard to answer well in the abstract. Asked cold, "which figures matter?" forces the user to recall the paper from memory; asked after extraction, it's a menu of the paper's actual figures. The same applies to fidelity intent, code repo URL, and paper-specific conventions — the paper knows; the user shouldn't have to reach for it.
 
 ---
 
 ## What the interview produces
 
-Two files at the reproduction workdir root:
+Three things in the reproduction workdir, all committed together at the end:
 
 - **`constitution.md`** — drafted from [`../templates/constitution.md`](../templates/constitution.md). YAML frontmatter `status: active`, then Goal, Fidelity intent, Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL, where the substrate lives), Rigor *Current state* per output (starts empty; iterations append), Open dimensions. The ralph loop's driving document; each iteration reads it on entry. The top half (Goal, Scope, Quality bar, Evidence) sharpens slowly; the bottom half (Rigor *Current state*) is updated each iteration. Task-bound — archivable once the reproduction closes.
 - **`CLAUDE.md`** — drafted from [`../templates/CLAUDE.md`](../templates/CLAUDE.md). Paper identity at the top (DOI, title, one-line subject), Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty; iterations append), Open opportunities (starts empty; iterations append), Pointers (to `constitution.md`, `work/reference/`, etc.). The auto-loading walk-up; every Claude Code session in the workdir picks it up. Durable — stays useful for any follow-on work in this directory once the reproduction lands.
+- **`work/reference/` paper substrate** — produced by `/paper-extraction` between beats: `paper.pdf`, `source/` (Path A) or `document.md` (Path B), `index.json`, `astra.yaml`, `figures/`, `tables/`, `bibliography-source.{bib,bbl}`.
+
+There is no separate "constitution skill" invocation — `/ralph`'s Authoring mode (Study → Draft → Refine → Launch) is what you're following here; the constitution authoring discipline + reference materials live there. Pull the discipline mentally; the deliverable is these two markdown files (plus the substrate produced by `/paper-extraction`).
+
+After the user approves both drafts, save them, `git init` the workdir if it isn't one already, commit `constitution.md` + `CLAUDE.md` + the paper substrate as the first commit, then proceed to ACQUIRE in the same session.
+
+---
+
+## Beat 1 — Cold: identify the paper
+
+Ask the user for the paper identifier in **prose** — not `AskUserQuestion`. The answer is inherently free-form (an arXiv ID, a DOI, or a path to a PDF on disk), and a multiple-choice modal is the wrong shape for it.
+
+Wording is up to you, but cover the three forms cleanly. Something like:
+
+> *"What paper would you like to reproduce? An arXiv ID, a DOI, or a path to a PDF on disk all work — arXiv ID gives the cleanest acquisition because the LaTeX source comes through."*
 
-There is no separate "constitution skill" invocation — `/ralph`'s Authoring mode (Study → Draft → Refine → Launch) is what you're following here; the constitution authoring discipline + reference materials live there. Pull the discipline mentally; the deliverable is these two markdown files.
+If the user supplied the identifier on the `/lc-from-paper` invocation, skip the ask. If not, ask once and continue when you have it. Don't batch other questions into this beat — everything else lives in Beat 2, after the paper is on disk.
 
-After the user approves both drafts, save them, `git init` the workdir if it isn't one already, commit both files as the first commit, then proceed to ACQUIRE in the same session.
+**No other `AskUserQuestion` rounds in Beat 1.** Anything beyond the identifier is either inferable from the paper (next step) or belongs in Beat 2.
 
 ---
 
-## The four jobs
+## Between beats — Run `/paper-extraction` inline
 
-### 1. Identify the paper
+With the paper identifier in hand, invoke the paper-extraction skill directly:
 
-If the user did not supply a paper identifier on the `/lc-from-paper` invocation, your first action is `AskUserQuestion` asking for the paper along with the following items rather than trying to search for a paper in their directories.
+```
+/paper-extraction <doi-or-arxiv-id-or-pdf-path>
+```
 
-Use `AskUserQuestion` for whatever the user did not supply on `/lc-from-paper` invocation:
+This produces the paper substrate under `work/reference/` (see [`acquire.md`](acquire.md) — ACQUIRE no longer touches the paper side). When it returns, the substrate is on disk. **Read it before continuing to Beat 2** so the next questions are grounded:
 
-- **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE).
-- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) When code is available, every iteration that touches a sub-analysis reads from `work/reference/code/` and treats code as canonical for numerics + method. This is recorded in CLAUDE.md's Rules.
-- **User's prior familiarity.** Has the user reproduced this paper before? Read it recently? Worked with the original authors? Useful context for the iterations (and for the user at REVIEW).
-- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; iterations will read it during ARCHITECT.
+- **`work/reference/index.json`** — title, abstract, figure/table inventory with captions, section outline, citations with resolved DOIs. The structural surface.
+- **The abstract and the conclusions section of the paper** — give you the claimed headline results, with actual numbers.
+- **The "Data availability" / "Code availability" sections of the paper** — usually the canonical place for repo URLs and dataset locations. If neither section exists, grep across `work/reference/source/*.tex` (Path A) or `work/reference/document.md` (Path B) for `github.com`, `gitlab`, `zenodo`, `softwarex`, `\url{}` patterns.
+- **The acknowledgements section** — sometimes carries software repos, dataset attributions, cluster acknowledgements that hint at the execution environment.
 
-### 2. Scope the reproduction
+You do *not* need to read the paper end-to-end. The goal is to ground Beat 2's questions — abstract for claims, conclusions for what the paper says it found, data/code availability for substrate hints. Iterations will read the rest as they need it.
 
-A paper has many figures, tables, numbers. The user usually does not want all of them.
+If `/paper-extraction` fails or returns partial substrate (network issue, ambiguous arXiv ID, etc.), surface the failure to the user before continuing — Beat 2 can't ground itself against missing substrate.
 
-Ask:
+---
+
+## Beat 2 — Grounded: everything else
+
+Now `AskUserQuestion` is the right tool — each remaining question is a constrained choice with structured options, and the user has paper context loaded from your summary or from the substrate they can browse. Ask in whatever order reads naturally; batching related questions in a single `AskUserQuestion` call (up to 4) is fine.
+
+### Scope
+
+Present the paper's actual primary outputs as a menu:
 
-- **Full reproduction or targeted?** Full = every primary result the paper reports. Targeted = "I only care about figures 3, 4, 7 and the headline number in Table 2." Targeted is cheaper and produces a tighter `astra.yaml`.
-- **Specific decisions of interest.** A paper makes many choices. The user may care most about a few — e.g. "I want the BAO fit to use a different damping prior than the paper." These become first-class decisions in the spec, with the alternative preserved as a sibling option.
-- **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; ARCHITECT will mirror that structure as the stub's decomposition. If the paper is monolithic, one analysis suffices.
+> *"The paper claims [N] figures + [M] tables + [headline numerical results]. What's in scope for this reproduction?"*
+>
+> - Full — every primary result the paper reports
+> - Targeted — specific figures / tables / numbers (you'll list which)
+> - Use the paper's natural primary-result set (default)
+
+When the user picks "targeted," follow up with the list of the paper's figures/tables (from `index.json`) so they can pick the subset directly rather than recalling from memory.
+
+If the paper has sub-analyses with genuinely independent stages (e.g. reconstruction → clustering → BAO fit), ask about decomposition; if the paper is monolithic, one analysis suffices.
 
 These answers go into `constitution.md`'s **Scope** section (in / out) and inform ARCHITECT's structural decomposition.
 
-### 3. Fidelity intent
+### Fidelity intent
+
+A reproduction can land anywhere from a quick "does this even run" sanity check to a full match across every primary and secondary target. The user owns where they want this one to land — but where it *can* land in this stretch depends on the compute, tokens, time, and attention available. The honest meta-conversation is the point: what does the user want out of this first stretch, given what's spendable on it?
+
+Don't ask the abstract "what would you like to get out of this" — too literal, lands as a wish list. Pivot on what's actually being weighed. With the paper's actual headline numbers in hand from the abstract/conclusions, name them in the prompt so the answer can lock onto something concrete:
+
+> *"The paper's headline is `S_8 = 0.795 ± 0.014`. What's the right shape for this stretch — a quick check that the analysis is tractable, getting that one number right within stated uncertainty, or a full match across every primary target? How much compute and wall-clock do you have to spend on it?"*
+
+Offer the prose options as `AskUserQuestion` options the user can pick from or replace via "Other":
+
+- *"Just checking the analysis is tractable — quick sanity that some headline number comes out close. An afternoon."*
+- *"The headline matches within stated uncertainty; secondary results can stay rough. Overnight."*
+- *"One specific figure / result fully matches; rest stay rough — a day or two."* — follow up: which one?
+- *"Every primary and secondary target lining up within stated tolerance; every paper-vs-code conflict adjudicated. No hard deadline."*
+
+Record the answer verbatim or in close paraphrase under **Fidelity intent** in `constitution.md`'s Goal section. Time/compute bounds are part of the intent — the user's spendable budget shapes what "good enough" can mean for this stretch. Each iteration reads the intent when sizing its next move; COMPARE grades opportunities against it.
+
+If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
+
+### Code repository
+
+Use what `/paper-extraction` surfaced. If there's a single candidate URL from the data/code availability or acknowledgements section, lead with that confirmation:
+
+> *"The paper's Data availability section points at `https://github.com/...`. Should we clone that as the reference code? Or is there a different/private repo?"*
+
+If paper-extraction found nothing, ask plainly:
+
+> *"I didn't find a code repo URL in the paper. Is there a private / unpublished repo we should clone? Or proceed paper-only?"*
+
+When the user provides a URL, capture it into `constitution.md`'s **Evidence** section. When the paper has no code repo and the user doesn't supply one, capture *"no public code; paper prose is the only methodological anchor"* into the Evidence section so iterations don't waste effort searching.
+
+When the code is available, every iteration that touches a sub-analysis reads from `work/reference/code/` and treats code as canonical for numerics + method. This is recorded in `CLAUDE.md`'s Rules.
+
+### Paper-specific conventions or warnings
+
+Now Claude has read the paper enough to *propose* one-line conventions / warnings rather than asking the user to volunteer cold. Surface candidates from your post-extraction read:
+
+> *"From the paper I noticed: (a) Paper II of a 5-paper series; siblings in prep with no DOI. (b) Uses non-standard convention for X. (c) Four-way catalog comparison drives every figure. Want any of those as iteration-level pointers in `CLAUDE.md`?"*
+
+Let the user toggle the ones to keep, edit them, add more, or skip cleanly if none apply. The selected items land in `CLAUDE.md`'s **Pointers** section as one-line notes — context every iteration sees on entry.
+
+### Prior familiarity
 
-A reproduction can land anywhere from a quick "does this even run" sanity check to a full match across every primary and secondary target. The user owns where they want this one to land — but where it *can* land in this stretch depends on the compute, tokens, time, and attention available. The honest meta-conversation is the point: what does the user want out of this first stretch, given what's spendable on it? Don't ask the abstract "what would you like to get out of this" — too literal, lands as a wish list. Pivot on what's actually being weighed.
+A single question, post-extraction:
 
-The job is to **elicit prose intent** — the user's own words for what "good enough" looks like for this stretch — and capture it into `constitution.md`'s Goal section. Reach for whichever pivot fits; you usually only need one or two:
+> *"How familiar are you with this paper?"*
+>
+> - Haven't read it / barely skimmed
+> - Skimmed it / general sense of the claims
+> - Read carefully / know the methodology
+> - Coauthor / worked closely with the authors
 
-- *"What's the right shape for this stretch — a quick check that the analysis is tractable, getting one specific figure right, or a full match across primary targets? How much compute and wall-clock do you have to spend on it?"*
-- *"Is there a specific result you care about more than the rest, where you'd want full fidelity even if the others stay rough?"*
-- *"Are you trying to verify the paper, build on it, or critique it? That shifts where the fidelity bar wants to sit."*
-- *"If this took a few days of iteration to reach high fidelity everywhere, is that the right investment? Or would a working version overnight be enough to decide where to push further?"*
+This affects how confidently iterations should defer to the user when adjudicating paper-vs-code disagreements, and how heavy first-iteration review should lean. It does **not** gate paper acquisition — that's why it's in Beat 2, not Beat 1.
 
-Record the answer verbatim or in close paraphrase under **Fidelity intent** in `constitution.md`'s Goal section. Concrete examples of what good prose intent looks like:
+### External context
 
-- *"Just checking if the analysis is tractable — quick sanity that some headline number comes out close. An afternoon."*
-- *"I care about Figure 3 being right. The rest can stay rough. Overnight if needed."*
-- *"Full fidelity on the BAO fit specifically; the rest can stay rough. A day or two."*
-- *"Every primary and secondary target lining up within stated tolerance, every paper-vs-code conflict adjudicated. No hard deadline."*
+The real probe is: *"is there context outside the paper substrate + codebase that should inform the spec?"* — co-author feedback, sibling-paper drafts (common for papers in a series), internal blinding documentation, decision-history docs, referee responses, a relevant talk or slide deck. The artifact form varies; what matters is whether such context exists and whether you should point ARCHITECT at it.
 
-Time/compute bounds are part of the intent — the user's spendable budget shapes what "good enough" can mean for this stretch. Each iteration reads the intent when sizing its next move; COMPARE grades opportunities against it. If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
+Ask in those terms:
 
-### 4. Paper-specific conventions or warnings
+> *"Beyond the paper and any code repo, is there context an iteration should know about — co-author / referee feedback, internal notes, a sibling paper still in prep, decisions documented elsewhere? If yes, point at the path(s). Otherwise the paper substrate + code are the source of truth."*
 
-Light touch. Ask the user if there's anything they want every iteration to know about this paper up front — a known pitfall, a non-obvious convention, a thing the authors did unusually. These go into `CLAUDE.md`'s **Pointers** section as one-line notes. Skip cleanly if nothing comes to mind; iterations surface their own as they work.
+Capture paths into `CLAUDE.md`'s **Pointers** section. Don't proactively read them in INTERVIEW — that's ARCHITECT's job when it scopes the sub-analyses.
 
 ---
 
@@ -77,16 +150,18 @@ Open both templates side-by-side:
 - [`../templates/constitution.md`](../templates/constitution.md) — fill in the header, Goal (with fidelity intent), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL — these are the user-supplied identifiers; the substrate-path bullets in the template stay as boilerplate, naming where each substrate lives on disk), Open dimensions. Leave the YAML frontmatter `status: active` intact.
 - [`../templates/CLAUDE.md`](../templates/CLAUDE.md) — fill in the header (paper title + arXiv ID + DOI + one-line subject), any paper-specific Pointers. Leave Rules in the template state (universal across reproductions). Leave the Disagreements log and Open opportunities sections empty — iterations populate them.
 
-Show both drafts to the user, take corrections, refine, save. Then `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline) and commit both as the first commit.
+Show both drafts to the user, take corrections, refine, save. Then `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline) and commit `constitution.md` + `CLAUDE.md` + the paper substrate (everything under `work/reference/` that paper-extraction produced) as the first commit. A single commit captures the full INTERVIEW deliverable.
 
-After the user approves and the workdir is initialized, run ACQUIRE in your same main session (see [`acquire.md`](acquire.md)). When ACQUIRE completes, commit the substrate and launch the ralph loop (per SKILL.md's *Launching the loop* section). Tell the user the tmux session name and the attach command, and that you'll be ready for REVIEW close-out when the loop terminates.
+After the user approves and the workdir is initialized, run ACQUIRE in your same main session (see [`acquire.md`](acquire.md)). ACQUIRE is now thin — just the code substrate side. When ACQUIRE completes, commit the code substrate and launch the ralph loop (per SKILL.md's *Launching the loop* section). Tell the user the tmux session name and the attach command, and that you'll be ready for REVIEW close-out when the loop terminates.
 
 ---
 
 ## Discipline
 
-- **The interview is short.** Three to six `AskUserQuestion` rounds, total. If the user is grinding through detail, gently steer back to scope.
-- **Two files, both drafted at INTERVIEW.** No deferring — both `constitution.md` and `CLAUDE.md` are committed before ACQUIRE runs and before the loop launches.
+- **No `AskUserQuestion` before paper-extraction has run.** Beat 1 collects the identifier in prose; everything else waits until Beat 2, after the paper is on disk and you can ground the questions in actual content.
+- **The paper-identifier question is prose.** It's the one question that doesn't fit `AskUserQuestion`'s multiple-choice shape; the free-form answer (arXiv ID / DOI / PDF path) belongs in a prose ask.
+- **Three to six `AskUserQuestion` rounds total in Beat 2** — scope, fidelity, code repo, conventions, familiarity, external context. Some can batch into a single multi-question call when they're independent.
+- **Three deliverables, one commit.** `constitution.md` + `CLAUDE.md` + `work/reference/` paper substrate are committed together at the end of INTERVIEW. No intermediate commits for "paper-extraction landed but the user hasn't approved yet."
 - **Defaults are the path.** When the user says "you choose," take the defaults — full reproduction, the paper's natural sub-analysis structure if any. The defaults reflect what the architecture has learned about which seams matter.
 - **One paper at a time.** A single `constitution.md` + `CLAUDE.md` pair covers one paper. If the user wants two, run the interview twice — two reproduction directories, two pairs.
 

From 4524774a87674945cbcbaff4cbd9e33ef13feef9 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 21:35:29 +0200
Subject: [PATCH 105/124] =?UTF-8?q?lc-from-paper:=20drop=20the=20Rigor=20v?=
 =?UTF-8?q?ocabulary=20=E2=80=94=20review=20is=20"read=20critically=20as?=
 =?UTF-8?q?=20part=20of=20survey"?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The sketch/baseline/tightened/canonical vocabulary was decorative. The
actual protocol collapses to "every iteration reads the most recent
artifact critically; fixes what's broken or advances if nothing is."
No state machine, no review-state tracking, no commit-message regex.

Dropped:
- Rigor — current state section from the constitution template
- The four-word vocabulary across SKILL.md and all per-phase references
- The "review-and-fix iteration" framing as a distinct kind of iteration
- All "Rigor: canonical" survey signals
- The 5-iteration cap encoded around vocabulary states

Kept:
- The single discipline line in SKILL.md: "Read the most recent
  artifact critically as part of survey. If you see real issues, fix
  them and commit before adding more. Termination of any phase is
  implicit: a fresh-context iteration finds nothing to critique and
  moves forward."
- The "iteration that landed fixes can't also judge it done"
  property — now automatic, not enforced by vocabulary.
- Open opportunities in CLAUDE.md as the durable "wanted more than
  the loop delivered" surface.

The per-phase reference files lost their elaborate review-and-fix
sections; survey signals now use natural file-existence / content
checks (stub absent? decisions empty? recipes present?). The cascade
nets out at -27 lines across 10 files — meaningful simplification.
---
 .../lightcone/skills/lc-from-paper/SKILL.md   | 14 ++++----
 .../lc-from-paper/references/architect.md     | 32 ++++++-------------
 .../lc-from-paper/references/implement.md     | 32 ++++++++-----------
 .../lc-from-paper/references/interview.md     |  2 +-
 .../lc-from-paper/references/literature.md    | 13 ++++----
 .../skills/lc-from-paper/references/review.md |  4 +--
 .../lc-from-paper/references/specify.md       | 20 ++++++------
 .../skills/lc-from-paper/templates/CLAUDE.md  |  2 +-
 .../lc-from-paper/templates/constitution.md   | 10 ++----
 docs/skills/lc-from-paper.md                  | 28 ++++++++--------
 10 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index eb45feb6..28b38622 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -22,7 +22,7 @@ The architecture is two-piece:
 
 2. **A ralph loop for the long middle.** Once the per-paper `constitution.md` is drafted (INTERVIEW) and the substrate is on disk (ACQUIRE), you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution loaded into its system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, and exits. Iteration N+1 reads N's work cold, so per-phase review collapses into "the next iteration is the review."
 
-The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The split is intentional: the constitution is *task-bound* (what this reproduction is trying to achieve and how it's progressing — Goal, fidelity intent, scope, quality bar, plus the running rigor accumulators) and can be archived once the reproduction lands. CLAUDE.md is *durable* (rules, paper-vs-code disagreements, pointers to substrate) — it stays useful when the user comes back to do follow-on work in this directory. Every iteration picks up both on launch.
+The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The split is intentional: the constitution is *task-bound* (what this reproduction is trying to achieve — Goal, fidelity intent, scope, quality bar, Open dimensions) and can be archived once the reproduction lands. CLAUDE.md is *durable* (rules, paper-vs-code disagreements, Open opportunities, pointers to substrate) — it stays useful when the user comes back to do follow-on work in this directory. Every iteration picks up both on launch.
 
 ## Setup: git-tracked workdir
 
@@ -58,7 +58,7 @@ The interview must collect: (1) the paper identifier in Beat 1, then via `AskUse
 
 These get drafted into **two files** plus the paper substrate, all in the reproduction workdir:
 
-- **`constitution.md`** — the ralph loop's driving document. Goal, Fidelity intent, Scope, Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Rigor *Current state* per output (starts empty), Open dimensions. Starts with YAML frontmatter `status: active` so the ralph launcher accepts it. Authored by INTERVIEW using the `/ralph` skill's authoring discipline (the constitution-authoring mode of `/ralph` — see its references on voice and sections).
+- **`constitution.md`** — the ralph loop's driving document. Goal, Fidelity intent, Scope, Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Open dimensions. Starts with YAML frontmatter `status: active` so the ralph launcher accepts it. Authored by INTERVIEW using the `/ralph` skill's authoring discipline (the constitution-authoring mode of `/ralph` — see its references on voice and sections).
 - **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the top, Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty), Open opportunities (starts empty), Pointers (to `constitution.md`, `work/reference/`, etc.).
 - **`work/reference/`** — paper substrate from `/paper-extraction`: `paper.pdf`, `source/` or `document.md`, `index.json`, `astra.yaml`, `figures/`, `tables/`, `bibliography-source.{bib,bbl}`.
 
@@ -98,13 +98,13 @@ Tell the user explicitly: "Launching the ralph loop in tmux session `<name>`. At
 
 Iterations follow the `/ralph` skill's Loop protocol — Survey → Work → Update → Exit. The per-paper specifics layered on top:
 
-- **Survey starts with the constitution + CLAUDE.md, then the workdir.** Read the constitution end-to-end — Goal, Fidelity intent, Scope, Quality bar, and the Rigor *Current state* table for where each output currently sits relative to the quality bar. Skim CLAUDE.md for rules, paper-vs-code disagreements, and pointers. Then survey the workdir against the **Workdir-as-state** table below to identify the next phase that needs work.
+- **Survey starts with the constitution + CLAUDE.md, then the workdir.** Read the constitution for Goal, Fidelity intent, Scope, Quality bar. Skim CLAUDE.md for rules, paper-vs-code disagreements, Open opportunities, and pointers. Then survey the workdir against the **Workdir-as-state** table below to identify the next phase that needs work — and read the most recent artifact critically before extending it.
 - **One phase per iteration is the typical shape.** Don't try to do ARCHITECT *and* SPECIFY in one iteration; the fresh-context property of the next iteration is what makes review work, and conflating phases collapses the seam. (Exceptions: small targeted fixes after COMPARE may touch multiple phases in one iteration if they're tightly coupled.)
 - **Phase reference is your working spec for the iteration.** Whichever phase is next, read its `references/<phase>.md` on entry. That file carries the discipline for that phase's work (what to produce, code-as-canonical, rigor adjustment, etc.).
-- **Self-review is the next iteration.** Where ARCHITECT/SPECIFY/LITERATURE/IMPLEMENT used to spawn fresh-context reviewer sub-agents per round (broken — sub-agents can't spawn sub-agents), the discipline now lives on iteration boundaries. Each iteration that touches an artifact reads it cold, reviews silently, applies any fixes inline, updates the constitution's Rigor *Current state* (*baseline* → *tightened* if changes landed, → *canonical* if nothing needed fixing), commits, and exits. **The iteration that makes a change cannot declare the artifact done** — a subsequent fresh-context iteration has to read it and find nothing to fix for the artifact to reach *canonical*. No intermediate review files; the diff against the prior commit is the record. The cycle terminates when the artifact hits *canonical*; if 5 review iterations pass without reaching *canonical*, log the unfinished tail to `open-questions.md` and let the survey advance the phase.
+- **Read the most recent artifact critically as part of survey.** Every iteration enters fresh and reads the last phase's work cold. If you see real issues, fix them and commit before adding more — that's the review. If nothing needs fixing, advance to the next valuable move. Termination of any phase is implicit: a fresh-context iteration finds nothing to critique in the prior work and moves forward. The iteration that just landed fixes can't also be the iteration that judges the work clean — by construction, it found something to fix.
 - **Parallel fan-out lives inside an iteration.** LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis work, IMPLEMENT per-output work — these fan out as one-level-deep `Agent(...)` spawns inside the iteration's main session. Sub-agents can't spawn sub-agents, but an iteration *is* the main session, so it can spawn freely.
 - **`AskUserQuestion` is not available inside an iteration.** Each iteration runs in a detached tmux session; the user isn't reachable interactively. Iterations append questions to `open-questions.md` with their best-judgment default applied, and the user resolves them at REVIEW close-out (back in their main session).
-- **Update the accumulators** before exit: in `constitution.md`, the Rigor *Current state* per output that the iteration changed; in `CLAUDE.md`, the Paper-vs-code disagreements log for any material conflict the iteration surfaced and Open opportunities for any COMPARE-surfaced gap the iteration didn't act on.
+- **Update the accumulators** before exit: in `CLAUDE.md`, the Paper-vs-code disagreements log for any material conflict the iteration surfaced and Open opportunities for any COMPARE-surfaced gap the iteration didn't act on; in `constitution.md`, Open dimensions for anything material that warrants user ratification at REVIEW.
 - **Sharpen the constitution body itself** if something fundamental shifted — the user's fidelity intent reframed, a sub-analysis decomposition rethought, a quality-bar item that's now more concrete. Don't accrete amendment sections; rewrite the affected prose.
 
 ## Workdir-as-state
@@ -135,13 +135,13 @@ REVIEW runs in your main session because `/figure-comparison` and `/check-senten
 
 **Workdir is the state.** No state machine, no resume mechanic — file existence + `git log` + `astra validate` answer "what phase am I on" deterministically. Each iteration's first move is to survey the workdir on entry against the table above.
 
-**Constitution is task-bound; CLAUDE.md is durable.** The constitution describes what *this reproduction* is trying to achieve and how it's progressing — Goal, Fidelity intent, Scope, Quality bar, Evidence, Rigor *Current state*, Open dimensions. Once the reproduction lands, the constitution can be archived. CLAUDE.md carries what stays useful past the reproduction — paper identity, rules, paper-vs-code disagreements, open opportunities for future tightening, pointers to substrate — so a user returning to this directory for follow-on work inherits it. When deciding where to put something new, ask: does it stay useful once the task is done?
+**Constitution is task-bound; CLAUDE.md is durable.** The constitution describes what *this reproduction* is trying to achieve — Goal, Fidelity intent, Scope, Quality bar, Evidence, Open dimensions. Once the reproduction lands, the constitution can be archived. CLAUDE.md carries what stays useful past the reproduction — paper identity, rules, paper-vs-code disagreements, open opportunities for future tightening, pointers to substrate — so a user returning to this directory for follow-on work inherits it. When deciding where to put something new, ask: does it stay useful once the task is done?
 
 **Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every iteration that touches a sub-analysis reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Paper-vs-code disagreements* section so it's visible to every iteration and to the user at REVIEW. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
 
 **Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose. Intent is partly aesthetic ("how good does this need to be?") and partly pragmatic ("what's feasible given the compute, tokens, and wall-clock available?"). Both dimensions belong in the prose — *"just checking the analysis is tractable — an afternoon"*, *"Figure 3 must be right; the rest can stay rough — overnight"*, *"every primary and secondary target lining up within stated tolerance, a few days"*.
 
-The Rigor *Current state* vocabulary (*sketch / baseline / tightened / canonical*) tracks the trajectory per output. Each fresh-context iteration that reads the artifact and applies fixes moves it from *baseline* to *tightened*; the first subsequent fresh-context iteration that reads it and finds nothing to fix moves it to *canonical*. An artifact reaching *canonical* terminates its review cycle. If the intent wants outputs pushed even further than *canonical*, that's a future loop, logged as an Open opportunity in CLAUDE.md. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
+There's no explicit review state machine. Each iteration reads the prior phase's artifact critically as part of survey, fixes what needs fixing or advances if nothing does, commits, exits. The fresh-context property at iteration boundaries makes the next iteration the review. Gaps that the intent wants pushed further than the loop has time to deliver become Open opportunities in CLAUDE.md; a future loop relaunch closes them. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
 
 **arXiv-LaTeX-first acquisition.** When the paper is on arXiv, the source tarball is the substrate; equations, ligatures, captions, tables come through clean. PDF + Docling is a fallback for non-arXiv only.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 606d7b98..2c981c4a 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -18,7 +18,7 @@ ARCHITECT is what a ralph iteration does when the workdir signals "paper substra
 ## Outputs
 
 - `astra.yaml` at the project root — **stub form**: sub-analyses named, architecture wired (inputs / outputs declared at the sub-analysis level), high-level `narrative:` prose blocks per analysis. **No `decisions:`, `prior_insights:`, `findings:`, or `astra-anchor:` references yet** — those entries don't exist for the narrative to reference.
-- `constitution.md` updates: Rigor *Current state* appended with the stub's state (e.g. *stub: baseline* after a single-iteration write, *stub: tightened* if this iteration was a review pass that incorporated fixes).
+- `constitution.md` updates: Open dimensions, when something material surfaces that warrants user ratification at REVIEW.
 
 ## Step 1 — Read the substrate, then write the stub
 
@@ -80,42 +80,28 @@ analyses:
 - **Validate before exit.** `astra validate astra.yaml` must return clean.
 - **Targeted reads, not whole-paper absorption.** The indices give you most of what you need; reach into the source / document / code for specific items, not as a default.
 
-After the stub is written and validates, commit it (`architect: stub astra.yaml`) and update the constitution's Rigor *Current state* with the stub's state (e.g. *stub: baseline*).
+After the stub is written and validates, commit it (`architect: stub astra.yaml`) and exit.
 
-## Review-and-fix — the iterations after the write
+## Reviewing prior ARCHITECT work as part of survey
 
-There is no in-iteration review-round mechanism. The ralph loop's iteration boundary *is* the fresh-context review: iteration N writes the stub at *baseline*; the next iteration reads it cold, reviews silently, applies any fixes inline, updates the constitution's Rigor *Current state* (*baseline → tightened* if fixes landed, *baseline → canonical* if nothing needed fixing), commits, and exits. **The iteration that makes changes cannot declare the stub done** — a subsequent fresh-context iteration has to read it and find nothing to fix for it to reach *canonical*.
+There is no separate review phase. Every iteration that enters and finds an ARCHITECT stub on disk reads it critically before doing anything else. If you see real issues — wrong sub-analysis decomposition, reserved-name collision, missing in-scope output, narrative gap — fix them inline, commit (`architect: fix <what>`), and exit. Only when a fresh-context read finds nothing to fix does the iteration move on to SPECIFY work. The fresh-context property at iteration boundaries makes the next iteration the review; nothing else is needed.
 
-The cycle terminates when the stub hits *canonical*. Typical shapes:
-- Write iteration writes a clean stub → next iteration finds nothing → stub: *canonical* in two iterations.
-- Write iteration leaves small issues → next iteration fixes them (→ *tightened*) → iteration after that finds nothing (→ *canonical*) in three.
-- Bigger gaps may take more, but cap at 5 review iterations: if *canonical* isn't reached by then, log the tail in `open-questions.md` ("ARCHITECT review cap reached; user to review at close-out") and let the survey advance to SPECIFY.
-
-### When entering as a review-and-fix iteration
-
-The signal is "stub `astra.yaml` exists, Rigor *Current state* says *stub: baseline* or *stub: tightened*." Read the stub cold, then check:
+What to look at:
 
 1. **Sub-analysis decomposition.** Right cuts? Consistent with `code-index.md`? Defensible against the paper where the paper compresses?
 2. **Sub-analysis IDs.** Noun phrases. No reserved-name collisions (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`).
 3. **Inputs at sub-analysis level.** Each input has a stable id; the data dependency is real (cross-check against `code-index.md`'s External-data-dependencies and the paper's data section).
 4. **Outputs at sub-analysis level.** Each output corresponds to a result locus from `index.json` OR an intermediate artifact a downstream sub-analysis consumes. Targeted scope from `constitution.md`'s Scope is honored — no out-of-scope outputs sneaking in, no in-scope targets missed.
-5. **Narrative coverage.** Root narrative includes a data-flow paragraph (when sub-analyses exist). Each sub-analysis's narrative accurately describes its role. No `astra-anchor:` references at this stage; flag any that snuck in.
+5. **Narrative coverage.** Root narrative includes a data-flow paragraph (when sub-analyses exist). Each sub-analysis's narrative accurately describes its role. No `astra-anchor:` references at this stage.
 6. **Validates.** `astra validate astra.yaml` returns clean.
 
-Apply fixes inline as you find them. Don't write a separate findings file — the diff against the prior commit is the record of what changed. If fixes landed, commit (`architect: review-and-fix stub`) and update the constitution to *stub: tightened*. If nothing needed fixing, commit (`architect: review confirmed clean`, possibly empty) and update the constitution to *stub: canonical*. Exit.
-
-### What NOT to do during review-and-fix
-
-- Don't flag empty `decisions:` / `prior_insights:` / `findings:`. That's SPECIFY's territory.
-- Don't re-read the entire paper or code. Use the indices and targeted reads.
-- Don't declare the stub *canonical* in the same iteration where you applied fixes — the next fresh-context iteration earns that judgment.
+Don't flag empty `decisions:` / `prior_insights:` / `findings:` — that's SPECIFY's territory. Don't re-read the entire paper or code; use the indices and targeted reads. If you see the same artifact getting churned across many recent commits without convergence, log the situation to `open-questions.md` and advance the phase anyway.
 
 ## Survey signals (entry into ARCHITECT)
 
 - `work/reference/index.json` + `work/reference/astra.yaml` (paper substrate from INTERVIEW) + `work/reference/code-index.md` (code substrate from ACQUIRE, when code present) exist ⇒ paper + code substrate is ready
-- `astra.yaml` at project root absent (or present-but-empty) ⇒ this iteration writes the stub (records *stub: baseline*)
-- `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty), Rigor *Current state* shows *stub: baseline* or *stub: tightened* ⇒ this iteration is review-and-fix
-- Rigor *Current state* shows *stub: canonical* ⇒ ARCHITECT done; next iteration surveys for SPECIFY
+- `astra.yaml` at project root absent (or present-but-empty) ⇒ this iteration writes the stub
+- `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty) ⇒ ARCHITECT's output is on disk; read it critically. Fix anything wrong; otherwise the iteration moves on to SPECIFY.
 
 ## Notes
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/implement.md b/claude/lightcone/skills/lc-from-paper/references/implement.md
index b242c0c6..7822e346 100644
--- a/claude/lightcone/skills/lc-from-paper/references/implement.md
+++ b/claude/lightcone/skills/lc-from-paper/references/implement.md
@@ -1,6 +1,6 @@
-# IMPLEMENT — write scripts and recipes; review by iteration boundary
+# IMPLEMENT — write scripts and recipes
 
-Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands at *baseline*, each subsequent fresh-context iteration reads it cold, reviews against paper + code, applies fixes inline if any, updates Rigor *Current state* (*baseline → tightened* or → *canonical*), and exits. The cycle terminates when the implementation reaches *canonical* (a fresh-context iteration read it and found nothing to fix). Same shape ARCHITECT and SPECIFY use.
+Read `astra.yaml` (the filled spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. After the first-pass implementation lands, the next fresh-context iteration reads it critically against paper + code; if it sees issues it fixes them and exits, otherwise it advances to RUN. Same shape ARCHITECT and SPECIFY use.
 
 IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done + scripts/ absent (first pass) or comparison-report.yaml shows partial/fail (retry pass)". Most implementation is mechanical (translate spec → script). Where parallelization is feasible (multiple independent outputs from different scripts), the iteration fans out to one-level-deep sub-agents per output (inside its own main session) and merges.
 
@@ -11,7 +11,7 @@ IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done
 - `work/reference/index.json` — paper-side structural index (figures, tables, outline, citations); useful when the spec compresses or you need to find where in the paper a behavior is described.
 - `work/reference/code-index.md` (when code present) — code inventory: module map, candidate decisions with file:line, entry-points, data dependencies, gotchas (the canonical map of where each sub-analysis's logic lives in `work/reference/code/`).
 - `work/reference/code/` (if present) — **canonical reference. Read it when implementing each output.** Where paper and code disagree, code wins for numerics, plotting, and method.
-- `constitution.md` — Fidelity intent, Rigor *Current state* per output.
+- `constitution.md` — Fidelity intent.
 - `CLAUDE.md` — **Paper-vs-code disagreements** for prior conflicts already logged.
 
 ## Outputs
@@ -19,7 +19,6 @@ IMPLEMENT is what a ralph iteration does when the workdir signals "SPECIFY done
 - `scripts/<output>.py` (or `.sh`, or whatever fits) — one script per output (or shared scripts for tightly-coupled outputs)
 - `requirements.txt` — Python dependencies
 - Recipes in `astra.yaml` — each output gets a `recipe:` block with `command:` and `inputs:`
-- `constitution.md` updates — Rigor *Current state* per output (*baseline* after the write iteration, *tightened* after a review-and-fix iteration that landed changes, *canonical* after a fresh-context iteration found nothing to fix)
 - `CLAUDE.md` updates — append to **Paper-vs-code disagreements** for any new conflict surfaced during implementation
 
 ## Step 1: write recipes + scripts
@@ -60,13 +59,13 @@ The iteration merges scripts and recipes after the per-output sub-agents finish.
 5. **Do not execute scripts** — the RUN phase handles execution via `lc run`.
 6. **Validate** with `astra validate astra.yaml` after adding recipes.
 
-## Step 2: review-and-fix — iterations after the write
+## Step 2: reviewing prior IMPLEMENT work as part of survey
 
-After the first-pass implementation lands at *baseline*, the next fresh-context iteration reads it cold, reviews silently against paper + code, applies any fixes inline if needed, updates Rigor *Current state* (*baseline → tightened* if fixes landed, → *canonical* if nothing needed fixing), and exits. **The iteration that applied fixes cannot declare the implementation done** — a subsequent fresh-context iteration earns *canonical* by reading the implementation and finding nothing to fix. Cap at 5 review iterations: if *canonical* isn't reached by then, log the tail to `open-questions.md` and let the survey advance to RUN.
+There is no separate review phase. Every iteration that enters and finds `scripts/` + recipes on disk reads them critically against paper + code before doing anything else. If you see real issues — wrong constant, missing recipe, paper-vs-code drift, synthetic-data shortcut — fix them inline, commit (`implement: fix <what>`), exit. When a fresh-context read finds nothing to fix, the iteration advances to RUN.
 
 The cross-check question on entry: is the implementation consistent with the paper and the code?
 
-### What to check
+### What to look at
 
 1. **Recipe coverage.** Every output in `astra.yaml` has a recipe; every recipe runs a script that exists in `scripts/`.
 2. **Method fidelity.** For each output, the script implements the method described by the relevant sub-analysis's `decisions:` and `findings:` in `astra.yaml`. Where SPECIFY's code pass surfaced a material disagreement, the script follows the code's method (canonical-resolution rule), unless the spec recorded a different override in `decisions:` and `universes/baseline.yaml`.
@@ -75,14 +74,11 @@ The cross-check question on entry: is the implementation consistent with the pap
 5. **Determinism.** Scripts set random seeds where the paper's method is stochastic. Library versions in `requirements.txt` are pinned where reproducibility requires it.
 6. **Recipe wiring.** Recipe `inputs:` references match the data-flow the scripts actually consume; no orphan dependencies, no missing dependencies.
 
-Apply fixes inline as you find them — `scripts/`, `astra.yaml` recipes, `requirements.txt`, `implementation-notes.md`, the disagreements log in CLAUDE.md when a new material conflict surfaces. After any change to `astra.yaml`, run `astra validate astra.yaml`. If fixes landed: commit (`implement: review-and-fix`), update Rigor to *tightened*. If nothing needed fixing: commit (`implement: review confirmed clean`, possibly empty), update Rigor to *canonical*. Exit.
+Apply fixes inline as you find them — `scripts/`, `astra.yaml` recipes, `requirements.txt`, `implementation-notes.md`, the disagreements log in CLAUDE.md when a new material conflict surfaces. After any change to `astra.yaml`, run `astra validate astra.yaml`. Commit the diff and exit.
 
-### What NOT to do during review-and-fix
+Don't re-read the entire paper; grep into `work/reference/index.json`, `work/reference/code-index.md`, and `work/reference/source/` (or `document.md`) for specific items. Don't declare the implementation done in the same iteration where you landed fixes — the next fresh-context iteration reads it cold; if nothing needs fixing, it advances to RUN, which is the "done" signal.
 
-- **Don't re-read the entire paper.** Grep into `work/reference/index.json`, `work/reference/code-index.md`, and `work/reference/source/` (or `document.md`) for specific items.
-- **Don't declare the implementation *canonical* in the same iteration where you applied fixes.** That's the next fresh-context iteration's call.
-
-The post-RUN COMPARE → IMPLEMENT retry loop is separate from this review cycle — that loop handles result-matching after the pipeline executes, not spec/implementation alignment before it.
+The post-RUN COMPARE → IMPLEMENT retry loop is separate from this critical-read pattern — that loop handles result-matching after the pipeline executes, not spec/implementation alignment before it.
 
 ## Data: REAL DATA ONLY
 
@@ -96,14 +92,12 @@ If a dataset is behind a paywall, requires registration, or is "available upon r
 
 If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, a subsequent iteration may take on a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. Default attempt budget is 5; the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, accept partial, log the failure as an Open opportunity in CLAUDE.md (so REVIEW close-out can decide whether to push further or accept the trajectory), and exit; subsequent iterations either accept the verdict via a cold close or pivot scope based on REVIEW's input.
 
-A retry attempt restarts the IMPLEMENT review cycle on the changed scripts (back to *baseline*) before the next iteration advances to RUN.
+A retry attempt restarts the critical-read pattern on the changed scripts before the next iteration advances to RUN.
 
 ## Survey signals (entry into IMPLEMENT)
 
 - `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement first pass
-- `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done (Rigor: *baseline*)
-- Rigor *Current state* shows *baseline* or *tightened* for the implementation ⇒ this iteration is review-and-fix
-- Rigor *Current state* reaches *canonical* ⇒ IMPLEMENT done; the next iteration surveys and advances to RUN
+- `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ IMPLEMENT's output is on disk; read it critically. Fix anything wrong; otherwise the iteration advances to RUN.
 - `comparison-report.yaml` returns `pass` ⇒ COMPARE → IMPLEMENT loop terminated; the constitution can close after a cold survey, and REVIEW close-out runs in the user's main session
 
 ## Notes
@@ -111,5 +105,5 @@ A retry attempt restarts the IMPLEMENT review cycle on the changed scripts (back
 - **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation.
 - **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs."
 - **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work.
-- **The iteration that fixed the artifact can't declare it canonical.** Termination requires a subsequent fresh-context iteration to read the work and find nothing to fix. This is what the fresh-context-no-bias property buys you; conflating the fix-iteration with the canonical-judgment defeats it.
-- **Commit as you go.** One commit per script + recipe wiring; one commit per review-and-fix pass; one commit per confirmed-clean review. The next iteration reads `git log` and Rigor *Current state* to track progress.
+- **The iteration that fixed the artifact can't also be the iteration that judges it clean.** That's the fresh-context-no-bias property at iteration boundaries; conflating fix-iteration with done-judgment defeats it.
+- **Commit as you go.** One commit per script + recipe wiring; one commit per fix. The next iteration reads `git log` to track progress.
diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/interview.md
index 5612ee82..620a44ca 100644
--- a/claude/lightcone/skills/lc-from-paper/references/interview.md
+++ b/claude/lightcone/skills/lc-from-paper/references/interview.md
@@ -12,7 +12,7 @@ The two-beat shape exists because most interview questions are inherently hard t
 
 Three things in the reproduction workdir, all committed together at the end:
 
-- **`constitution.md`** — drafted from [`../templates/constitution.md`](../templates/constitution.md). YAML frontmatter `status: active`, then Goal, Fidelity intent, Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL, where the substrate lives), Rigor *Current state* per output (starts empty; iterations append), Open dimensions. The ralph loop's driving document; each iteration reads it on entry. The top half (Goal, Scope, Quality bar, Evidence) sharpens slowly; the bottom half (Rigor *Current state*) is updated each iteration. Task-bound — archivable once the reproduction closes.
+- **`constitution.md`** — drafted from [`../templates/constitution.md`](../templates/constitution.md). YAML frontmatter `status: active`, then Goal, Fidelity intent, Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL, where the substrate lives), Open dimensions. The ralph loop's driving document; each iteration reads it on entry. The body sharpens slowly; Open dimensions is updated each iteration as decisions worth user ratification surface. Task-bound — archivable once the reproduction closes.
 - **`CLAUDE.md`** — drafted from [`../templates/CLAUDE.md`](../templates/CLAUDE.md). Paper identity at the top (DOI, title, one-line subject), Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty; iterations append), Open opportunities (starts empty; iterations append), Pointers (to `constitution.md`, `work/reference/`, etc.). The auto-loading walk-up; every Claude Code session in the workdir picks it up. Durable — stays useful for any follow-on work in this directory once the reproduction lands.
 - **`work/reference/` paper substrate** — produced by `/paper-extraction` between beats: `paper.pdf`, `source/` (Path A) or `document.md` (Path B), `index.json`, `astra.yaml`, `figures/`, `tables/`, `bibliography-source.{bib,bbl}`.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index e1915c19..640807f4 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -13,7 +13,7 @@ LITERATURE is what a ralph iteration does when the workdir signals "SPECIFY done
 - `astra.yaml` — filled by SPECIFY's paper (and code) passes; each sub-analysis has `prior_insights:` entries shaped as syntactically-complete `Insight` blocks (`id`, `claim`, `created_at`, `evidence: [{id, doi}]`) where each Evidence carries a `doi:` but no `quote:` selector. These are the placeholders LITERATURE resolves by writing `quote: {exact, prefix, suffix}` and `location: {page}` onto each Evidence entry. The option↔insight linkage already lives on the option side (`Option.insights`); LITERATURE does not touch it.
 - `work/reference/index.json#citations` — paper-extraction's cite-key → `{locations, citation, doi}` mapping for every entry in the target paper's bibliography. Used as the canonical cite-key → DOI lookup when cross-checking placeholder DOIs and surfacing unresolved-DOI cases.
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) — target paper text. Grep into for context on how the cited paper is invoked, when a placeholder's claim is ambiguous.
-- `constitution.md` — Fidelity intent; Rigor *Current state* per output (so this iteration knows where prior insights currently sit).
+- `constitution.md` — Fidelity intent.
 
 ## Outputs
 
@@ -164,9 +164,9 @@ Rules:
 
 When the iteration fans out to Haikus, each Haiku is spawned with `model="haiku"` and gets this contract plus its assigned subset of placeholders and substrate paths.
 
-## Review-and-fix — iterations after the merge
+## Reviewing prior LITERATURE work as part of survey
 
-After the merge lands (Rigor: *baseline*), the next fresh-context iteration reads cold, runs `astra validate --verify-evidence` for the deterministic check, does a semantic re-read of each resolved insight, applies fixes inline if needed, updates Rigor *Current state* (*baseline → tightened* if fixes landed, → *canonical* if nothing needed fixing), and exits. **The iteration that applied fixes cannot declare LITERATURE done** — a subsequent fresh-context iteration earns *canonical* by reading the resolutions and finding nothing to fix. Cap at 5 review iterations: if *canonical* isn't reached by then, log the tail to `open-questions.md` and let the survey advance.
+There is no separate review phase. Every iteration that enters and finds `prior_insights:` placeholders resolved on disk reads them critically — running `astra validate --verify-evidence` for the deterministic check, plus a semantic re-read of each insight. If you see real issues — tangential quote, wrong cited paper, broken `Option.insights` linkage — fix them inline, commit (`literature: fix <what>`), exit. When a fresh-context read finds nothing to fix, the iteration advances to IMPLEMENT.
 
 The cross-check questions on entry:
 
@@ -176,7 +176,7 @@ The cross-check questions on entry:
 4. **Cited paper is the right paper.** Does the target paper actually invoke this DOI for this claim?
 5. **Unresolved entries are honest.** For entries in `open-questions.md` flagged unresolved, does a closer read of the cited paper find supporting evidence the resolver missed?
 
-Apply fixes inline as you find them — `astra.yaml`'s `prior_insights:` entries (including re-running Haiku quote-finding for entries that need a different quote, when the gap is mechanical rather than semantic). If fixes landed: commit (`literature: review-and-fix`), update Rigor to *tightened*. If nothing needed fixing: commit (`literature: review confirmed clean`, possibly empty), update Rigor to *canonical*. Exit.
+Apply fixes inline as you find them — `astra.yaml`'s `prior_insights:` entries (including re-running Haiku quote-finding for entries that need a different quote, when the gap is mechanical rather than semantic). Commit the diff and exit.
 
 If the entry genuinely has no supporting quote in the cited paper, log it to `open-questions.md` with a "no support found" note and leave the entry as-is for the user to resolve at REVIEW. Don't fabricate evidence.
 
@@ -185,9 +185,8 @@ If the entry genuinely has no supporting quote in the cited paper, log it to `op
 - `astra.yaml` has `prior_insights:` placeholders — entries with `claim:` plus Evidence carrying `doi:` but no `quote:` selector ⇒ ready to resolve
 - `work/cited/<doi-slug>/work/reference/index.json` exists for each unique cited DOI ⇒ fetches done
 - `work/notes/literature/resolutions.yaml` exists with non-empty resolutions / unresolved sections ⇒ quote-finding done
-- `astra.yaml`'s `prior_insights:` entries each have a resolved `quote:` (+ `location:`) selector on their Evidence ⇒ merge done (Rigor: *baseline*)
-- `astra validate astra.yaml --verify-evidence` returns clean ⇒ structural validation done
-- Rigor *Current state* reaches *canonical* for LITERATURE ⇒ LITERATURE done; the next iteration surveys and advances to IMPLEMENT
+- `astra.yaml`'s `prior_insights:` entries each have a resolved `quote:` (+ `location:`) selector on their Evidence ⇒ merge done
+- `astra validate astra.yaml --verify-evidence` returns clean ⇒ structural validation done; read the resolutions critically. Fix anything wrong; otherwise the iteration advances to IMPLEMENT.
 
 ## Notes
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index 63b0a381..f71e1c7b 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -16,7 +16,7 @@ The phase name **REVIEW** is freed by the old pre-implement REVIEW phase folding
 - `work/reference/index.json` and `work/reference/code-index.md` — for context
 - `work/reference/source/` (Path A) or `work/reference/document.md` (Path B) and `work/reference/code/` — directly available for follow-up questions the user asks during REVIEW that the report and CLAUDE.md don't answer ("remind me what the paper says about X", "did the original code do Y"). Grep into for specifics; read targeted spans by offset/limit.
 - `CLAUDE.md` at the workdir root — paper identity, Rules, Paper-vs-code disagreements, Open opportunities (the durable surface, accumulated across iterations)
-- `constitution.md` at the workdir root — Goal, Fidelity intent, Scope, Quality bar, Evidence, Rigor *Current state* (the driving document the loop has been working against)
+- `constitution.md` at the workdir root — Goal, Fidelity intent, Scope, Quality bar, Evidence, Open dimensions (the driving document the loop has been working against)
 
 ## Outputs
 
@@ -76,7 +76,7 @@ Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes;
 
 ## Step 4: reconcile the Open opportunities list
 
-COMPARE iterations have been logging un-acted-on opportunities into CLAUDE.md's *Open opportunities* list as they run, so the list is already populated. REVIEW's job here is reconciliation: cross-check that every opportunity in `comparison-report.yaml`'s `opportunities:` block that the user did NOT act on is present in CLAUDE.md's list, and remove any that the user acted on at REVIEW (e.g. authorized one more IMPLEMENT round to close). Note any acted-on closures in the constitution's Rigor *Current state* (e.g. *Figure 3: tightened* if the systematics treatment got a heavier pass).
+COMPARE iterations have been logging un-acted-on opportunities into CLAUDE.md's *Open opportunities* list as they run, so the list is already populated. REVIEW's job here is reconciliation: cross-check that every opportunity in `comparison-report.yaml`'s `opportunities:` block that the user did NOT act on is present in CLAUDE.md's list, and remove any that the user acted on at REVIEW (e.g. authorized one more IMPLEMENT round to close).
 
 ## Step 5: commit
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 8228f9ba..47488e28 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -11,7 +11,7 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 ## Inputs
 
 - `astra.yaml` — the stub from ARCHITECT (sub-analyses, inputs, outputs, narrative; empty `decisions:` / `prior_insights:` / `findings:` blocks)
-- `constitution.md` — Goal (scope), Fidelity intent, Quality bar, Rigor *Current state* for the per-output trajectory tracking
+- `constitution.md` — Goal (scope), Fidelity intent, Quality bar
 - `CLAUDE.md` — Rules; **Paper-vs-code disagreements** for prior-iteration entries
 - `work/reference/index.json` — paper-extraction's structural index: figures, tables, section outline, citations. The `citations:` block maps each cited paper's BibTeX key (Path A) or synthetic `<lastname>_<year>` key (Path B) to `{locations, citation, doi}`. SPECIFY uses this to write each `prior_insights:` placeholder's `doi:` so LITERATURE knows which paper to fetch.
 - `work/reference/code-index.md` (when code present) — code inventory: module map, candidate decisions with file:line, entry-points, data dependencies, gotchas.
@@ -26,8 +26,8 @@ Per-sub-analysis work is parallelizable when sub-analyses are independent. Each
 - `universes/baseline.yaml` — selects the paper's choices (where paper and code disagree per the canonical-resolution rule, see "Material conflicts" below)
 - `implementation-notes.md` — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays.
 - `targets/targets.md` — small target ledger COMPARE consumes: per output (already declared by ARCHITECT), a brief entry with type, priority, paper value, expected match criteria, and the path to the reference figure / table / metric (when applicable, copy the reference file into `targets/` so the directory is self-contained)
-- `constitution.md` updates — Rigor *Current state* per sub-analysis (*baseline* after the write iteration, *tightened* after a review-and-fix iteration that landed changes, *canonical* after a fresh-context iteration found nothing to fix)
 - `CLAUDE.md` updates — append entries to **Paper-vs-code disagreements** for each material conflict surfaced
+- `constitution.md` updates — Open dimensions when something material warrants user ratification at REVIEW
 
 ## Substrate skills to invoke
 
@@ -119,11 +119,11 @@ Read the code that implements this sub-analysis (`work/reference/code-index.md`'
 
 3. **Decision-option augmentation.** Where the code reveals an option the paper didn't mention but is defensible (a sibling implementation alternative used in the codebase or referenced in a comment), add it as a sibling option to the relevant `decisions:` entry. Do not pre-emptively author every code variant; only the ones that bear on a real choice.
 
-### Review-and-fix — the iterations after the passes
+### Reviewing prior SPECIFY work as part of survey
 
-After the paper + code passes land for a sub-analysis (Rigor: *baseline*), the next fresh-context iteration reads the slice cold, reviews silently, applies any fixes inline, updates the Rigor *Current state* (*baseline → tightened* if fixes landed, *baseline → canonical* if nothing needed fixing), commits, and exits. **The iteration that makes changes cannot declare the sub-analysis done** — a subsequent fresh-context iteration earns *canonical* by reading the work and finding nothing to fix. Cap at 5 review iterations per sub-analysis: if *canonical* isn't reached by then, log the tail to `open-questions.md` and let the survey advance.
+There is no separate review phase. Every iteration that enters and finds a SPECIFY-filled sub-analysis on disk reads it critically before doing anything else. If you see real issues — missing decision, paraphrased quote, dropped disagreement, broken anchor — fix them inline, commit (`specify: fix <sub-analysis-id> <what>`), and exit. When a fresh-context read finds nothing to fix in a sub-analysis, the iteration moves on (next sub-analysis, or next phase if every sub-analysis is clean).
 
-The cross-check question on entry: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
+The cross-check questions on entry: are the decisions covering everything material? Are the evidence quotes verbatim? Are the findings actually traceable to the paper or code? Did any material disagreement get silently dropped?
 
 #### What to check
 
@@ -143,15 +143,15 @@ astra validate astra.yaml
 astra validate astra.yaml --verify-evidence  # after LITERATURE has resolved the prior_insights placeholders
 ```
 
-If fixes landed: commit (`specify: review-and-fix <sub-analysis-id>`), update Rigor to *tightened*. If nothing needed fixing: commit (`specify: review confirmed clean <sub-analysis-id>`, possibly empty), update Rigor to *canonical*. Exit. The next iteration's survey checks each sub-analysis's Rigor state to decide what's next.
+Commit the diff (`specify: fix <sub-analysis-id> <what>`) and exit.
 
-#### What NOT to do during review-and-fix
+#### What NOT to do
 
 - **Don't flag missing `recipes:`.** Recipes are IMPLEMENT's, not SPECIFY's.
 - **Don't re-read the entire paper.** Use Grep on `work/reference/source/` (or `document.md`) for the specific claims you want to verify; lean on `work/reference/index.json`.
-- **Don't declare a sub-analysis *canonical* in the same iteration where you applied fixes.** That's the next fresh-context iteration's call.
+- **Don't declare the sub-analysis done in the iteration where you landed fixes.** The next fresh-context iteration reads it cold; if nothing needs fixing, it moves on, which is the "done" signal.
 
-When every sub-analysis reaches *canonical*, SPECIFY produces the final outputs:
+When every sub-analysis is clean and the SPECIFY-final outputs (target ledger, baseline universe, implementation-notes) are in place, SPECIFY produces its final artifacts:
 
 ## Target-ledger output
 
@@ -182,7 +182,7 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative; empty decisions / prior_insights / findings) ⇒ ready to specify
 - For each sub-analysis: `decisions:` populated with decision-level `rationale:` + options (paper's choice at `default:`); `findings:` populated as full Insight blocks with paper-anchored Evidence (DOI + `quote: {exact, prefix, suffix}` + `location: {page}`); `prior_insights:` populated as citation placeholders (`id`, `claim`, `created_at`, `evidence: [{id, doi}]` with `quote:` omitted — LITERATURE fills the quotes next); `Option.insights` back-references wired up where options draw on placeholders ⇒ paper pass done
 - For each sub-analysis: when `work/reference/code/` exists, code-pass material-disagreement entries land in `decisions:` (with both options) and `universes/baseline.yaml` selects the canonical-resolution choice; `implementation-notes.md` carries non-material gotchas ⇒ code pass done
-- For each sub-analysis: Rigor *Current state* reaches *canonical* ⇒ that sub-analysis's review cycle is done
+- For each sub-analysis: a fresh-context iteration reads the slice and finds nothing to fix ⇒ that sub-analysis is done; the next iteration moves on
 - `astra validate astra.yaml` returns clean (placeholders whose Evidence carries `doi:` without `quote:` are valid at this stage) ⇒ structural side validated; `--verify-evidence` waits until LITERATURE has authored the `quote:` + `location:` selectors
 - `targets/targets.md` exists with each entry mapped to a spec home ⇒ target-ledger done
 - `implementation-notes.md` exists ⇒ practical-guidance side done
diff --git a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
index 66e73a59..c8ecaa25 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/CLAUDE.md
@@ -2,7 +2,7 @@
 
 Reproduction of **<paper title>** (<arXiv ID>). DOI: <doi>. One-line subject: <e.g. "BAO scale measurement from DESI DR1">.
 
-The driving document for this reproduction is [`constitution.md`](constitution.md) — Goal, Fidelity intent, Scope, Quality bar, Evidence, and the running Rigor *Current state* per output. Every ralph iteration reads it on entry. This file (`CLAUDE.md`) is the auto-loading walk-up: rules + durable findings that stay useful past the reproduction (Open opportunities for future tightening, Paper-vs-code disagreements, pointers).
+The driving document for this reproduction is [`constitution.md`](constitution.md) — Goal, Fidelity intent, Scope, Quality bar, Evidence, Open dimensions. Every ralph iteration reads it on entry. This file (`CLAUDE.md`) is the auto-loading walk-up: rules + durable findings that stay useful past the reproduction (Open opportunities for future tightening, Paper-vs-code disagreements, pointers).
 
 ## Rules
 
diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
index 47666373..a45cd41c 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/constitution.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -4,7 +4,7 @@ status: active
 
 # <paper-slug> — reproduction constitution
 
-The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, DOI <doi>). Every iteration reads this on entry to know what "done" looks like and where each output currently sits. The top half (Goal, Scope, Quality bar, Evidence) **sharpens slowly** — only when something fundamental shifts (target moves, scope opens or fences, a material disagreement makes us re-think a sub-analysis). The bottom half (Rigor *Current state*, Open dimensions) is updated each iteration. Durable findings that stay useful past the reproduction — paper-vs-code disagreements, open opportunities for future tightening, pointers to substrate — live in `CLAUDE.md`.
+The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, DOI <doi>). Every iteration reads this on entry to know what "done" looks like. The body **sharpens slowly** — only when something fundamental shifts (target moves, scope opens or fences, a material disagreement makes us re-think a sub-analysis); Open dimensions is updated each iteration as decisions worth user ratification surface. Durable findings that stay useful past the reproduction — paper-vs-code disagreements, open opportunities for future tightening, pointers to substrate — live in `CLAUDE.md`.
 
 ## Goal
 
@@ -26,7 +26,7 @@ What "canonical" rigor looks like for *this* paper. The bar that primary-target
 - <e.g. "magnitude cuts and selection match the code's defaults exactly; any deviation is recorded as a paper-vs-code disagreement with both options preserved">
 - <e.g. "every prior insight cites a real verbatim quote from the cited paper">
 
-This is the ceiling; the fidelity intent determines which outputs need to actually reach it. The *Rigor — current state* table below tracks where each output currently sits relative to this bar.
+This is the ceiling; the fidelity intent determines which outputs need to actually reach it.
 
 ## Evidence
 
@@ -38,12 +38,6 @@ The substrate this reproduction is built against — the canonical sources itera
 - **arXiv ID:** <id> (if applicable)
 - **Code repo URL:** <url>
 
-## Rigor — current state
-
-Per-output trajectory tracking, updated by iterations as they produce or review artifacts. Coarse adjectives per output or per phase: *sketch / baseline / tightened / canonical*. *baseline* — first version written. *tightened* — at least one fresh-context iteration reviewed and applied fixes. *canonical* — a fresh-context iteration reviewed and found nothing to fix (terminates the review cycle for that artifact). Read alongside Fidelity intent above so each iteration knows where each output currently sits. Empty until the first iteration produces something:
-
-- (none yet)
-
 ## Open dimensions
 
 Decisions worth surfacing to the user — places the reproduction could go differently and the call benefits from human ratification. Iterations append here when something material comes up that isn't itself a paper-vs-code disagreement (those go to `CLAUDE.md`'s disagreements log instead). The user resolves these at REVIEW close-out, or earlier if they're around.
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index 963247b2..f4b690f4 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -63,12 +63,13 @@ INTERVIEW drafts two files in the reproduction workdir; every
 iteration picks them up on launch.
 
 - **`constitution.md`** — the ralph loop's driving document, *task-bound*.
-  YAML frontmatter declares `status: active`. Top half (sharpens slowly):
-  Goal (carrying the **fidelity intent** — the user's own "what do you
-  want out of this stretch, given what you have to spend on it"), Scope
-  (in/out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL).
-  Bottom half (updates each iteration): Rigor *Current state* per
-  output, Open dimensions. Archivable once the reproduction closes.
+  YAML frontmatter declares `status: active`. Goal (carrying the
+  **fidelity intent** — the user's own "what do you want out of this
+  stretch, given what you have to spend on it"), Scope (in/out),
+  Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Open
+  dimensions (decisions worth user ratification, updated each
+  iteration). The body sharpens slowly. Archivable once the
+  reproduction closes.
 - **`CLAUDE.md`** — the auto-loading walk-up, *durable*. Paper identity
   at the top; Rules (code-as-canonical, no blocking on `AskUserQuestion`
   mid-iteration, arXiv-LaTeX-first, `astra validate --verify-evidence`
@@ -97,14 +98,13 @@ Pointers, not snapshots.
   intent is partly aesthetic ("how good does this need to be?") and
   partly pragmatic ("what's feasible given the compute, tokens, and
   wall-clock available?"). The honest meta-conversation lives in
-  INTERVIEW; each iteration then sizes its work from the gap between
-  the constitution's Rigor *Current state* and that intent. The
-  vocabulary is *sketch / baseline / tightened / canonical*: the
-  write iteration produces *baseline*; each subsequent fresh-context
-  iteration either lands fixes (→ *tightened*) or finds nothing to fix
-  (→ *canonical*, which terminates the review cycle). The iteration
-  that applied fixes can't declare the artifact *canonical* — that
-  judgment belongs to the next fresh-context read.
+  INTERVIEW. There's no explicit review state machine: every
+  iteration reads the most recent artifact critically as part of
+  survey, fixes what needs fixing or advances if nothing does. The
+  fresh-context property at iteration boundaries makes the next
+  iteration the review. Gaps the intent wants pushed further than
+  the loop has time to deliver become Open opportunities in
+  CLAUDE.md for a future loop.
 - **arXiv LaTeX first.** PDF + Docling is the non-arXiv fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,
   every input must be real.

From b944a3e5c7abcb369426b7c99f8550e4df09253c Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 21:39:17 +0200
Subject: [PATCH 106/124] paper-extraction: handle nested caption braces

---
 .../lc-from-paper/references/acquire.md       | 91 -------------------
 .../references/{interview.md => orient.md}    |  0
 .../scripts/extract-paper-substrate.py        | 12 ++-
 tests/test_paper_extraction_caption.py        | 24 +++++
 4 files changed, 34 insertions(+), 93 deletions(-)
 delete mode 100644 claude/lightcone/skills/lc-from-paper/references/acquire.md
 rename claude/lightcone/skills/lc-from-paper/references/{interview.md => orient.md} (100%)
 create mode 100644 tests/test_paper_extraction_caption.py

diff --git a/claude/lightcone/skills/lc-from-paper/references/acquire.md b/claude/lightcone/skills/lc-from-paper/references/acquire.md
deleted file mode 100644
index 683baee8..00000000
--- a/claude/lightcone/skills/lc-from-paper/references/acquire.md
+++ /dev/null
@@ -1,91 +0,0 @@
-# ACQUIRE — stand up the code substrate
-
-The post-INTERVIEW substrate phase. Runs in the user's main session, right after INTERVIEW has committed `constitution.md`, `CLAUDE.md`, and the paper substrate produced by `/paper-extraction`. ACQUIRE is now thin: it stands up the **code substrate** if there's a reference code repository, then commits. The paper substrate is INTERVIEW's deliverable, not ACQUIRE's — INTERVIEW reads the paper to ground its grounded-beat questions, so the substrate already exists on disk when ACQUIRE starts.
-
-There is no `acquire` sub-agent. ACQUIRE's work is at most one skill invocation (`/lc-from-code` in scan-only mode) plus the surrounding clone, status file, and commit.
-
-If the paper has no reference code repo (and the user didn't supply a private one in INTERVIEW), ACQUIRE is one step: write `code-status.yaml` with `found: false` and proceed to launch the loop.
-
-## Where this runs
-
-User's main session, directly. The one sub-skill invocation is `/lc-from-code` in scan-only mode against the cloned reference repo.
-
-## Inputs
-
-- **Code repo URL** (from `constitution.md`'s Evidence section, surfaced during INTERVIEW Beat 2). May be absent if the paper has no public code and the user didn't supply a private one.
-- **Paper substrate** at `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml, figures/, tables/}` — produced by `/paper-extraction` during INTERVIEW. Read-only from ACQUIRE's perspective; iterations consult it, ACQUIRE doesn't modify it.
-
-## Outputs
-
-All on-disk:
-
-- `work/reference/code/` — cloned reference repo (absent if `code-status.yaml` records `found: false`)
-- `work/reference/code-status.yaml` — record of where the code came from (or that it wasn't found)
-- `work/reference/code-index.md` — script inventory, candidate decisions, dependencies, container hints (absent when no code substrate)
-
-## Step 1 — Locate, clone, scan
-
-1. **Locate the reference code repository.**
-   - If a URL was supplied at INTERVIEW (recorded in `constitution.md`'s Evidence section), use it.
-   - Otherwise, the paper has no public code repo and the user didn't supply a private one — go to Step 1.4 and record `found: false`.
-
-2. **Clone if found:**
-   ```bash
-   git clone --depth 1 <url> work/reference/code
-   ```
-
-   For multi-project monorepos where the user pointed at specific subpaths (e.g. GitHub `tree/<branch>/<path>` URLs), clone the whole repo on the named branch — don't sparse-checkout — and capture the primary subpaths in `code-status.yaml` so `/lc-from-code` knows where to focus.
-
-3. **If `work/reference/code/` exists, run `/lc-from-code` in scan-only mode against it:**
-   - Invoke `/lc-from-code` pointing at the cloned repo, with an invocation prompt that names the primary subpaths from `code-status.yaml` (if any) and reminds it of the scan-only contract: write `work/reference/code-index.md` only; do not touch `astra.yaml` at the project root; do not parameterize any code; do not run anything; do not modify the cloned repo.
-   - The scan-only branch of `/lc-from-code` does the inventory pass and writes to `work/reference/code-index.md`.
-
-4. **Write `work/reference/code-status.yaml`:**
-   ```yaml
-   found: true        # or false
-   url: "https://..."  # null if not found
-   branch: "main"     # or whichever branch was cloned; null if not found
-   cloned: true       # false if found but clone failed
-   primary_subpaths:  # optional; for multi-project monorepos
-     - "notebooks/..."
-     - "..."
-   notes: "..."
-   ```
-
-`/lc-from-code`'s scan-only branch is the canonical code-inventory mechanism. Its prompt-context surface is what carries the "stop at scan" contract.
-
-**A scan-only return is not an ACQUIRE stopping point.** ACQUIRE is incomplete until Step 2 below has either succeeded or hit a concrete launcher blocker. When `/lc-from-code` returns, do not summarize the scan as the final user-facing result. Continue immediately to Step 2: commit the code substrate, launch the ralph loop, and tell the user the session name.
-
-## Step 2 — Commit and launch the ralph loop
-
-1. **Commit the code substrate.** Stage `code-status.yaml` + `code-index.md` and commit — small, descriptive ("acquire: code substrate (sp_validation @ develop)"). The `work/reference/code/` clone itself can be `.gitignore`d or committed depending on the project's preference; the inventory file (`code-index.md`) is what downstream iterations actually consult, and gitignoring keeps the workdir tracked-size small for a 50+ MB monorepo clone.
-
-2. **Tell the user** the ralph loop is about to launch. Surface anything notable from Step 1 — if `code-status.yaml` records `found: false` or the cloned repo is gnarly (no `requirements.txt`, abandoned-looking, etc.), mention it now so the user can adjust scope before iterations start working against the substrate.
-
-3. **Launch the loop** (per SKILL.md's *Launching the loop* section):
-   ```bash
-   .claude/skills/ralph/scripts/ralph constitution.md
-   ```
-   Tell the user the tmux session name and the attach command. Iterations start firing immediately.
-
-## Survey signals (entry into ACQUIRE)
-
-Run `ls work/reference/` first.
-
-- `work/reference/code/` present, **or** `code-status.yaml` records `found: false`, **and** `code-index.md` is present → ACQUIRE is done. Commit any unstaged changes and launch the loop.
-- Otherwise, run Step 1.
-
-If the paper substrate (`paper.pdf`, `index.json`, etc.) is missing, INTERVIEW didn't complete cleanly — re-invoke `/paper-extraction` against the partial state (idempotent; skips done work) and confirm the constitution + CLAUDE.md are consistent with what's on disk, before continuing.
-
-## Notes
-
-- **lc-from-code is the code-inventory authority** for the scan portion. ACQUIRE's invocation constrains it to scan-only via the prompt; the parameterization and run portions of `/lc-from-code` are not invoked at this phase.
-- **The cloned code is read-only reference.** Iterations may re-read it; nothing modifies `work/reference/code/`. (When the reproduction's implementation needs to happen, that's an IMPLEMENT-phase decision, not an ACQUIRE one.)
-- **Code-as-canonical** is loaded by every iteration via `CLAUDE.md`'s Rules. ACQUIRE just stands up the reference so the rule has something to point at.
-- **This phase is acquisition, not understanding.** ACQUIRE doesn't write `astra.yaml` at the project root and doesn't compare paper to code. ARCHITECT does that, in the first ralph iteration after the loop launches.
-- **No reference code is still a valid ACQUIRE outcome.** When `code-status.yaml` records `found: false`, iterations operate in paper-only mode — methodology lives in the paper's prose; no code-as-canonical adjudication is needed. CLAUDE.md's code-as-canonical Rule self-disables in that case.
-- **Surface anti-patterns from the scan.** If `code-status.yaml` reports the clone failed or the repo is clearly dead, surface to the user immediately rather than launching a loop against half-acquired substrate.
-
-## Future substrate types
-
-ACQUIRE's purpose is "stand up reference substrate that wasn't surfaced in INTERVIEW." Today, that's just the code. If a future paper requires substrate types that aren't paper-or-code (a specific dataset to fetch from an open archive, supplementary materials, calibration files), they fit naturally as Step 1.5 in ACQUIRE — produced before commit + launch, with a status file recording what was acquired. Don't accrete those into INTERVIEW (which is about conversation) or into the ralph loop (which is about iteration over committed substrate).
diff --git a/claude/lightcone/skills/lc-from-paper/references/interview.md b/claude/lightcone/skills/lc-from-paper/references/orient.md
similarity index 100%
rename from claude/lightcone/skills/lc-from-paper/references/interview.md
rename to claude/lightcone/skills/lc-from-paper/references/orient.md
diff --git a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
index 93e50843..623dd1db 100755
--- a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
+++ b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
@@ -80,7 +80,9 @@
 )
 ADS_API = "https://api.adsabs.harvard.edu/v1/search/query"
 NETWORK_TIMEOUT_S = 10
-CAPTION = re.compile(r"\\caption\{((?:[^{}]|\{[^}]*\})*)\}", re.DOTALL)
+# Match caption commands; the body itself is walked with balanced-brace logic so
+# nested braces and escaped braces survive intact.
+CAPTION = re.compile(r"\\caption\*?\s*(?:\[[^\]]*\])?\s*\{")
 LABEL = re.compile(r"\\label\{([^}]+)\}")
 INCLUDEGRAPHICS = re.compile(r"\\includegraphics(?:\[[^\]]*\])?\{([^}]+)\}")
 PLOTONE = re.compile(r"\\plotone\{([^}]+)\}")
@@ -104,8 +106,14 @@ def extract_caption(text: str, macros: dict[str, str]) -> str:
 
     Composite figures often have empty subfigure captions before the real
     top-level caption; taking the first caption produces a false warning.
+    Balanced-brace walking preserves nested LaTeX and escaped braces inside
+    caption bodies.
     """
-    captions = [m.group(1).strip() for m in CAPTION.finditer(text)]
+    captions = []
+    for match in CAPTION.finditer(text):
+        body = walk_balanced_braces(text, match.end() - 1)
+        if body is not None:
+            captions.append(body.strip())
     nonempty = [caption for caption in captions if caption]
     return expand_macros(nonempty[-1], macros) if nonempty else ""
 
diff --git a/tests/test_paper_extraction_caption.py b/tests/test_paper_extraction_caption.py
new file mode 100644
index 00000000..579f8d69
--- /dev/null
+++ b/tests/test_paper_extraction_caption.py
@@ -0,0 +1,24 @@
+"""Tests for paper-extraction caption parsing."""
+from __future__ import annotations
+
+from importlib.util import module_from_spec, spec_from_file_location
+from pathlib import Path
+
+SCRIPT_PATH = (
+    Path(__file__).resolve().parents[1]
+    / "claude"
+    / "lightcone"
+    / "skills"
+    / "paper-extraction"
+    / "scripts"
+    / "extract-paper-substrate.py"
+)
+_SPEC = spec_from_file_location("paper_extraction_extract_script", SCRIPT_PATH)
+assert _SPEC is not None and _SPEC.loader is not None
+_SCRIPT = module_from_spec(_SPEC)
+_SPEC.loader.exec_module(_SCRIPT)
+
+
+def test_extract_caption_handles_nested_braces_and_last_nonempty_caption() -> None:
+    text = r"\caption{}\caption{X $A^{\mathrm{Y}}$ Z}"
+    assert _SCRIPT.extract_caption(text, {}) == r"X $A^{\mathrm{Y}}$ Z"

From 1d4f6b8959f80a604068ac9496c5fd914c28dffc Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 21:45:36 +0200
Subject: [PATCH 107/124] lc-from-paper: collapse INTERVIEW + ACQUIRE into a
 single ORIENT phase
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cail's call after walking through dogfood friction #3: instead of two
pre-loop phases (INTERVIEW asks → ACQUIRE clones + scans), one ORIENT
phase that runs the right thing at the right moment.

ORIENT runs as one phase with seven stages, all in the user's main
session:

1. Ask for the paper (prose).
2. /paper-extraction inline; read the substrate.
3. Interview the user (AskUserQuestion rounds, grounded in paper).
4. Clone repo + /lc-from-code scan-only (skip if no public code).
5. Optional follow-up questions if the code-index surfaced anything.
6. Draft constitution.md + CLAUDE.md (now informed by paper AND code).
7. User review → refine → single commit (constitution + CLAUDE +
   paper substrate + code substrate) → launch loop.

The key shape: knowing the code BEFORE writing the constitution lets
the constitution's Scope and sub-analysis decomposition lean on the
actual pipeline. Previously the constitution drafted from paper-only
context, then ACQUIRE added the code substrate after. Cleaner this way.

Knock-on edits:
- SKILL.md: phases table goes 0 ORIENT → 7 REVIEW (eight phases, was
  nine). "Two pre-loop bookends" section collapses into "The pre-loop
  bookend: ORIENT". Workdir-as-state's INTERVIEW + ACQUIRE rows merge.
  Resuming + anti-patterns updated.
- references/orient.md: new file (content carried from interview.md
  via git mv in the prior commit, fully rewritten here with the seven
  stages, Stage 4's code-substrate work folded in from the deleted
  acquire.md, Cail's "Author / worked closely with authors" replacing
  the prior "Coauthor / ..." option in familiarity).
- references/architect.md: survey signal reworded for ORIENT.
- references/review.md: "first being INTERVIEW" → "first being ORIENT".
- templates/constitution.md: "INTERVIEW" / "ACQUIRE" → "ORIENT".

Friction #3 closes cleanly with this; #2 (prose paper-identifier) and
#6 (reframed external-context question) were already folded into the
prior commit and carry forward into the new orient.md.

Co-Authored-By: Claude Sonnet 4.7 (1M context) <noreply@anthropic.com>
---
 .../lightcone/skills/lc-from-paper/SKILL.md   |  88 +++++------
 .../lc-from-paper/references/architect.md     |   4 +-
 .../skills/lc-from-paper/references/orient.md | 138 ++++++++++++------
 .../skills/lc-from-paper/references/review.md |   2 +-
 .../lc-from-paper/templates/constitution.md   |   6 +-
 5 files changed, 139 insertions(+), 99 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 28b38622..24718535 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -5,22 +5,23 @@ description: >
   scientific paper in ASTRA — has a DOI, arXiv ID, or PDF — or asks to
   "reproduce <paper>", "set up reproduction", or "import a paper". Also
   use when continuing or resuming an existing reproduction workdir. The
-  skill instructs Claude to run INTERVIEW + ACQUIRE in the user's main
-  session, then hand the reproduction off to a ralph loop whose
-  iterations carry the remaining phases (ARCHITECT → SPECIFY → LITERATURE
-  → IMPLEMENT → RUN → COMPARE) until the constitution closes, at which
-  point REVIEW close-out runs back in the user's main session.
+  skill instructs Claude to run ORIENT in the user's main session
+  (paper-extraction + interview + code scan, all grounded), then hand
+  the reproduction off to a ralph loop whose iterations carry the
+  remaining phases (ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN
+  → COMPARE) until the constitution closes, at which point REVIEW
+  close-out runs back in the user's main session.
 ---
 
 # lc-from-paper
 
-You are helping the user reproduce a published scientific paper as a complete ASTRA project. This is a long, complex task that won't fit in a single context window — it spans discrete phases: acquire the paper and its code, architect the spec, specify decisions and findings, resolve cited literature, implement, run, compare, review.
+You are helping the user reproduce a published scientific paper as a complete ASTRA project. This is a long, complex task that won't fit in a single context window — it spans discrete phases: orient (figure out what the user wants, acquire paper + code), architect the spec, specify decisions and findings, resolve cited literature, implement, run, compare, review.
 
 The architecture is two-piece:
 
-1. **Interactive bookends in the user's main session.** INTERVIEW and REVIEW are conversations with the user; INTERVIEW also runs `/paper-extraction` inline between its two beats so its second beat can ground every remaining question in the actual paper. ACQUIRE is thin — one `/lc-from-code` scan-only invocation against the cloned reference code (or `found: false` when the paper has no public repo).
+1. **Interactive bookends in the user's main session.** ORIENT and REVIEW are conversations with the user. ORIENT runs in stages: ask for the paper, run `/paper-extraction` inline, interview the user (grounded in the paper), clone the code and run `/lc-from-code` scan-only (if a repo exists), possibly ask follow-up questions, then draft `constitution.md` + `CLAUDE.md` from the full paper-plus-code context for user review.
 
-2. **A ralph loop for the long middle.** Once the per-paper `constitution.md` is drafted (INTERVIEW) and the substrate is on disk (ACQUIRE), you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution loaded into its system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, and exits. Iteration N+1 reads N's work cold, so per-phase review collapses into "the next iteration is the review."
+2. **A ralph loop for the long middle.** Once ORIENT lands — `constitution.md` + `CLAUDE.md` drafted, paper and code substrate on disk — you launch a ralph loop against the constitution. Each iteration starts a fresh session with the constitution loaded into its system prompt, surveys the workdir, picks the next valuable move (typically one phase's worth of work), does it, commits, and exits. Iteration N+1 reads N's work cold, so per-phase review collapses into "the next iteration is the review."
 
 The whole thing is driven by **the per-paper `constitution.md`** at the reproduction workdir root, plus the auto-loading `CLAUDE.md` walk-up. The split is intentional: the constitution is *task-bound* (what this reproduction is trying to achieve — Goal, fidelity intent, scope, quality bar, Open dimensions) and can be archived once the reproduction lands. CLAUDE.md is *durable* (rules, paper-vs-code disagreements, Open opportunities, pointers to substrate) — it stays useful when the user comes back to do follow-on work in this directory. Every iteration picks up both on launch.
 
@@ -30,57 +31,50 @@ The reproduction's directory should be a git repo — if not already, `git init`
 
 ## The phases
 
-Nine phases (zero-indexed). INTERVIEW and ACQUIRE run before the loop, in the user's main session; the loop's iterations carry phases 2–7; REVIEW runs after the loop closes, back in the user's main session.
+Eight phases (zero-indexed). ORIENT runs before the loop, in the user's main session; the loop's iterations carry phases 1–6; REVIEW runs after the loop closes, back in the user's main session.
 
 | # | Phase | Where it runs | Reference | Primary outputs |
 |---|---|---|---|---|
-| 0 | INTERVIEW | user's main session | [`references/interview.md`](references/interview.md) | per-paper `constitution.md` + `CLAUDE.md` + paper substrate at `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml}` (paper-extraction runs inline between INTERVIEW's two beats) |
-| 1 | ACQUIRE | user's main session | [`references/acquire.md`](references/acquire.md) | code substrate at `work/reference/{code/, code-status.yaml, code-index.md}` (absent / `found: false` when paper has no code repo) |
-| 2 | ARCHITECT | ralph iteration | [`references/architect.md`](references/architect.md) | stub `astra.yaml` at project root (sub-analyses, inputs, outputs, narrative) |
-| 3 | SPECIFY | ralph iteration | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
-| 4 | LITERATURE | ralph iteration | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; per-paper PDFs cached via `astra paper add` |
-| 5 | IMPLEMENT | ralph iteration | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
-| 6 | RUN | ralph iteration | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
-| 7 | COMPARE | ralph iteration | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
-| 8 | REVIEW | user's main session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
+| 0 | ORIENT | user's main session | [`references/orient.md`](references/orient.md) | per-paper `constitution.md` + `CLAUDE.md` + paper substrate at `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml}` (from inline `/paper-extraction`) + code substrate at `work/reference/{code/, code-status.yaml, code-index.md}` (from inline `/lc-from-code` scan-only, when a repo exists) |
+| 1 | ARCHITECT | ralph iteration | [`references/architect.md`](references/architect.md) | stub `astra.yaml` at project root (sub-analyses, inputs, outputs, narrative) |
+| 2 | SPECIFY | ralph iteration | [`references/specify.md`](references/specify.md) | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
+| 3 | LITERATURE | ralph iteration | [`references/literature.md`](references/literature.md) | `astra.yaml`'s `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; per-paper PDFs cached via `astra paper add` |
+| 4 | IMPLEMENT | ralph iteration | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
+| 5 | RUN | ralph iteration | [`references/run.md`](references/run.md) | `results/<universe>/<output>/` |
+| 6 | COMPARE | ralph iteration | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` |
+| 7 | REVIEW | user's main session | [`references/review.md`](references/review.md) | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
 
 COMPARE produces a verdict plus an opportunity assessment — not just pass / fail, but where the gaps are, how much they likely matter, and how they sit relative to the constitution's fidelity intent. A subsequent iteration decides whether to spend another IMPLEMENT round (close a gap that sits below intent) or land the reproduction at its current trajectory and log the gap into CLAUDE.md's Open opportunities. Once the COMPARE → IMPLEMENT loop terminates (verdict `pass`, or `partial` with the un-acted opportunities logged), a subsequent cold-survey iteration finds nothing left to do and flips the constitution's `status:` to `closed`. The loop terminates; REVIEW runs in the user's main session.
 
-## The two pre-loop bookends
+## The pre-loop bookend: ORIENT (Phase 0)
 
-### INTERVIEW (Phase 0)
+The opening interactive phase. Run it from the user's main session. Read [`references/orient.md`](references/orient.md) in full before starting.
 
-The opening interactive phase. Run it from the user's main session. Read [`references/interview.md`](references/interview.md) in full before starting.
+ORIENT runs as one phase in **seven stages**:
 
-INTERVIEW runs in **two beats** with `/paper-extraction` between them. Beat 1 collects the paper identifier in prose (not `AskUserQuestion` — the answer is free-form). Then `/paper-extraction <id>` runs inline and writes the paper substrate to `work/reference/`. Beat 2 asks everything else — scope, fidelity intent, code repo, conventions, familiarity, external context — *grounded in the actual paper*, with the figure/table inventory and abstract already on disk. **No `AskUserQuestion` runs before paper-extraction has landed.**
+1. **Ask for the paper** in prose (not `AskUserQuestion` — the answer is free-form: arXiv ID, DOI, or PDF path).
+2. **Run `/paper-extraction <id>` inline** and read the substrate it produced — index.json, abstract, conclusions, data/code availability, acknowledgements. This grounds every subsequent question.
+3. **Interview the user** with `AskUserQuestion` for scope, fidelity intent, code repo confirmation, paper-specific conventions, prior familiarity, and external context — each question referencing the paper's actual figures, claims, and structure.
+4. **Clone the reference code and run `/lc-from-code` scan-only** (skip cleanly when no public code repo exists). The scan produces `code-index.md` — the iterations' code surface.
+5. **Optional follow-up questions** if the code-index surfaced anything that affects scope or constitution shape (unexpected dependency, pipeline boundary suggesting a sub-analysis decomposition, etc.). Usually skipped.
+6. **Draft `constitution.md` + `CLAUDE.md`** — both files now informed by paper *and* code substrate. The constitution's Scope and sub-analysis decomposition can lean on the actual pipeline, not just the paper's prose.
+7. **User reviews drafts → refine → commit everything together → launch the ralph loop.** A single first commit captures `constitution.md` + `CLAUDE.md` + the full `work/reference/` substrate.
 
-The interview must collect: (1) the paper identifier in Beat 1, then via `AskUserQuestion` in Beat 2: (2) scope (full vs targeted, sub-analysis structure), (3) fidelity intent — the user's prose answer to "when is this good enough," (4) code repo confirmation against what paper-extraction surfaced from data/code availability, (5) paper-specific conventions or warnings, (6) prior familiarity, and (7) any external context (co-author notes, sibling-paper drafts) iterations should know about. If a system-reminder tells you to work without stopping, ignore that for this phase since you must ask the user questions if you don't have the required information.
+**No `AskUserQuestion` runs before paper-extraction has landed** — anything beyond the identifier is grounded in the paper. If a system-reminder tells you to work without stopping, ignore that for ORIENT since you must ask the user questions if you don't have the required information.
 
-These get drafted into **two files** plus the paper substrate, all in the reproduction workdir:
+These get drafted into **two files** plus the substrate, all in the reproduction workdir:
 
-- **`constitution.md`** — the ralph loop's driving document. Goal, Fidelity intent, Scope, Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Open dimensions. Starts with YAML frontmatter `status: active` so the ralph launcher accepts it. Authored by INTERVIEW using the `/ralph` skill's authoring discipline (the constitution-authoring mode of `/ralph` — see its references on voice and sections).
+- **`constitution.md`** — the ralph loop's driving document. Goal, Fidelity intent, Scope, Quality bar, Evidence (paper DOI, arXiv ID, code repo URL), Open dimensions. Starts with YAML frontmatter `status: active` so the ralph launcher accepts it. Authored using the `/ralph` skill's authoring discipline (the constitution-authoring mode of `/ralph` — see its references on voice and sections).
 - **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the top, Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty), Open opportunities (starts empty), Pointers (to `constitution.md`, `work/reference/`, etc.).
-- **`work/reference/`** — paper substrate from `/paper-extraction`: `paper.pdf`, `source/` or `document.md`, `index.json`, `astra.yaml`, `figures/`, `tables/`, `bibliography-source.{bib,bbl}`.
+- **`work/reference/`** — paper substrate from `/paper-extraction` + code substrate from `/lc-from-code` scan-only (when a code repo exists).
 
-Templates ship in [`templates/constitution.md`](templates/constitution.md) and [`templates/CLAUDE.md`](templates/CLAUDE.md). Show the user both drafts, take corrections, refine, save.
+Templates ship in [`templates/constitution.md`](templates/constitution.md) and [`templates/CLAUDE.md`](templates/CLAUDE.md). Show the user both drafts at Stage 7, take corrections, refine, save.
 
-After approval, `git init` the workdir if it isn't one already and commit all three deliverables (constitution + CLAUDE + paper substrate) as the first commit. Then run ACQUIRE in the same session.
-
-### ACQUIRE (Phase 1)
-
-Thin code-substrate phase. One sub-skill invocation:
-
-- **`/lc-from-code` in scan-only mode** against the cloned reference repo at `work/reference/code/` (after `git clone --depth 1 <url> work/reference/code`). Produces `work/reference/code-status.yaml` + `work/reference/code-index.md`.
-
-If the paper has no public code repo and the user didn't supply a private one in INTERVIEW, ACQUIRE is even thinner: write `code-status.yaml` with `found: false` and proceed to launch. The code-as-canonical rule self-disables in that case.
-
-See [`references/acquire.md`](references/acquire.md) for the full step-by-step. The paper substrate is INTERVIEW's deliverable, not ACQUIRE's — INTERVIEW reads the paper to ground its second-beat questions, so the substrate is already on disk when ACQUIRE starts.
-
-When ACQUIRE returns, commit the code substrate (`code-status.yaml` + `code-index.md`; the `code/` clone itself can be `.gitignore`d for large monorepos) and launch the ralph loop (see **Launching the loop** below).
+After approval, `git init` the workdir if it isn't one already and commit all deliverables (constitution + CLAUDE + paper substrate + code substrate when present) as the first commit. The `work/reference/code/` clone itself can be `.gitignore`d for large monorepos; the inventory file `code-index.md` is what downstream iterations actually consult. Then launch the ralph loop.
 
 ## Launching the loop
 
-After INTERVIEW + ACQUIRE land, hand the rest of the reproduction off to a ralph loop. From the reproduction workdir:
+After ORIENT lands, hand the rest of the reproduction off to a ralph loop. From the reproduction workdir:
 
 ```bash
 .claude/skills/ralph/scripts/ralph constitution.md
@@ -113,8 +107,7 @@ Each iteration's survey reads the workdir to determine what phase is next. File
 
 | Signal | Phase done |
 |---|---|
-| `constitution.md` + `CLAUDE.md` at workdir root, both committed, **and** `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` present (paper substrate is INTERVIEW's deliverable) | INTERVIEW |
-| `work/reference/code/` present **or** `code-status.yaml` records `found: false`, **and** `code-index.md` present (or absent when `found: false`) | ACQUIRE |
+| `constitution.md` + `CLAUDE.md` at workdir root, both committed, **and** `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` present, **and** (`work/reference/code/` present **or** `code-status.yaml` records `found: false`) | ORIENT |
 | `astra.yaml` at project root validates with empty `decisions:` / `prior_insights:` / `findings:` blocks | ARCHITECT (stub) |
 | `astra.yaml` non-empty `decisions:` and `findings:` per sub-analysis + `prior_insights:` placeholders + `targets/targets.md` + `implementation-notes.md` | SPECIFY |
 | `astra.yaml`'s `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; `work/cited/<doi-slug>/` populated per cited paper | LITERATURE |
@@ -139,7 +132,7 @@ REVIEW runs in your main session because `/figure-comparison` and `/check-senten
 
 **Code-as-canonical, with disagreements recorded.** When the original codebase is at `work/reference/code/`, every iteration that touches a sub-analysis reads relevant code on entry. Where paper and code disagree on something material (a different choice would plausibly change a numeric result the paper reports), **code is canonical** for numerics, plotting, and method — but the disagreement is recorded: as a decision option in `astra.yaml` with both alternatives preserved, and as an entry in CLAUDE.md's *Paper-vs-code disagreements* section so it's visible to every iteration and to the user at REVIEW. Stylistic / cosmetic / pure-tooling differences aren't material — note them in `implementation-notes.md` and move on. Without this discipline, iterations drift to "looks right" rather than "matches" and material disagreements get silently absorbed.
 
-**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at INTERVIEW as prose. Intent is partly aesthetic ("how good does this need to be?") and partly pragmatic ("what's feasible given the compute, tokens, and wall-clock available?"). Both dimensions belong in the prose — *"just checking the analysis is tractable — an afternoon"*, *"Figure 3 must be right; the rest can stay rough — overnight"*, *"every primary and secondary target lining up within stated tolerance, a few days"*.
+**Rigor is a trajectory toward the user's intent.** A reproduction isn't one-shot — it reaches a baseline, then accumulates. The anchor is the user's **fidelity intent**, captured in `constitution.md`'s Goal section at ORIENT as prose. Intent is partly aesthetic ("how good does this need to be?") and partly pragmatic ("what's feasible given the compute, tokens, and wall-clock available?"). Both dimensions belong in the prose — *"just checking the analysis is tractable — an afternoon"*, *"Figure 3 must be right; the rest can stay rough — overnight"*, *"every primary and secondary target lining up within stated tolerance, a few days"*.
 
 There's no explicit review state machine. Each iteration reads the prior phase's artifact critically as part of survey, fixes what needs fixing or advances if nothing does, commits, exits. The fresh-context property at iteration boundaries makes the next iteration the review. Gaps that the intent wants pushed further than the loop has time to deliver become Open opportunities in CLAUDE.md; a future loop relaunch closes them. (Work fan-out for the artifact-producing phases is separate; see "Parallel fan-out lives inside an iteration" above.)
 
@@ -155,16 +148,15 @@ There's no explicit review state machine. Each iteration reads the prior phase's
 
 When the user walks back into a workdir that already has artifacts:
 
-1. **Skip INTERVIEW** unless the user explicitly wants to revise scope (in which case edit `constitution.md` together, no re-draft from scratch).
+1. **Skip ORIENT** unless the user explicitly wants to revise scope (in which case edit `constitution.md` together, no re-draft from scratch).
 2. **If `constitution.md`'s `status:` is `active` and the tmux session isn't running**, re-launch the ralph loop: `.claude/skills/ralph/scripts/ralph constitution.md`. The next iteration surveys the workdir and picks up wherever the prior loop left off.
 3. **If `constitution.md`'s `status:` is `closed`**, the reproduction is at REVIEW. Run REVIEW close-out in your main session.
-4. **If the paper substrate is incomplete** (INTERVIEW didn't finish cleanly — paper-extraction errored or partial), re-invoke `/paper-extraction` in your main session against the existing partial state (idempotent; skips done work). Confirm the constitution + CLAUDE.md are consistent before continuing.
-5. **If ACQUIRE substrate is incomplete**, finish ACQUIRE in your main session before launching the loop — re-invoke `/lc-from-code` scan-only against the existing partial state.
+4. **If ORIENT substrate is incomplete** — paper-extraction errored mid-flight, or the code clone / scan didn't land — finish the missing stages in your main session before launching the loop. Both `/paper-extraction` and `/lc-from-code` are survey-first and skip done work; re-invoking against partial state is safe.
 
 ## Anti-patterns
 
 - **Spawning a "loop manager" sub-agent inside your main session.** The whole point of the ralph loop is fresh per-iteration context; you launch the loop, the loop runs detached, you come back when it's done. No nested orchestrator.
-- **Doing the long middle in your main session instead of launching the loop.** INTERVIEW and ACQUIRE belong in your session; ARCHITECT through COMPARE belong in the loop. Doing phase work in your main session burns context that doesn't get reset; the loop exists precisely to give each phase fresh context.
+- **Doing the long middle in your main session instead of launching the loop.** ORIENT belongs in your session; ARCHITECT through COMPARE belong in the loop. Doing phase work in your main session burns context that doesn't get reset; the loop exists precisely to give each phase fresh context.
 - **Asking an iteration to use `AskUserQuestion`.** Iterations run detached. Surface questions to `open-questions.md` with a default applied; the user resolves at REVIEW.
 - **Re-implementing what `astra` already does.** If `astra validate` returns clean, don't write a separate validator. If `astra paper add` caches the PDF, don't write a separate cache.
 - **Bundling phases into one iteration.** Each iteration does one phase's worth of work. Conflating phases re-creates the failure mode the loop exists to avoid: no fresh-context review between phases.
diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 2c981c4a..84bf6d0c 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -2,7 +2,7 @@
 
 ARCHITECT is the structural seam: decide the sub-analysis decomposition, wire the inputs and outputs at the sub-analysis level, and author high-level narrative prose for each analysis — all in one stub `astra.yaml`. SPECIFY then fills the stub with `decisions:`, `prior_insights:`, `findings:`, and `astra-anchor:` references. Splitting **structure** from **content** keeps each iteration's cognitive load manageable: ARCHITECT decides *what the analyses are*; SPECIFY decides *what's inside each one*.
 
-ARCHITECT is what a ralph iteration does when the workdir signals "paper substrate (from INTERVIEW) + code substrate (from ACQUIRE) both present + project-root `astra.yaml` absent (or empty stub)." The heavy work of *understanding* the paper and code happened in `/paper-extraction` (which INTERVIEW invokes inline) and `/lc-from-code`'s scan-only branch (which ACQUIRE invokes); their on-disk substrate (the structural `index.json`, the paper-extraction `astra.yaml`, the `code-index.md`) is what you read on entry. No persistent expert sub-agents; targeted reads against the substrate carry the orientation.
+ARCHITECT is what a ralph iteration does when the workdir signals "ORIENT substrate present + project-root `astra.yaml` absent (or empty stub)." The heavy work of *understanding* the paper and code happened in `/paper-extraction` and `/lc-from-code`'s scan-only branch — both invoked inline during ORIENT in the user's main session. Their on-disk substrate (the structural `index.json`, the paper-extraction `astra.yaml`, the `code-index.md`) is what you read on entry. No persistent expert sub-agents; targeted reads against the substrate carry the orientation.
 
 ## Inputs
 
@@ -99,7 +99,7 @@ Don't flag empty `decisions:` / `prior_insights:` / `findings:` — that's SPECI
 
 ## Survey signals (entry into ARCHITECT)
 
-- `work/reference/index.json` + `work/reference/astra.yaml` (paper substrate from INTERVIEW) + `work/reference/code-index.md` (code substrate from ACQUIRE, when code present) exist ⇒ paper + code substrate is ready
+- `work/reference/index.json` + `work/reference/astra.yaml` + `work/reference/code-index.md` (when code present) exist ⇒ ORIENT substrate is ready
 - `astra.yaml` at project root absent (or present-but-empty) ⇒ this iteration writes the stub
 - `astra.yaml` exists with stub form (sub-analyses + inputs + outputs + narrative populated; `decisions:` / `prior_insights:` / `findings:` blocks present-and-empty) ⇒ ARCHITECT's output is on disk; read it critically. Fix anything wrong; otherwise the iteration moves on to SPECIFY.
 
diff --git a/claude/lightcone/skills/lc-from-paper/references/orient.md b/claude/lightcone/skills/lc-from-paper/references/orient.md
index 620a44ca..9b4edb5d 100644
--- a/claude/lightcone/skills/lc-from-paper/references/orient.md
+++ b/claude/lightcone/skills/lc-from-paper/references/orient.md
@@ -1,28 +1,30 @@
-# INTERVIEW — Phase 0
+# ORIENT — Phase 0
 
-The opening interactive phase. Runs from the user's main session, before the ralph loop launches. Its job is to crystallize what the user actually wants — which paper, what scope, any paper-specific gotchas — and bake that into the per-paper `constitution.md` (the ralph loop's driving document) and `CLAUDE.md` (the auto-loading walk-up with rules and accumulators) the loop's iterations will walk up to.
+The opening pre-loop phase. Runs in the user's main session, before the ralph loop launches. Its job is to figure out what the user wants to reproduce, stand up the reference substrate (paper + code), and write the per-paper `constitution.md` + `CLAUDE.md` the ralph loop's iterations will walk up to.
 
-The interview runs in **two beats** with `/paper-extraction` between them. Beat 1 is short and cold — it collects just the paper identifier so the substrate can be acquired. Then `/paper-extraction` runs inline. Beat 2 asks everything else — scope, fidelity, conventions, code repo, familiarity, external context — *grounded in the actual paper*, with the figure/table inventory, abstract, conclusions, and data-availability section already on disk.
+One phase, executed in stages so each later decision is grounded in what was acquired earlier. The paper is read before the interview questions land (so questions reference actual figures and claims); the code is scanned before the constitution is drafted (so the constitution's Scope and sub-analysis decomposition lean on the actual pipeline). The user reviews the drafts before anything commits.
 
-The two-beat shape exists because most interview questions are inherently hard to answer well in the abstract. Asked cold, "which figures matter?" forces the user to recall the paper from memory; asked after extraction, it's a menu of the paper's actual figures. The same applies to fidelity intent, code repo URL, and paper-specific conventions — the paper knows; the user shouldn't have to reach for it.
+ORIENT is the only pre-loop bookend. REVIEW is the post-loop one. Everything else lives inside the ralph loop.
 
 ---
 
-## What the interview produces
+## What ORIENT produces
 
 Three things in the reproduction workdir, all committed together at the end:
 
 - **`constitution.md`** — drafted from [`../templates/constitution.md`](../templates/constitution.md). YAML frontmatter `status: active`, then Goal, Fidelity intent, Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL, where the substrate lives), Open dimensions. The ralph loop's driving document; each iteration reads it on entry. The body sharpens slowly; Open dimensions is updated each iteration as decisions worth user ratification surface. Task-bound — archivable once the reproduction closes.
 - **`CLAUDE.md`** — drafted from [`../templates/CLAUDE.md`](../templates/CLAUDE.md). Paper identity at the top (DOI, title, one-line subject), Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty; iterations append), Open opportunities (starts empty; iterations append), Pointers (to `constitution.md`, `work/reference/`, etc.). The auto-loading walk-up; every Claude Code session in the workdir picks it up. Durable — stays useful for any follow-on work in this directory once the reproduction lands.
-- **`work/reference/` paper substrate** — produced by `/paper-extraction` between beats: `paper.pdf`, `source/` (Path A) or `document.md` (Path B), `index.json`, `astra.yaml`, `figures/`, `tables/`, `bibliography-source.{bib,bbl}`.
+- **`work/reference/` substrate** — paper substrate from `/paper-extraction` (`paper.pdf`, `source/` or `document.md`, `index.json`, `astra.yaml`, `figures/`, `tables/`, `bibliography-source.{bib,bbl}`) + code substrate from `/lc-from-code` scan-only (`code/`, `code-status.yaml`, `code-index.md`) when there's a reference code repo.
 
-There is no separate "constitution skill" invocation — `/ralph`'s Authoring mode (Study → Draft → Refine → Launch) is what you're following here; the constitution authoring discipline + reference materials live there. Pull the discipline mentally; the deliverable is these two markdown files (plus the substrate produced by `/paper-extraction`).
+There is no separate "constitution skill" invocation — `/ralph`'s Authoring mode (Study → Draft → Refine → Launch) is what you're following here; the constitution authoring discipline + reference materials live there. Pull the discipline mentally; the deliverable is these two markdown files (plus the substrate produced by the inline skill invocations).
 
-After the user approves both drafts, save them, `git init` the workdir if it isn't one already, commit `constitution.md` + `CLAUDE.md` + the paper substrate as the first commit, then proceed to ACQUIRE in the same session.
+After the user approves both drafts, save them, `git init` the workdir if it isn't one already, commit `constitution.md` + `CLAUDE.md` + the full `work/reference/` substrate as the first commit, then launch the ralph loop (per SKILL.md's *Launching the loop* section).
 
 ---
 
-## Beat 1 — Cold: identify the paper
+## The stages
+
+### Stage 1 — Ask for the paper
 
 Ask the user for the paper identifier in **prose** — not `AskUserQuestion`. The answer is inherently free-form (an arXiv ID, a DOI, or a path to a PDF on disk), and a multiple-choice modal is the wrong shape for it.
 
@@ -30,13 +32,9 @@ Wording is up to you, but cover the three forms cleanly. Something like:
 
 > *"What paper would you like to reproduce? An arXiv ID, a DOI, or a path to a PDF on disk all work — arXiv ID gives the cleanest acquisition because the LaTeX source comes through."*
 
-If the user supplied the identifier on the `/lc-from-paper` invocation, skip the ask. If not, ask once and continue when you have it. Don't batch other questions into this beat — everything else lives in Beat 2, after the paper is on disk.
-
-**No other `AskUserQuestion` rounds in Beat 1.** Anything beyond the identifier is either inferable from the paper (next step) or belongs in Beat 2.
-
----
+If the user supplied the identifier on the `/lc-from-paper` invocation, skip the ask. **No `AskUserQuestion` runs before paper-extraction has landed** — anything beyond the identifier is either inferable from the paper or belongs in a later stage where you can ground the question.
 
-## Between beats — Run `/paper-extraction` inline
+### Stage 2 — Run `/paper-extraction` inline; read the substrate
 
 With the paper identifier in hand, invoke the paper-extraction skill directly:
 
@@ -44,24 +42,22 @@ With the paper identifier in hand, invoke the paper-extraction skill directly:
 /paper-extraction <doi-or-arxiv-id-or-pdf-path>
 ```
 
-This produces the paper substrate under `work/reference/` (see [`acquire.md`](acquire.md) — ACQUIRE no longer touches the paper side). When it returns, the substrate is on disk. **Read it before continuing to Beat 2** so the next questions are grounded:
+This produces the paper substrate under `work/reference/`. When it returns, the substrate is on disk. **Read it before continuing to Stage 3** so the next questions are grounded:
 
 - **`work/reference/index.json`** — title, abstract, figure/table inventory with captions, section outline, citations with resolved DOIs. The structural surface.
 - **The abstract and the conclusions section of the paper** — give you the claimed headline results, with actual numbers.
 - **The "Data availability" / "Code availability" sections of the paper** — usually the canonical place for repo URLs and dataset locations. If neither section exists, grep across `work/reference/source/*.tex` (Path A) or `work/reference/document.md` (Path B) for `github.com`, `gitlab`, `zenodo`, `softwarex`, `\url{}` patterns.
 - **The acknowledgements section** — sometimes carries software repos, dataset attributions, cluster acknowledgements that hint at the execution environment.
 
-You do *not* need to read the paper end-to-end. The goal is to ground Beat 2's questions — abstract for claims, conclusions for what the paper says it found, data/code availability for substrate hints. Iterations will read the rest as they need it.
+You do *not* need to read the paper end-to-end. The goal is to ground Stage 3's questions — abstract for claims, conclusions for what the paper says it found, data/code availability for substrate hints. Iterations will read the rest as they need it.
 
-If `/paper-extraction` fails or returns partial substrate (network issue, ambiguous arXiv ID, etc.), surface the failure to the user before continuing — Beat 2 can't ground itself against missing substrate.
+If `/paper-extraction` fails or returns partial substrate (network issue, ambiguous arXiv ID, etc.), surface the failure to the user before continuing.
 
----
-
-## Beat 2 — Grounded: everything else
+### Stage 3 — Interview the user, grounded in the paper
 
 Now `AskUserQuestion` is the right tool — each remaining question is a constrained choice with structured options, and the user has paper context loaded from your summary or from the substrate they can browse. Ask in whatever order reads naturally; batching related questions in a single `AskUserQuestion` call (up to 4) is fine.
 
-### Scope
+#### Scope
 
 Present the paper's actual primary outputs as a menu:
 
@@ -77,7 +73,7 @@ If the paper has sub-analyses with genuinely independent stages (e.g. reconstruc
 
 These answers go into `constitution.md`'s **Scope** section (in / out) and inform ARCHITECT's structural decomposition.
 
-### Fidelity intent
+#### Fidelity intent
 
 A reproduction can land anywhere from a quick "does this even run" sanity check to a full match across every primary and secondary target. The user owns where they want this one to land — but where it *can* land in this stretch depends on the compute, tokens, time, and attention available. The honest meta-conversation is the point: what does the user want out of this first stretch, given what's spendable on it?
 
@@ -96,7 +92,7 @@ Record the answer verbatim or in close paraphrase under **Fidelity intent** in `
 
 If the user genuinely doesn't know yet, write that — *"Not sure yet; let's get something running and revisit"* is itself useful intent, and they can sharpen it at any future REVIEW.
 
-### Code repository
+#### Code repository
 
 Use what `/paper-extraction` surfaced. If there's a single candidate URL from the data/code availability or acknowledgements section, lead with that confirmation:
 
@@ -106,11 +102,9 @@ If paper-extraction found nothing, ask plainly:
 
 > *"I didn't find a code repo URL in the paper. Is there a private / unpublished repo we should clone? Or proceed paper-only?"*
 
-When the user provides a URL, capture it into `constitution.md`'s **Evidence** section. When the paper has no code repo and the user doesn't supply one, capture *"no public code; paper prose is the only methodological anchor"* into the Evidence section so iterations don't waste effort searching.
-
-When the code is available, every iteration that touches a sub-analysis reads from `work/reference/code/` and treats code as canonical for numerics + method. This is recorded in `CLAUDE.md`'s Rules.
+When the user provides a URL, capture it. When the paper has no code repo and the user doesn't supply one, note *"no public code; paper prose is the only methodological anchor"* and skip directly to Stage 6 (no code substrate to acquire). When the code is available, every iteration that touches a sub-analysis reads from `work/reference/code/` and treats code as canonical for numerics + method — this is recorded in `CLAUDE.md`'s Rules.
 
-### Paper-specific conventions or warnings
+#### Paper-specific conventions or warnings
 
 Now Claude has read the paper enough to *propose* one-line conventions / warnings rather than asking the user to volunteer cold. Surface candidates from your post-extraction read:
 
@@ -118,20 +112,20 @@ Now Claude has read the paper enough to *propose* one-line conventions / warning
 
 Let the user toggle the ones to keep, edit them, add more, or skip cleanly if none apply. The selected items land in `CLAUDE.md`'s **Pointers** section as one-line notes — context every iteration sees on entry.
 
-### Prior familiarity
+#### Prior familiarity
 
-A single question, post-extraction:
+A single question:
 
 > *"How familiar are you with this paper?"*
 >
 > - Haven't read it / barely skimmed
 > - Skimmed it / general sense of the claims
 > - Read carefully / know the methodology
-> - Coauthor / worked closely with the authors
+> - Author / worked closely with the authors
 
-This affects how confidently iterations should defer to the user when adjudicating paper-vs-code disagreements, and how heavy first-iteration review should lean. It does **not** gate paper acquisition — that's why it's in Beat 2, not Beat 1.
+This affects how confidently iterations should defer to the user when adjudicating paper-vs-code disagreements, and how heavy first-iteration review should lean.
 
-### External context
+#### External context
 
 The real probe is: *"is there context outside the paper substrate + codebase that should inform the spec?"* — co-author feedback, sibling-paper drafts (common for papers in a series), internal blinding documentation, decision-history docs, referee responses, a relevant talk or slide deck. The artifact form varies; what matters is whether such context exists and whether you should point ARCHITECT at it.
 
@@ -139,35 +133,77 @@ Ask in those terms:
 
 > *"Beyond the paper and any code repo, is there context an iteration should know about — co-author / referee feedback, internal notes, a sibling paper still in prep, decisions documented elsewhere? If yes, point at the path(s). Otherwise the paper substrate + code are the source of truth."*
 
-Capture paths into `CLAUDE.md`'s **Pointers** section. Don't proactively read them in INTERVIEW — that's ARCHITECT's job when it scopes the sub-analyses.
+Capture paths into `CLAUDE.md`'s **Pointers** section. Don't proactively read them in ORIENT — that's ARCHITECT's job when it scopes the sub-analyses.
 
----
+### Stage 4 — Clone the code (if any) and run `/lc-from-code` scan-only
+
+Skip cleanly when Stage 3's code-repo answer was "no public code." Otherwise:
+
+1. **Clone the repo:**
+   ```bash
+   git clone --depth 1 <url> work/reference/code
+   ```
+   For multi-project monorepos where the user pointed at specific subpaths (e.g. GitHub `tree/<branch>/<path>` URLs), clone the whole repo on the named branch — don't sparse-checkout — and capture the primary subpaths in `code-status.yaml` so `/lc-from-code` knows where to focus.
+
+2. **Write `work/reference/code-status.yaml`:**
+   ```yaml
+   found: true        # or false
+   url: "https://..."  # null if not found
+   branch: "main"     # or whichever branch was cloned; null if not found
+   cloned: true       # false if found but clone failed
+   primary_subpaths:  # optional; for multi-project monorepos
+     - "notebooks/..."
+   notes: "..."
+   ```
+
+3. **Invoke `/lc-from-code` in scan-only mode:**
+   ```
+   /lc-from-code scan-only against work/reference/code/. From inside /lc-from-paper's ORIENT phase. Produce work/reference/code-index.md only — do not touch the project-root astra.yaml, do not parameterize any code, do not run anything, do not modify the cloned repo. Primary subpaths (per code-status.yaml): <list>.
+   ```
+
+   The scan-only branch of `/lc-from-code` does the inventory pass and writes to `work/reference/code-index.md`. Its prompt-context surface carries the "stop at scan" contract.
 
-## Drafting the two files
+When no public code repo exists, write `code-status.yaml` with `found: false` and skip `/lc-from-code` entirely. The code-as-canonical rule self-disables in that case.
+
+### Stage 5 — Follow-up questions if the code surfaced anything new
+
+If the code-index reveals something the user should weigh in on — an unexpected dependency, a clear pipeline boundary that suggests a sub-analysis decomposition different from the paper's, an unusual container requirement, an explicit data-availability gate not visible in the paper — ask before drafting the constitution.
+
+Usually this is light or skipped entirely. The code-index is the iterations' surface, not the user's; most of what it reveals doesn't need user adjudication at ORIENT. But when something genuinely affects scope or constitution shape, surface it now rather than waiting for an iteration to file an open question.
+
+### Stage 6 — Draft `constitution.md` + `CLAUDE.md`
 
 Open both templates side-by-side:
 
-- [`../templates/constitution.md`](../templates/constitution.md) — fill in the header, Goal (with fidelity intent), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL — these are the user-supplied identifiers; the substrate-path bullets in the template stay as boilerplate, naming where each substrate lives on disk), Open dimensions. Leave the YAML frontmatter `status: active` intact.
-- [`../templates/CLAUDE.md`](../templates/CLAUDE.md) — fill in the header (paper title + arXiv ID + DOI + one-line subject), any paper-specific Pointers. Leave Rules in the template state (universal across reproductions). Leave the Disagreements log and Open opportunities sections empty — iterations populate them.
+- [`../templates/constitution.md`](../templates/constitution.md) — fill in the header, Goal (with fidelity intent), Scope (in / out), Quality bar, Evidence (paper DOI, arXiv ID, code repo URL — these are the user-supplied identifiers; the substrate-path bullets in the template stay as boilerplate, naming where each substrate lives on disk), Open dimensions. Leave the YAML frontmatter `status: active` intact. Both paper and code substrate are on disk by now — the constitution can lean on the actual pipeline decomposition, named figures/tables, and concrete file paths.
+- [`../templates/CLAUDE.md`](../templates/CLAUDE.md) — fill in the header (paper title + arXiv ID + DOI + one-line subject), any paper-specific Pointers from Stage 3. Leave Rules in the template state (universal across reproductions). Leave the Disagreements log and Open opportunities sections empty — iterations populate them.
+
+### Stage 7 — User review, refine, commit, launch
+
+Show both drafts to the user, take corrections, refine, save. When the user approves:
 
-Show both drafts to the user, take corrections, refine, save. Then `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline) and commit `constitution.md` + `CLAUDE.md` + the paper substrate (everything under `work/reference/` that paper-extraction produced) as the first commit. A single commit captures the full INTERVIEW deliverable.
+1. `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline).
+2. Commit `constitution.md` + `CLAUDE.md` + the full `work/reference/` substrate (paper + code, when code present) as the first commit. A single commit captures the full ORIENT deliverable.
+3. The `work/reference/code/` clone itself can be `.gitignore`d for large monorepos; `code-index.md` is what downstream iterations actually consult. The clone is reproducible from `code-status.yaml`'s URL.
+4. Launch the ralph loop per SKILL.md's *Launching the loop* section.
 
-After the user approves and the workdir is initialized, run ACQUIRE in your same main session (see [`acquire.md`](acquire.md)). ACQUIRE is now thin — just the code substrate side. When ACQUIRE completes, commit the code substrate and launch the ralph loop (per SKILL.md's *Launching the loop* section). Tell the user the tmux session name and the attach command, and that you'll be ready for REVIEW close-out when the loop terminates.
+Tell the user the tmux session name and the attach command, and that you'll be ready for REVIEW close-out when the loop terminates.
 
 ---
 
 ## Discipline
 
-- **No `AskUserQuestion` before paper-extraction has run.** Beat 1 collects the identifier in prose; everything else waits until Beat 2, after the paper is on disk and you can ground the questions in actual content.
+- **No `AskUserQuestion` before paper-extraction has run.** Stage 1 collects the identifier in prose; everything else waits until Stage 3, after the paper is on disk and you can ground the questions in actual content.
 - **The paper-identifier question is prose.** It's the one question that doesn't fit `AskUserQuestion`'s multiple-choice shape; the free-form answer (arXiv ID / DOI / PDF path) belongs in a prose ask.
-- **Three to six `AskUserQuestion` rounds total in Beat 2** — scope, fidelity, code repo, conventions, familiarity, external context. Some can batch into a single multi-question call when they're independent.
-- **Three deliverables, one commit.** `constitution.md` + `CLAUDE.md` + `work/reference/` paper substrate are committed together at the end of INTERVIEW. No intermediate commits for "paper-extraction landed but the user hasn't approved yet."
+- **Three to six `AskUserQuestion` rounds total across Stages 3 + 5** — scope, fidelity, code repo, conventions, familiarity, external context, plus any Stage 5 follow-ups. Some can batch into a single multi-question call when they're independent.
+- **One commit at the end, with everything.** `constitution.md` + `CLAUDE.md` + paper substrate + code substrate are committed together. No intermediate commits for "paper-extraction landed but the user hasn't approved yet" or "code cloned but constitution not drafted yet."
 - **Defaults are the path.** When the user says "you choose," take the defaults — full reproduction, the paper's natural sub-analysis structure if any. The defaults reflect what the architecture has learned about which seams matter.
-- **One paper at a time.** A single `constitution.md` + `CLAUDE.md` pair covers one paper. If the user wants two, run the interview twice — two reproduction directories, two pairs.
+- **One paper at a time.** A single `constitution.md` + `CLAUDE.md` pair covers one paper. If the user wants two, run ORIENT twice — two reproduction directories, two pairs.
+- **No code repo is still a valid ORIENT outcome.** When `code-status.yaml` records `found: false`, iterations operate in paper-only mode — methodology lives in the paper's prose; no code-as-canonical adjudication is needed. CLAUDE.md's code-as-canonical Rule self-disables.
 
 ---
 
-## When the interview gets stuck
+## When ORIENT gets stuck
 
 Most failure modes resolve into "the user has not yet decided what 'reproduce' means for them." If the conversation is circling, ask one of these directly:
 
@@ -178,3 +214,15 @@ Most failure modes resolve into "the user has not yet decided what 'reproduce' m
 - *"Is there anything weird about this paper you want every iteration to know up front?"* — pins paper-specific conventions.
 
 When these answer cleanly, both files draft themselves.
+
+---
+
+## Survey signals (entry into ORIENT)
+
+If the user is walking into a workdir mid-flow, check what's already on disk before re-running stages:
+
+- `constitution.md` + `CLAUDE.md` at workdir root, committed → ORIENT already produced its files. If the loop didn't launch (or has exited), skip ahead to launching.
+- `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` present → paper substrate from Stage 2 exists. `/paper-extraction` is idempotent — re-invoke if anything looks partial; it skips done work.
+- `work/reference/code/` present **or** `code-status.yaml` records `found: false` **and** `code-index.md` present → code substrate from Stage 4 exists.
+
+When all three are committed, ORIENT is done. Otherwise, identify the earliest missing piece and resume from there.
diff --git a/claude/lightcone/skills/lc-from-paper/references/review.md b/claude/lightcone/skills/lc-from-paper/references/review.md
index f71e1c7b..5a080bf7 100644
--- a/claude/lightcone/skills/lc-from-paper/references/review.md
+++ b/claude/lightcone/skills/lc-from-paper/references/review.md
@@ -1,6 +1,6 @@
 # REVIEW — close-out in the user's main session
 
-The reproduction has converged: the constitution's `status:` is `closed` (after COMPARE returned `pass`, or `partial` with the un-acted opportunities logged, and the next cold-survey iteration found nothing left to do). The ralph loop's tmux session has exited. REVIEW runs back in the user's main session — the second of two interactive bookends, the first being INTERVIEW. It runs in the user's main session (not as an iteration) because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which isn't available inside detached ralph iterations.
+The reproduction has converged: the constitution's `status:` is `closed` (after COMPARE returned `pass`, or `partial` with the un-acted opportunities logged, and the next cold-survey iteration found nothing left to do). The ralph loop's tmux session has exited. REVIEW runs back in the user's main session — the second of two interactive bookends, the first being ORIENT. It runs in the user's main session (not as an iteration) because both `/figure-comparison` and `/check-sentence-by-sentence` use `AskUserQuestion`, which isn't available inside detached ralph iterations.
 
 Its job is to render the validation surfaces, walk the user through the accumulated open questions, land the resolutions, and draft the final report — in one interactive arc. The Open opportunities list in CLAUDE.md already carries un-acted-on opportunities from the latest COMPARE (those iterations logged them directly); REVIEW just reads them.
 
diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
index a45cd41c..b5d1adb7 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/constitution.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -10,7 +10,7 @@ The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, D
 
 <What "done" looks like for this reproduction. Concrete: which targets, what verdict against them, what validation passes. E.g.: "A complete `astra.yaml` with recipes that produce reproduced versions of <list of targets>, validated by `astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` verdict `pass` against the targets in `targets/targets.md`.">
 
-**Fidelity intent.** <The user's prose answer from INTERVIEW to "what do you want out of this stretch, given what you have to spend on it" — captured verbatim or in close paraphrase. Carries both the aesthetic dimension (what "good enough" looks like) and the pragmatic dimension (compute, tokens, wall-clock budget). E.g.: "just checking if the analysis is tractable — an afternoon of compute", "Figure 3 must be right; the rest can stay rough — overnight", "full fidelity on the BAO fit, baseline elsewhere — a few days", "every primary and secondary target lining up within stated tolerance, no hard deadline". Each iteration reads this when sizing its next move; COMPARE grades opportunities against it. Static once approved at INTERVIEW; the user can sharpen at any REVIEW.>
+**Fidelity intent.** <The user's prose answer from ORIENT to "what do you want out of this stretch, given what you have to spend on it" — captured verbatim or in close paraphrase. Carries both the aesthetic dimension (what "good enough" looks like) and the pragmatic dimension (compute, tokens, wall-clock budget). E.g.: "just checking if the analysis is tractable — an afternoon of compute", "Figure 3 must be right; the rest can stay rough — overnight", "full fidelity on the BAO fit, baseline elsewhere — a few days", "every primary and secondary target lining up within stated tolerance, no hard deadline". Each iteration reads this when sizing its next move; COMPARE grades opportunities against it. Static once approved at ORIENT; the user can sharpen at any REVIEW.>
 
 ## Scope
 
@@ -32,8 +32,8 @@ This is the ceiling; the fidelity intent determines which outputs need to actual
 
 The substrate this reproduction is built against — the canonical sources iterations consult:
 
-- **Paper:** `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` (from `/paper-extraction` during ACQUIRE). The `index.json#citations` block carries each cited paper's resolved DOI for LITERATURE.
-- **Code:** `work/reference/code/` (cloned during ACQUIRE; scan inventory at `work/reference/code-index.md`).
+- **Paper:** `work/reference/{paper.pdf, source/ or document.md, index.json, astra.yaml}` (from `/paper-extraction` during ORIENT). The `index.json#citations` block carries each cited paper's resolved DOI for LITERATURE.
+- **Code:** `work/reference/code/` (cloned during ORIENT; scan inventory at `work/reference/code-index.md`).
 - **Paper DOI:** <doi>
 - **arXiv ID:** <id> (if applicable)
 - **Code repo URL:** <url>

From 6936c54eced9c0da18be16f7034c1bc839348c75 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 22:08:03 +0200
Subject: [PATCH 108/124] lc-from-code: drop inline-only rule in scan-only
 mode; embrace Explore
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The "do the inventory work inline (using Read, Glob, Grep) rather than
dispatching the Explore sub-agent" rule was historical — it came from
the era when /lc-from-code scan-only could run nested inside another
agent context where Agent-spawn-Agent was unsafe. With /lc-from-paper's
ORIENT phase, scan-only always runs in the user's main session at
Stage 4. There's no nested-context concern.

The rule was also empirically wrong about "the inventory is bounded
enough to do inline." For real cosmology monorepos (e.g. CosmoStat's
sp_validation in the dogfood run: 271-line Snakefile + ~2000 lines of
rules + 40 analysis scripts), inline scan is slow and tedious;
Explore is the right tool.

Drops the inline-only branch entirely. Scan-only and fresh-migration
both spawn an Explore sub-agent. For large multi-project codebases,
spawn parallel Explores against coherent subtrees.

Surfaced via dogfood friction #10 against arXiv:2604.03227.

Co-Authored-By: Claude Sonnet 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-code/SKILL.md | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-code/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
index cd3894c1..a14b5172 100644
--- a/claude/lightcone/skills/lc-from-code/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -12,7 +12,7 @@ End-to-end migration: scan existing code, draft or add to `astra.yaml`, paramete
 
 This skill has two invocation contexts. The first is the user-driven default described in the phases below: do the full scan → spec → parameterize → run flow.
 
-The second is **scan-only**, used when `/lc-from-paper`'s ACQUIRE invokes this skill against a cloned reference repo at `work/reference/code/`. The invocation prompt will tell you explicitly to *do only Phase 1's scan*, write the inventory to a path it specifies (typically `work/reference/code-index.md`), and **stop** — do not touch `astra.yaml` at the project root, do not parameterize any code, do not run anything, do not modify the cloned repo. In scan-only mode, **do the inventory work inline** (using Read, Glob, Grep directly) rather than dispatching the Explore sub-agent that fresh-migration mode uses below. The scan-only branch can run nested inside another agent context (no sub-agent dispatch is safe in that case), and the inventory is bounded enough to do inline. Trust the invocation prompt's instructions over the fresh-migration defaults below; if the prompt says scan-only, the scan-only contract holds.
+The second is **scan-only**, used when `/lc-from-paper`'s ORIENT Stage 4 invokes this skill against a cloned reference repo at `work/reference/code/`. The invocation prompt will tell you explicitly to *do only Phase 1's scan*, write the inventory to a path it specifies (typically `work/reference/code-index.md`), and **stop** — do not touch `astra.yaml` at the project root, do not parameterize any code, do not run anything, do not modify the cloned repo. Reach for an Explore sub-agent (or parallel Explore spawns when the repo is large enough that one survey misses the breadth) — that's the cost-effective tool for inventorying a real codebase, and there's no longer any nested-context concern that would forbid it. Trust the invocation prompt's instructions over the fresh-migration defaults below; if the prompt says scan-only, the scan-only contract holds (stop after writing the inventory file).
 
 ## Phase 1: Scan & Spec
 
@@ -23,7 +23,7 @@ First, invoke `/astra` and read its Decisions section, then decide which mode ap
 
 ### Scanning the project
 
-In **fresh migration** mode (user's main session, full migration flow), spawn an Explore subagent to scan the project. Include the decision criteria in the prompt so the subagent can classify candidates:
+In both modes, spawn an Explore sub-agent to scan the project. Include the decision criteria in the prompt so the sub-agent can classify candidates:
 
 ```
 Agent(subagent_type="Explore", prompt="""
@@ -58,14 +58,9 @@ For reference, here are the decision criteria for classifying candidates:
 """)
 ```
 
-In **scan-only** mode (invoked by `/lc-from-paper` ACQUIRE), do the same inventory work inline using Read / Glob / Grep:
+When the codebase is large enough that one Explore pass risks missing depth (a multi-project monorepo, a workflow folder plus a notebooks tree plus a `src/` package), spawn Explores in parallel against the named subtrees — one Explore per coherent region. Aggregate their inventories into the final scan output.
 
-- `Glob` for `**/*.py`, `**/*.ipynb`, `**/Dockerfile`, `**/Containerfile`, `**/requirements*.txt`, `**/environment*.yml`, `**/pyproject.toml`, and any other relevant dependency / container manifests. Inventory the matches.
-- For each script and notebook, `Read` it (paginating with offset / limit for large files) to identify what it does, what it reads / writes, and any hardcoded analytical choices with `file:line` references.
-- `Grep` for repeated patterns when surveying for candidate decisions across the tree (magic numbers, common method-selector patterns, config-dict keys).
-- Apply the same decision criteria from `/astra` (Decisions section) to classify candidates; the criteria are the filter regardless of whether the inventory came from an Explore sub-agent or inline reads.
-
-Either way, write the scan results to `CLAUDE.md` under `## Project Notes` (fresh migration) or to the path the invocation prompt specifies (scan-only — typically `work/reference/code-index.md`) as a script inventory, then in fresh migration mode draft or add to `astra.yaml` from the scan results following the spec structure documented in `/astra`. In scan-only mode, stop after the inventory file lands; do not touch `astra.yaml`. Use the decision criteria from `/astra` (Decisions section) to filter candidate decisions down to only true analytical choices — most hardcoded values are implementation details, not decisions. Use current hardcoded values as defaults.
+Write the scan results to `CLAUDE.md` under `## Project Notes` (fresh migration) or to the path the invocation prompt specifies (scan-only — typically `work/reference/code-index.md`) as a script inventory, then in fresh migration mode draft or add to `astra.yaml` from the scan results following the spec structure documented in `/astra`. In scan-only mode, stop after the inventory file lands; do not touch `astra.yaml`. Use the decision criteria from `/astra` (Decisions section) to filter candidate decisions down to only true analytical choices — most hardcoded values are implementation details, not decisions. Use current hardcoded values as defaults.
 
 In augment mode, preserve the existing paper-derived or user-derived `inputs`, `outputs`, `decisions`, `findings`, and `narrative` unless the code scan shows a real conflict. Attach code evidence to the nearest existing home first. Create new ASTRA structure only when the code reveals a real analysis object that has no suitable home in the current spec.
 

From 1deb58865aa480106b4c85fb4d6c0c92c2dd8cdb Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 22:15:36 +0200
Subject: [PATCH 109/124] =?UTF-8?q?docs:=20propagate=20INTERVIEW=20+=20ACQ?=
 =?UTF-8?q?UIRE=20=E2=86=92=20ORIENT=20renaming?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Following the ORIENT collapse in 1d4f6b8, sweep the docs to match.

- README.md: top-level pitch paragraph for /lc-from-paper rewritten
  around ORIENT's seven stages.
- claude/lightcone/skills/README.md: bundle table entries for
  lc-from-paper, ralph, paper-extraction updated.
- docs/skills/lc-from-paper.md: phase table goes 0 ORIENT → 7 REVIEW;
  new "ORIENT stages" section enumerates the seven; per-paper substrate
  + disciplines + anti-patterns updated.
- docs/skills/index.md: ralph row mentions ORIENT.
- docs/skills/paper-extraction.md: invocation context updated to
  "ORIENT Stage 2."
- docs/user/agent-workflow.md: end-user description of the
  reproduction flow rewritten around ORIENT.

Surfaced via Cail's dogfood: he asked whether we'd been updating the
docs and we had not. Now we have.

Co-Authored-By: Claude Sonnet 4.7 (1M context) <noreply@anthropic.com>
---
 README.md                         |   2 +-
 claude/lightcone/skills/README.md |   6 +-
 docs/skills/index.md              |   2 +-
 docs/skills/lc-from-paper.md      | 103 +++++++++++++++++++-----------
 docs/skills/paper-extraction.md   |   4 +-
 docs/user/agent-workflow.md       |  27 ++++----
 6 files changed, 90 insertions(+), 54 deletions(-)

diff --git a/README.md b/README.md
index 56fcbdad..7145b1aa 100644
--- a/README.md
+++ b/README.md
@@ -42,7 +42,7 @@ Scans an existing codebase, drafts an `astra.yaml` that captures its inputs, out
 
 ### `/lc-from-paper` — Reproduce a published paper
 
-Interview-first driver for reproducing a published paper in ASTRA. INTERVIEW + ACQUIRE run in the user's main session — drafting a per-paper `constitution.md` (the ralph loop's driving document) plus a `CLAUDE.md` (auto-loading rules + accumulators), then standing up the on-disk substrate (`/paper-extraction` for the paper, `/lc-from-code` in scan-only mode for the code). Then the rest of the reproduction hands off to a **ralph loop** whose iterations carry the long middle: ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE. Each iteration runs in a fresh tmux session against the constitution; the fresh-context property between iterations is what makes per-phase review work. When the loop closes (constitution `status: closed` after COMPARE returns `pass`), REVIEW runs back in the user's main session. Composes a bundle of sibling skills (`ralph`, `paper-extraction`, `narrative`, `figure-comparison`, `check-sentence-by-sentence`). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
+ORIENT-first driver for reproducing a published paper in ASTRA. ORIENT runs in the user's main session in seven stages — asks for the paper, runs `/paper-extraction` inline to acquire it, interviews the user (grounded in the paper), clones the reference code and runs `/lc-from-code` scan-only (if a repo exists), optionally follows up, then drafts a per-paper `constitution.md` (the ralph loop's driving document) + `CLAUDE.md` (auto-loading rules + accumulators) from the full paper-plus-code context for user review. Then the rest of the reproduction hands off to a **ralph loop** whose iterations carry the long middle: ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE. Each iteration runs in a fresh tmux session against the constitution; the fresh-context property between iterations is what makes per-phase review work. When the loop closes (constitution `status: closed` after COMPARE returns `pass`), REVIEW runs back in the user's main session. Composes a bundle of sibling skills (`ralph`, `paper-extraction`, `narrative`, `figure-comparison`, `check-sentence-by-sentence`). See [`claude/lightcone/skills/README.md`](claude/lightcone/skills/README.md) for the full bundle map.
 
 ### `/lc-feedback` — Report a bug
 
diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index bc26ab79..6e33c8a5 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -27,10 +27,10 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Reproduction driver.** Interview-first; INTERVIEW + ACQUIRE run in the user's main session (drafts a per-paper `constitution.md` + `CLAUDE.md`, stands up the substrate via `/paper-extraction` and `/lc-from-code` scan-only). Then hands off to a ralph loop whose iterations carry the long middle: ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE. When the loop closes (constitution `status: closed` after COMPARE returns `pass`), REVIEW runs back in the user's main session. Fidelity intent — captured as prose at INTERVIEW — is what every iteration translates into per-move cheap/heavy decisions, and what COMPARE grades opportunities against. |
-| [`ralph`](ralph/SKILL.md) | The loop substrate. `lc-from-paper`'s INTERVIEW invokes `/ralph`'s Authoring mode to draft the per-paper constitution; ACQUIRE's hand-off invokes the launcher. Each iteration runs `/ralph`'s Loop protocol against the constitution. |
+| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Reproduction driver.** ORIENT-first; one pre-loop phase in the user's main session that asks for the paper, runs `/paper-extraction` inline, interviews the user (grounded in the paper), clones the reference code and runs `/lc-from-code` scan-only (when a repo exists), and drafts the per-paper `constitution.md` + `CLAUDE.md`. Then hands off to a ralph loop whose iterations carry the long middle: ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE. When the loop closes (constitution `status: closed` after COMPARE returns `pass`), REVIEW runs back in the user's main session. Fidelity intent — captured as prose at ORIENT — is what every iteration translates into per-move cheap/heavy decisions, and what COMPARE grades opportunities against. |
+| [`ralph`](ralph/SKILL.md) | The loop substrate. `lc-from-paper`'s ORIENT invokes `/ralph`'s Authoring mode to draft the per-paper constitution; the loop launcher hands off after ORIENT lands. Each iteration runs `/ralph`'s Loop protocol against the constitution. |
 | [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by `lc-from-paper`'s ARCHITECT (for the structural narrative) and SPECIFY (for anchored content narrative). |
-| [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations with resolved DOIs) plus a stub `astra.yaml` for the paper. Primary acquisition path for `lc-from-paper`'s ACQUIRE; also invoked per cited paper by LITERATURE. |
+| [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations with resolved DOIs) plus a stub `astra.yaml` for the paper. Primary acquisition path for `lc-from-paper`'s ORIENT (Stage 2); also invoked per cited paper by LITERATURE. |
 | [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Audit paper claims against code locations (`file:line` or `NOT FOUND`). Invoked from `lc-from-paper`'s REVIEW close-out (opt-in); also user-invokable directly. |
 | [`figure-comparison`](figure-comparison/SKILL.md) | Build a self-contained HTML side-by-side: original figures/tables/numerics vs replicated. Invoked from `lc-from-paper`'s REVIEW close-out (mandatory); also user-invokable directly. |
 
diff --git a/docs/skills/index.md b/docs/skills/index.md
index ba960dbb..6fa9c513 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -34,7 +34,7 @@ dispatches them by role during the reproduction.
 
 | Skill | Command | Purpose |
 |-------|---------|---------|
-| [ralph](ralph.md) | `/ralph` | Loop substrate. `lc-from-paper`'s INTERVIEW invokes ralph's Authoring mode to draft the per-paper constitution; ACQUIRE's hand-off invokes the launcher; each iteration runs ralph's Loop protocol. Also user-invokable standalone (see the Project lifecycle row above). |
+| [ralph](ralph.md) | `/ralph` | Loop substrate. `lc-from-paper`'s ORIENT invokes ralph's Authoring mode to draft the per-paper constitution; the loop launcher hands off after ORIENT lands; each iteration runs ralph's Loop protocol. Also user-invokable standalone (see the Project lifecycle row above). |
 | [paper-extraction](paper-extraction.md) | `/paper-extraction` | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: substrate, figures, tables, citations (with resolved DOIs), and a stub `astra.yaml`. |
 | [narrative](narrative.md) | `/narrative` | Author the `narrative:` prose and decision `rationale:` against an existing `astra.yaml`, in paper-reproduction, retrofit, or co-drafting mode. |
 | [figure-comparison](figure-comparison.md) | `/figure-comparison` | Build a self-contained HTML side-by-side: paper figures, tables, and numerics vs reproduced artifacts. |
diff --git a/docs/skills/lc-from-paper.md b/docs/skills/lc-from-paper.md
index f4b690f4..89f3c53f 100644
--- a/docs/skills/lc-from-paper.md
+++ b/docs/skills/lc-from-paper.md
@@ -1,12 +1,13 @@
 # /lc-from-paper
 
 Reproduce a published scientific paper as a complete ASTRA project. The
-skill is **interview-first** and **ralph-driven**. INTERVIEW and
-ACQUIRE run in the user's main session to set up the per-paper
-substrate. A ralph loop then carries the long middle —
-ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE —
-across many iterations against the same constitution. REVIEW returns
-to the user's main session once the loop closes.
+skill is **ORIENT-first** and **ralph-driven**. ORIENT runs in the
+user's main session — figuring out what the user wants, standing up the
+paper and code substrate, and drafting the per-paper constitution. A
+ralph loop then carries the long middle — ARCHITECT → SPECIFY →
+LITERATURE → IMPLEMENT → RUN → COMPARE — across many iterations against
+the same constitution. REVIEW returns to the user's main session once
+the loop closes.
 
 `/lc-from-paper` is the entry point of the paper-reproduction bundle.
 Sibling skills ([`ralph`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/SKILL.md)
@@ -21,14 +22,16 @@ Source: [`claude/lightcone/skills/lc-from-paper/SKILL.md`](https://github.com/Li
 
 Two pieces.
 
-1. **Interactive bookends in the user's main session.** INTERVIEW and
-   REVIEW are conversations with the user. ACQUIRE is two parallel
-   sub-skill invocations (`/paper-extraction` and `/lc-from-code` in
-   scan-only mode) that produce the on-disk substrate everything
-   downstream consults.
+1. **Interactive bookends in the user's main session.** ORIENT and
+   REVIEW are conversations with the user. ORIENT runs in stages —
+   ask for the paper, run `/paper-extraction` inline, interview
+   (grounded in the paper), clone the code and run `/lc-from-code`
+   scan-only (if a repo exists), optionally follow up, then draft
+   `constitution.md` + `CLAUDE.md` from the full paper-plus-code
+   context for user review.
 
-2. **A ralph loop for the long middle.** Once `constitution.md` is
-   drafted (INTERVIEW) and the substrate is on disk (ACQUIRE),
+2. **A ralph loop for the long middle.** Once ORIENT lands —
+   `constitution.md` drafted, paper and code substrate on disk —
    `/lc-from-paper` launches a ralph loop against the constitution.
    Each iteration starts a fresh tmux-detached Claude session with
    the constitution loaded into its system prompt, surveys the
@@ -42,25 +45,53 @@ Two pieces.
 
 ## Phases
 
-Nine phases, zero-indexed. INTERVIEW + ACQUIRE + REVIEW run in the
-user's main session; phases 2–7 run as ralph iterations.
+Eight phases, zero-indexed. ORIENT + REVIEW run in the user's main
+session; phases 1–6 run as ralph iterations.
 
 | # | Phase | Where | Primary outputs |
 |---|-------|-------|------------------|
-| 0 | INTERVIEW | user's main session | per-paper `constitution.md` + `CLAUDE.md` |
-| 1 | ACQUIRE | user's main session | `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml, code/, code-status.yaml, code-index.md}` |
-| 2 | ARCHITECT | ralph iteration | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative) |
-| 3 | SPECIFY | ralph iteration | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
-| 4 | LITERATURE | ralph iteration | `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; per-paper PDFs cached via `astra paper add` |
-| 5 | IMPLEMENT | ralph iteration | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
-| 6 | RUN | ralph iteration | `results/<universe>/<output>/` |
-| 7 | COMPARE | ralph iteration | `comparison-report.{yaml,md}` plus an opportunity assessment graded against the user's fidelity intent |
-| 8 | REVIEW | user's main session | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
+| 0 | ORIENT | user's main session | per-paper `constitution.md` + `CLAUDE.md` + paper substrate at `work/reference/{paper.pdf, source/ or document.md, figures/, tables/, index.json, astra.yaml}` (from inline `/paper-extraction`) + code substrate at `work/reference/{code/, code-status.yaml, code-index.md}` (from inline `/lc-from-code` scan-only, when a repo exists) |
+| 1 | ARCHITECT | ralph iteration | stub `astra.yaml` (sub-analyses, inputs, outputs, narrative) |
+| 2 | SPECIFY | ralph iteration | filled `astra.yaml` (`decisions:`, `findings:`, `prior_insights:` placeholders, anchored narrative); `targets/targets.md`; `implementation-notes.md`; `universes/baseline.yaml` |
+| 3 | LITERATURE | ralph iteration | `prior_insights:` Evidence entries each carry resolved `quote:` + `location:` selectors; per-paper PDFs cached via `astra paper add` |
+| 4 | IMPLEMENT | ralph iteration | `scripts/`, `requirements.txt`, recipes in `astra.yaml` |
+| 5 | RUN | ralph iteration | `results/<universe>/<output>/` |
+| 6 | COMPARE | ralph iteration | `comparison-report.{yaml,md}` plus an opportunity assessment graded against the user's fidelity intent |
+| 7 | REVIEW | user's main session | `REPRODUCTION-SUMMARY.md`, `/figure-comparison` HTML, resolved `open-questions.md`, finalized reproduction outcome |
+
+## ORIENT stages
+
+ORIENT is one phase executed in seven stages, each grounded in what
+the earlier stages produced:
+
+1. **Ask for the paper** in prose (the answer is free-form: arXiv ID,
+   DOI, or PDF path). No `AskUserQuestion` here — it's the wrong
+   shape for a free-form string.
+2. **Run `/paper-extraction <id>` inline** and read the substrate
+   it produced — index.json, abstract, conclusions, data/code
+   availability, acknowledgements. This grounds every subsequent
+   question.
+3. **Interview the user** with `AskUserQuestion` for scope, fidelity
+   intent, code repo confirmation, paper-specific conventions, prior
+   familiarity, and external context — each question referencing the
+   paper's actual figures, claims, and structure.
+4. **Clone the reference code and run `/lc-from-code` scan-only**
+   (skip cleanly when no public code repo exists). The scan produces
+   `code-index.md` — the iterations' code surface.
+5. **Optional follow-up questions** if the code-index surfaced
+   something that affects scope or constitution shape. Usually
+   skipped.
+6. **Draft `constitution.md` + `CLAUDE.md`** — both files now
+   informed by paper *and* code substrate. The constitution's Scope
+   and sub-analysis decomposition can lean on the actual pipeline.
+7. **User reviews drafts → refine → single first commit (constitution
+   + CLAUDE + paper substrate + code substrate) → launch the ralph
+   loop.**
 
 ## Per-paper substrate: constitution + CLAUDE.md
 
-INTERVIEW drafts two files in the reproduction workdir; every
-iteration picks them up on launch.
+ORIENT drafts two files in the reproduction workdir; every iteration
+picks them up on launch.
 
 - **`constitution.md`** — the ralph loop's driving document, *task-bound*.
   YAML frontmatter declares `status: active`. Goal (carrying the
@@ -98,13 +129,13 @@ Pointers, not snapshots.
   intent is partly aesthetic ("how good does this need to be?") and
   partly pragmatic ("what's feasible given the compute, tokens, and
   wall-clock available?"). The honest meta-conversation lives in
-  INTERVIEW. There's no explicit review state machine: every
-  iteration reads the most recent artifact critically as part of
-  survey, fixes what needs fixing or advances if nothing does. The
-  fresh-context property at iteration boundaries makes the next
-  iteration the review. Gaps the intent wants pushed further than
-  the loop has time to deliver become Open opportunities in
-  CLAUDE.md for a future loop.
+  ORIENT. There's no explicit review state machine: every iteration
+  reads the most recent artifact critically as part of survey,
+  fixes what needs fixing or advances if nothing does. The fresh-context
+  property at iteration boundaries makes the next iteration the
+  review. Gaps the intent wants pushed further than the loop has
+  time to deliver become Open opportunities in CLAUDE.md for a future
+  loop.
 - **arXiv LaTeX first.** PDF + Docling is the non-arXiv fallback only.
 - **No synthetic data.** Unless the paper itself uses synthetic data,
   every input must be real.
@@ -116,8 +147,8 @@ Pointers, not snapshots.
 ## Anti-patterns
 
 - Doing the long middle in the user's main session instead of launching
-  the loop. INTERVIEW + ACQUIRE + REVIEW belong in the main session;
-  ARCHITECT through COMPARE belong in iterations.
+  the loop. ORIENT and REVIEW belong in the main session; ARCHITECT
+  through COMPARE belong in iterations.
 - Asking an iteration to use `AskUserQuestion` — iterations are
   detached.
 - Re-implementing what `astra` already does (`astra validate`, `astra
@@ -132,7 +163,7 @@ Pointers, not snapshots.
   — why the bundle is co-located rather than a separate plugin install.
 - [`/ralph`](https://github.com/LightconeResearch/lightcone-cli/blob/main/claude/lightcone/skills/ralph/SKILL.md)
   — the loop substrate (authoring + launching + iterating).
-- [`/paper-extraction`](paper-extraction.md) — ACQUIRE's primary
+- [`/paper-extraction`](paper-extraction.md) — ORIENT Stage 2's
   acquisition path; also invoked per cited paper by LITERATURE.
 - [`/narrative`](narrative.md) — ARCHITECT's structural narrative and
   SPECIFY's anchored content narrative.
diff --git a/docs/skills/paper-extraction.md b/docs/skills/paper-extraction.md
index 09f2deb7..7c7cc05f 100644
--- a/docs/skills/paper-extraction.md
+++ b/docs/skills/paper-extraction.md
@@ -125,7 +125,7 @@ cached PDF — paraphrasing breaks the gate.
 ## Related
 
 - [`/lc-from-paper`](lc-from-paper.md) — invokes `/paper-extraction`
-  during ACQUIRE for the target paper, and again from inside a ralph
-  iteration for each cited paper during LITERATURE; each iteration
+  during ORIENT Stage 2 for the target paper, and again from inside a
+  ralph iteration for each cited paper during LITERATURE; each iteration
   reads `index.json` and the substrate directly.
 - [`/astra`](index.md#reference-skills-auto-primed-via-session-start) — Insight + Evidence shape, `quote.exact` rules.
diff --git a/docs/user/agent-workflow.md b/docs/user/agent-workflow.md
index b051f7e3..349054f1 100644
--- a/docs/user/agent-workflow.md
+++ b/docs/user/agent-workflow.md
@@ -83,17 +83,22 @@ driven by an interview-first agent that hands off to a long-running
 ralph loop for the heavy middle.**
 
 `/lc-from-paper` is the entry point of the paper-reproduction bundle.
-It opens with a short interactive interview — paper identity, scope
-(full vs targeted), fidelity intent (your prose answer to "when is
-this good enough"), and any paper-specific conventions — then drafts
-**two files** at the reproduction workdir root: `constitution.md`
-(the ralph loop's driving document — Goal, fidelity intent, scope,
-quality bar, evidence) and `CLAUDE.md` (the auto-loading walk-up with
-rules, the Rigor accumulator, and the paper-vs-code disagreements log).
-ACQUIRE then runs in the same session, standing up the paper and code
-substrate via `/paper-extraction` and `/lc-from-code` in scan-only mode.
-
-After ACQUIRE lands, the skill launches a **ralph loop** in a detached
+It opens with **ORIENT** — one pre-loop phase in your main session
+that runs in seven stages: ask for the paper, run `/paper-extraction`
+inline (so subsequent questions are grounded in the actual paper),
+interview you (scope, fidelity intent — your prose answer to "when is
+this good enough" — code repo confirmation, paper-specific
+conventions, prior familiarity, external context), clone the
+reference code and run `/lc-from-code` scan-only (when a repo exists),
+optionally follow up, then draft **two files** at the workdir root:
+`constitution.md` (the ralph loop's driving document — Goal, fidelity
+intent, scope, quality bar, evidence) and `CLAUDE.md` (the auto-loading
+walk-up with rules, the paper-vs-code disagreements log, open
+opportunities). You review the drafts, then a single first commit
+captures `constitution.md` + `CLAUDE.md` + the full `work/reference/`
+substrate.
+
+After ORIENT lands, the skill launches a **ralph loop** in a detached
 tmux session against `constitution.md`. Each iteration starts a fresh
 worker that surveys the workdir, picks the next valuable move
 (typically one of ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN

From dab2d4d4e18de4fc8594610f0ac394d1b0812871 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:11:42 +0200
Subject: [PATCH 110/124] docs/skills: ORIENT-first, not interview-first
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Lingering "interview-first driver" in the index row — the rest of the
docs propagated to ORIENT-first when INTERVIEW + ACQUIRE collapsed to
ORIENT (1deb588). Same row as docs/skills/lc-from-paper.md's
"ORIENT-first and ralph-driven" lede; align.
---
 docs/skills/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/skills/index.md b/docs/skills/index.md
index 6fa9c513..e1c352b7 100644
--- a/docs/skills/index.md
+++ b/docs/skills/index.md
@@ -22,7 +22,7 @@ user-invokable directly.
 |-------|---------|---------|
 | [lc-new](lc-new.md) | `/lc-new` | Scope a research question into an `astra.yaml`, with optional literature extraction. |
 | [lc-from-code](lc-from-code.md) | `/lc-from-code` | Wrap an existing codebase in ASTRA: scan, generate spec, parameterize, run. |
-| [lc-from-paper](lc-from-paper.md) | `/lc-from-paper` | Reproduce a published paper in ASTRA — interview-first driver that hands off to a ralph loop for the long middle. |
+| [lc-from-paper](lc-from-paper.md) | `/lc-from-paper` | Reproduce a published paper in ASTRA — ORIENT-first driver that hands off to a ralph loop for the long middle. |
 | [lc-feedback](lc-feedback.md) | `/lc-feedback` | File a GitHub issue against the right Lightcone repo with auto-collected context. |
 | [ralph](ralph.md) | `/ralph` | Author a constitution and run a ralph loop against it. Used by `lc-from-paper` for the long middle; standalone for any other long-running work. |
 

From aa58ac4405caf437b81975181bf573395e40f663 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:11:49 +0200
Subject: [PATCH 111/124] ralph: drop Rigor from the CLAUDE.md accumulator
 example

The Loop protocol's Update step listed "Rigor *Current state*,
Paper-vs-code disagreements, open opportunities" as example
accumulators. Rigor was dropped from the lc-from-paper bundle's
CLAUDE.md template in 4524774 (review is "read critically as part of
survey," not a Rigor accumulator), so naming it as an example
accumulator in ralph's discipline points at vocabulary the only
bundled consumer no longer carries. The remaining two examples
(Paper-vs-code disagreements, Open opportunities) are both present in
the template.
---
 claude/lightcone/skills/ralph/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/ralph/SKILL.md b/claude/lightcone/skills/ralph/SKILL.md
index 5155c2f0..f65013f2 100644
--- a/claude/lightcone/skills/ralph/SKILL.md
+++ b/claude/lightcone/skills/ralph/SKILL.md
@@ -147,7 +147,7 @@ Anything after a literal `--` separator forwards to the backend unchanged. Commo
 
 1. **Survey** — Fresh eyes. Read the constitution and the workdir's `CLAUDE.md`. Check `git log`, glance at sub-fibers or notes the prior iteration left, look at what's actually in the workdir.
 2. **Work** — Stay and work from the vantage point the survey built. Make 1–3 substantial contributions; don't try to clear the queue in one iteration.
-3. **Update** — Before exiting: commit your work; update `CLAUDE.md`'s accumulators (Rigor *Current state*, Paper-vs-code disagreements, open opportunities — whichever the project carries) if anything sharpened; sharpen the constitution body itself if a fact stable enough to belong in *Context* or *Desired State* landed.
+3. **Update** — Before exiting: commit your work; update `CLAUDE.md`'s accumulators (Paper-vs-code disagreements, Open opportunities — whichever the project carries) if anything sharpened; sharpen the constitution body itself if a fact stable enough to belong in *Context* or *Desired State* landed.
 4. **Exit** — `kill $PPID`.
 
 ### Earn the vantage point

From 3a50a19048de4991534ebdea47d13d7a5db867da Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:19:59 +0200
Subject: [PATCH 112/124] lc-from-paper: drop the orchestrator-era review-file
 commit cadence
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

architect.md's and specify.md's commit-cadence bullets still mentioned
"review-N files" / "review files" as separate artifacts produced by
the per-phase reviewer sub-agents. In the ralph model, reviews happen
inline as part of survey — fixes go straight into astra.yaml etc. and
there are no per-iteration review files. Rephrase the bullets to talk
about fix passes instead, matching the surrounding 'Reviewing prior
<phase> work as part of survey' sections in each reference.
---
 claude/lightcone/skills/lc-from-paper/references/architect.md | 2 +-
 claude/lightcone/skills/lc-from-paper/references/specify.md   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/references/architect.md b/claude/lightcone/skills/lc-from-paper/references/architect.md
index 84bf6d0c..a2556e77 100644
--- a/claude/lightcone/skills/lc-from-paper/references/architect.md
+++ b/claude/lightcone/skills/lc-from-paper/references/architect.md
@@ -109,4 +109,4 @@ Don't flag empty `decisions:` / `prior_insights:` / `findings:` — that's SPECI
 - **The stub's empty blocks are intentional.** `decisions: {}`, `prior_insights: {}`, `findings: {}` make it clear at a glance that ARCHITECT's job is structural and SPECIFY fills them. Don't try to half-author content — empty is honest.
 - **Code-as-canonical for structure, paper-as-canonical for narrative voice.** The code reveals where the real stage boundaries are; the paper provides the words to describe them. The stub uses both.
 - **The narrative skill is the prose author, not the structure author.** Invoke `/narrative` for the prose blocks; ARCHITECT's job is the structural skeleton plus invoking `/narrative` to fill the `narrative:` keys cleanly.
-- **Commit each artifact as it lands.** Stub commits before any review-N file; review-N files commit one per iteration; each fix pass commits separately. Small, descriptive commits keep `git log` legible to the next iteration.
+- **Commit each artifact as it lands.** The stub commits when it lands; each subsequent fix pass commits separately. Small, descriptive commits keep `git log` legible to the next iteration.
diff --git a/claude/lightcone/skills/lc-from-paper/references/specify.md b/claude/lightcone/skills/lc-from-paper/references/specify.md
index 47488e28..eba368dc 100644
--- a/claude/lightcone/skills/lc-from-paper/references/specify.md
+++ b/claude/lightcone/skills/lc-from-paper/references/specify.md
@@ -195,4 +195,4 @@ Out-of-scope targets stay in `targets/targets.md` with an explicit reason and sh
 - **The target ledger is a derivation, not a separate phase's output.** Treat `targets/targets.md` as a small index produced alongside the filled `astra.yaml`, not a heavyweight artifact. The depth lives in `astra.yaml`'s `outputs:` / `findings:` / `decisions:`.
 - **Two-pass discipline is the cross-check.** Skipping the code pass (when code exists) loses the canonical-resolution surface and lets paper-vs-code material disagreements slip through. The fresh-context review can recover *some* of these but not all — the disciplined sequence (paper → code → review) catches more.
 - **Per-sub-analysis parallelism is opt-in.** When sub-analyses are independent (no shared decision blocks, no cross-sub-analysis findings), the iteration can fan out one-level-deep sub-agents (one per sub-analysis from inside its main session) to run their passes in parallel. When they share material decisions or findings (rare), serialize across iterations.
-- **Commit per sub-analysis as it lands.** Each sub-analysis's filled-in `astra.yaml` slice + its targets/implementation-notes/baseline updates earn one commit; review files commit one per review-iteration. The next iteration reads `git log` to track progress; small commits keep the trail readable.
+- **Commit per sub-analysis as it lands.** Each sub-analysis's filled-in `astra.yaml` slice + its targets/implementation-notes/baseline updates earn one commit; subsequent fix passes commit separately. The next iteration reads `git log` to track progress; small commits keep the trail readable.

From 05dc3413f642f2600855eb85d416a11e206b390d Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:20:06 +0200
Subject: [PATCH 113/124] docs/user/agent-workflow: propagate ORIENT-first +
 drop the Rigor section
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two leftover-rename slips at the user-facing agent-workflow doc, both of
the same shape the prior iteration's docs/skills/index.md and ralph
SKILL.md fixes targeted:

- 'driven by an interview-first agent' → 'driven by an ORIENT-first
  agent' (matches the propagated INTERVIEW + ACQUIRE → ORIENT rename in
  1d4f6b8 / 1deb588 / dab2d4d).
- COMPARE's opportunity assessment propagates into 'CLAUDE.md's Rigor
  section' → into 'CLAUDE.md's *Open opportunities* list' (matches the
  CLAUDE.md template after the Rigor *Current state* accumulator was
  dropped in 4524774 and aa58ac4).
---
 docs/user/agent-workflow.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/user/agent-workflow.md b/docs/user/agent-workflow.md
index 349054f1..9ebf6b63 100644
--- a/docs/user/agent-workflow.md
+++ b/docs/user/agent-workflow.md
@@ -79,7 +79,7 @@ parameter plumbing.
 ## `/lc-from-paper` — reproduce a published paper
 
 **You have a DOI or arXiv ID. You end with a reproduction project
-driven by an interview-first agent that hands off to a long-running
+driven by an ORIENT-first agent that hands off to a long-running
 ralph loop for the heavy middle.**
 
 `/lc-from-paper` is the entry point of the paper-reproduction bundle.
@@ -118,8 +118,8 @@ session: `/figure-comparison` against the targets, optional
 questions, a `REPRODUCTION-SUMMARY.md`. COMPARE's opportunity
 assessment — where the gaps are, how much they likely matter, and how
 they sit relative to your fidelity intent — propagates into
-CLAUDE.md's Rigor section as the trajectory of what could be tightened
-on a return visit.
+CLAUDE.md's *Open opportunities* list as the trajectory of what could
+be tightened on a return visit.
 
 The bundle composes sibling skills: `ralph` (the loop substrate),
 `paper-extraction`, `narrative`, `figure-comparison`, and

From eb811860b096b5a64d42c9d72cf8bbe77f51be96 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:26:41 +0200
Subject: [PATCH 114/124] lc-from-paper: drop two more Rigor-vocabulary orphans
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Commit 4524774 dropped the cheap/heavy review-termination axis and the
"review-and-fix iteration" framing from every artifact-producing
reference, but two adjacent surfaces preserved the dropped vocabulary:

- bundle README's lc-from-paper row described iterations translating
  fidelity intent into "per-move cheap/heavy decisions" — the cheap/heavy
  axis no longer exists anywhere else in the bundle. Replaced with the
  surviving framing from orient.md:91 ("each iteration reads the intent
  when sizing its next move").

- literature.md's commit-cadence bullet listed "the review-and-fix
  iteration commits its diff" as a distinct commit type. Same family
  as the architect.md / specify.md fixes in 3a50a19, applied to a third
  file with the same rephrase pattern ("Subsequent fix passes commit
  separately").
---
 claude/lightcone/skills/README.md                              | 2 +-
 claude/lightcone/skills/lc-from-paper/references/literature.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md
index 6e33c8a5..6d3a238d 100644
--- a/claude/lightcone/skills/README.md
+++ b/claude/lightcone/skills/README.md
@@ -27,7 +27,7 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i
 
 | Skill | Role |
 |---|---|
-| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Reproduction driver.** ORIENT-first; one pre-loop phase in the user's main session that asks for the paper, runs `/paper-extraction` inline, interviews the user (grounded in the paper), clones the reference code and runs `/lc-from-code` scan-only (when a repo exists), and drafts the per-paper `constitution.md` + `CLAUDE.md`. Then hands off to a ralph loop whose iterations carry the long middle: ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE. When the loop closes (constitution `status: closed` after COMPARE returns `pass`), REVIEW runs back in the user's main session. Fidelity intent — captured as prose at ORIENT — is what every iteration translates into per-move cheap/heavy decisions, and what COMPARE grades opportunities against. |
+| [`lc-from-paper`](lc-from-paper/SKILL.md) | **Reproduction driver.** ORIENT-first; one pre-loop phase in the user's main session that asks for the paper, runs `/paper-extraction` inline, interviews the user (grounded in the paper), clones the reference code and runs `/lc-from-code` scan-only (when a repo exists), and drafts the per-paper `constitution.md` + `CLAUDE.md`. Then hands off to a ralph loop whose iterations carry the long middle: ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE. When the loop closes (constitution `status: closed` after COMPARE returns `pass`), REVIEW runs back in the user's main session. Fidelity intent — captured as prose at ORIENT — is what every iteration reads when sizing its next move, and what COMPARE grades opportunities against. |
 | [`ralph`](ralph/SKILL.md) | The loop substrate. `lc-from-paper`'s ORIENT invokes `/ralph`'s Authoring mode to draft the per-paper constitution; the loop launcher hands off after ORIENT lands. Each iteration runs `/ralph`'s Loop protocol against the constitution. |
 | [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by `lc-from-paper`'s ARCHITECT (for the structural narrative) and SPECIFY (for anchored content narrative). |
 | [`paper-extraction`](paper-extraction/SKILL.md) | Turn an arXiv ID or DOI into a standardized `work/reference/` directory: structural index (figures, tables, outline, citations with resolved DOIs) plus a stub `astra.yaml` for the paper. Primary acquisition path for `lc-from-paper`'s ORIENT (Stage 2); also invoked per cited paper by LITERATURE. |
diff --git a/claude/lightcone/skills/lc-from-paper/references/literature.md b/claude/lightcone/skills/lc-from-paper/references/literature.md
index 640807f4..5c49f61f 100644
--- a/claude/lightcone/skills/lc-from-paper/references/literature.md
+++ b/claude/lightcone/skills/lc-from-paper/references/literature.md
@@ -196,4 +196,4 @@ If the entry genuinely has no supporting quote in the cited paper, log it to `op
 - **Resume is automatic.** If `work/cited/<doi-slug>/work/reference/index.json` exists, skip that DOI's fetch. If `work/notes/literature/resolutions.yaml` has an entry for a placeholder, skip that placeholder's quote-finding.
 - **Unresolved is not failure.** A placeholder that no quote in the cited paper supports is a real signal — the target paper cited loosely or paraphrased beyond what the source actually says. Surface to `open-questions.md`; don't fabricate evidence.
 - **`astra validate --verify-evidence` runs after the merge**, not after each Haiku's per-placeholder output. Haikus write to disjoint files; the deterministic check happens once `astra.yaml` is updated.
-- **Commit per stage.** Fetches commit together once Stage 1 completes (one commit for all cited-paper substrates). Quote-finding commits together once Stage 2 completes (`resolutions.yaml` + Haiku files). The merge into `astra.yaml` is its own commit. The review-and-fix iteration commits its diff. The next iteration reads `git log` to see progress.
+- **Commit per stage.** Fetches commit together once Stage 1 completes (one commit for all cited-paper substrates). Quote-finding commits together once Stage 2 completes (`resolutions.yaml` + Haiku files). The merge into `astra.yaml` is its own commit. Subsequent fix passes commit separately. The next iteration reads `git log` to see progress.

From 37232eca9fe54e7a6d10b18308eb60bd92f6e5f1 Mon Sep 17 00:00:00 2001
From: Nolan Koblischke <nolan.koblischke@mail.utoronto.ca>
Date: Wed, 13 May 2026 17:30:23 -0400
Subject: [PATCH 115/124] fix(check-sentence-by-sentence): mark skill as
 user-invoked only

---
 .../lightcone/skills/check-sentence-by-sentence/SKILL.md  | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
index 9a9cd09c..1eeeaba5 100644
--- a/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
+++ b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
@@ -4,11 +4,9 @@ description: >
   Sentence-by-sentence audit of a paper against an ASTRA project's code. For
   every claim about implementation or results in the methodology, results,
   discussion, and appendices, locate the corresponding code (file:line) or
-  mark NOT FOUND. Use when the user says "check reproduction", "verify the
-  paper line by line", or "sentence-by-sentence audit". Run from the project
-  folder containing astra.yaml. In lc-from-paper projects, read paper sources
-  from work/reference/: prefer arXiv TeX under work/reference/source/, fall
-  back to Docling/Pandoc markdown at work/reference/document.md.
+  mark NOT FOUND. Only the user can invoke this skill, though this skill can be suggested for the user to invoke during paper reproduction. Other skills may mention this skill as an optional follow-up, but should not invoke it themselves. Run from the project folder containing astra.yaml. In lc-from-paper projects, read paper sources from
+  work/reference/: prefer arXiv TeX under work/reference/source/, fall back to
+  Docling/Pandoc markdown at work/reference/document.md.
 allowed-tools: Read, Glob, Grep, Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), AskUserQuestion, Agent
 argument-hint: "[path to paper source, e.g. work/reference/source/main.tex or work/reference/document.md]"
 ---

From d2a90dff07d830ea0ec0e5990313544b11beb042 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:32:42 +0200
Subject: [PATCH 116/124] lc-from-paper: drop four more Rigor-vocabulary
 orphans
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Same slip-class as eb81186 / 3a50a19 — 4524774 dropped the
sketch/baseline/tightened/canonical state-machine but a handful of
surfaces preserved the dropped vocabulary because the original commit
didn't touch their files:

- SKILL.md:97 listed "rigor adjustment" as a representative discipline
  a phase reference carries, but the per-phase references lost their
  rigor-adjustment sections in 4524774. Replaced with "evidence shape",
  which actually does cross-cut the references.

- compare.md:93's illustrative example used "sketch-level" /
  "canonical-grade" / "sketchy" as quality descriptors — the dropped
  four-word vocabulary deployed as if it were still defined. Rephrased
  to plain English ("rough" / "tight").

- compare.md:95's empty-opportunities framing said "the reproduction is
  at canonical rigor across the targets" — same vocabulary, inverted
  word order. The surrounding section explicitly tracks
  `relative_to_intent`, so the surviving framing is "reaches the
  fidelity intent."

- templates/constitution.md:23 (the Quality bar section's lede) said
  'What "canonical" rigor looks like for *this* paper' — 4524774
  deleted the constitution template's "Rigor — current state" section
  but missed this echo in the section just above it.

Confirmed by `git show 4524774 -- compare.md` returning empty: that
file was never touched by the rigor-drop commit, so its vocabulary
references were missed wholesale.

Cold sweep across the bundle (`canonical rigor`, `canonical-grade`,
`sketch-level`, `sketchy`, `rigor adjustment`, `"canonical" rigor`,
`review-and-fix`, stale INTERVIEW/ACQUIRE phase names, "you are the X
sub-agent" framing) now returns clean.
---
 claude/lightcone/skills/lc-from-paper/SKILL.md                | 2 +-
 claude/lightcone/skills/lc-from-paper/references/compare.md   | 4 ++--
 .../lightcone/skills/lc-from-paper/templates/constitution.md  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index 24718535..fb84802c 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -94,7 +94,7 @@ Iterations follow the `/ralph` skill's Loop protocol — Survey → Work → Upd
 
 - **Survey starts with the constitution + CLAUDE.md, then the workdir.** Read the constitution for Goal, Fidelity intent, Scope, Quality bar. Skim CLAUDE.md for rules, paper-vs-code disagreements, Open opportunities, and pointers. Then survey the workdir against the **Workdir-as-state** table below to identify the next phase that needs work — and read the most recent artifact critically before extending it.
 - **One phase per iteration is the typical shape.** Don't try to do ARCHITECT *and* SPECIFY in one iteration; the fresh-context property of the next iteration is what makes review work, and conflating phases collapses the seam. (Exceptions: small targeted fixes after COMPARE may touch multiple phases in one iteration if they're tightly coupled.)
-- **Phase reference is your working spec for the iteration.** Whichever phase is next, read its `references/<phase>.md` on entry. That file carries the discipline for that phase's work (what to produce, code-as-canonical, rigor adjustment, etc.).
+- **Phase reference is your working spec for the iteration.** Whichever phase is next, read its `references/<phase>.md` on entry. That file carries the discipline for that phase's work (what to produce, code-as-canonical, evidence shape, etc.).
 - **Read the most recent artifact critically as part of survey.** Every iteration enters fresh and reads the last phase's work cold. If you see real issues, fix them and commit before adding more — that's the review. If nothing needs fixing, advance to the next valuable move. Termination of any phase is implicit: a fresh-context iteration finds nothing to critique in the prior work and moves forward. The iteration that just landed fixes can't also be the iteration that judges the work clean — by construction, it found something to fix.
 - **Parallel fan-out lives inside an iteration.** LITERATURE Haiku quote-finders, SPECIFY per-sub-analysis work, IMPLEMENT per-output work — these fan out as one-level-deep `Agent(...)` spawns inside the iteration's main session. Sub-agents can't spawn sub-agents, but an iteration *is* the main session, so it can spawn freely.
 - **`AskUserQuestion` is not available inside an iteration.** Each iteration runs in a detached tmux session; the user isn't reachable interactively. Iterations append questions to `open-questions.md` with their best-judgment default applied, and the user resolves them at REVIEW close-out (back in their main session).
diff --git a/claude/lightcone/skills/lc-from-paper/references/compare.md b/claude/lightcone/skills/lc-from-paper/references/compare.md
index fc6746cd..2dd64106 100644
--- a/claude/lightcone/skills/lc-from-paper/references/compare.md
+++ b/claude/lightcone/skills/lc-from-paper/references/compare.md
@@ -90,9 +90,9 @@ Each opportunity gets two grades: a **leverage** one-liner (impact if closed) an
 - `at` — closing the gap reaches the intent; further tightening would be gravy.
 - `above` — already past the intent; log it but it doesn't pull on attention.
 
-Read the Goal's fidelity intent prose to make the call. "Figure 3 must be right" + a sketch-level figure 3 systematics = `below`. "Just checking the analysis is tractable" + a canonical-grade outputs block + a sketchy sub-analysis = `above` everywhere except the headline. When intent is silent on something, default to `at` for primary targets, `above` for secondaries.
+Read the Goal's fidelity intent prose to make the call. "Figure 3 must be right" + a rough figure 3 systematics = `below`. "Just checking the analysis is tractable" + a tight outputs block + a rough sub-analysis = `above` everywhere except the headline. When intent is silent on something, default to `at` for primary targets, `above` for secondaries.
 
-Empty `opportunities:` is a strong signal — say "the reproduction is at canonical rigor across the targets" rather than padding.
+Empty `opportunities:` is a strong signal — say "the reproduction reaches the fidelity intent across the targets" rather than padding.
 
 Also write `comparison-report.md` with a human-readable summary. For figure / table comparisons, describe what you see in both and explain your match judgment. Include the opportunity assessment as its own section — group by `relative_to_intent` so the `below` items lead.
 
diff --git a/claude/lightcone/skills/lc-from-paper/templates/constitution.md b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
index b5d1adb7..ef278951 100644
--- a/claude/lightcone/skills/lc-from-paper/templates/constitution.md
+++ b/claude/lightcone/skills/lc-from-paper/templates/constitution.md
@@ -20,7 +20,7 @@ The driving document for the ralph loop reproducing <paper title> (<arXiv ID>, D
 
 ## Quality bar
 
-What "canonical" rigor looks like for *this* paper. The bar that primary-target outputs aim for when the fidelity intent calls for it:
+What the quality bar looks like for *this* paper. The level primary-target outputs aim for when the fidelity intent calls for it:
 
 - <e.g. "BAO fit posteriors match the paper's Figure 4 within 1σ across the full damping prior range">
 - <e.g. "magnitude cuts and selection match the code's defaults exactly; any deviation is recorded as a paper-vs-code disagreement with both options preserved">

From b593fe56ebd8cd676dedd3a10f2c64a1a7f2e9f3 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:42:58 +0200
Subject: [PATCH 117/124] lc-from-code: drop allowed-tools so the agent isn't
 unintentionally limited
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Follows the same call as lc-from-paper (PR #86 review thread on
lc-from-paper line 14): allowed-tools is optional, and pre-declaring
tends to bite when the agent hits a path the author didn't foresee.
Also subsumes Alexandre's "add Bash(uv:*)" suggestion — with the
field gone, uv is available alongside everything else.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-code/SKILL.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-from-code/SKILL.md b/claude/lightcone/skills/lc-from-code/SKILL.md
index a14b5172..b4da467a 100644
--- a/claude/lightcone/skills/lc-from-code/SKILL.md
+++ b/claude/lightcone/skills/lc-from-code/SKILL.md
@@ -1,7 +1,6 @@
 ---
 name: lc-from-code
 description: Bring an existing project into ASTRA / lightcone-cli, starting from the code. Scans the codebase, drafts or augments astra.yaml, parameterizes decisions, and runs until outputs materialize. Triggers on "migrate", "convert", "existing project", "wrap this code", "start from code".
-allowed-tools: Read, Write, Edit, Glob, Grep, Bash(astra:*), Bash(lc:*), Bash(python:*), Bash(pip:*), Bash(git:*), Bash(mkdir:*), Bash(ls:*), Agent, AskUserQuestion
 ---
 
 # /lc-from-code

From de89415e5bfe1d1a6ab57c70de1b63628c554d26 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:43:05 +0200
Subject: [PATCH 118/124] paper-extraction: derive ASTRA_SCHEMA_VERSION from
 installed astra-spec
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The script used to hardcode ASTRA_SCHEMA_VERSION = "0.0.7" with a
"bump when the ASTRA spec version we target changes" comment — easy
to forget. Pull it from importlib.metadata.version("astra-spec")
instead so the stub astra.yaml always stamps the version actually
present in the environment, with a defensive "0.0.0" fallback for
the (unlikely) case astra-spec isn't importable.

Addresses Alexandre's review on extract-paper-substrate.py line 68.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../scripts/extract-paper-substrate.py                | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
index 623dd1db..ce735e13 100755
--- a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
+++ b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
@@ -44,6 +44,7 @@
 import urllib.parse
 import urllib.request
 from difflib import SequenceMatcher
+from importlib.metadata import PackageNotFoundError, version as _pkg_version
 from pathlib import Path
 
 
@@ -65,7 +66,15 @@
     r"autocite|textcite|parencite|footcite|smartcite)\*?"
     r"(?:\[[^\]]*\]){0,2}\{([^}]+)\}"
 )
-ASTRA_SCHEMA_VERSION = "0.0.7"  # bump when the ASTRA spec version we target changes
+# Derived from the installed astra-spec package so the stub `astra.yaml` always
+# stamps the version actually present in the environment — `astra validate` will
+# warn if the analysis declares a version the installed astra-spec can't honour.
+# Falls back to "0.0.0" only if astra-spec isn't importable (defensive — this
+# script ships with lightcone-cli, which depends on astra-spec).
+try:
+    ASTRA_SCHEMA_VERSION = _pkg_version("astra-spec")
+except PackageNotFoundError:
+    ASTRA_SCHEMA_VERSION = "0.0.0"
 
 # Bump when the structural shape of `index.json` changes in a backwards-incompatible
 # way (a new key added is fine; renaming/reshaping an existing value breaks consumers).

From 0433eeaa6ca500c155d4d374666ab3484c687f6c Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:43:12 +0200
Subject: [PATCH 119/124] docs/skills/authoring: stop calling the resync recipe
 "lc update"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`lc update` was removed; docs/cli/update.md is now a tombstone page
that explains the situation and carries the Python heredoc for
resyncing plugin subdirs. authoring.md still linked to it as if
"lc update" were a live command — reword to "Updating an existing
project" so the prose matches reality (link target unchanged).

Addresses Alexandre's review on docs/skills/authoring.md line 96.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/skills/authoring.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/skills/authoring.md b/docs/skills/authoring.md
index c97003b3..5fbfbf72 100644
--- a/docs/skills/authoring.md
+++ b/docs/skills/authoring.md
@@ -93,6 +93,6 @@ from lightcone.eval.cli import run_cmd
 ## Installing changes into an existing project
 
 `lc init` copies the plugin once and refuses to run a second time on
-the same directory. See [`lc update`](../cli/update.md) for the Python
-heredoc that resyncs all the plugin subdirs (`skills`, `agents`,
-`scripts`, `guides`, `templates`) into an existing project.
+the same directory. See [Updating an existing project](../cli/update.md)
+for the Python heredoc that resyncs all the plugin subdirs (`skills`,
+`agents`, `scripts`, `guides`, `templates`) into an existing project.

From 6b754209e90e14def5ccce97ab25920a03853428 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:44:18 +0200
Subject: [PATCH 120/124] paper-extraction: fail loud on missing astra-spec
 instead of falling back

Caught by Cail in review: silent fallback to "0.0.0" hides a real bug
(lightcone-cli depends on astra-spec, so a missing install is a broken
environment, not an edge case). Drop the try/except and let
PackageNotFoundError propagate at import time with its native message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../scripts/extract-paper-substrate.py                | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
index ce735e13..ce2309ee 100755
--- a/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
+++ b/claude/lightcone/skills/paper-extraction/scripts/extract-paper-substrate.py
@@ -44,7 +44,7 @@
 import urllib.parse
 import urllib.request
 from difflib import SequenceMatcher
-from importlib.metadata import PackageNotFoundError, version as _pkg_version
+from importlib.metadata import version as _pkg_version
 from pathlib import Path
 
 
@@ -69,12 +69,9 @@
 # Derived from the installed astra-spec package so the stub `astra.yaml` always
 # stamps the version actually present in the environment — `astra validate` will
 # warn if the analysis declares a version the installed astra-spec can't honour.
-# Falls back to "0.0.0" only if astra-spec isn't importable (defensive — this
-# script ships with lightcone-cli, which depends on astra-spec).
-try:
-    ASTRA_SCHEMA_VERSION = _pkg_version("astra-spec")
-except PackageNotFoundError:
-    ASTRA_SCHEMA_VERSION = "0.0.0"
+# Let PackageNotFoundError propagate: this script ships with lightcone-cli, which
+# depends on astra-spec, so a missing install is a real bug we want loud.
+ASTRA_SCHEMA_VERSION = _pkg_version("astra-spec")
 
 # Bump when the structural shape of `index.json` changes in a backwards-incompatible
 # way (a new key added is fine; renaming/reshaping an existing value breaks consumers).

From 09d1af08ead4a72314df9dc72913caeb37dce351 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:53:07 +0200
Subject: [PATCH 121/124] skills: drop allowed-tools from paper-extraction,
 check-sentence-by-sentence, figure-comparison, lc-feedback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Same principle Cail articulated on the lc-from-paper review thread:
allowed-tools is optional, and pre-declaring tends to bite when the
agent hits a path the author didn't foresee (and can even hallucinate
tool output when blocked). None of these four had safety/security
reasoning behind their allowlist — just conventional intent, which is
already communicated in each skill's prose:

  - paper-extraction: was already Bash + Write unscoped, so the
    allowlist was effectively unrestricted anyway.
  - check-sentence-by-sentence: read-only-ness is stated in the body;
    the narrow Bash list could bite if the agent reaches for diff,
    head, sed, etc.
  - figure-comparison: python:*/base64:* would block magick or other
    image tools the agent might reasonably need.
  - lc-feedback: gh+python+uname is "just enough" by accident — a bug
    report might want Read for CLAUDE.md or Bash(git:*) for branch sha.

lc-new keeps its allowed-tools per Cail's preference (the spec-only
Write restriction encodes intentional separation from implement-time
skills, and it's been working).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/check-sentence-by-sentence/SKILL.md | 1 -
 claude/lightcone/skills/figure-comparison/SKILL.md          | 1 -
 claude/lightcone/skills/lc-feedback/SKILL.md                | 1 -
 claude/lightcone/skills/paper-extraction/SKILL.md           | 1 -
 4 files changed, 4 deletions(-)

diff --git a/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
index 1eeeaba5..06c0211a 100644
--- a/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
+++ b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md
@@ -7,7 +7,6 @@ description: >
   mark NOT FOUND. Only the user can invoke this skill, though this skill can be suggested for the user to invoke during paper reproduction. Other skills may mention this skill as an optional follow-up, but should not invoke it themselves. Run from the project folder containing astra.yaml. In lc-from-paper projects, read paper sources from
   work/reference/: prefer arXiv TeX under work/reference/source/, fall back to
   Docling/Pandoc markdown at work/reference/document.md.
-allowed-tools: Read, Glob, Grep, Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), AskUserQuestion, Agent
 argument-hint: "[path to paper source, e.g. work/reference/source/main.tex or work/reference/document.md]"
 ---
 
diff --git a/claude/lightcone/skills/figure-comparison/SKILL.md b/claude/lightcone/skills/figure-comparison/SKILL.md
index e9638ce2..0c5247ce 100644
--- a/claude/lightcone/skills/figure-comparison/SKILL.md
+++ b/claude/lightcone/skills/figure-comparison/SKILL.md
@@ -11,7 +11,6 @@ description: >
   "compare results", "side-by-side comparison", "build comparison HTML", or
   "did we reproduce the paper". Run from the project folder containing
   astra.yaml.
-allowed-tools: Read, Write, Glob, Grep, Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), Bash(file:*), Bash(python3:*), Bash(python:*), Bash(base64:*), AskUserQuestion, Agent
 argument-hint: "[path to paper reference dir, e.g. work/reference/]"
 ---
 
diff --git a/claude/lightcone/skills/lc-feedback/SKILL.md b/claude/lightcone/skills/lc-feedback/SKILL.md
index 7cf6f09c..23cf7b10 100644
--- a/claude/lightcone/skills/lc-feedback/SKILL.md
+++ b/claude/lightcone/skills/lc-feedback/SKILL.md
@@ -3,7 +3,6 @@ name: lc-feedback
 description: >
   File a bug report from the current session. Use when something breaks:
   /lc-feedback <description of what went wrong>
-allowed-tools: Bash(gh:*), Bash(python:*), Bash(uname:*), AskUserQuestion
 argument-hint: "<what went wrong>"
 ---
 
diff --git a/claude/lightcone/skills/paper-extraction/SKILL.md b/claude/lightcone/skills/paper-extraction/SKILL.md
index 44cef06b..dd4dce32 100644
--- a/claude/lightcone/skills/paper-extraction/SKILL.md
+++ b/claude/lightcone/skills/paper-extraction/SKILL.md
@@ -13,7 +13,6 @@ description: >
   for the semantic surface. Triggers on: "read paper", "prep paper",
   "ingest paper", "extract paper", "set up paper", "fetch arxiv", "arxiv
   id", "DOI", "find paper", or `/paper-extraction <id>`.
-allowed-tools: Read, Write, Edit, Bash, Grep, Glob, WebFetch, WebSearch
 ---
 
 # paper-extraction

From 20c708bf4b9f3673f9af8b7aaea697a702d53ece Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Wed, 13 May 2026 23:54:27 +0200
Subject: [PATCH 122/124] lc-cli: drop lc eval from the agent-facing CLI
 reference
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

lc eval is the internal harness used to measure how well Claude
navigates the lightcone-cli build loop — it spawns Daytona sandboxes,
builds wheels, runs Claude against seed tasks, and grades the outputs.
The 'eval' extra pulls in heavy deps (anthropic, daytona-sdk,
python-dotenv, build) that aren't useful to an analysis-running agent.

Surfacing it in lc-cli (the reference an agent loads while doing real
analysis work) is just noise. The separate "should lc eval ship at
launch at all?" question is tracked in fiber
[[lightcone/paper2astra-as-skill/launch-cluster/eval-ship-decision]]
and lightcone-cli#131 — that's about whether to retire eval entirely,
which is orthogonal: even if eval stays for internal/CI use, it
doesn't belong in the agent CLI reference.

Addresses Alexandre's review on lc-cli/SKILL.md line 28.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-cli/SKILL.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/claude/lightcone/skills/lc-cli/SKILL.md b/claude/lightcone/skills/lc-cli/SKILL.md
index 6192ea2a..66e9b643 100644
--- a/claude/lightcone/skills/lc-cli/SKILL.md
+++ b/claude/lightcone/skills/lc-cli/SKILL.md
@@ -25,7 +25,6 @@ lc build [--force] [--runtime docker]                             # Build contai
 lc status [--universe NAME] [--json]                              # Materialization status (text or JSON)
 lc verify [--universe NAME]                                       # Recompute hashes and walk the provenance chain
 lc export wrroc [--output PATH] [--universe NAME] [--zip] [--metadata-only] [--author "NAME <EMAIL>"]  # Export Workflow Run RO-Crate bundle
-lc eval {run,report,compare}                                      # Run/inspect eval suites (requires the 'eval' extra)
 ```
 
 `lc run` is quiet by default — pass `--verbose` to see worker output. `--scratch` is only relevant on HPC sites where `$HOME` doesn't honor `flock` (NERSC etc.); it redirects Snakemake state and Dask spill onto the named filesystem.

From 01cb01409e7665d855fbfd104cd5c3bb06742065 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Thu, 14 May 2026 00:01:32 +0200
Subject: [PATCH 123/124] docs/glossary: refresh stale Ralph loop entry
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

/cancel-ralph and .claude/ralph-loop.local.md are both stale — neither
exists in the codebase anymore. The current mechanics are: tmux runner
+ constitution-as-system-prompt, cancel by setting status: closed in
the constitution frontmatter (next iteration sees it and exits) or by
killing the tmux session.

Addresses Alexandre's review on docs/user/glossary.md line 200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/user/glossary.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/docs/user/glossary.md b/docs/user/glossary.md
index 4515139f..529bfd1d 100644
--- a/docs/user/glossary.md
+++ b/docs/user/glossary.md
@@ -194,10 +194,12 @@ The three labels `lc verify` produces when something's wrong:
 
 A reusable autonomous iteration pattern for long-running agent work.
 Each iteration surveys state, decides what to do next, writes or runs
-code, commits, and exits. The Claude Code stop hook can re-inject the
-loop prompt until the agent emits its completion signal or hits an
-iteration limit. State persists across crashes in
-`.claude/ralph-loop.local.md`. Cancel with `/cancel-ralph`.
+code, commits, and exits. A bundled tmux runner spawns a fresh worker
+per iteration with the *constitution* — a markdown file describing what
+"done" looks like — as system prompt; the constitution stays editable
+across iterations. Stop the loop by setting `status: closed` in the
+constitution's frontmatter (the next iteration sees it and exits) or by
+killing the tmux session.
 
 ## Permission tier
 

From 811280dbbcb17f4366cbc713ed87a2289ddc3c88 Mon Sep 17 00:00:00 2001
From: Cail Daley <cailmdaley@gmail.com>
Date: Thu, 14 May 2026 00:11:33 +0200
Subject: [PATCH 124/124] lc-from-paper: mandate an explicit user-approval gate
 before launching ralph
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Caught during Cail's dogfood test: after constitution.md + CLAUDE.md
were drafted, the loop launched immediately with no opportunity to
review. The old Stage-7 prose ("Show drafts, take corrections,
refine, save. When the user approves: ...") didn't *mandate* a halt;
it read as a one-shot announcement followed by launch.

This is the only chance the user gets to shape the reproduction
before it goes autonomous — every subsequent iteration consumes the
constitution as system prompt and CLAUDE.md as walk-up. Missing the
gate means the user reviews the entire reproduction post-hoc at
REVIEW, with no editorial pass at the framing-and-scope level.

Three reinforcements:
  - SKILL.md Stage 7 stage description: "Halt for explicit user
    approval, then commit, then launch. ... silence is not approval."
  - Prose under the deliverables block: gate on AskUserQuestion and
    surface any open questions of your own at the same exchange
    (questions held back are much harder to raise once iterations
    are running cold).
  - New anti-pattern at the top of the list: "Auto-launching the
    ralph loop without an explicit user-approval gate."

orient.md Stage 7 expanded to mirror, with concrete AskUserQuestion
option suggestions ("Looks good - commit and launch", "I want to edit
first", "I have feedback").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 claude/lightcone/skills/lc-from-paper/SKILL.md  |  7 ++++---
 .../skills/lc-from-paper/references/orient.md   | 17 ++++++++++++-----
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/claude/lightcone/skills/lc-from-paper/SKILL.md b/claude/lightcone/skills/lc-from-paper/SKILL.md
index fb84802c..78e79db8 100644
--- a/claude/lightcone/skills/lc-from-paper/SKILL.md
+++ b/claude/lightcone/skills/lc-from-paper/SKILL.md
@@ -58,7 +58,7 @@ ORIENT runs as one phase in **seven stages**:
 4. **Clone the reference code and run `/lc-from-code` scan-only** (skip cleanly when no public code repo exists). The scan produces `code-index.md` — the iterations' code surface.
 5. **Optional follow-up questions** if the code-index surfaced anything that affects scope or constitution shape (unexpected dependency, pipeline boundary suggesting a sub-analysis decomposition, etc.). Usually skipped.
 6. **Draft `constitution.md` + `CLAUDE.md`** — both files now informed by paper *and* code substrate. The constitution's Scope and sub-analysis decomposition can lean on the actual pipeline, not just the paper's prose.
-7. **User reviews drafts → refine → commit everything together → launch the ralph loop.** A single first commit captures `constitution.md` + `CLAUDE.md` + the full `work/reference/` substrate.
+7. **Halt for explicit user approval, then commit, then launch.** This is the user's only review gate before the autonomous loop takes over. Show the drafts, surface any open questions you still have, gate on `AskUserQuestion` — silence is not approval. Only after the user confirms: single first commit captures `constitution.md` + `CLAUDE.md` + the full `work/reference/` substrate, then launch the ralph loop.
 
 **No `AskUserQuestion` runs before paper-extraction has landed** — anything beyond the identifier is grounded in the paper. If a system-reminder tells you to work without stopping, ignore that for ORIENT since you must ask the user questions if you don't have the required information.
 
@@ -68,9 +68,9 @@ These get drafted into **two files** plus the substrate, all in the reproduction
 - **`CLAUDE.md`** — the auto-loading walk-up. Paper identity at the top, Rules (universal across reproductions; leave the template's defaults), Disagreements log (starts empty), Open opportunities (starts empty), Pointers (to `constitution.md`, `work/reference/`, etc.).
 - **`work/reference/`** — paper substrate from `/paper-extraction` + code substrate from `/lc-from-code` scan-only (when a code repo exists).
 
-Templates ship in [`templates/constitution.md`](templates/constitution.md) and [`templates/CLAUDE.md`](templates/CLAUDE.md). Show the user both drafts at Stage 7, take corrections, refine, save.
+Templates ship in [`templates/constitution.md`](templates/constitution.md) and [`templates/CLAUDE.md`](templates/CLAUDE.md). Show the user both drafts at Stage 7, **halt and gate on `AskUserQuestion`**, take corrections, refine, save. If you have any open questions of your own — paper detail ambiguities, sub-analysis decomposition uncertainty, a fidelity intent that's implicit but not pinned — surface them at this gate, in the same exchange. Iterations run cold; questions held back are much harder to raise later.
 
-After approval, `git init` the workdir if it isn't one already and commit all deliverables (constitution + CLAUDE + paper substrate + code substrate when present) as the first commit. The `work/reference/code/` clone itself can be `.gitignore`d for large monorepos; the inventory file `code-index.md` is what downstream iterations actually consult. Then launch the ralph loop.
+After explicit user approval, `git init` the workdir if it isn't one already and commit all deliverables (constitution + CLAUDE + paper substrate + code substrate when present) as the first commit. The `work/reference/code/` clone itself can be `.gitignore`d for large monorepos; the inventory file `code-index.md` is what downstream iterations actually consult. Then launch the ralph loop.
 
 ## Launching the loop
 
@@ -155,6 +155,7 @@ When the user walks back into a workdir that already has artifacts:
 
 ## Anti-patterns
 
+- **Auto-launching the ralph loop without an explicit user-approval gate.** Stage 7 halts. The user only sees the constitution + CLAUDE.md once before they go into a fresh iteration's system prompt; "drafts written → launch" skips the one editorial pass that gets to shape the entire reproduction. Gate on `AskUserQuestion`; treat silence as not-yet-approved.
 - **Spawning a "loop manager" sub-agent inside your main session.** The whole point of the ralph loop is fresh per-iteration context; you launch the loop, the loop runs detached, you come back when it's done. No nested orchestrator.
 - **Doing the long middle in your main session instead of launching the loop.** ORIENT belongs in your session; ARCHITECT through COMPARE belong in the loop. Doing phase work in your main session burns context that doesn't get reset; the loop exists precisely to give each phase fresh context.
 - **Asking an iteration to use `AskUserQuestion`.** Iterations run detached. Surface questions to `open-questions.md` with a default applied; the user resolves at REVIEW.
diff --git a/claude/lightcone/skills/lc-from-paper/references/orient.md b/claude/lightcone/skills/lc-from-paper/references/orient.md
index 9b4edb5d..0c6db49c 100644
--- a/claude/lightcone/skills/lc-from-paper/references/orient.md
+++ b/claude/lightcone/skills/lc-from-paper/references/orient.md
@@ -180,12 +180,19 @@ Open both templates side-by-side:
 
 ### Stage 7 — User review, refine, commit, launch
 
-Show both drafts to the user, take corrections, refine, save. When the user approves:
+**Halt here for explicit user approval.** This is the user's only review point before the autonomous loop takes over; treat it as the final author-mode editorial pass. Do not commit or launch the ralph loop until the user explicitly confirms — silence is not approval.
 
-1. `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline).
-2. Commit `constitution.md` + `CLAUDE.md` + the full `work/reference/` substrate (paper + code, when code present) as the first commit. A single commit captures the full ORIENT deliverable.
-3. The `work/reference/code/` clone itself can be `.gitignore`d for large monorepos; `code-index.md` is what downstream iterations actually consult. The clone is reproducible from `code-status.yaml`'s URL.
-4. Launch the ralph loop per SKILL.md's *Launching the loop* section.
+1. **Show the drafts.** Point the user at `constitution.md` and `CLAUDE.md` (file paths plus a brief inline summary of what each carries — Goal / Fidelity intent / Scope / Quality bar / Evidence for the constitution; paper header + Pointers for the CLAUDE.md). The user reads the actual files; don't paste the full bodies inline.
+
+2. **Surface any open questions you have at this gate.** If a paper detail is ambiguous, a scope choice didn't fully resolve in Stages 3–5, a sub-analysis decomposition is uncertain, or a fidelity intent is implicit but not pinned — ask now, in this same exchange, *before* the loop launches. Each ralph iteration runs cold from `constitution.md` + `CLAUDE.md`; an open question held back here is much harder to raise later.
+
+3. **Gate on `AskUserQuestion`.** Offer options like "Looks good — commit and launch", "I want to edit first" (point them at the file paths), "I have feedback" (collect, refine, re-show, gate again). The launch decision waits on this answer.
+
+4. **When the user approves:**
+   - `git init` the workdir if it isn't one already (per SKILL.md's *Setup: git-tracked workdir* discipline).
+   - Commit `constitution.md` + `CLAUDE.md` + the full `work/reference/` substrate (paper + code, when code present) as the first commit. A single commit captures the full ORIENT deliverable.
+   - The `work/reference/code/` clone itself can be `.gitignore`d for large monorepos; `code-index.md` is what downstream iterations actually consult. The clone is reproducible from `code-status.yaml`'s URL.
+   - Launch the ralph loop per SKILL.md's *Launching the loop* section.
 
 Tell the user the tmux session name and the attach command, and that you'll be ready for REVIEW close-out when the loop terminates.