From b0b9fb037f1365488ecee86419c086c0113e885c Mon Sep 17 00:00:00 2001 From: Cail Daley Date: Thu, 30 Apr 2026 09:48:14 +0200 Subject: [PATCH 001/124] Add /narrative skill (ported from lightcone-ui#10) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Authors or revises the `narrative:` prose inside an ASTRA analysis (`astra.yaml` and its sub-analyses) plus decision `rationale:` fields. Five fixed keys at each scale (`summary`, `findings`, `methods`, `inputs`, `outputs`) with three working modes: - paper-reproduction (production-ready) - existing-analysis retrofit (under development) - interactive in-flight authoring (under development) Originally landed on lightcone-ui#10; relocated to lightcone-cli per Liam's request — skills live alongside `lc-new`, `lc-build`, `lc-verify`, `lc-migrate`, `lc-feedback` in `claude/lightcone/skills/`. Self-contained: all references are intra-skill (`references/*.md`); no cross-skill or guides imports needed. Co-Authored-By: Claude Opus 4.7 (1M context) Signed-off-by: Cail Daley --- claude/lightcone/skills/narrative/SKILL.md | 399 ++++++++++++++++++ .../narrative/references/existing-analysis.md | 172 ++++++++ .../narrative/references/interactive.md | 184 ++++++++ .../references/paper-reproduction.md | 221 ++++++++++ 4 files changed, 976 insertions(+) create mode 100644 claude/lightcone/skills/narrative/SKILL.md create mode 100644 claude/lightcone/skills/narrative/references/existing-analysis.md create mode 100644 claude/lightcone/skills/narrative/references/interactive.md create mode 100644 claude/lightcone/skills/narrative/references/paper-reproduction.md diff --git a/claude/lightcone/skills/narrative/SKILL.md b/claude/lightcone/skills/narrative/SKILL.md new file mode 100644 index 00000000..e08d3f1b --- /dev/null +++ b/claude/lightcone/skills/narrative/SKILL.md @@ -0,0 +1,399 @@ +--- +name: narrative +description: > + Author or revise the `narrative:` prose inside an ASTRA analysis + (`astra.yaml` and its sub-analyses) plus decision `rationale:` fields. + Five fixed keys at each scale (`summary`, `findings`, `methods`, + `inputs`, `outputs`). Three working modes — paper reproduction + (ready), existing-analysis retrofit (under development), and + interactive in-flight authoring (under development). Use when the + `narrative:` block is empty or stub, when a decision needs a + `rationale:`, when a sub-analysis needs its own narrative, or when + revising existing prose. Triggers on "narrative", "draft the + narrative", "narrate this analysis", "narrate this sub-analysis", + "rationale for this decision", "write the summary", or any request + for reader-facing prose keyed off an astra.yaml. +--- + +# narrative + +## What this skill writes + +One field: `narrative:` on an analysis or sub-analysis, or `rationale:` on a decision. +Per-element prose (what each `Input`, `Output`, `Decision`, `Option`, or `Insight` is and why it matters) lives on those elements' own `description` / `rationale` / `notes` fields. +`narrative` is the analysis-level story that weaves the pieces together. + +## What a narrative is + +Science, from a single decision to a review paper, is a practice of +engaging with previous work and telling the story of what was tried +and what it means. Any honest account does three things. + +**Grounding.** Where the work sits — state of the field, open +questions, prior work it responds to, upstream decisions that shape +its choices. Tells the reader why before the work shows its own +value. May foreshadow findings. + +**Movement of learning.** Not the tidied retrospective ("we did X, +obtained Y") but traces of the process: what was tried, what failed, +what forced a step back. The best papers convey this; most compress +it away under length pressure. ASTRA's telescoping makes it cheap — +a sentence at the top about global-vs-per-object PSF leakage, one +level down where the nerd gets the two pages on how the team got +there. Papers don't have this affordance and so compress iteration +away; ASTRA does, and authors should spend it. + +**Implications.** What the results mean and where they point. +Results are facts; what they do to the field is the argument. +Forward-look matters even when unformed — that is where science +passes the baton. + +A narrative that does all three at the appropriate scale is honest. +One that presents only results and methods elides the meaning-making. + +The three phases repeat at every scale. A top-level analysis +narrates them across five keys (`summary`, `methods`, `findings`, +`inputs`, `outputs`); a sub-analysis does the same; a decision +narrates in one paragraph of `rationale:`. The telescope gives the +reader a short view at their current depth and the option to drill +in — without exploding the parent. + +## Length as forcing function + +1–3 paragraphs per key, at any level. + +Length is the mechanism that keeps analyses modular, not a style +preference. If the references don't fit in three paragraphs, the +analysis is too big — split it. The narrative is a compressor; if +it won't compress, split the thing being compressed. + +## What this prose is for + +ASTRA preserves the decision structure that papers compress into +linear argument; the narrative keeps that structure legible. Three +consequences: + +- **Not wiki, not paper.** A wiki page summarizes ("BAO is the + baryon acoustic oscillation feature"); a paper compresses ("we + chose the Gaussian prior"). An ASTRA narrative **points into + reasoning** — it names the load-bearing decision, anchors to the + structured node that records it, and lets the reader follow. The + prose does not re-explain the field or re-list the spec. +- **Read and queried.** The narrative is consumed by human readers + *and* by agent retrievers. Anchor coverage and clarity are + substrate, not style — an uncited decision is invisible to both + readings. +- **Asymmetric load.** The three phases don't map onto ASTRA's + structure evenly. Movement-of-learning has strong structural + support — `decisions`, `options`, `prior_insights`, the + sub-analysis DAG — and `methods` condenses what structure already + carries. Grounding has partial support at the decision site; + implications have none. On those two phases, the narrative is the + reader's only access — carry just enough, and err toward brevity + and certainty. + +## Pick a mode first + +**Paper reproduction is production-ready. Retrofit and interactive +are under active development — their references are working drafts.** + +Three modes. Read the matching reference file in full before drafting. + +| Mode | Reference | Status | When | +|---|---|---|---| +| **Paper reproduction** | [`references/paper-reproduction.md`](references/paper-reproduction.md) | **Ready.** | A published paper exists and the analysis mirrors it. Primarily in-house Lightcone work (DESI BAO and similar) plus end users bringing a paper to reproduce. Covers paper sourcing (arXiv LaTeX preferred), paper→ASTRA mapping, voice seams, fidelity rules. | +| **Existing-analysis retrofit** | [`references/existing-analysis.md`](references/existing-analysis.md) | Under development. | Code, results, or an in-flight project being imported into ASTRA with no source paper. Archaeological work: triage, reconstruction of intent, gaps where the record is silent. | +| **Interactive (in-flight research)** | [`references/interactive.md`](references/interactive.md) | Under development. | New research being done now; the narrative drafted alongside the work. Provisional voice, ask-first discipline. | + +If unsure which applies, confirm with the user via `AskUserQuestion`. + +The rest of this file is the **mode-independent substrate** every +reference relies on. + +--- + +## Narrate what you declare + +The five keys are schema-optional, but `astra validate` applies a +**conditional requirement** — a section must hold non-empty prose +when the corresponding structured data exists on the Analysis node. + +| Key | Required when | +|---|---| +| `findings` | `Analysis.findings` has entries | +| `methods` | `Analysis.decisions` or `Analysis.analyses` has entries | +| `inputs` | `Analysis.inputs` has entries | +| `outputs` | `Analysis.outputs` has entries | +| `summary` | always optional (no structured counterpart) | + +Three consequences worth internalizing: + +- **A stub analysis with only `summary` is valid.** Use that for + stage-zero scoping. +- **Don't write a `findings` key before findings are declared.** If the + spec's `findings:` list is empty, the narrative's `findings` key + should not appear — adding prose about findings that don't exist is + fiction. +- **`summary` is the one key without a structural peer.** It's the + "question, scope, orientation" key — the only place prose stands + alone, not framing something structural. + +--- + +## The spec renders alongside the narrative + +ASTRA's structural content — decisions, findings, inputs, outputs, +sub-analyses, options — surfaces alongside the narrative. Structural +peers will be presented; **prose does not duplicate them.** An +abstract does not list every methods subsection; a methods section +does not re-state every appendix equation. Prose assumes its +structural peers exist and focuses on argument. + +Applied to the five keys: + +- `summary` **orients** — question, scope, headline shape. +- `methods` **walks the pipeline**, citing each decision and + sub-analysis by anchor where they appear. Movement-of-learning + lives here. +- `findings` **synthesizes** — each finding cited by anchor as part of + the argument, not an enumeration. +- `inputs` **names provenance**. +- `outputs` **names what was promoted and why**, citing each by anchor. +- Decision `rationale:` **names why the default won**. + +--- + +## Anchor coverage + +`astra validate` checks: + +- **Broken references** → error. Anchor doesn't resolve to a real id. +- **Uncited declared elements** → warning. Every declared finding, + decision, output, and sub-analysis must be cited somewhere in the + narrative tree. + +If a declared element is genuinely not worth a prose mention, consider +whether it should be declared at all. + +--- + +## User presence + +Multi-turn back-and-forth → user present; use `AskUserQuestion` to +clarify mode, scale, and reproduction-vs-extension before drafting. +Single-shot or pipeline invocation → autonomous; make the reasonable +default inference and note it inline on the narrative. Ambiguous → +err on present and ask. + +--- + +## Phase → key mapping + +The three phases (see top) map onto the five keys unevenly: + +| Key | Dominant phase | +|---|---| +| `summary` | all three, telescoped | +| `findings` | implications | +| `inputs` | grounding | +| `methods` | movement of learning | +| `outputs` | structural; phase-thin | + +There is no `discussion` key. Implications distribute into `summary` +and `findings`. + +--- + +## Anchor syntax + +Markdown link syntax, `#`-target, **tree-path-first**. + +| Target | Anchor | +|---|---| +| Input | `#inputs.` | +| Output | `#outputs.` | +| Decision | `#decisions.` | +| Option within a decision | `#decisions..options.` | +| Finding | `#findings.` | +| Prior insight | `#prior_insights.` | +| Sub-analysis (whole node) | `#analyses.` | +| Element inside sub-analysis | `#..` (e.g. `#reconstruction.decisions.algorithm`) | +| Parent scope (from a sub-analysis) | `#../decisions.` | + +Note the sub-analysis form: **sub-analysis first, then category**. +`#reconstruction.decisions.algorithm`, not `#decisions.reconstruction.algorithm`. +References are interpreted **relative to the hosting analysis**; use +`../` to escape to parent scope (matches decision `from_ref` syntax). + +Rules: + +- Anchor text is authored prose, **not** the raw id. +- Inline refs do the work of a citation; don't footnote or parenthesize. +- One ref per idea. Stacking three on a sentence means the sentence + carries too much. +- Findings cannot currently appear in `decisions.options.insights` + (see [astra-spec#16](https://github.com/LightconeResearch/astra-spec/issues/16)). + When a finding motivates a decision, cite it from the decision's + `rationale:` prose. + +--- + +## Reserved entity names + +These names cannot be used as entity IDs (they collide with the +anchor grammar): `inputs`, `outputs`, `decisions`, `findings`, +`prior_insights`, `analyses`, `options`, `content`, `narrative`. + +If you find an entity using one (legacy spec), flag it; the authoring +tooling and validator will reject it. + +--- + +## Linking relationships — structural vs narrative + +| Relationship | Structural | Narrative | +|---|---|---| +| Prior insight → decision option | `decisions..options..insights: [ids]` | inline in `methods` when the decision is discussed | +| Finding → output | `findings..evidence` → `outputs.` | inline in `findings` | +| Finding → decision | *no structural link yet* (#16) | inline in decision's `rationale:` | +| Decision → decision | `decisions..from: ` or `from: ../decisions.` | inline in the inheriting decision's `rationale:` | + +If a relationship is structural, don't duplicate it in prose — cite +it by anchor. + +--- + +## Self-contained example + +A minimal (not necessarily valid) sketch showing how the blocks fit +together. The point is the *shape*. + +```yaml +id: example_analysis +version: "0.1.0" +name: "Example analysis" + +narrative: + summary: | + We measure in . The feature is + [detected at high significance](#findings.headline_detection) and + [exceeds prior precision by 1.2×](#findings.precision_improvement), + with [an anomalous feature at ](#findings.anomaly) + motivating follow-up. + + inputs: | + Primary data are [the ](#inputs.primary_data); validation + uses [](#inputs.validation_mocks). + + methods: | + The pipeline runs in two stages. + [Preparation](#analyses.preparation) ingests the raw catalog and + produces [cleaned two-point statistics + ](#preparation.outputs.clean_stats). [Fitting + ](#analyses.fitting) consumes those statistics and fits model + parameters. Both stages inherit the parent's + [fiducial cosmology](#decisions.fiducial_cosmology) so the + distance-redshift relation is used end-to-end. + + findings: | + Three findings constitute the result: a + [headline detection](#findings.headline_detection), a + [precision comparison with prior work + ](#findings.precision_improvement), and + [an anomalous feature](#findings.anomaly). The anomaly is the + most-discussed qualitative feature. + + outputs: | + Two artifacts are promoted to the top level: + [the final measurement table](#outputs.final_table) and + [the headline figure](#outputs.headline_figure), both produced by + [fitting](#analyses.fitting). + +decisions: + fiducial_cosmology: + label: "Fiducial cosmology" + rationale: | + Planck 2018-ΛCDM is the community reference; distance-redshift + conversion is downstream of this choice, and fixing it lets + results be compared directly to prior measurements. Inherited + by [fitting](#analyses.fitting) so the end-to-end chain uses one + distance scale. + default: planck2018 + options: + planck2018: + label: "Planck 2018-ΛCDM" + wmap9: + label: "WMAP9" + excluded_reason: "Superseded; no longer the community reference." +``` + +What to notice: + +- Anchor text is prose, not an id. +- `methods` uses the sub-analysis-first form + (`#preparation.outputs.clean_stats`) for cross-scope refs. +- `findings` synthesizes how three findings relate; each cited by + anchor, not recited. +- `outputs` is thin — two sentences. +- Decision rationale cites a sub-analysis by anchor when the choice + propagates, and says why the default won without enumerating options. + +For a canonical reproduction narrative in context, see +`Reproductions/DESI/desi-dr1-bao/astra.yaml` in +`LightconeResearch/Reproductions`. + +--- + +## Craft + +- **Economy.** Every sentence introduces a new idea or sharpens an + existing one. Release real verbs: `conducted cross-correlation` → + `cross-correlated`. +- **Epistemic honesty.** Hedges carry information about certainty. + "This suggests" reflects real uncertainty; "may perhaps indicate" is + decorative. +- **Show, don't label.** Describe the tension; don't announce it. Cut + signposting: "the key insight is," "importantly," "it is worth + noting." +- **Specificity.** Names, numbers, references over generic claims. +- **Arrive through content.** No "in this analysis we will describe…"; + the content is the opening. + +--- + +## Anti-patterns (mode-independent) + +- **Narrative-per-element.** Writing `narrative:` on findings, inputs, + outputs, or insights. The five-key analysis narrative is the only + home; per-element prose is `description` / `rationale` / `notes`. +- **Results-only narrative.** Methods without movement-of-learning + elides the meaning-making. At minimum, name one pivot or abandoned + option per scale. +- **Decision-list paragraph.** "We made the following decisions: A, + B, C." Cite each decision where it shapes the pipeline, not as + recitation. Too many to weave coherently → the spec wants more + sub-analyses. +- **Wiki-style what-is framing.** "BAO is the baryon acoustic + oscillation feature." A wiki summarizes; an ASTRA narrative points + into reasoning. Replace with "we chose the Gaussian BAO damping + prior over flat because flat admitted spurious minima" — with the + anchor. Applies to every key. +- **`summary` as primer.** Teaching what the field is. Readers arrive + with context. + +--- + +## Lint + +1. `astra validate ` — catches broken anchors, schema + violations, uncited declared elements. +2. Paragraph count per key — flag anything over three. +3. Only conditionally-required keys present — if `findings:` is + empty, `narrative.findings` is absent. + +--- + +## Now read the mode reference + +Before drafting, open the reference file that matches the user's +situation. diff --git a/claude/lightcone/skills/narrative/references/existing-analysis.md b/claude/lightcone/skills/narrative/references/existing-analysis.md new file mode 100644 index 00000000..765f525c --- /dev/null +++ b/claude/lightcone/skills/narrative/references/existing-analysis.md @@ -0,0 +1,172 @@ +# Existing-analysis retrofit mode + +> **Status: under development.** This mode is scaffolded but not yet +> production-ready. The workflow below is a working draft — treat it +> as a starting point, not a locked spec. For the production-ready +> path, use paper reproduction mode if applicable. Report friction +> back so this reference can firm up. + +A project has been running — with code, results, a working directory, +possibly a partial spec — and is being imported into ASTRA. There is +no published paper; the narrative is being built from artifacts, not +reconstructed from prose. + +Read the main SKILL.md first. This file adds what's specific to +retrofit. + +Retrofit is distinct from paper reproduction (there is no source +narrative to reconstruct) and from interactive authoring (the work is +already done, or at least substantially done, rather than in flight). +The core move is **archaeology**: classifying what's live, harvesting +intent from whatever artifacts carry it, marking gaps where the record +is silent. + +## Workflow + +### 1 · Triage + +Before writing a single sentence, classify the project's contents. + +Go through `astra.yaml` and each sub-analysis and mark: + +- **live** — current, active, still used downstream +- **superseded** — kept in the spec for record, but no longer what's + actually run +- **abandoned** — tried and dropped; may or may not belong in the + narrative as movement-of-learning +- **unclear** — decision or finding with no documentation; the + original rationale is not recoverable from the spec alone + +Produce this as a short summary and surface via `AskUserQuestion`. +Confirm with the user: + +- What stays, what is explicitly deprecated, what is abandoned. +- Whether abandoned options should appear as movement-of-learning + (sometimes yes: "we initially tried X, which gave Y; switched to Z" + is honest). Sometimes no: trivial or confidential choices don't + belong. +- Which `unclear` items the user can reconstruct, vs. which are + genuinely lost. + +The narrative only speaks for live content unless the user explicitly +wants a history section. + +### 2 · Harvest + +The project's substrate substitutes for a paper's narrative. Mine +these, in roughly decreasing order of value: + +- **`README.md`, `CLAUDE.md`, `NOTES.md`, `TODO.md`** at project root. + Often contain the clearest statement of intent. +- **`.felt/`** or a fibers directory. The author's active thinking, + decisions with rationale, meeting notes, open questions. +- **Notebook markdown cells.** Often the narrative the author wrote + for themselves. +- **Code comments** at function-level decision points. "We drop + rows where X < 0.1 because …" is a rationale waiting to be lifted. +- **Commit messages** at milestone commits. `git log --grep` for + keywords like "decided," "switched," "abandoned," "fix" can surface + turning points. +- **Meeting notes, old proposals, grant text.** Grant paragraphs are + often where motivation lives in its cleanest form. +- **Open issues and closed PRs.** Rejected options often have a PR + describing what was tried. + +Make a list of candidate motivation, methodology, and findings text +before starting to draft. Where possible, anchor each harvested piece +to its source so rationales can be traced. + +### 3 · Fill the gaps + +For each `unclear` decision, try in order: + +1. **Ask the user.** `AskUserQuestion` with the decision and its + options, asking for a one-sentence rationale. +2. **If the user doesn't know**, write a fair description of what was + chosen and mark it as reconstructed. Example: + ```yaml + rationale: >- + _(Reconstructed 2026-04: original rationale not recorded. Current + reading is that option X was chosen because Y, based on the + downstream code's assumptions about Z.)_ + ... + ``` +3. **If the rationale is actually lost**, name that. A narrative that + admits "the reasoning for this cut was not recorded and cannot be + reconstructed" is honest; one that fabricates a plausible-sounding + justification is not. + +Do the same for findings without evidence, inputs without provenance, +and outputs without a clear source sub-analysis. + +### 4 · Draft order + +Same as reproduction: inputs → methods → findings → outputs → +summary. Retrofit is stable enough for compression-last to +work. Unlike interactive authoring, you're narrating after the fact. + +### 5 · Voice + +- **Past tense for what happened**; present tense only for the living + structure ("the pipeline runs three stages"). +- **Don't impose a narrative of inevitability.** If the project tried + Option A for six months, abandoned it, and switched to B, say so. + The iteration is the substance of movement-of-learning — retrofit is + where that content has to come from the archaeology, not from a + researcher narrating live. +- **Mark reconstructions.** `_(Reconstructed)_` or a brief prose note + when the authoring draws on harvested material whose original author + is absent. + +### 6 · Critique + +In addition to SKILL.md's three-phase and craft audits: + +**Triage audit.** + +- Does the narrative speak only for live content, unless a deliberate + history section is included? +- Are deprecated / abandoned elements explicitly named as such, or do + they appear as if current? + +**Harvest audit.** + +- Does every load-bearing claim in the narrative trace to a project + artifact (commit, notebook cell, fiber, code comment, meeting note) + — or to the user's confirmation? +- Are gaps named rather than fabricated? + +## Anti-patterns (retrofit-specific) + +- **Fabricated rationales.** Writing a plausible-sounding justification + for a decision whose actual rationale was "someone chose this and + nobody remembers." Mark the reconstruction, or say the reasoning is + lost. +- **Smoothing over abandoned work.** If the project pivoted mid-way, + retrofit is exactly the place where that iteration belongs. Don't + write a narrative of smooth progress that contradicts the git log. +- **Narrating around gaps.** A sub-analysis with no findings doesn't + need filler prose explaining what it didn't find; the narrative + should say the finding work is not yet done (or was never done). +- **Missing the archaeology step.** Jumping straight to drafting + without triage and harvest produces a narrative in the author's + voice about work they didn't do. The result sounds invented because + it is. +- **Treating CLAUDE.md like a paper.** Harvest from it; don't import + its style. `CLAUDE.md` is agent-facing; the narrative is + reader-facing. + +## When retrofit becomes reproduction + +If, during retrofit, it becomes clear that the project is actually +reproducing an unacknowledged paper (code based on a published +analysis, derived from another group's method), switch to paper +reproduction mode for the parts that map. Hybrid is fine: reproduce +what's published; retrofit what's novel or local. + +## When retrofit becomes interactive + +If the retrofit surfaces that core decisions are still open and the +user wants to revisit them now, the narrative isn't yet stable. Flag +to the user and switch to interactive mode for those sections — +provisional voice, revisit after decisions land. diff --git a/claude/lightcone/skills/narrative/references/interactive.md b/claude/lightcone/skills/narrative/references/interactive.md new file mode 100644 index 00000000..e0861db2 --- /dev/null +++ b/claude/lightcone/skills/narrative/references/interactive.md @@ -0,0 +1,184 @@ +# Interactive mode — in-flight new research + +> **Status: under development.** This mode is scaffolded but not yet +> production-ready. The workflow below is a working draft — treat it +> as a starting point, not a locked spec. For the production-ready +> path, use paper reproduction mode if applicable. Report friction +> back so this reference can firm up. + +Research is being done now. A narrative is being drafted alongside the +work, not reconstructed from a paper or archaeological sources. The +narrative is expected to change as results land. + +Read the main SKILL.md first. This file adds what's specific to +interactive. + +Interactive differs from reproduction (no source paper to reconstruct +from — the narrative is the researcher's own) and from retrofit (the +work is still happening, not finished — you are authoring live, with +the researcher in the loop). + +The core discipline is **provisional voice**: the narrative makes its +own incompleteness visible, so a reader can tell at a glance what's +settled and what's pending. + +## Workflow + +### 1 · Orient + +1. `astra.yaml` and each sub-analysis — whole files. Note where + `findings` are stub-level, where decisions are unresolved, where + outputs don't exist yet. +2. Any project `CLAUDE.md` / working notes. +3. Active fibers at `.felt/` (if present). Fibers are the best + substrate in interactive mode — they carry the researcher's live + thinking, recent pivots, open questions. Read the relevant + top-level fiber and anything it wikilinks. +4. Existing narrative, if any. Revision preserves what lands. + +### 2 · Ask first, draft second + +Interactive mode is not archaeology. The researcher is available. +Don't guess at motivation or the headline finding — ask. Use +`AskUserQuestion` to batch: + +- **Research question.** What are we trying to learn? One sentence. +- **Current headline finding.** What, if anything, has been + established so far? One sentence. +- **Movement so far.** What has already happened in the work that + belongs in movement-of-learning? (Pivots, abandoned options, things + that surprised the researcher.) +- **Implications the researcher would claim today.** What does the + result — as far as it's gone — *mean*? A gesture is fine; a + premature strong claim is not. + +The researcher's framing is the substrate. Don't draft around a guess +at it. + +### 3 · Draft order (inverted from reproduction) + +In interactive mode, the executive summary is drafted *first* (as a +stub, to fix intent) and revised last. This is the opposite of +reproduction. + +1. **`summary` — stub.** One paragraph, provisional. States + the question and the current best-guess outcome. Explicitly marked + provisional (see below). Useful because it forces a clear statement + of intent the rest of the narrative can align with. +2. **`methods`** — the substance. The process is live; methods is + where the live thinking goes. Name decisions in flight. Name + pivots. Use first-person plural, with dates where iteration + matters. Use `[: ]` inline if it's load-bearing. +3. **`findings`** — what's been established so far, with anchors to + `findings.` that actually exist. Phrase claims to make + dependency visible: "pending validation in + [reconstruction](#analyses.reconstruction)." +4. **`inputs`** — what the work rests on. +5. **`outputs`** — thin; what's been promoted to the top level, if + any. +6. **Return to `summary`** and revise it against the rest of + the draft. Re-mark provisional. + +For a decision in flight, `rationale:` can explicitly call out +open-ness: "We are currently running with option X, pending validation +of Y. See [[fiber or sub-analysis]]." + +### 4 · Provisional voice + +Make incompleteness visible in three ways: + +**Phrasing.** Not "we constrain X to 3%"; rather "our current best +constraint on X is 3%, pending validation of the covariance in +[reconstruction](#analyses.reconstruction)." Not "we detect Y"; rather +"we detect Y at the 4σ level in the current fit, with the fit being +revisited after the prior rescope lands." + +**Explicit markers.** At the top of `summary` (and optionally +on any key that's unusually volatile), an italic note: + +```yaml +summary: > + _(Provisional — revisit after bao_fitting. Last updated 2026-04-23.)_ + We are measuring the BAO scale in the DESI DR1 LRG tracer as a + warm-up before folding in ELGs and QSOs. Current best result is + [an 8σ detection of the acoustic peak at z = 0.7 + ](#findings.lrg_bao_detection), with the aggregate precision + constraint pending completion of the covariance validation in + [reconstruction](#analyses.reconstruction). +``` + +The `_(Provisional ...)_` prefix is a convention, not a spec field. It +reads as expected-to-change without breaking the narrative shape. + +### 5 · Revision cadence + +Interactive narratives accrete. File fibers for: + +- The ceiling date for next revision. +- Open questions that will force rewrites when they close. +- Decisions in flight and what a different resolution would change in + the narrative. + +When a major result lands (headline finding solidified, pivotal +decision settled), a full revision pass — including re-drafting the +executive summary in reproduction-style (past tense, declarative) for +the now-settled content, while keeping provisional markers on what's +still open. + +### 6 · Voice + +- **First person plural** ("we are measuring," "we found"), present + tense for live work, past tense for completed steps. +- **Hedge when uncertain; claim when confident.** Interactive mode has + a sharper hedging signal than reproduction — the author's current + confidence *is* what the reader needs to know. Don't over-hedge + defensively and don't under-hedge performatively. +- **Name sub-analyses that don't exist yet.** If the plan is to run + `reconstruction` next and the current narrative anticipates its + output, say so: "Once [reconstruction](#analyses.reconstruction) is + run, we expect X; if the expectation fails, Y follows." This is + legitimate movement-of-learning: it captures what a result is being + interpreted *against*. + +### 7 · Critique (adds to SKILL.md base) + +**Provisional audit.** + +- Is every claim phrased consistently with the actual confidence level? +- Are provisional markers present where the content is volatile? +- Will a reader one week from now know which pieces need revisiting + vs. which are settled? + +**Freshness audit.** + +- Any "last updated" or "revisit after" markers still current, or + stale? +- Any referenced sub-analysis or finding that has since changed but + the narrative still reflects the old state? + +## Anti-patterns (interactive-specific) + +- **False completeness.** Writing in reproduction voice ("we measure," + "we constrain") when the measurement is in flight. Use "we are + measuring" / "our current constraint is X, pending Y." +- **Over-committing to implications.** Promising what results will + mean before they land. A gesture is honest; a claim before evidence + is not. +- **Skipping movement-of-learning because "it's still moving."** The + live process *is* the movement. Capture it while it's cheap; it's + the hardest content to reconstruct later. +- **Solo drafting.** Interactive is the one mode where authoring + without asking produces fiction. The researcher is available; ask. +- **Provisional everywhere.** If every sentence is hedged, the + narrative reads as afraid of itself. Hedge the genuinely uncertain + claims; state the settled ones plainly. +- **Stale markers.** A "revisit after X" comment left in place after + X has landed is worse than no marker at all. Revise on each touch. + +## When interactive stabilizes + +When the work is done (paper draft ready, results published, project +wrapping up), the narrative should be rewritten in reproduction voice. +Interactive was scaffolding; the final narrative reads as a stable +artifact. That rewrite is its own pass — switch modes and treat the +project's own prior drafts as a source, like a paper. diff --git a/claude/lightcone/skills/narrative/references/paper-reproduction.md b/claude/lightcone/skills/narrative/references/paper-reproduction.md new file mode 100644 index 00000000..a9fd42d2 --- /dev/null +++ b/claude/lightcone/skills/narrative/references/paper-reproduction.md @@ -0,0 +1,221 @@ +# Paper reproduction mode + +A published paper exists. Reconstruct its narrative into ASTRA's +five-key shape — against an `astra.yaml` that's already built, or +alongside one being built concurrently — preserving the paper's +confidence level and sequence. + +## Where the paper lives + +Prefer arXiv LaTeX source. It's the most natural form to work with: +sections are delimited, captions are inline, citations resolve to a +`.bib`, equations are parseable. + +### 1 · arXiv LaTeX source (default) + +If the paper is on arXiv, fetch the source: + +```sh +arxiv_id= # e.g. 2404.03000 +mkdir -p paper +cd paper +curl -L "https://arxiv.org/e-print/${arxiv_id}" -o "${arxiv_id}.tar.gz" +tar -xzf "${arxiv_id}.tar.gz" +``` + +The archive unpacks to the paper's working tree — typically a main +`.tex` file, section includes, figures, a `.bib`. Identify the main +file with `grep -l '\\documentclass' *.tex`. Read sections in order; +resolve citation keys against the bundled `.bib`. + +### 2 · Existing parsed paper in the project + +Some reproductions ship the paper already parsed. Check for: + +- `desi_dr1_paper/` or `paper/` at the project root. +- Single `.md` file (Docling output or manual conversion), + `.pdf`, or the arXiv tarball unpacked. + +If a markdown parse exists, use it as the primary source; fall back +to the PDF or the arXiv source to resolve ambiguities. + +### 3 · User-provided + +Ask the user where the paper is if nothing lands automatically. + +If no paper is accessible, this is not a reproduction task — fall +back to `references/existing-analysis.md` (currently under +development). + +## Paper-to-ASTRA mapping + +Write this down before drafting a sentence. + +| Paper element | ASTRA home | +|---|---| +| Abstract | `summary` | +| Introduction (motivation, related work) | `summary` + `findings` intro | +| Methods section N | corresponding sub-analysis's `narrative.methods` | +| Results | structural `findings.` claims; narrative intro in `findings` | +| Discussion | `findings` narrative + `summary` implications | +| Conclusions | reinforces `summary` | +| Figures / tables | `outputs.` — referenced in `findings` via anchors | +| "We chose X because Y" sentences | decision `rationale:` | + +Not every paper maps cleanly section-to-sub-analysis. When it +doesn't, the sub-analysis DAG in `astra.yaml` is authoritative. +Narrate according to the DAG, harvesting the paper's prose for +content. If the spec has deliberately reorganized relative to the +paper, say so briefly in `methods`. + +## Workflow + +### 1 · Orient + +The spec may be stable, in flux, or both — narrative drafting often +runs concurrently with spec refinement. Read what's there; expect to +revisit as the spec moves. + +1. `astra.yaml` at the project root. Whole file. Note `inputs`, + `outputs`, `decisions`, `findings`, `analyses`, existing + `narrative:`. Notice which of the five keys are present vs. empty. +2. Each sub-analysis `astra.yaml`. Skim decisions (inherited vs. + local), findings, outputs, existing narrative. A sub-analysis may + use `description:` (legacy) instead of the five-key `narrative:` + block — promoting it may be part of the job. +3. The paper — abstract, intro open/close, methods section headers, + discussion, conclusions. Read full sections when drafting the + corresponding ASTRA piece. +4. Any project `CLAUDE.md` or working notes. + +Infer authoring state (from-scratch, extending, revising) from what +is already on disk. If the user is present, confirm via +`AskUserQuestion`: + +- Scale: top-level, a specific sub-analysis, or a decision's + `rationale:`? +- Pure reproduction, or with reproducer extensions (e.g., the + reproduction's covariance differs from the posted table)? + +If the spec is iterating, draft narrative concurrently — rationale +when a decision is added, five-key narrative when a sub-analysis +splits, findings synthesis updated when a finding is added. Narrative +and spec quality rise together when they share context. + +### 2 · Draft order + +Not `summary` first. `summary` compresses the rest; draft it last. + +1. **`inputs`** — shortest. Name the data and its provenance. One + short paragraph. Let the inputs structure carry the dataset + detail. +2. **`methods`** — walk the pipeline in DAG order. Cite each + sub-analysis and decision by anchor as part of the argument, not + as an enumeration. If there are too many to weave coherently, the + analysis wants more sub-analyses. Inheritance that propagates + across sub-analyses gets called out because it's load-bearing + end-to-end. Movement-of-learning lives here — a pivot the paper + narrates ("we initially tried X, but…") is cheap because of + telescoping. +3. **`findings`** — **only if findings are declared structurally.** + If `findings:` is empty, skip this key (per narrate-what-you- + declare). If findings exist, synthesize how they fit together — + each cited by anchor, not an enumeration. +4. **`outputs`** — thin. Which artifacts were promoted and why; + point to the sub-analysis that produced them. +5. **`summary`** — last. Two paragraphs. Open with the question and + the headline finding; thread motivation, method, and implications. + No primer material. + +For sub-analyses, same order, same length target (1–3 paragraphs per +key). For a decision's `rationale:`, one paragraph: what was decided, +the insight(s) that motivated it (by anchor), what the load-bearing +alternative was and why it lost. The alternatives themselves are in +the options structure. + +**Conditional keys on sub-analyses.** Only include keys whose +structural counterpart is non-empty. A reconstruction sub-analysis +with no findings gets `summary`, `methods`, `inputs`, `outputs` — no +`findings`. + +### 3 · Reproduction-specific moves + +- **Fidelity to source confidence.** Don't sharpen or soften. If the + paper says "we detect," don't write "we strongly detect." If it + hedges, preserve the hedge. +- **Harvest, don't invent.** The paper's prose is the first source. + Paraphrase — don't lift verbatim — but preserve meaning and + confidence register. +- **Voice seams.** If reproducer-specific content enters ("during + reproduction we found the published covariance differs from the + posted table"), mark the transition. A sentence mixing paper + claims and reproduction claims without a seam confuses both. +- **Paper sequence is usually load-bearing.** DAG order should match + the paper's section order unless the spec deliberately + reorganized. +- **No primer material.** `summary` is not a field-introduction. + Don't teach what BAO or weak lensing is. Readers arrive with + context. +- **Rationales come from the paper.** "We chose reconstruction + convention X because Y" becomes the backbone of a decision's + `rationale`. Keep Y; cite the supporting prior insight by anchor + if one exists. +- **Published = done.** Reproduction narrative is declarative, + present-tense matching the paper's voice ("The analysis is + organised as…", "The pipeline runs in…"). Not "we are measuring." +- **Scope-limited reproductions.** Real-world reproductions often + cover a subset of the paper (e.g., DESI BAO reproducing only + LRG1+LRG2). Name the scope in `summary` so a reader knows what's + in and out. + +## Critique pass + +Run these reproduction-specific checks alongside the three-phase and +craft audits from SKILL.md. + +**Fidelity audit.** + +- No sharpened or softened claims relative to the paper. +- Voice seams marked where reproducer content enters. +- Rationales traceable to the paper's justifications or to a prior + insight in the spec. +- No invented citations. Every anchor resolves to a real spec id. +- Scope (what's reproduced, what isn't) stated in `summary` if + narrower than the paper. + +**Sequence audit.** + +- `methods` walks sub-analyses in DAG order; DAG order matches the + paper's narrative sequence (or the deviation is named in prose). +- `summary` opens with the question, not a field primer. + +**Structural-peer-redundancy audit.** + +- Every declared decision, finding, output, and sub-analysis cited + somewhere in the narrative (validator enforces). Citations woven + into argument, not recited as a list. +- `findings` narrative synthesizes relationships between findings; + `inputs` narrative names provenance. Neither catalogs fields. + +**Anchor coverage audit.** + +- `astra validate` warns on any declared finding / decision / output + / sub-analysis not cited in the narrative. Review the warnings; + either cite the element or consider whether it should be declared. + +## Anti-patterns (reproduction-specific) + +- **Lifting verbatim.** Copy-pasting abstract sentences into + `summary`. Paraphrase — otherwise the narrative reads as a citation + of itself. +- **Adding implications the paper didn't make.** Fidelity cuts both + ways. +- **Eliding the reproducer's voice entirely.** If the reproduction + caught something the paper missed, name it with the seam. +- **Treating paper sections as sub-analyses.** A paper's Section 3.2 + isn't automatically a sub-analysis; the DAG is the authority. +- **Listing instead of weaving.** Narrate each decision where it + shapes the pipeline. Too many to weave coherently → the spec wants + more sub-analyses. +- **Drafting `findings` on a sub-analysis that has no declared + findings.** Skip the key. From 4f9a7246b75f3dd07c6831369d5806defcc160dd Mon Sep 17 00:00:00 2001 From: Cail Daley Date: Mon, 4 May 2026 03:08:50 +0200 Subject: [PATCH 002/124] skills: add ralph-loops, managing-bibliography, constitution; update narrative for bundle + #108 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bundles three more skills into the lightcone-cli paper-reproduction toolkit alongside the existing /narrative skill, and updates /narrative for bundle-aware orchestration plus the downstream-consumer discipline that closes #108. - ralph-loops: direct copy from cailmdaley/skills (felt-agnostic). Carries the loop iteration discipline + scripts/ralph runner + assets/spec.md template. /paper2astra will launch this against per-paper constitutions. - managing-bibliography: direct copy of personal version (~/.claude/skills/ managing-bibliography). Becomes the canonical paper-acquisition path inside the bundle (arxiv-LaTeX-first; PDF + Docling fallback for non-arxiv). - constitution: merged version. Public skills/skills/constitution provided the procedural backbone (study → draft → refine → launch); personal felt references (constitute.md, crafting.md) provided the depth (two-diamonds rhythm, six stances, funnel ledger, qualitative self-check). Felt-specific commands have been softened to felt-optional framing so the skill stands alone in lightcone-cli. - narrative: (a) acknowledges paper-reproduction bundle context and SPECIFY- phase invocation; (b) lands the lightcone-cli#108 fix — new "Data flow" section in the mode-independent substrate, requiring narrative.outputs to name downstream consumers (. form) and root narratives to include a top-down end-to-end data-flow paragraph when sub-analyses exist. All copied skills carry attribution to their canonical home in a Provenance section. Closes lightcone-cli#108 as a side-effect of the bundle PR. Refs: lightcone/paper2astra-as-skill/skill-bundle constitution. Co-Authored-By: Claude Opus 4.7 --- claude/lightcone/skills/constitution/SKILL.md | 126 ++++++++++++ .../constitution/references/constitute.md | 136 ++++++++++++ .../constitution/references/crafting.md | 193 ++++++++++++++++++ .../skills/managing-bibliography/SKILL.md | 162 +++++++++++++++ claude/lightcone/skills/narrative/SKILL.md | 42 +++- claude/lightcone/skills/ralph-loops/SKILL.md | 70 +++++++ .../skills/ralph-loops/assets/spec.md | 29 +++ .../skills/ralph-loops/scripts/ralph | 124 +++++++++++ 8 files changed, 881 insertions(+), 1 deletion(-) create mode 100644 claude/lightcone/skills/constitution/SKILL.md create mode 100644 claude/lightcone/skills/constitution/references/constitute.md create mode 100644 claude/lightcone/skills/constitution/references/crafting.md create mode 100644 claude/lightcone/skills/managing-bibliography/SKILL.md create mode 100644 claude/lightcone/skills/ralph-loops/SKILL.md create mode 100644 claude/lightcone/skills/ralph-loops/assets/spec.md create mode 100755 claude/lightcone/skills/ralph-loops/scripts/ralph diff --git a/claude/lightcone/skills/constitution/SKILL.md b/claude/lightcone/skills/constitution/SKILL.md new file mode 100644 index 00000000..58384960 --- /dev/null +++ b/claude/lightcone/skills/constitution/SKILL.md @@ -0,0 +1,126 @@ +--- +name: constitution +description: > + Draft a constitution — a markdown spec describing a desired state for + autonomous iteration. Study the problem space, shape the spec + interactively (two-diamonds rhythm; six stances on demand), then hand + it to a runner — a ralph loop, a shuttle dispatch, or any other + iteration-runner. Use for any work where adaptation matters more than + a fixed plan: science, refactoring, exploration, creative work. + Triggers: "constitution", "constitute", "ralph spec", "set up a ralph", + "create a ralph", "write a spec". +--- + +# Constitution + +A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It's designed to outlast any single agent or iteration and remain valid as the world changes around it. A good constitution never says "50 files remain" because that's a snapshot that goes stale; it says "check `grep -r 'old_pattern'`" because that's a principle that stays true until the work is done. + +Constitutions don't prescribe steps. They describe what the system looks like when it's right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what's highest value. In a ralph loop, each iteration does this with fresh context. + +This matters most in science and exploratory work, where each decision is informed by the result just before it. A plan assumes you know the path; a constitution trusts the agent to find it — with taste, judgment, and fresh eyes each time. + +**Separation of context: if you craft, you never do the work yourself.** + +## Workflow + +1. **Study** — Read relevant files, understand existing patterns. This informs the *spec*, not implementation. The goal is pointers that iterations will follow. + +2. **Draft** — Create a markdown spec file. The bundled template lives in the sibling `ralph-loops` skill: + ```bash + cp ../ralph-loops/assets/spec.md my-spec.md + ``` + (or copy directly from `claude/lightcone/skills/ralph-loops/assets/spec.md` if you're outside a skill). + Fill in what you can — don't wait until it's perfect. + +3. **Refine** — Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. The two-diamonds rhythm and six stances in [`references/crafting.md`](references/crafting.md) help most when the user is deciding something non-trivial. Apply the qualitative ambiguity self-check before launching. + +4. **Launch** — When approved, hand the spec to whichever runner is appropriate. Common options: + + - **`/ralph-loops`** — bundled, manual loop runner. Tmux session re-spawns iterations against the spec until status flips off open/active. + ```bash + ../ralph-loops/scripts/ralph my-spec.md [--backend claude|codex] [-- extra-flags...] + ``` + Add `-- --chrome` for visual/frontend work. Session: `ralph-`. Attach: `tmux attach -t ralph-`. + - **External dispatchers** (e.g. shuttle, when felt is installed) — watch a fiber tree for dispatch-eligible blocks and spawn single-shot workers. Their configuration is owned outside this skill. + + The constitution stays editable while iteration runs; successive iterations re-read it each cycle, so refinements between iterations are normal. + +## What goes in a constitution + +A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what doesn't, add what's missing: + +```markdown +## Desired State +What the system looks like when it's done. Invariants, quality bar, +done-conditions. Fence the scope — what to aim for AND what to leave alone. + +## Context +File paths, existing patterns, architectural constraints. Things iterations +need to *find* but not *achieve*. + +## Skills +Which skills to activate before working (e.g., /snakemake, /narrative). + +## Evidence +How to check progress — commands, test suites, grep patterns. Pointers to +the ground truth that iterations measure themselves against. + +## Open Questions +Uncertainties the user should weigh in on. Iterations add to this; the user +resolves between loops. +``` + +For deeper reference on each section's voice and the discipline that keeps a constitution from drifting into a plan, see [`references/constitute.md`](references/constitute.md). + +## Principles + +**Constitution, not plan.** Say what the system looks like when it's right. Never describe the current state — anything that becomes false or irrelevant as work progresses doesn't belong. If a section would be outdated after one iteration, it's a snapshot — replace it with a pointer. + +**Pointers, not snapshots.** "Check `grep -r 'old_pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done. + +**Prefer existing systems.** Before designing anything new: can what's there handle this? + +**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies. + +**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it. + +## Constitutions that shape artifacts + +Some constitutions don't build code — they shape artifacts like documentation, dashboards, or research narratives. These have different rhythms: + +- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it's the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow. +- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops. + +## Anti-patterns + +**Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment. + +**Vague done.** "Make it better" — when does iteration stop? + +**Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste. + +**Snapshot language.** "Currently 50 files" — will be wrong after one iteration. + +**Decision logs in the body.** "Resolved choices" / "Process notes" sections turn the constitution into a process journal. When a question gets answered, fold the answer into the narrative where it's contextually relevant — into Invariants, Desired State, Context — and let the runner's history surface (`felt history`, commits, etc.) carry the chronology. + +--- + +## References + +- [`references/constitute.md`](references/constitute.md) — depth on + drafting voice, sections, and the felt-flavored crafting workflow. + Felt-optional: read past the felt-specific commands if felt isn't installed. +- [`references/crafting.md`](references/crafting.md) — two-diamonds + rhythm, six stances, the funnel ledger, and the qualitative ambiguity + self-check. Use this when the conversation has careful-thinking + character — not every constitution drafting needs it, but the ones that + do are the ones that benefit most. + +## Provenance + +Merged from two sources: + +- [`cailmdaley/skills/skills/constitution/`](https://github.com/cailmdaley/skills/tree/main/skills/constitution) (public, procedural, felt-agnostic) — provided the SKILL body backbone. +- `~/.claude/skills/felt/references/{constitute,crafting}.md` (personal felt skill) — provided the depth references; felt-specific commands have been softened to felt-optional framing so this skill stands alone in lightcone-cli. + +Copied here for the paper-reproduction bundle so `/paper2astra` can invoke `/constitution` to draft the per-paper reproduction constitution during its interview phase. The merged shape may flow back upstream; re-sync as needed. diff --git a/claude/lightcone/skills/constitution/references/constitute.md b/claude/lightcone/skills/constitution/references/constitute.md new file mode 100644 index 00000000..198a3a18 --- /dev/null +++ b/claude/lightcone/skills/constitution/references/constitute.md @@ -0,0 +1,136 @@ +# Constitute — depth reference + +Drafting a constitution. The SKILL body covers the procedural backbone (Study → Draft → Refine → Launch). This reference goes deeper on voice, sections, and the discipline that keeps a constitution from sliding into a plan. + +The constitution itself is just a markdown file with YAML frontmatter that a runner reads on each iteration. Common runners: the bundled `ralph-loops` (tmux loop, `scripts/ralph`), or external dispatchers like felt-shuttle (when felt is installed). The runner is interchangeable; the constitution is what matters. + +--- + +## What a constitution is + +A constitution is a design document with trust built in. Like a governmental constitution, it lays out principles and aspirations — not specific laws, not the current state of affairs. It is designed to outlast any single iteration and remain valid as the world changes around it. + +**A good constitution never says "50 files remain"** — that is a snapshot that goes stale. It says `check "grep -r 'old_pattern'"` — that is a principle that stays true until the work is done. + +Constitutions do not prescribe steps. They describe what the system looks like when it is right — the desired state, in both senses of the word. Nothing in the constitution should become confusing or unnecessary as the desired state is reached. Whoever works from it surveys reality, reasons about the gap, and decides what is highest value. Each iteration of the work does this with fresh context. + +**Constitution, not plan.** Plans assume you know the path; constitutions trust the agent to find it — with taste, judgment, and fresh eyes each time. This matters most in science and exploratory work, where each decision is informed by the result just before it. + +**Separation of context: if you craft, you never do the work yourself.** The constitution is designed by one role; iterations are run by another. + +--- + +## When to constitute + +- Work where adaptation matters more than a fixed plan: scientific investigation, exploratory refactoring, creative writing +- The desired state is clear (or can be made clear) but the path is not +- Iterations need to re-read with fresh context and make judgment calls +- A checklist would either be wrong after one step or race through without judgment + +Do not constitute for: clearly-scoped atomic tasks, work that could be a snakemake rule, anything where a plan actually is the right shape. + +--- + +## Workflow (deeper) + +### 1. Study + +Read relevant files, understand existing patterns. This informs the **constitution**, not implementation — the goal is pointers that iterations will follow, not a head start on the work. + +### 2. Draft + +Create the spec file from the bundled template: + +```bash +cp ../ralph-loops/assets/spec.md my-spec.md +``` + +(Or, if felt is installed and you are working in a felt-tracked project, you can create the constitution as a fiber and the runner will treat it as the spec — `felt add "Constitution title" -s open -t constitution` then edit the body. This is felt-only; the bundled template above works without felt.) + +Use the crafting process from [`crafting.md`](crafting.md): + +- **Wonder → Ontology:** what IS the desired state? Name it precisely. +- **Design → Delivery:** what sections does this constitution need? Which are pointers vs snapshots? + +Stances that help most during constitution drafting: + +- **Ontologist** for naming the desired state ("what IS 'done' here?") +- **Simplifier** for fencing scope ("what are we explicitly leaving alone?") +- **Contrarian** for pressure-testing whether the whole framing is right +- **Architect** when the constitution is about refactoring structure + +### 3. Refine + +Show the draft, get feedback, revise. Use AskUserQuestion for structured choices. Apply the qualitative ambiguity self-check from `crafting.md` — goal, constraints, success — before launching. + +Repeat until it feels solid. It does not have to be complete; open questions belong in the Open Questions section. + +### 4. Launch + +When approved, hand to a runner. Bundled option: `../ralph-loops/scripts/ralph my-spec.md`. The runner re-reads the spec each iteration, so refinements between iterations are normal. + +--- + +## Constitutional sections + +A constitution needs enough structure that an iteration landing cold can orient itself, and enough freedom that it can adapt. Common sections — use what fits, skip what does not, add what is missing: + +```markdown +## Desired State +What the system looks like when it is done. Invariants, quality bar, +done-conditions. Fence the scope — what to aim for AND what to leave alone. + +## Context +File paths, existing patterns, architectural constraints. Things iterations +need to *find* but not *achieve*. + +## Skills +Which skills to activate before working (e.g., /snakemake, /narrative). + +## Evidence +How to check progress — commands, test suites, grep patterns. Pointers to +ground truth that iterations measure themselves against. + +## Open Questions +Uncertainties the user should weigh in on. Iterations add to this; the user +resolves between loops. +``` + +--- + +## Principles (deeper) + +**Pointers, not snapshots.** `check "grep -r 'old_pattern'"` not "50 files remain." Snapshots go stale; pointers stay valid across iterations. This is the constitutional principle: write what remains true until the work is done. + +**Prefer existing systems.** Before designing anything new: can what is there handle this? + +**Constraints need reasons.** Bare constraints get creatively circumvented. Include enough *why* that an iteration knows when it applies. + +**Scope is a gift.** A clear fence — "only rename, don't refactor" — saves iterations from well-intentioned drift. Explicit scope frees the agent to work confidently within it. + +--- + +## Constitutions that shape artifacts + +Some constitutions do not build code — they shape artifacts like documentation or research narratives. These have different rhythms: + +- **The desired state is comprehension, not correctness.** "A reviewer can follow the narrative cold" is harder to test than "all tests pass" — but it is the right bar. Evidence for progress: fewer redundant plots, clearer prose, more natural flow. +- **The artifact continues to grow.** Unlike a refactoring (which finishes), a research narrative keeps acquiring nodes. The constitution shapes how growth presents itself, not when growth stops. + +--- + +## Anti-patterns + +- **Checklists.** "1. Add X, 2. Add Y" — iterations race through without judgment. +- **Vague done.** "Make it better" — when does iteration stop? What would a reader see? +- **Over-specification.** Prescribing *how* instead of *what*. Trust the agent's taste. +- **Snapshot language.** "Currently 50 files" — will be wrong after one iteration. +- **Immutable seed.** Not our shape. The constitution is meant to be edited between iterations; do not treat it as frozen. +- **Numerical convergence.** "Iteration stops when similarity ≥ 0.95" — wrong shape for science. Stop when the Evidence section says the desired state has been reached. +- **Decision logs in the body.** "Resolved choices" / "Decisions made" / "Process notes" sections turn the constitution into a process journal. When a question gets answered (in conversation, via `AskUserQuestion`, in a review), fold the answer into the narrative where it is contextually relevant — into Invariants, Desired State, Context — and let the runner's chronological surface (commits, `felt history` if felt is in use) carry the chronology. The constitution describes *what is*, not *how we got here*; an "Open Questions" section that has been fully resolved should be deleted, not left as a victory log. + +--- + +## When crafting lands here + +The crafting rhythm in [`crafting.md`](crafting.md) applies to all careful interactive thinking; this reference kicks in when the target artifact is specifically a constitution. The diamonds do most of the work — the funnel mechanic used for open-ended exploration is not the primary move here, because there is already one specific artifact being produced. See the Workflow section above for which stances help most at each drafting phase. diff --git a/claude/lightcone/skills/constitution/references/crafting.md b/claude/lightcone/skills/constitution/references/crafting.md new file mode 100644 index 00000000..15f65b32 --- /dev/null +++ b/claude/lightcone/skills/constitution/references/crafting.md @@ -0,0 +1,193 @@ +# Crafting + +How to help the user think through something that hasn't crystallized, and turn the result into structured commitments — frontmatter on a fiber if felt is in use, otherwise inline structure in the constitution itself (decisions with excluded options, evidence pointers, scoped findings). + +Use it when the user is deciding something non-trivial, scoping a sub-analysis, drafting a living spec, or talking through an open question — any time careful interactive thinking is happening and the output can land in structured form. + +The rhythm is two diamonds: first understand what the thing IS, then decide what to DO about it. Each diamond diverges to explore and converges to commit. The ontological question — *what IS this, really?* — is the convergence point of the first diamond, and it is the most practical question you can ask. + +``` + ◇ Wonder ◇ Design + ╱ (diverge) ╱ (diverge) + ╱ surface ╱ alternatives + ╱ questions ╱ trade-offs +●─────────────────────●─────────────────────● + ╲ ╲ + ╲ crystallize ╲ commit + ╲ the name ╲ with reasons + ◇ (converge) ◇ (converge) + Ontology Delivery +``` + +Diamond 1 diverges into questions and converges on a name (*"this IS a decision about covariance estimation"*). Diamond 2 diverges into alternatives and converges on a commit (a default with `excluded_reason` for each rejection). The second diamond inherits the ontological commit from the first. + +--- + +## The two diamonds + +### Diamond 1: Wonder → Ontology + +**Wonder (diverge).** What are we actually trying to figure out? Surface questions, assumptions, ambiguities. Do not propose answers yet. If the user is already pitching solutions, back them up to the question. + +**Ontology (converge).** What IS this, really? Crystallize into a claim, decision, or question specific enough to act on. The convergence is complete when you can **name** the thing precisely — "this is a decision about covariance estimation" or "this is a question about whether leakage matters below ℓ=100." A good name is often the entire output of Diamond 1. + +**Output of Diamond 1:** a stub with a real name and at least one structural placeholder — a decision label, an insight claim, or input/output IDs. Not a full block — just the hook that identifies what kind of thing this is. + +### Diamond 2: Design → Delivery + +**Design (diverge).** What are the real alternatives? For each, what would make it right or wrong? Trade-offs, excluded options, edge cases. This is where the Contrarian and Simplifier stances are most useful. + +**Delivery (converge).** Commit to a default, write the `excluded_reason` for each rejected option, identify inputs and outputs, stage the evidence. The structure is now formalizable. + +**Output of Diamond 2:** structured fields populated — `decisions` with options and default, `inputs`/`outputs` with IDs and types, `insights` with claim and evidence. (If felt is in use, these go on the fiber; otherwise they live in the spec itself or in `astra.yaml`.) + +The two diamonds are sequential but the boundary is soft. If you find yourself naming alternatives before the thing is clear, back up to the ontology convergence point. If you converge too early on "this is a decision" when it is actually a question, the Design phase will feel forced — that is the cue to re-enter Wonder. + +--- + +## Stances + +Six lightweight lenses for when the conversation needs pressure. **Default is no stance** — straight conversation. Invoke a stance when pressure would help, announce it in one sentence, drop it when it has done its work. Do not stack or pipeline them. + +### Socratic — *"What are you assuming?"* + +Question-only. Never proposes answers. Surfaces the assumptions under the user's framing. + +- What are you assuming is true that might not be? +- What would make option A right vs option B? What is the actual fork? +- If you had to write the `excluded_reason` for the option you are about to reject, what would it say? + +**Use in Wonder and early Design.** When the user is about to commit to a path and you want the reasons made explicit. + +### Ontologist — *"What IS this, really?"* + +Pushes on definition before mechanism. Four questions: + +1. **Essence** — what is the true nature, stripping away accidental properties? +2. **Root cause or symptom** — is this the fundamental issue or a surface effect? +3. **Prerequisites** — what must exist first for this even to make sense? +4. **Hidden assumptions** — what implicit beliefs is the framing resting on? + +**Use at the Ontology convergence point.** When a word is doing heavy lifting and may mean different things in different sentences. + +### Contrarian — *"What if the opposite were true?"* + +Challenges premises, not details. + +- What if the choice does not actually matter for your signal? +- What if the constraint you are designing around is not real? +- What if the simplest version is already good enough? + +**Use in Design.** When the conversation is burning effort on a distinction that may not matter, or a third option (do nothing, use the default) is being ignored. + +### Simplifier — *"Is this complexity earning its keep?"* + +YAGNI, concrete first, data over code. + +- What can we remove without losing the core value? +- What is the simplest version that would work? +- Can a data structure replace this logic? + +**Use in Design and early Delivery.** When the design is drifting toward over-engineering or a feature list is growing without anchoring reasons. + +### Researcher — *"What do we actually know?"* + +Evidence before interpretation. Especially useful for scientific work where a claim needs to be defensible. + +- What does the actual source say, not what we remember? +- What would count as evidence here? What would falsify the claim? +- What is the most specific claim we can make with the data in hand? + +**Use in Delivery.** When an insight needs a defensible claim, or when the user is about to write an outcome that is stronger than the evidence supports. + +### Architect — *"If we started over, would we build it this way?"* + +Structural root cause. The question behind the question when friction keeps recurring. + +- Is the same problem showing up in different forms? +- Which abstraction does not match reality? +- What assumption was wrong from the start? + +**Use when a debate keeps returning.** The user is circling a decision they have already made three times and cannot stick to — the real question is probably structural, not tactical. + +--- + +## The funnel + +When the conversation is exploratory — no single topic, things are accumulating — keep a private running ledger of what is falling out, classified by destination: + +| Item kind | What it looks like | Destination | +|-----------|--------------------|-------------| +| **Decision** | A choice between real alternatives | `decisions` block in spec / `astra.yaml` / fiber | +| **Finding** | A claim with at least the start of evidence | `insights` block / fiber | +| **Sub-analysis** | "Compute X from Y" with identifiable inputs/outputs | New `astra.yaml` sub-analysis or new fiber with `inputs`/`outputs` stubs | +| **Question** | An open thread worth tracking, not yet answered | "Open Questions" section of the constitution / annotated fiber | +| **Root-fiber change** | A pattern or gotcha that belongs in CLAUDE.md | Edit CLAUDE.md / root fiber | + +The ledger is your own working memory. **Do not surface it mid-conversation** unless the user asks or a flush cue fires. + +**Flush cues:** + +- User says "OK we should write this down" or similar +- Three or more items have accumulated and the topic is about to shift +- A natural pause after a decision or finding lands + +On flush, present the ledger grouped by destination, then file with the user's assent. If the user declines an item, discard it without argument. + +--- + +## Qualitative ambiguity self-check + +Before committing to a path — filing a decision, launching an iteration loop, sealing an outcome — check three things qualitatively. **No scoring, no thresholds.** If any feels fuzzy, resolve it with AskUserQuestion. + +1. **Goal.** Is what the user wants specific enough that two competent people would build the same thing from it? If not, what would pin it down? +2. **Constraints.** Are the limits named? What cannot change, what must be preserved, what would break everything? Missing constraints tend to show up as "oh wait, we also need…" after the commit. +3. **Success.** How will we know it is done or right? What is the evidence condition? Qualitative is fine ("a reviewer can follow the narrative cold"), but it has to be checkable. + +When one is fuzzy, use AskUserQuestion with concrete options rather than open prose questions. Iterate until the answer is "yeah, that's it." **Stop when the fuzziness resolves, not when a score crosses a threshold.** Scores on qualitative priors add false precision; the honest signal is whether the user knows what they want. + +This is a mirror, not a gate. If the user wants to file anyway with one dimension still fuzzy, file it — the fuzziness itself can live in an Open Questions section, and future iterations can refine it. + +--- + +## When to bring in /confer + +`/confer` routes a prompt through Codex for adversarial review. Good fits inside a crafting session: + +- A design choice where two plausible paths both look right and the user is stuck +- Validating that an insight claim actually follows from its evidence +- Pressure-testing a constitution's desired state before launching iteration + +Bad fits: routine decisions, the user has already committed, the dispute is stylistic, or the answer only needs three more seconds of thought. `/confer` is not a substitute for the user's taste — it is a second opinion when the first opinion is honestly unsure. + +--- + +## Mapping outputs to structure + +What comes out of the diamonds maps onto wherever you keep structured commitments: + +| Diamond output | Destination | +|----------------|-------------| +| Wonder questions left open | "Open Questions" section in the constitution; or a fiber with `status: open` (felt) | +| Ontology convergence — "this IS a decision about X" | A `decisions..label` entry — in `astra.yaml`, in the constitution body, or on a fiber | +| Design alternatives with trade-offs | `decisions..options`; rejected options get `excluded_reason` | +| Delivery — the commit | `decisions..default` | +| Finding at end of Delivery | `insights.` with `claim` + `evidence` (or finding in `astra.yaml`) | +| Sub-analysis scope | New sub-analysis in `astra.yaml`, or a new fiber with `inputs`/`outputs` | +| Process-level lesson that generalizes | Edit to root CLAUDE.md / root fiber | + +If felt is installed, the [`felt:felt`](https://github.com/cailmdaley/felt) skill carries the tier ladder (Annotated → Formalized → Tempered) and the common frontmatter shapes. Without felt, the same shapes apply directly inline in `astra.yaml` or the constitution itself. + +--- + +## Anti-patterns + +- **Ambiguity gates.** Do not withhold help until the user clarifies N dimensions. The self-check is a mirror, not a door. +- **Numerical scoring.** Do not introduce 0–1 clarity scores with thresholds. The underlying signal is qualitative and the number adds false precision. +- **Stance pipelines.** Do not run Socratic → Ontologist → Contrarian in sequence. Pick one when it helps; drop it when it has. +- **Mandatory interview.** No prepared question list. Stances are responsive to the actual conversation. +- **Surfacing the ledger too early.** A single item is not a flush. Wait for accumulation or a pause. +- **Immutable outputs.** Nothing filed here is locked. Everything is editable; reversals are normal. +- **Nine-minds overload.** Six stances is already generous. Add more only when a specific gap shows up, never preemptively. +- **Interrogation without a ceiling.** Three questions is usually enough. If the user is getting irritated, stop asking and file what you have. +- **Converging before the name is clear.** If Diamond 2 feels forced, Diamond 1 has not finished. Back up. diff --git a/claude/lightcone/skills/managing-bibliography/SKILL.md b/claude/lightcone/skills/managing-bibliography/SKILL.md new file mode 100644 index 00000000..c9143a46 --- /dev/null +++ b/claude/lightcone/skills/managing-bibliography/SKILL.md @@ -0,0 +1,162 @@ +--- +name: managing-bibliography +description: > + Read arXiv paper source and add BibTeX entries via ADS API. Use for + research that requires reading full paper text and managing citations. + Also the canonical paper-acquisition path inside the lightcone-cli + paper-reproduction bundle: `/paper2astra` calls this during the ACQUIRE + phase to fetch arXiv LaTeX source, with PDF + Docling as the non-arXiv + fallback. Triggers on: "read paper", "cite", "add to bibliography", + "bibtex", "ADS", "arXiv", "find paper", "add citation", or any request + to read scientific papers or manage references. +--- + +Read scientific papers and manage citations. Two capabilities: + +1. **Read papers** — Download arXiv LaTeX source to read full text, verify claims, understand methodology +2. **Cite papers** — Fetch BibTeX from NASA ADS and add to bibliography + +**Activation**: Use this skill when you need to: +- Read a paper's full text (not just abstract) +- Verify a claim before citing it +- Add citations to your bibliography +- Research how other papers phrase similar findings +- Acquire a paper for the `/paper2astra` reproduction pipeline (ACQUIRE phase) + +**Usage pattern**: +- "Read the KiDS-Legacy paper to see how they report B-mode PTEs" +- "Add [paper description] to the bibliography" +- "Find and cite [author name] [year] [topic]" + +--- + +## Reading Papers + +Download arXiv LaTeX source to read full paper text: + +```bash +# Download source (replace ID as needed) +curl -L -o /tmp/2503.19441.tar.gz "https://arxiv.org/src/2503.19441" + +# Extract +mkdir -p /tmp/2503.19441 && cd /tmp/2503.19441 && tar -xzf /tmp/2503.19441.tar.gz + +# Find the main tex file +ls *.tex +``` + +This gives you: +- Full paper text (not just abstract) +- Equations and methodology details +- How authors phrased specific claims +- Their bibliography (.bib or .bbl files) + +Use when you need to: +- Verify a claim before citing +- See exact phrasing in another paper +- Understand methodology not in abstract +- Cross-reference their citations + +--- + +## ADS API Setup + +The ADS API requires an API token. Before using citation features: + +1. **Check for token**: The skill reads `$ADS_API_TOKEN` from the environment +2. **If missing**: Tell the user to create one at https://ui.adsabs.harvard.edu/user/settings/token and set it: + ```bash + # Add to ~/.zshrc or ~/.bashrc + export ADS_API_TOKEN="your-token-here" + ``` +3. **Do not proceed** with ADS API calls until the token is available — check with `echo $ADS_API_TOKEN` + +--- + +## Citing Papers + +When adding a paper to the bibliography: + +1. **Web search** for the paper using description + "arxiv" + - Look for arXiv ID in format `YYMM.NNNNN` + - If multiple results, show options and ask user to select + +2. **Query ADS API** to get bibcode using arXiv ID + ```bash + curl -H "Authorization: Bearer $ADS_API_TOKEN" \ + 'https://api.adsabs.harvard.edu/v1/search/query?q=arXiv:YYMM.NNNNN&fl=bibcode' + ``` + +3. **Fetch BibTeX entry** with abstract from ADS + ```bash + curl -H "Authorization: Bearer $ADS_API_TOKEN" \ + 'https://api.adsabs.harvard.edu/v1/export/bibtexabs/{bibcode}' + ``` + +4. **Parse BibTeX** to extract author names and year: + - Parse `author = {...}` field for last names + - Parse `year = YYYY` field for publication year + - Generate citation key based on author count: + - 1 author: `firstauthor{YY}` (e.g., `asgari17`) + - 2 authors: `firstauthor.secondauthor{YY}` (e.g., `schneider.kilbinger12`) + - 3+ authors: `firstauthor.etal{YY}` (e.g., `wright.etal25`) + - Use only last names, lowercase, final 2 digits of year + +5. **Replace citation key** in BibTeX entry + - Update the entry key on the first line (before the opening brace) + - Keep all other fields unchanged + +6. **Append to bibliography** file + - Add the modified entry to the project's `.bib` file + - Check for duplicate keys first and warn if found + +7. **Report success** + - Show the user the complete entry that was added + - Confirm file location + +## Citation Key Generation + +**Examples from BibTeX parsing**: +- `author = {{Wright}, Angus H. and {Stölzner}, Benjamin and ...}` + `year = 2025` → `wright.etal25` +- `author = {{Schneider}, Peter and {Kilbinger}, Martin}` + `year = 2012` → `schneider.kilbinger12` +- `author = {{Asgari}, Marika}` + `year = 2017` → `asgari17` + +## Error Handling + +- **No arXiv ID found**: Ask user to provide it manually or search for the paper directly +- **Multiple search results**: Show options and ask user to select the correct paper +- **ADS API fails**: Show error and suggest manual bibcode lookup or entry +- **Duplicate citation key**: Warn user, show existing entry, offer to replace or rename +- **Missing bibliography file**: Report error and ask for correct file path + +## Key Configuration Points + +- **ADS API Token**: Read from `$ADS_API_TOKEN` environment variable +- **ADS Search endpoint**: `https://api.adsabs.harvard.edu/v1/search/query` +- **ADS Export endpoint**: `https://api.adsabs.harvard.edu/v1/export/bibtexabs/{bibcode}` +- **Export format**: Use `bibtexabs` endpoint to include abstracts + +## Bibliography File Paths + +Adapt to your project structure: +- `docs/unions_bmodes/unions_bmodes.bib` (example UNIONS project) +- `references/bibliography.bib` (common alternative) +- User should specify their bibliography file path + +## Notes + +- Always use the `bibtexabs` endpoint to include abstract in the entry +- Parse author list carefully: format is `author = {{LastName}, FirstName and {LastName}, FirstName ...}` +- Year is straightforward: `year = YYYY` +- Before appending, verify file exists and has proper BibTeX format +- Preserve existing entries when appending new ones + +--- + +## Provenance + +Originally maintained at `~/.claude/skills/managing-bibliography/SKILL.md` +(Cail's personal version). Copied here so the lightcone-cli +paper-reproduction bundle has the full toolkit available via `lc init` — +without depending on a separate plugin install. The personal copy may be +ahead; re-sync as needed. diff --git a/claude/lightcone/skills/narrative/SKILL.md b/claude/lightcone/skills/narrative/SKILL.md index e08d3f1b..32524830 100644 --- a/claude/lightcone/skills/narrative/SKILL.md +++ b/claude/lightcone/skills/narrative/SKILL.md @@ -23,6 +23,14 @@ One field: `narrative:` on an analysis or sub-analysis, or `rationale:` on a dec Per-element prose (what each `Input`, `Output`, `Decision`, `Option`, or `Insight` is and why it matters) lives on those elements' own `description` / `rationale` / `notes` fields. `narrative` is the analysis-level story that weaves the pieces together. +This skill is also part of the lightcone-cli paper-reproduction bundle: the +`/paper2astra` orchestrator invokes it during the SPECIFY phase to author the +narrative for the spec it has just crafted. Sibling skills in the bundle — +`constitution`, `ralph-loops`, `managing-bibliography`, +`check-sentence-by-sentence`, `figure-comparison` — solve adjacent pieces of +the reproduction story; this skill stands alone and does not need to know +about them. + ## What a narrative is Science, from a single decision to a review paper, is a practice of @@ -158,11 +166,43 @@ Applied to the five keys: - `findings` **synthesizes** — each finding cited by anchor as part of the argument, not an enumeration. - `inputs` **names provenance**. -- `outputs` **names what was promoted and why**, citing each by anchor. +- `outputs` **names what was promoted and why**, citing each by anchor — + **and names its downstream consumers** when they exist (see "Data flow" below). - Decision `rationale:` **names why the default won**. --- +## Data flow — name where each output goes + +Recipe `inputs:` wires the DAG; the narrative makes the wiring legible. The +schema already encodes who consumes what — readers should not have to grep +49 `inputs:` lists to learn what an intermediate output is *for*. + +Two rules — both load-bearing for projects with sub-analyses: + +1. **`narrative.outputs` names downstream consumers.** When authoring + `outputs` prose on a sub-analysis or the root, name where each output + gets consumed using the `.` form that recipe `inputs:` + already uses. *"`xi_post_recon_lrg1` feeds + [`bao_fit_post_iso_ap_lrg1`](#analyses.bao_fit.outputs.bao_fit_post_iso_ap_lrg1) + and [`bao_detection_chi2_lrg1`](#findings.bao_detection_chi2_lrg1)."* + Anchor where you can; bare `.` text is acceptable when + no anchor is reachable from the current scope. + +2. **Root narrative includes a top-down data-flow paragraph.** When the + project has sub-analyses, the root analysis's `methods` (or `summary`) + must include one paragraph that traces the pipeline end-to-end: + *"raw catalogs → [reconstruction.post_recon_catalog_*](#analyses.reconstruction) + → [clustering.xi_*_recon_*](#analyses.clustering) → root [bao_fit_*](#outputs.bao_fit_post_iso_ap_lrg1)."* + This is the one place a reader can land cold and get the shape of the + pipeline without reading every recipe declaration. + +Closes [lightcone-cli#108](https://github.com/LightconeResearch/lightcone-cli/issues/108). +The validator does not (yet) enforce this; treat both rules as authorial +discipline. The information is already in the spec — surface it. + +--- + ## Anchor coverage `astra validate` checks: diff --git a/claude/lightcone/skills/ralph-loops/SKILL.md b/claude/lightcone/skills/ralph-loops/SKILL.md new file mode 100644 index 00000000..649a813c --- /dev/null +++ b/claude/lightcone/skills/ralph-loops/SKILL.md @@ -0,0 +1,70 @@ +--- +name: ralph-loops +description: > + Autonomous loop iteration toward a desired state. You are inside a ralph + loop — your spec is in the system prompt. Survey, contribute, update state + discoverably, exit. Activated automatically inside ralph loops. + Triggers: "ralph-loops", "ralph", "ralph loop", "iterate", "autonomous loop". +--- + +# Ralph Loops + +You are inside a loop. Your spec is in the system prompt above. Each iteration: survey freely, work substantially, update state discoverably, exit. + +## Loop + +1. **Survey** — Fresh eyes. Explore agents, git log, tests. You decide what to check. +2. **Contribute** — Work on 1–3 substantial pieces. Do NOT try to clear the whole queue in one iteration. +3. **Update** — Before exiting: commit your work, update CLAUDE.md if warranted. +4. **Exit** — `kill $PPID` + +**CRITICAL: Exit before compaction.** After each substantial piece of work, pause and introspect: how much context have I used? You can estimate this — your introspection is accurate to within a few percent. If you feel past 50%, wrap up and exit. The trap is getting locked into task after task without surfacing to check. Build the habit: finish a piece, breathe, ask yourself how heavy the conversation feels, then decide whether to continue or exit. Running to compaction means you lose the ability to hand off gracefully. The loop continues — you don't have to finish everything. + +## Rules + +**State, not checklist.** The spec describes what "done" looks like. Survey reality, decide what's highest value, work on that. + +**Discoverable updates.** Commits, test results, documentation — not notes or progress files. The next iteration finds what changed by inspecting the system. + +**Pointers, not snapshots.** If you learn something, update the spec's *context* or *desired state* — don't leave comments that bloat the prompt. + +**You have authority.** Trust the spec, don't ask permission. Make substantial contributions. Don't avoid ambitious solutions just because they span multiple iterations. + +**File uncertain decisions** so the user can answer after the loop. Use AskUserQuestion to batch up to 4 high-leverage questions before exiting — choices where user input redirects substantial work. + +### Long-Running Jobs + +Some iterations require waiting on computation (builds, cluster jobs, CI). When jobs are running: + +1. **Check state** — tail logs, check output +2. **Sleep** — interval proportional to expected runtime (30s for minute-scale, 5m for hour-scale) +3. **Check again** — look for errors or completion +4. **Repeat** until jobs finish or fail + +Stay and shepherd computation through. Don't exit and hope the next iteration picks it up. + +## Exit + +If you **made substantial contributions**, `kill $PPID`. Do NOT close the spec — the loop continues. + +If you **cannot find any remaining work**, update the spec's YAML frontmatter to `status: closed` with a summary of what was accomplished. + +--- + +Pattern adapted from [Ralph Wiggum](https://ghuntley.com/ralph/). + +--- + +## Provenance + +Originally from [`cailmdaley/skills`](https://github.com/cailmdaley/skills/tree/main/skills/ralph-loops). +Copied into the lightcone-cli paper-reproduction bundle so it can compose +with `paper2astra`, `constitution`, and the rest of the bundle without a +separate plugin install. The canonical version may be ahead; re-sync as +needed. + +In the bundle, `/paper2astra` invokes `/constitution` to draft a per-paper +reproduction constitution and then launches a ralph loop against it via +`scripts/ralph`. Successive iterations of the loop survey the workdir and +git history, execute the next phase, and exit cleanly — see the bundle +README at `../README.md`. diff --git a/claude/lightcone/skills/ralph-loops/assets/spec.md b/claude/lightcone/skills/ralph-loops/assets/spec.md new file mode 100644 index 00000000..0da84d2a --- /dev/null +++ b/claude/lightcone/skills/ralph-loops/assets/spec.md @@ -0,0 +1,29 @@ +--- +status: open +--- + +This is your spec for an autonomous iteration loop, a meditative iteration toward a desired state. + +## Desired State + +[Describe what you're building and why. Someone unfamiliar with the project should understand the goal from this section alone. + +Be detailed about "done": the architecture, behavior, constraints, quality bar. You'll check reality against this and work to close the gap. + +Use pointers, not snapshots. Say "check `grep -r 'pattern'`" not "50 files remain." Snapshots go stale; pointers stay valid.] + +## Context + +[Point to relevant files and existing patterns. When you see real implementations, you build coherently on them rather than introducing alien patterns.] + +## Skills + +[Skills to activate before working. Use `/skill-name`.] + +## Evidence + +[How to check progress — commands, test suites, grep patterns. Pointers to the ground truth that iterations measure themselves against.] + +## Open Questions + +[Uncertainties the user should weigh in on. Iterations add to this; the user resolves between loops.] diff --git a/claude/lightcone/skills/ralph-loops/scripts/ralph b/claude/lightcone/skills/ralph-loops/scripts/ralph new file mode 100755 index 00000000..a7269366 --- /dev/null +++ b/claude/lightcone/skills/ralph-loops/scripts/ralph @@ -0,0 +1,124 @@ +#!/bin/bash +# Run a ralph loop on a spec file +# Loops while spec status is open/active, appending spec content to system prompt +# Usage: ralph [--backend claude|codex] [-- extra-flags...] +# +# Supports both Claude Code and Codex backends. +# Default: claude. Set RALPH_BACKEND=codex or pass --backend codex. + +set -e + +SPEC_FILE="${1:?Usage: ralph [--backend claude|codex] [-- extra-flags...]}" +shift + +BACKEND="${RALPH_BACKEND:-claude}" +if [[ "$1" == "--backend" ]]; then + BACKEND="$2" + shift 2 +fi + +EXTRA_FLAGS="" +if [[ "$1" == "--" ]]; then + shift + EXTRA_FLAGS="$*" +fi + +# Resolve to absolute path +SPEC_FILE="$(cd "$(dirname "$SPEC_FILE")" && pwd)/$(basename "$SPEC_FILE")" + +if [[ ! -f "$SPEC_FILE" ]]; then + echo "Spec file not found: $SPEC_FILE" + exit 1 +fi + +SESSION="ralph-$(basename "$SPEC_FILE" .md)" +WORK_DIR="$(pwd)" + +# Check if already running +if tmux has-session -t "$SESSION" 2>/dev/null; then + echo "Ralph already running: $SESSION" + echo " Attach: tmux attach -t $SESSION" + exit 0 +fi + +# Write loop script to temp file (avoids heredoc quoting hell) +LOOP_SCRIPT=$(mktemp /tmp/ralph-loop-XXXXXX.sh) +cat > "$LOOP_SCRIPT" << 'LOOP' +#!/bin/bash +SPEC_FILE="$1" +WORK_DIR="$2" +BACKEND="$3" +EXTRA_FLAGS="$4" + +iteration=0 + +# Check YAML frontmatter for status field +check_status() { + head -50 "$SPEC_FILE" | sed -n '/^---$/,/^---$/p' | grep -qiE 'status:.*(open|active)' +} + +while check_status; do + cd "$WORK_DIR" + iteration=$((iteration + 1)) + echo "" + echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + echo "Ralph iteration $iteration — $(date '+%H:%M:%S')" + echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + + SPEC_CONTENT=$(cat "$SPEC_FILE") + + SYSPROMPT_FILE=$(mktemp /tmp/ralph-sys-XXXXXX.txt) + PROMPT_FILE=$(mktemp /tmp/ralph-prompt-XXXXXX.txt) + + cat > "$SYSPROMPT_FILE" << SYSEOF +Ralph iteration $iteration. Spec: $SPEC_FILE + +$SPEC_CONTENT +SYSEOF + + cat > "$PROMPT_FILE" << 'PROMPTEOF' +You are inside a Ralph loop — a meditative iteration toward a desired state. Activate the ralph-loops skill and follow its instructions for iterating on the spec above. +PROMPTEOF + + PROMPT=$(cat "$PROMPT_FILE") + + if [[ "$BACKEND" == "codex" ]]; then + codex --dangerously-bypass-approvals-and-sandbox \ + --config "developer_instructions=$(cat "$SYSPROMPT_FILE")" \ + $EXTRA_FLAGS \ + "$PROMPT" + else + claude --dangerously-skip-permissions \ + $EXTRA_FLAGS \ + --append-system-prompt "$(cat "$SYSPROMPT_FILE")" \ + <<< "$PROMPT" + fi + + rm -f "$SYSPROMPT_FILE" "$PROMPT_FILE" + + echo "--- Iteration complete ---" + sleep 2 +done + +echo "" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "Ralph complete — $iteration iterations" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "" +echo "Session kept open for inspection. Type exit to close." +exec bash -l +LOOP + +chmod +x "$LOOP_SCRIPT" + +echo "Starting ralph on $SPEC_FILE" +echo " Backend: $BACKEND" +echo " Work dir: $WORK_DIR" +[[ -n "$EXTRA_FLAGS" ]] && echo " Flags: $EXTRA_FLAGS" + +# Launch tmux with a login shell running the loop script +tmux new-session -d -s "$SESSION" -c "$WORK_DIR" \ + bash -l "$LOOP_SCRIPT" "$SPEC_FILE" "$WORK_DIR" "$BACKEND" "$EXTRA_FLAGS" + +echo " Session: $SESSION" +echo " Attach: tmux attach -t $SESSION" From 7d53081df62d7b9af485b4c222b8fe9c19c06aab Mon Sep 17 00:00:00 2001 From: Cail Daley Date: Mon, 4 May 2026 03:19:23 +0200 Subject: [PATCH 003/124] skills: add /paper2astra orchestrator + bundle README Builds the paper2astra skill on the lightcone-cli plugin: an interview-first orchestrator that crafts a per-paper reproduction constitution and hands it to a ralph loop. Composes the rest of the bundle (managing-bibliography, narrative, constitution, ralph-loops; +check-sentence-by-sentence and figure-comparison when Nolan pushes them). paper2astra/SKILL.md frames the workflow: - Interview is the only interactive phase (once per project). Drafts the per-paper constitution via /constitution. - After interview, /ralph-loops/scripts/ralph drives the multi-session reproduction. Each iteration surveys the workdir to determine the current phase from file existence + git history; no Pydantic state machine, no resume mechanic. - 11 phases (interview + acquire + parse + summarize + extract_targets + literature + specify + review + implement + run + compare + summarize_run) each have a self-contained reference under paper2astra/references/. The phase prose ports 1:1 from the existing Paper2ASTRA Python prompts at LightconeResearch/Paper2ASTRA, with the surfacing-seam discipline pulled in from the 2026-04-30 design plan. - Per-phase mode (interactive vs sub-agent) is a constitution choice the interview surfaces. SPECIFY and COMPARE are always interactive (mandatory user-ratification seams); SUMMARIZE and LITERATURE are always sub-agent (parallel grunt-work); the rest default to sub-agent but the user can flip them. - Material conflicts at SPECIFY (paper-vs-code disagreement that plausibly changes a numeric result) surface to the user via AskUserQuestion. Default on user silence is paper. Resolves Paper2ASTRA#8. - ACQUIRE rewrites for arxiv-LaTeX-first via /managing-bibliography; PDF + Docling stays as the non-arxiv fallback. Resolves Paper2ASTRA#7 as a side-effect of the migration. skills/README.md indexes the bundle: lifecycle skills (lc-*) plus the seven paper-reproduction skills, with origin attribution for each. Documents the pending bundle additions (Nolan's two skills not yet pushed) so the next worker knows what to add when they land. Refs: lightcone/paper2astra-as-skill/skill-bundle constitution. Co-Authored-By: Claude Opus 4.7 --- claude/lightcone/skills/README.md | 42 +++++ claude/lightcone/skills/paper2astra/SKILL.md | 168 ++++++++++++++++++ .../skills/paper2astra/references/acquire.md | 85 +++++++++ .../skills/paper2astra/references/compare.md | 97 ++++++++++ .../paper2astra/references/extract_targets.md | 61 +++++++ .../paper2astra/references/implement.md | 57 ++++++ .../paper2astra/references/interview.md | 160 +++++++++++++++++ .../paper2astra/references/literature.md | 163 +++++++++++++++++ .../skills/paper2astra/references/parse.md | 79 ++++++++ .../skills/paper2astra/references/review.md | 79 ++++++++ .../skills/paper2astra/references/run.md | 56 ++++++ .../skills/paper2astra/references/specify.md | 105 +++++++++++ .../paper2astra/references/summarize.md | 120 +++++++++++++ .../paper2astra/references/summarize_run.md | 58 ++++++ 14 files changed, 1330 insertions(+) create mode 100644 claude/lightcone/skills/README.md create mode 100644 claude/lightcone/skills/paper2astra/SKILL.md create mode 100644 claude/lightcone/skills/paper2astra/references/acquire.md create mode 100644 claude/lightcone/skills/paper2astra/references/compare.md create mode 100644 claude/lightcone/skills/paper2astra/references/extract_targets.md create mode 100644 claude/lightcone/skills/paper2astra/references/implement.md create mode 100644 claude/lightcone/skills/paper2astra/references/interview.md create mode 100644 claude/lightcone/skills/paper2astra/references/literature.md create mode 100644 claude/lightcone/skills/paper2astra/references/parse.md create mode 100644 claude/lightcone/skills/paper2astra/references/review.md create mode 100644 claude/lightcone/skills/paper2astra/references/run.md create mode 100644 claude/lightcone/skills/paper2astra/references/specify.md create mode 100644 claude/lightcone/skills/paper2astra/references/summarize.md create mode 100644 claude/lightcone/skills/paper2astra/references/summarize_run.md diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md new file mode 100644 index 00000000..439337cd --- /dev/null +++ b/claude/lightcone/skills/README.md @@ -0,0 +1,42 @@ +# lightcone-cli skills + +Each subdirectory is one Claude Code skill: `SKILL.md` plus optional `references/`, `assets/`, and `scripts/`. `lc init` copies these into a project's `.claude/skills/` so they are discoverable to Claude Code sessions. + +## Project lifecycle skills + +| Skill | Role | +|---|---| +| `lc-new` | Scaffold a new ASTRA-shaped project from scratch. | +| `lc-build` | Build container images and dependencies for a project. | +| `lc-verify` | Run validation across an ASTRA project. | +| `lc-migrate` | Migrate legacy projects to current conventions. | +| `lc-feedback` | Report bugs and feature requests upstream. | + +## Paper-reproduction bundle + +A self-contained toolkit for reproducing published papers in ASTRA. The bundle is co-located so a single `lc init` brings the full toolkit into a project — no plugin marketplace, no separate installs. + +| Skill | Role | Origin | +|---|---|---| +| [`paper2astra`](paper2astra/SKILL.md) | **Orchestrator.** Interview-first; drafts a per-paper reproduction constitution and launches a ralph loop against it. | New for the bundle. | +| [`narrative`](narrative/SKILL.md) | Author the `narrative:` prose and decision `rationale:` in `astra.yaml`. Invoked by paper2astra during SPECIFY. | Cail's ([lightcone-cli#86](https://github.com/LightconeResearch/lightcone-cli/pull/86), ported from lightcone-ui#10). | +| [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. | Merged from [`cailmdaley/skills/skills/constitution`](https://github.com/cailmdaley/skills/tree/main/skills/constitution) (procedural backbone) + Cail's personal felt references (taste — two diamonds, six stances, funnel ledger, qualitative self-check), with felt-optional framing. | +| [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Launched by paper2astra after the interview. | Direct copy from [`cailmdaley/skills/skills/ralph-loops`](https://github.com/cailmdaley/skills/tree/main/skills/ralph-loops). | +| [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. | Direct copy of Cail's personal `~/.claude/skills/managing-bibliography` (newer than the public version). | +| `check-sentence-by-sentence` | Paper-vs-code TeX audit via sub-agents; locates `file:line` or `NOT FOUND`. Invoked by paper2astra during COMPARE. | Nolan Koblischke's, on his Reproductions-branch. **Not yet pushed publicly** — see "Pending bundle additions" below. | +| `figure-comparison` | HTML side-by-side: original figures/tables/numerics vs replicated. Invoked by paper2astra during COMPARE. | Same — Nolan's, pending. | + +The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra. + +### Why bundle (not depend on plugin install) + +- **Testability.** We want to verify paper2astra invokes constitution + ralph-loops + the others correctly. That only works if all are in the same checkout. +- **Single install path.** `lc init` is the install path for lightcone-cli skills. Adding a separate "also install Cail's public skills via plugin marketplace" step is friction we don't need. +- **Copy-with-credit costs nothing.** The copied skills retain attribution to their original authors in the SKILL body; if those skills update upstream, we re-sync. +- **Future consolidation is open.** Per Francois's "next week we improve" framing, the long-run shape might be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all. + +### Pending bundle additions + +- **`check-sentence-by-sentence`** and **`figure-comparison`** — Nolan Koblischke's two skills. Per the bundle constitution ([`lightcone/.felt/lightcone/paper2astra-as-skill/skill-bundle`](https://github.com/LightconeResearch/lightcone/blob/main/.felt/lightcone/paper2astra-as-skill/skill-bundle.md)), these are part of the bundle, but at first cut they were not yet pushed to any public branch (only living on Nolan's local working tree on his Reproductions checkout). When Nolan pushes them, copy with attribution into this directory; paper2astra's SKILL.md and COMPARE reference already name them as expected siblings, so the integration is wire-compatible the moment they land. + + Until then, COMPARE falls back to direct image-diff judgment without `/figure-comparison`'s structured per-panel rendering, and SPECIFY's evidence-quote re-verification (when COMPARE flags `partial`) falls back to manual Grep against `work/reference/document.md` without `/check-sentence-by-sentence`'s sub-agent audit. Both fallbacks are workable but lossier than the intended path. diff --git a/claude/lightcone/skills/paper2astra/SKILL.md b/claude/lightcone/skills/paper2astra/SKILL.md new file mode 100644 index 00000000..5ea9fc58 --- /dev/null +++ b/claude/lightcone/skills/paper2astra/SKILL.md @@ -0,0 +1,168 @@ +--- +name: paper2astra +description: > + Reproduce a published scientific paper in ASTRA. Interview the user + about the paper and the intended scope, draft a per-paper reproduction + constitution, then launch a ralph loop that drives the multi-session + reproduction work. Composes sibling skills for each phase: managing- + bibliography for ACQUIRE, narrative for SPECIFY, check-sentence-by- + sentence + figure-comparison for COMPARE. Use when the user wants to + reproduce a paper, has a DOI or arXiv ID and wants to start a + reproduction project, or asks to "reproduce ", "set up + reproduction", "paper2astra", "/paper2astra ", or hands you a + published paper as a starting point for ASTRA work. +--- + +# paper2astra + +Reproduce a published paper in ASTRA. The skill is **interview-first**: a short interactive crafting phase up front that produces a per-paper reproduction constitution. After the interview, paper2astra hands the constitution to a ralph loop that drives multi-session reproduction. Successive iterations of the loop survey the workdir, execute one or two phases, exit cleanly, and re-spawn with fresh context until the constitution is realized. + +This is a Claude-Code-native skill. There is no Python orchestrator, no state machine, no resume mechanic — the workdir on disk + git history are the substrate. + +## When to use this skill + +- The user has a paper (DOI, arXiv ID, or PDF) and wants to reproduce its analysis +- The user invokes `/paper2astra` (with or without an argument) +- The user is starting a fresh reproduction project under `Reproductions///` +- An existing paper-reproduction workdir needs the next phase driven forward (in which case skip the interview, see "Resuming an in-flight reproduction" below) + +## The bundle + +paper2astra composes the rest of the lightcone-cli paper-reproduction bundle. All siblings live in the same `claude/lightcone/skills/` directory and are available without separate installs: + +| Sibling skill | Where it's invoked | +|---|---| +| [`/managing-bibliography`](../managing-bibliography/SKILL.md) | ACQUIRE — arXiv LaTeX source download (primary) and BibTeX caching | +| [`/constitution`](../constitution/SKILL.md) | INTERVIEW — drafting the per-paper reproduction constitution | +| [`/ralph-loops`](../ralph-loops/SKILL.md) | After interview — launches the loop that drives all subsequent phases | +| [`/narrative`](../narrative/SKILL.md) | SPECIFY — authoring the `narrative:` and `rationale:` prose in `astra.yaml` | +| [`/check-sentence-by-sentence`](../check-sentence-by-sentence/SKILL.md) | COMPARE — paper-vs-code TeX audit (Nolan's skill) | +| [`/figure-comparison`](../figure-comparison/SKILL.md) | COMPARE — HTML side-by-side reference vs reproduced figures (Nolan's skill) | + +paper2astra does not re-implement what these skills already do — it tells the agent at each phase to invoke them. + +## Workflow + +### Interview (interactive — once per project) + +The interview is the only phase paper2astra runs interactively. Read [`references/interview.md`](references/interview.md) in full before starting. + +The interview has four jobs: + +1. **Identify the paper** — DOI / arXiv ID / title; whether code is available; whether the user has prior experience with this paper. +2. **Scope the reproduction** — full reproduction vs targeted (e.g. only the BAO fit), which figures/tables/numbers are the targets. +3. **Choose interactive vs sub-agent per phase** — see "Per-phase mode" below. The defaults are reasonable; the user gets to flip any of them. +4. **Draft the per-paper constitution** — invoke `/constitution`. The constitution lives at the project root (or wherever the user prefers). It captures the paper, the scope, the per-phase mode choices, and the evidence checks. + +After the constitution is approved, the interview ends. Launch the ralph loop: + +```bash +../ralph-loops/scripts/ralph paper2astra-constitution.md +``` + +Tell the user: *"Constitution drafted. Launching ralph loop in tmux session `ralph-paper2astra-constitution`. Each iteration will run one or two phases and exit; the next iteration picks up where it left off. Attach with `tmux attach -t ralph-paper2astra-constitution`."* + +### Phases (driven by ralph iterations after the interview) + +Inside each ralph iteration, the agent reads the per-paper constitution, surveys the workdir to determine which phase is current (file existence + git log), and runs that phase's reference. Each phase reference is self-contained — read the matching one in full before working: + +| Phase | Reference | Outputs | +|---|---|---| +| ACQUIRE | [`references/acquire.md`](references/acquire.md) | `work/reference/{document.md, paper.pdf, code/, code-status.yaml}` | +| PARSE | [`references/parse.md`](references/parse.md) | `work/reference/{figures/, tables/, metadata.json}` | +| SUMMARIZE | [`references/summarize.md`](references/summarize.md) | `work/notes/{methodology.md, cited_papers.yaml, code-analysis.md}` | +| EXTRACT_TARGETS | [`references/extract_targets.md`](references/extract_targets.md) | `targets/targets.md` + reference files | +| LITERATURE | [`references/literature.md`](references/literature.md) | `work/notes/literature.yaml` + per-paper YAMLs | +| SPECIFY | [`references/specify.md`](references/specify.md) | `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` | +| REVIEW | [`references/review.md`](references/review.md) | (in-place edits to spec + notes) | +| IMPLEMENT | [`references/implement.md`](references/implement.md) | `scripts/`, `requirements.txt`, recipes in `astra.yaml` | +| RUN | [`references/run.md`](references/run.md) | `results///` | +| COMPARE | [`references/compare.md`](references/compare.md) | `comparison-report.{yaml,md}` | +| SUMMARIZE_RUN | [`references/summarize_run.md`](references/summarize_run.md) | Final write-up; constitution outcome update | + +The COMPARE → IMPLEMENT loop iterates until the verdict is `pass` or attempts are exhausted. The constitution carries the attempt budget; the ralph iterations consult it. + +### Per-phase mode (interactive vs sub-agent) + +A reproduction's most consequential decisions show up at known seams. The interview decides — for this paper — which phases run interactively (in the main loop session, the user can be reached via `AskUserQuestion`) and which delegate to a sub-agent (Task tool with fresh context, no user reach). + +Defaults the constitution starts with: + +| Phase | Default | Why | +|---|---|---| +| ACQUIRE | sub-agent | Mostly mechanical; surfacing happens only on download failures. | +| PARSE | sub-agent | Deterministic Docling / arXiv extraction. | +| SUMMARIZE | sub-agent | Parallel paper + code reading benefits from fresh context per task. | +| EXTRACT_TARGETS | user choice | The selection of replication targets is sometimes obvious, sometimes wants user input. | +| LITERATURE | sub-agent | One sub-agent per cited paper — pure parallel grunt-work. | +| SPECIFY | **interactive** | Material paper-vs-code conflicts surface here; the user must ratify. | +| REVIEW | user choice | Pre-implement sanity check; can be either. | +| IMPLEMENT | user choice | Mostly mechanical, but algorithm choices may want ratification. | +| RUN | user choice | Mechanical, but failures need diagnosis. | +| COMPARE | **interactive** | Verdict (was the reproduction close enough?) is the second mandatory user-ratification seam. | +| SUMMARIZE_RUN | sub-agent | Final report; no decisions remain. | + +The constitution records the choice; ralph iterations honor it. Sub-agent phases are spawned via the `Task` tool from inside the main loop session — that gives them fresh context but no user-reach. Interactive phases run inline in the loop session and may pause with `AskUserQuestion` at material seams. + +### Material conflicts (the SPECIFY seam) + +Inside SPECIFY, when paper and code disagree on something material, do not silently pick one. Use `AskUserQuestion` to surface the conflict: + +- **Material** = a different choice would plausibly change a numeric result the paper reports. +- **Stylistic / cosmetic / pure-tooling differences** are not material — record them in `implementation-notes.md` and move on. +- **Default on user silence is paper.** If the user does not respond, take the paper's stated method as canonical and record the override (with reason) in a finding or insight. + +Both choices land in `astra.yaml` as decision options. Whichever the user picks becomes the option selected by `universes/baseline.yaml`; the alternative is preserved as a sibling option for future universe runs. See `references/specify.md` for the full SPECIFY discipline. + +### Resuming an in-flight reproduction + +If the workdir already exists (`work/reference/document.md` is present, `astra.yaml` exists, etc.): + +1. **Skip the interview** unless the user explicitly wants to revise scope. +2. Read the per-paper constitution if it exists; if it does not, draft a minimal one from the current workdir state. +3. Launch (or re-attach to) the ralph loop. Each iteration's first move is to survey the workdir and determine the current phase. + +Workdir signals (file existence implies the phase has been done): + +| Signal | Phase done | +|---|---| +| `work/reference/document.md` | ACQUIRE + PARSE | +| `work/notes/methodology.md` | SUMMARIZE (paper) | +| `work/notes/code-analysis.md` | SUMMARIZE (code) | +| `targets/targets.md` | EXTRACT_TARGETS | +| `work/notes/literature.yaml` | LITERATURE | +| `astra.yaml` valid (`astra validate astra.yaml`) | SPECIFY | +| `implementation-notes.md` | SPECIFY | +| recipes present in `astra.yaml` | IMPLEMENT | +| `results///` | RUN | +| `comparison-report.yaml` | COMPARE | + +`git log --oneline` complements this — phase commits are the chronological view. + +## Skills (activate before working) + +- [`/constitution`](../constitution/SKILL.md) — for the interview's drafting phase +- [`/ralph-loops`](../ralph-loops/SKILL.md) — for the loop that drives phases +- [`/managing-bibliography`](../managing-bibliography/SKILL.md) — for ACQUIRE +- [`/narrative`](../narrative/SKILL.md) — for SPECIFY +- `/check-sentence-by-sentence`, `/figure-comparison` — for COMPARE (Nolan's skills; see Provenance) + +## Discipline + +- **paper2astra is the workflow story; phase references are the depth.** SKILL.md tells you when to read which reference; the references carry the prompt prose ported from the legacy Paper2ASTRA Python package. +- **Use the up-to-date CLI surfaces, not skill-specific wrappers.** When `astra validate` already does the job, call it directly. Specifically: `astra validate `, `astra validate --verify-evidence`, `astra paper add`. Use whatever the current `astra --help` surfaces. +- **No synthetic data.** Unless the paper itself uses synthetic data as its input, every input dataset must be real (downloaded, queried, or fetched from a real archive). The implement phase reference repeats this; treat it as load-bearing. +- **Workdir conventions stay.** The phase references preserve Paper2ASTRA's workdir layout (`work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, `universes/`, `results/`) so workdirs from the legacy Paper2ASTRA package are interoperable with workdirs driven by this skill. + +## Anti-patterns + +- **Asking the user mid-sub-agent.** Sub-agent phases cannot reach the user. If the constitution puts SPECIFY in sub-agent mode and a material conflict surfaces, the sub-agent must record the conflict in a `decisions:` block (with both options preserved) and let the next interactive phase ratify it. Never make the sub-agent pick silently. +- **Re-implementing what astra already does.** If `astra validate` returns clean, do not write a separate validator. If `astra paper add` caches the PDF, do not write a separate cache. +- **Treating Paper2ASTRA workdir as legacy.** It is not legacy — it is the substrate. The phase references inherit its conventions intentionally. +- **Bundling everything into one ralph iteration.** Each iteration runs one or two phases, then exits. The constitution is realized across many iterations. + +## Provenance + +`paper2astra` is a fresh skill, but the phase prose ports 1:1 from the prompts in [`LightconeResearch/Paper2ASTRA/src/paper2astra/prompts/`](https://github.com/LightconeResearch/Paper2ASTRA/tree/main/src/paper2astra/prompts) (commit b3b54b5 and onward on `feat/skill-form-redesign`). The Paper2ASTRA Python package retires once this skill is in regular use; the repo persists as a reference for the original prompts and pipeline structure. + +The two compare-phase sibling skills (`check-sentence-by-sentence` and `figure-comparison`) originate from Nolan Koblischke's work on the [Reproductions](https://github.com/LightconeResearch/Reproductions) repo. They are credited in their own SKILL.md bodies; tag him post-publish so he can PR the canonical versions wherever they should ultimately live. diff --git a/claude/lightcone/skills/paper2astra/references/acquire.md b/claude/lightcone/skills/paper2astra/references/acquire.md new file mode 100644 index 00000000..a0d8aa2d --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/acquire.md @@ -0,0 +1,85 @@ +# ACQUIRE — fetch the paper and code + +Acquire the paper's full text and (when available) its reference code repository. The bundle's primary acquisition path is **arXiv LaTeX source via `/managing-bibliography`**; PDF + Docling is the fallback for non-arXiv papers. + +The constitution's per-phase mode controls whether this runs interactively or as a sub-agent. Default is sub-agent. + +## Inputs + +- The paper's DOI or arXiv ID (from the constitution) +- An optional code repo URL (from the interview, if the user knew it) + +## Outputs + +- `work/reference/document.md` — paper as markdown (LaTeX-rendered when arXiv source available; Docling-extracted for PDF fallback) +- `work/reference/paper.pdf` — paper PDF (still needed for evidence verification via `astra validate --verify-evidence`) +- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` — extracted artifacts (PARSE may move some of this to `work/reference/`) +- `work/reference/code/` — clone of the code repo (or absent if not found) +- `work/reference/code-status.yaml` — record of where the code came from + +## Step 1: Acquire the paper text + +### Path A — arXiv ID is available (preferred) + +Invoke `/managing-bibliography`. Use it to download the arXiv LaTeX source tarball: + +```bash +curl -L -o /tmp/.tar.gz "https://arxiv.org/src/" +mkdir -p work/reference/source && cd work/reference/source && tar -xzf /tmp/.tar.gz +ls *.tex +``` + +The LaTeX source gives clean equations, captions, tables, and bibliography — none of the math collapse, ligature artifacts, or caption flattening that plagues PDF extraction. Use the main `.tex` file as the primary text source. Render it to markdown if a downstream phase needs that form (`pandoc`, or just preserve TeX where it is). + +Also cache the paper for ASTRA's evidence-verification surface: + +```bash +astra paper add 10.48550/arXiv. +cp "$(astra paper path 10.48550/arXiv.)" work/reference/paper.pdf +``` + +`astra paper add` for arXiv DOIs fetches the PDF directly. The PDF stays as a backup for `astra validate --verify-evidence`, even though the LaTeX source is the primary text. + +### Path B — non-arXiv paper (PDF + Docling fallback) + +```bash +astra paper add +cp "$(astra paper path )" work/reference/paper.pdf +file work/reference/paper.pdf +``` + +The `file` output must say "PDF document". If it says "HTML document" or anything else, the download was blocked (CAPTCHA, paywall). Search the web for an open-access copy (NASA ADS, arXiv, Unpaywall, Semantic Scholar, the journal's open-access link), download with `curl -L -o work/reference/paper.pdf `, re-validate, then `astra paper add --pdf work/reference/paper.pdf` to register the resolved file. + +If a valid PDF cannot be obtained, write a clear error to `work/reference/acquire-error.txt` and stop. + +Skip Step 1 if `work/reference/paper.pdf` already exists and is a valid PDF. + +## Step 2: Search for the code repository + +1. Search the paper text for repository URLs — abstract, intro, conclusion, footnotes, "Code Availability" or "Data Availability" sections. +2. If none found, web search: paper title + "github", Papers With Code, or the first author's GitHub profile. +3. Clone if found: + ```bash + git clone --depth 1 work/reference/code + ``` +4. Write `work/reference/code-status.yaml`: + ```yaml + found: true # or false + url: "https://..." # null if not found + cloned: true # false if found but clone failed + notes: "..." + ``` + +Spend no more than a few searches before recording failure and moving on. **Do NOT modify cloned code.** + +Skip Step 2 if `work/reference/code/` already exists. + +## Survey signals (entry into ACQUIRE) + +Run `ls work/reference/` first. If `paper.pdf` and `document.md` (or `source/` for arXiv) are present, ACQUIRE is done. If only `paper.pdf` is present, PARSE handles the rest. If nothing is there, run ACQUIRE. + +## Notes + +- **arXiv DOI form is `10.48550/arXiv.`.** `astra paper add` accepts that form directly. +- **Journal DOIs that 403 on Unpaywall** can be aliased to a locally-downloaded arXiv preprint via `astra paper add --pdf `. +- This phase's job is acquisition, not understanding. Do not start summarizing the paper here — that's SUMMARIZE. diff --git a/claude/lightcone/skills/paper2astra/references/compare.md b/claude/lightcone/skills/paper2astra/references/compare.md new file mode 100644 index 00000000..ee00a4f3 --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/compare.md @@ -0,0 +1,97 @@ +# COMPARE — judge whether the reproduction matches + +Compare reproduced results against the paper's replication targets. Produce a structured verdict the IMPLEMENT-retry loop consumes. COMPARE is the **second mandatory user-ratification seam** — the verdict (was it close enough?) is a judgment the user owns, not the agent. + +The constitution's per-phase mode is **always interactive** for this phase. Pause for verdict ratification. + +## Inputs + +- `targets/targets.md` — target ledger with priorities, expected values, comparison guidance +- `astra.yaml` — output definitions (each target maps to an output) +- `targets/` — reference figures / tables for comparison +- `results///` — reproduced results + +## Outputs + +- `comparison-report.yaml` — structured verdict +- `comparison-report.md` — human-readable summary + +## Sibling skills to invoke + +- **`/figure-comparison`** — HTML side-by-side reference vs reproduced figures, with structured judgment per panel. Invoke per figure target. (Nolan's skill; see `../figure-comparison/SKILL.md`.) +- **`/check-sentence-by-sentence`** — paper-vs-code TeX audit. Use when SPECIFY's evidence quotes need re-verification against the source paper, particularly when COMPARE flags a result as `partial` and the cause may be a misinterpretation of paper text. (Nolan's skill; see `../check-sentence-by-sentence/SKILL.md`.) + +## Result path convention + +For an output with `id: X`, the reproduced result lives at `results//X.`: + +- metrics: `.json` containing `{"value": ...}` +- figures: `.png` +- tables: `.csv` + +## Task + +1. **Read `targets/targets.md`.** Every replication target with its priority, expected values, comparison guidance, and the path to its reference file in `targets/`. +2. **Read `astra.yaml`.** Outputs correspond to targets. Match each target to its output. +3. **For every target**, find its reproduced result in `results//` and compare against the reference file in `targets/`. Missing results are `match: false`. +4. **Write `comparison-report.yaml` and `comparison-report.md`.** + +## Comparison guidance + +**Metrics.** Judge whether the reproduced value is scientifically equivalent to the expected value from `targets/targets.md`. Numerical tolerance comes from the target's stated precision; bare match is not the bar. + +**Figures.** Read the reference figure from `targets/` and compare to the reproduced image. Focus on shape / trend, axis ranges, key features (peaks, inflections, curve ordering), and magnitudes. **Do NOT require pixel-perfect matches** — stochastic methods produce variation. Judge whether the same scientific conclusion follows from both figures. **Use `/figure-comparison`** for HTML side-by-side rendering and structured per-panel judgment. + +**Tables.** Compare key values noted in `targets/targets.md` first, then remaining values. Reference tables are in `targets/`. + +## Output: `comparison-report.yaml` + +```yaml +verdict: pass|partial|fail +attempt: +outputs: + : + type: metric|figure|table + priority: high|medium|low + paper_value: "" + reproduced_value: "" + reference_file: "" + reproduced_file: "" + match: true|false + notes: "" +failure_diagnosis: null|"" +fix_suggestions: + - "" +``` + +## Verdict rules + +- **`pass`**: ALL high-priority targets match, no major issues with medium-priority. +- **`partial`**: some high-priority match, or all high-priority match but medium has issues. +- **`fail`**: most high-priority don't match, or fundamental methodological issue. + +If verdict is not `pass`, **`fix_suggestions` MUST reference specific scripts and line numbers**. "The result is wrong" is not actionable; "scripts/bao_fit.py:42 uses `damping_prior=flat`, paper specifies Gaussian; change to gaussian per Howlett+2017 §4.2" is. + +Also write `comparison-report.md` with a human-readable summary. For figure / table comparisons, describe what you see in both and explain your match judgment. + +## Verdict ratification (the user seam) + +After writing the report, surface the verdict to the user via `AskUserQuestion`: + +- **If `pass`**: confirm with the user before exiting the COMPARE → IMPLEMENT loop. *"All high-priority targets match. Mark reproduction complete?"* The user accepts → SUMMARIZE_RUN runs; the user rejects → name what's still off and re-enter the loop. +- **If `partial`**: show the user the failing targets and the diagnosis. *"Partial match. outputs failing: . Continue retrying or accept partial?"* If the attempt budget (from the constitution) is reached, this surfacing is mandatory. +- **If `fail`**: same shape, but the loop's continuation should be questioned more sharply. A fundamental methodological issue may need a constitution amendment, not another implement retry. + +The verdict is the agent's judgment; the **decision to keep iterating** is the user's. Default on user silence: continue the loop until the attempt budget is exhausted, then mandatory user surfacing. + +## Survey signals (entry into COMPARE) + +- All outputs in `lc status --universe baseline` are `ok` ⇒ ready to compare +- `comparison-report.yaml` exists with current `attempt` ⇒ COMPARE done for this attempt +- `comparison-report.yaml` verdict is `pass` ⇒ COMPARE → IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN + +## Notes + +- **One COMPARE per IMPLEMENT.** Each IMPLEMENT retry produces a fresh COMPARE; the report's `attempt` field increments. Do not overwrite prior reports — keep them at `comparison-report-attempt-.yaml` if useful, or commit each between iterations so git carries the history. +- **The verdict is the agent's; the keep-iterating decision is the user's.** Treat them as separate. +- **`/figure-comparison` is the trustworthy figure-judgment surface.** Direct image diffing without it tends to either over-fail (any pixel-level variation triggers a no-match) or over-pass (it sees that there are *some* shared features and rubber-stamps). The skill's structured prompt is the discipline. diff --git a/claude/lightcone/skills/paper2astra/references/extract_targets.md b/claude/lightcone/skills/paper2astra/references/extract_targets.md new file mode 100644 index 00000000..6af2c22b --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/extract_targets.md @@ -0,0 +1,61 @@ +# EXTRACT_TARGETS — pick the replication targets + +Take the results inventory from SUMMARIZE and select the concrete figures, tables, and metrics the reproduction will iterate against. Build a self-contained `targets/` directory the COMPARE phase will measure against. + +The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. The selection of replication targets is sometimes obvious (paper has 3 primary figures) and sometimes wants user input (which sub-analyses are in scope). + +## Inputs + +- `work/notes/methodology.md` — has the results inventory split into primary / secondary +- `work/reference/metadata.json` — index of figures and tables with captions +- `work/reference/figures/`, `work/reference/tables/` — the actual extracted artifacts + +## Outputs + +- `targets/targets.md` — the target ledger +- `targets/` — copies of selected reference files (figures, tables) so `targets/` is self-contained + +## Step 1: Read the results inventory + +Read `work/notes/methodology.md`. The results inventory section already separates primary from secondary results and notes which decisions feed into each. **Use this as your starting point** — do not re-analyze the paper from scratch. + +## Step 2: Select replication targets + +For each result in the inventory, find the corresponding figure, table, or in-text metric in `work/reference/`. Apply the constitution's scope: + +- **Primary results should almost always be included.** The constitution's Desired State names them. +- **Secondary results** should be included only if they are useful checkpoints along the pipeline (i.e., if getting them right helps verify intermediate steps). +- **Targeted reproduction** (per the constitution): include only the targets in scope. Mark out-of-scope primary results in `targets.md` with a reason. + +## Step 3: Populate `targets/` + +The `targets/` directory is the self-contained reference set the COMPARE phase consumes. + +1. **Copy relevant reference files** from `work/reference/figures/` and `work/reference/tables/` into `targets/`. Only copy the files corresponding to selected targets — not everything. + +2. **Write `targets/targets.md`.** For each target, a brief entry: + + - What it is and where its reference file lives in `targets/` + - Expected values / trends and how to judge if a reproduction matches + - Which decisions from the decision map feed into this result + - Whether reference code covers this computation (from `code-analysis.md` if present) + - Priority: `primary` or `secondary` + + Keep entries brief — a few lines per target, not paragraphs. + +## Rules + +- All paths in `targets/targets.md` are relative to `targets/`. +- For figures: describe scientific content, not just "a plot" — name the panels, the axis ranges, the qualitative shape. +- For tables: note which specific values matter most. +- For metrics: quote the exact value from the paper text (with the section / equation / sentence reference). + +## Survey signals (entry into EXTRACT_TARGETS) + +- `work/notes/methodology.md` exists ⇒ ready to extract targets +- `targets/targets.md` exists and reference files have been copied ⇒ EXTRACT_TARGETS done + +## Notes + +- **Targets are coverage obligations, not the spec.** SPECIFY maps each target to its appropriate ASTRA home — outputs for artifacts, findings for claims, inputs / decisions / universe defaults for constants. EXTRACT_TARGETS' job is the ledger; SPECIFY's job is the structural placement. +- **Out-of-scope targets stay in `targets.md`** with an explicit reason, not silently dropped. The constitution's scope is the source of truth for what's in. diff --git a/claude/lightcone/skills/paper2astra/references/implement.md b/claude/lightcone/skills/paper2astra/references/implement.md new file mode 100644 index 00000000..153c6c9d --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/implement.md @@ -0,0 +1,57 @@ +# IMPLEMENT — write scripts and recipes + +Read `astra.yaml` (the spec) and `implementation-notes.md` (practical guidance). Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml` so the asset graph is wired end to end. + +The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. Most implementation is mechanical (translate spec → script), but algorithm choices on tricky steps may want ratification. + +## Inputs + +- `astra.yaml` — the structural spec +- `implementation-notes.md` — tricky algorithms, numerical gotchas, data-format quirks +- `work/notes/methodology.md` — for context when the spec compresses +- `work/reference/code/` (if present) — reference code; **read for ambiguity resolution, do not copy verbatim** + +## Outputs + +- `scripts/.py` (or `.sh`, or whatever fits) — one script per output (or shared scripts for tightly-coupled outputs) +- `requirements.txt` — Python dependencies +- Recipes in `astra.yaml` — each output gets a `recipe:` block with `command:` and `inputs:` + +## Task + +Read `astra.yaml` and `implementation-notes.md`. Write scripts in `scripts/` that produce each output, then add recipes to `astra.yaml`. + +If `work/reference/code/` exists, **use it as a reference to resolve ambiguities** — but write clean scripts following ASTRA conventions, not verbatim copies of the reference code. + +## Data: REAL DATA ONLY + +**NEVER generate synthetic, mock, or fake data.** Every input dataset must be downloaded or queried from its real source (archive URL, database query, API, etc.). The methodology notes and `astra.yaml` inputs describe where each dataset comes from — write scripts that fetch the actual data. + +The only exception is if the paper itself uses synthetic / simulated data as its input (e.g., N-body simulations, Monte Carlo samples). In that case, reproduce the paper's data generation procedure exactly as described — but this is reproducing the paper's methodology, not substituting real data with fakes. + +If a dataset is behind a paywall, requires registration, or is "available upon request," write the download script with a clear error message explaining what the user needs to do manually. **Do NOT substitute synthetic data as a workaround.** + +## Rules + +1. **One script per output** (or a shared script for tightly-coupled outputs). +2. **Parameterize by decisions.** Each decision is a CLI argument; scripts also receive `--universe `. See lightcone-cli's `CLAUDE.md` for the full convention. +3. **Add recipes** to each output in `astra.yaml` with `command:` and `inputs:` (dependencies). Recipe inputs use the same `.` form the narrative skill's data-flow rules require. +4. **Create `requirements.txt`** with needed packages. Do not install them — the RUN phase manages environments. +5. **Do not execute scripts** — the RUN phase handles execution via `prism run` (now `lc run`). +6. **Validate** with `astra validate astra.yaml` after adding recipes. + +## Retry attempts + +If `comparison-report.yaml` exists from a prior COMPARE that returned `partial` or `fail`, the IMPLEMENT iteration is a **retry attempt**. Read `comparison-report.yaml` to understand what went wrong; focus on the outputs marked as non-matching. The constitution carries the attempt budget (default 5); the iteration's first move is to check whether `attempt` in the report has reached the budget. If it has, surface to the user via `AskUserQuestion` ("verdict still failing after N attempts — continue, change scope, or accept partial?") rather than burning more cycles. + +## Survey signals (entry into IMPLEMENT) + +- `astra.yaml` validates and `implementation-notes.md` exists ⇒ ready to implement +- `scripts/` has one entry per output id; `requirements.txt` exists; recipes appear in `astra.yaml` ⇒ first-pass IMPLEMENT done +- `comparison-report.yaml` returns `pass` ⇒ IMPLEMENT loop terminated; proceed to SUMMARIZE_RUN + +## Notes + +- **`lc run` is the canonical execution surface.** Scripts assume they will be invoked via the lightcone-cli runner. Do not hard-code working directories or assume environment activation. +- **Determinism where possible.** Set random seeds, fix library versions, prefer reproducible installations. The IMPLEMENT goal is not just "produces output once" but "reproducibly produces output across runs." +- **Tight coupling earns shared scripts.** When two outputs come from the same expensive computation (e.g. an MCMC produces both a parameter chain and a summary statistic), one script with multiple output paths is cleaner than two scripts that each re-do the work. diff --git a/claude/lightcone/skills/paper2astra/references/interview.md b/claude/lightcone/skills/paper2astra/references/interview.md new file mode 100644 index 00000000..8a7ca8f0 --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/interview.md @@ -0,0 +1,160 @@ +# Interview — drafting the per-paper reproduction constitution + +The interview is the only phase paper2astra runs interactively. It happens once per project, up front, before any ralph loop is launched. Its job is to crystallize what the user actually wants — which paper, what scope, which seams want their attention, which they want delegated — and bake that into a constitution the ralph loop can drive. + +Use the [`/constitution`](../../constitution/SKILL.md) skill to draft. The interview's job is to *gather* the inputs the constitution needs; the constitution skill carries the discipline of writing it. + +--- + +## What the interview produces + +A single markdown file at the project root — by convention `paper2astra-constitution.md` (or whatever name the user prefers). Its YAML frontmatter has `status: open`. Its body has the standard constitution sections: Desired State, Context, Skills, Evidence, Open Questions — populated for *this specific paper*. + +After the interview, paper2astra hands this file to ralph: + +```bash +../ralph-loops/scripts/ralph paper2astra-constitution.md +``` + +The constitution is the durable artifact; the interview's work product *is* the constitution. There is no separate "interview state" file. + +--- + +## The four jobs + +### 1. Identify the paper + +Use `AskUserQuestion` if the user did not supply enough on `/paper2astra` invocation: + +- **DOI or arXiv ID.** arXiv ID preferred when available — it unlocks the LaTeX-source acquisition path (see ACQUIRE). +- **Code repo URL** if the user knows it. (If not, ACQUIRE will search.) +- **User's prior familiarity.** Has the user reproduced this paper before? Read the paper recently? Worked with the original authors? This affects how much of the SUMMARIZE / EXTRACT_TARGETS work needs human ratification. +- **Notes file.** If the user has any prior notes (their own writeup, a sketch of which figures matter), capture the path; SUMMARIZE will read it. + +### 2. Scope the reproduction + +A paper has many figures, tables, and numbers. The user usually does not want all of them. + +Ask: + +- **Full reproduction or targeted?** Full = every primary result the paper reports. Targeted = "I only care about figures 3, 4, 7 and the headline number in Table 2." Targeted is cheaper and produces a tighter astra.yaml. +- **Specific decisions of interest.** A paper makes many choices. The user may care most about a few — e.g. "I want the BAO fit to use a different damping prior than the paper." These become first-class decisions in the spec, with the alternative preserved as a sibling option. +- **Sub-analysis structure.** Does the paper have genuinely independent stages (e.g. reconstruction → clustering → BAO fit)? If so, the spec wants sub-analyses; SPECIFY will mirror the structure. If the paper is monolithic, one analysis suffices. + +These answers live in the constitution's **Desired State** section. + +### 3. Choose interactive vs sub-agent per phase + +Read the "Per-phase mode" table in `../SKILL.md`. The defaults are reasonable. Walk the user through it briefly: + +- **Phases that are always interactive (defaults you should not flip):** SPECIFY, COMPARE. These are the ratification seams; the user has to be reachable. +- **Phases that are always sub-agent (defaults you should not flip):** SUMMARIZE, LITERATURE. These benefit from parallel fresh-context runs. +- **Phases the user chooses:** ACQUIRE, PARSE, EXTRACT_TARGETS, REVIEW, IMPLEMENT, RUN. These default to sub-agent (mostly mechanical) but may want user attention if the paper is unfamiliar or the user has strong opinions about implementation. + +If the user has no opinion, take the defaults. The choice goes into the constitution's **Context** section as a per-phase mode table. + +### 4. Draft the constitution + +Invoke `/constitution`. Pass in: + +- The paper identity (DOI, arXiv ID, code URL) +- The scope (full vs targeted, sub-analysis structure if known) +- The per-phase mode table +- Any prior context the user has shared + +The constitution skill carries the discipline of section voice (pointers, not snapshots; constitution, not plan; constraints with reasons). The constitution it produces will look approximately like: + +```markdown +--- +status: open +--- + +# Reproduce () + +## Desired State + +A complete `astra.yaml` for at this workdir, with recipes that +produce reproduced versions of , validated by +`astra validate astra.yaml --verify-evidence`, with `comparison-report.yaml` +verdict `pass` against the targets in `targets/targets.md`. + +Non-goals: . + +## Context + +- Paper DOI: +- arXiv ID: ; LaTeX source acquisition path is the primary +- Code repo: (or "to be searched in ACQUIRE") +- Workdir layout: standard Paper2ASTRA conventions — + `work/reference/`, `work/notes/`, `targets/`, `astra.yaml`, + `universes/`, `results/` +- Per-phase mode: + | Phase | Mode | + |---|---| + | ACQUIRE | sub-agent | + | PARSE | sub-agent | + | SUMMARIZE | sub-agent | + | EXTRACT_TARGETS | | + | LITERATURE | sub-agent | + | SPECIFY | interactive | + | REVIEW | | + | IMPLEMENT | | + | RUN | | + | COMPARE | interactive | + | SUMMARIZE_RUN | sub-agent | + +## Skills + +- `/paper2astra` — this skill (the orchestrator) +- `/managing-bibliography` — ACQUIRE +- `/narrative` — SPECIFY +- `/check-sentence-by-sentence`, `/figure-comparison` — COMPARE + +## Evidence + +- `ls work/reference/document.md` — ACQUIRE + PARSE done +- `ls work/notes/methodology.md` — SUMMARIZE done +- `ls targets/targets.md` — EXTRACT_TARGETS done +- `ls astra.yaml && astra validate astra.yaml` — SPECIFY done and valid +- `astra validate astra.yaml --verify-evidence` — evidence quotes match source PDFs +- `ls comparison-report.yaml && yq '.verdict' comparison-report.yaml` — most-recent COMPARE verdict +- `git log --oneline` — chronological view of phase commits + +The COMPARE → IMPLEMENT loop iterates until verdict is `pass` or +attempt budget (default 5) is exhausted. + +## Open Questions + +(empty — populated as the loop runs and surfaces material conflicts +the user must ratify) +``` + +Show the draft, take corrections, refine. When the user is happy: + +- Save the constitution at the project root +- Tell the user how to launch the loop: `../ralph-loops/scripts/ralph paper2astra-constitution.md` +- Optionally launch it for them if they say yes + +The interview ends here. Subsequent work happens inside ralph iterations. + +--- + +## Discipline + +- **The interview is short.** Do not turn it into a full paper-summarization session. The user does not need to teach you the paper — they need to tell you what they want reproduced. Three to five `AskUserQuestion` rounds, total. If the user is grinding through detail, gently steer back to scope. +- **The constitution is the work product.** Do not file separate "interview notes" or "scope document" files. Everything goes into the constitution. +- **The defaults are the path.** When the user says "I don't know, you choose," take the defaults from the per-phase mode table. The defaults reflect what the loops have learned about which seams matter. +- **One paper at a time.** A single constitution covers one paper. If the user wants two, run the interview twice — two constitutions, two ralph loops, two project workdirs. + +--- + +## When the interview gets stuck + +Most failure modes resolve into "the user has not yet decided what 'reproduce' means for them." If the conversation is circling, ask one of these directly: + +- *"If we ran this and it produced figure 3 plus the headline number in Table 2, would you be done?"* — pins targeted vs full. +- *"Is there a specific decision in the paper you want to vary, or are we trying to match the paper exactly?"* — pins whether universes need to span alternatives. +- *"Do you want to look at every paper-vs-code conflict, or just the ones I think are material?"* — pins SPECIFY mode. + +When all three answer cleanly, the constitution writes itself. diff --git a/claude/lightcone/skills/paper2astra/references/literature.md b/claude/lightcone/skills/paper2astra/references/literature.md new file mode 100644 index 00000000..07c77814 --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/literature.md @@ -0,0 +1,163 @@ +# LITERATURE — extract prior insights from cited papers + +For each cited paper that informed a methodological decision, extract evidence-quote-backed insights and link them to the relevant decisions and options. Synthesize across papers into `work/notes/literature.yaml`, which SPECIFY consumes when authoring `astra.yaml`'s `prior_insights` block. + +The constitution's per-phase mode is **always sub-agent** for this phase. Spawn one Task-tool sub-agent per cited paper for parallel extraction; spawn a final sub-agent for synthesis. This is pure parallel grunt-work. + +## Inputs + +- `work/notes/cited_papers.yaml` — the list of papers to mine, from SUMMARIZE +- `work/notes/methodology.md` — has the decision map; each per-paper sub-agent gets it as context +- `work/reference/document.md` — the target paper (for reference) + +## Outputs + +- `work/notes/literature/.yaml` — one file per cited paper (per-paper extraction) +- `work/notes/literature.yaml` — synthesized merged view (final output) + +## Per-paper extraction sub-agent — system prompt + +> You are an ASTRA insight extraction agent with self-validation capability. Your task is to extract scientific insights from a single cited paper that bear on specific methodological decisions already identified in the target paper. +> +> ### Instructions +> +> 1. Read the PDF at the path provided below using the Read tool. +> 2. Review the decision map provided below — these are the specific decisions you are looking for evidence about. +> 3. Scan the cited paper for findings that support, contradict, or compare the options listed in those decisions. Focus on: +> - Empirical comparisons between approaches listed as decision options +> - Performance benchmarks or validation results relevant to the choices +> - Recommendations or caveats about specific methods/parameters +> 4. For each relevant finding, extract: +> - A clear claim (1–2 sentences stating what we learned) +> - An exact quote from the paper (verbatim, 1–3 sentences) +> - The page number where the quote appears +> - Prefix and suffix context — REAL surrounding text from the page (~20–100 chars each), used to disambiguate the quote among similar passages. This follows the W3C TextQuoteSelector convention: prefix and suffix are literal substrings of the source page, NOT editorial parentheticals. Wording like "(Section 3.1 of Foo+19)" or "(see Figure 4)" will fail verification because the validator concatenates `prefix + quote + suffix` and matches against actual page text. +> 5. Cache the paper so spec-level verification can find it (see below). +> 6. Write the extracted insights as YAML to the specified output file. +> +> ### Caching the source PDF +> +> Before extraction completes, register each paper with the validator's PDF cache so downstream evidence verification can find it: +> +> ```bash +> astra paper add "" +> ``` +> +> For arXiv DOIs (`10.48550/arXiv.`) this fetches directly. Journal DOIs that 403 on Unpaywall can be aliased to a locally-downloaded arXiv preprint: +> +> ```bash +> astra paper add "" --pdf +> ``` +> +> ### Quote fidelity rules +> +> Quotes are NOT verified during this per-paper extraction phase — verification is spec-level (`astra validate astra.yaml --verify-evidence`) and runs once SPECIFY has authored `astra.yaml` referencing each paper. Your job here is to extract quotes that will pass that verification cleanly. The checks are: +> +> - Each `exact` quote must be present on the cited page, fuzzy-matched at RapidFuzz `partial_ratio` ≥ 70. Copy verbatim from the PDF; do not paraphrase, normalize whitespace, or strip mathematical typesetting. +> - The validator concatenates `prefix + quote + suffix` and matches that against the page text at a context score ≥ 80. Choose prefix/suffix as REAL surrounding page text (W3C TextQuoteSelector convention), not editorial commentary. Wording like "(Section 3.1 of Foo+19)" or "(see Figure 4)" silently lowers the context score below threshold even when the quote itself is in the PDF. +> - Avoid YAML `|` block-literal style for `exact`, `prefix`, and `suffix` values: embedded newlines from block-literal folding can mishandle the context-score concatenation. Single-line strings or `>` folded-block style are safer. +> - Math-formula quotes (with superscripts, subscripts, inline footnote markers) are likely to fail because the PDF text extractor collapses these. Quote the surrounding English narrative instead, or skip that piece of evidence if a sibling quote already establishes the finding. +> +> The verification cache is keyed by `(doi, version, sha256(quote_text))` plus `pdf_sha256`, so any edit to a quote in the eventual YAML automatically invalidates that entry — there is no need to delete the cache between runs. +> +> ### Quote granularity and finding attribution +> +> - **Quotes carry the claim on their own.** A four-word fragment ("two widely used fitting codes", "the actual quantity being fit") satisfies fuzzy-match but fails the reader: lift the quote out of context and the claim it supports must still stand. The validator is happy with any string that fuzzy-matches; a downstream agent or human reader following the evidence pointer needs to learn what the paper actually said. Default to full sentences with TeX-anchored prefix/suffix; split a long passage into two evidence rows rather than truncate a quote into a fragment that depends on context. Fragments creep in at exactly the spots where inline math forces shrinking, which is also where claims hide. +> - **Cross-section methodology gets separate insights.** When a paper's relevant methodology is split across multiple sections — a methods chapter defining a tool, a results chapter setting a threshold, an application chapter running it — file one insight per piece, each citing the section where that piece is *defined*. Do not collapse all the borrowed pieces into the application section's number. The application section gets all the credit and the methodology section disappears, which is a real fidelity-sweep failure mode. +> +> ### Output format +> +> Write ONLY this YAML structure to the output file. No other text. +> +> ```yaml +> insights: +> : +> id: +> claim: "" +> created_at: "" +> evidence: +> - id: ev1 +> doi: "" +> quote: +> type: TextQuoteSelector +> exact: "" +> prefix: "<~20-100 chars of REAL surrounding text BEFORE the quote>" +> suffix: "<~20-100 chars of REAL surrounding text AFTER the quote>" +> location: +> type: FragmentSelector +> page: +> scope: "" +> +> decision_links: +> : +> : +> - +> ``` +> +> ### Rules +> +> - Use `lowercase_with_underscores` for insight IDs. +> - Quotes must be EXACT — copy verbatim from the PDF, no paraphrasing or whitespace normalization. +> - Prefix and suffix must be real surrounding page text, not editorial parentheticals. +> - One claim per insight — do not combine multiple findings. +> - Only extract insights relevant to the target decisions listed below. +> - If no relevant insights found, write `insights: {}` and `decision_links: {}`. +> - prefix and suffix are REQUIRED for every TextQuoteSelector. + +## Synthesis sub-agent — system prompt + +> You are a literature synthesis agent. Read all per-paper extraction YAML files in `work/notes/literature/` and merge them into a single `work/notes/literature.yaml` that consolidates insights from all cited papers. +> +> ### Task +> +> 1. Read all per-paper YAML files in `work/notes/literature/`. +> 2. Merge insights, de-duplicating where multiple papers support the same claim. +> 3. Merge decision links across all papers. +> 4. Write the consolidated output to `work/notes/literature.yaml`. +> +> ### Output format +> +> ```yaml +> prior_insights: +> : +> id: +> claim: "" +> evidence: +> - id: e1 +> doi: "" +> quote: +> type: TextQuoteSelector +> exact: "" +> prefix: "<~20-100 chars before>" +> suffix: "<~20-100 chars after>" +> location: +> type: FragmentSelector +> page: +> scope: "" +> +> decision_links: +> : +> : [insight_id1, insight_id2] +> ``` +> +> ### Rules +> +> - Preserve all verified evidence exactly as-is (do not rewrite quotes). +> - When two papers support the same claim, merge their evidence lists under a single insight entry. +> - When papers support different but related claims, keep them as separate insights. +> - `decision_links` should map decision IDs to option IDs to lists of insight IDs. Merge across all papers so each decision collects all relevant insights. +> - Use consistent insight IDs (`lowercase_with_underscores`). +> - Drop any insights that had zero verified quotes. +> - If no papers produced insights, write `prior_insights: {}` and `decision_links: {}`. + +## Survey signals (entry into LITERATURE) + +- `work/notes/cited_papers.yaml` exists ⇒ ready to extract +- `work/notes/literature/` directory has one YAML per paper in `cited_papers.yaml` ⇒ extraction done +- `work/notes/literature.yaml` exists ⇒ synthesis done; LITERATURE complete + +## Notes + +- **Run per-paper extractions in parallel.** One sub-agent per entry in `cited_papers.yaml`. They are fully independent. +- **Synthesis is a single sub-agent.** It reads everything in `work/notes/literature/` and writes one merged `literature.yaml`. +- **Resume is automatic.** If `work/notes/literature/.yaml` already exists, skip the per-paper extraction for that paper. The synthesis re-runs whenever new per-paper files appear. diff --git a/claude/lightcone/skills/paper2astra/references/parse.md b/claude/lightcone/skills/paper2astra/references/parse.md new file mode 100644 index 00000000..b14c2d78 --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/parse.md @@ -0,0 +1,79 @@ +# PARSE — structure the paper + +Turn the acquired paper into structured artifacts the rest of the pipeline can consume: markdown text, individual figures, individual tables, and a metadata index. This is mostly a deterministic pre-processing step. + +The constitution's per-phase mode controls interactive vs sub-agent. Default is sub-agent. + +## Inputs + +- `work/reference/source/` — arXiv LaTeX source tree (Path A from ACQUIRE), or +- `work/reference/paper.pdf` — PDF (Path B fallback) + +## Outputs + +- `work/reference/document.md` — paper as markdown +- `work/reference/figures/` — extracted figures (PNG / PDF / vector) +- `work/reference/tables/` — extracted tables (CSV when machine-readable, MD otherwise) +- `work/reference/metadata.json` — index of figures and tables with captions and page numbers + +## Path A — arXiv LaTeX source (when `work/reference/source/` exists) + +The LaTeX source is already structured — sections are `\section{}`, equations are TeX, figures cite their files by name, tables are `tabular` environments. Convert to markdown while preserving equation TeX: + +```bash +# Find the main file (usually has \documentclass at the top) +grep -l '\\documentclass' work/reference/source/*.tex + +# Convert with pandoc, preserving math and structure +pandoc -f latex -t markdown -o work/reference/document.md work/reference/source/
.tex +``` + +Adjust pandoc invocation if the main file uses `\input{}` heavily — pandoc resolves them when run from the right cwd. Verify the output by reading the first ~200 lines and checking the section structure looks sensible. + +Extract figure files from the source tree into `work/reference/figures/`: + +```bash +mkdir -p work/reference/figures +# Copy referenced figure files; common extensions are .pdf .png .eps .jpg +find work/reference/source -type f \( -name "*.pdf" -o -name "*.png" -o -name "*.eps" -o -name "*.jpg" \) \ + -not -path "*/aux/*" -exec cp {} work/reference/figures/ \; +``` + +For tables, the LaTeX `tabular` blocks remain as TeX inside the rendered markdown. If a downstream phase needs them as CSV, extract them on demand. + +Build `work/reference/metadata.json` — index of figures and tables. The structure: + +```json +{ + "figures": [ + {"id": "fig1", "caption": "...", "file": "figures/fig1.pdf", "label": "fig:bao"} + ], + "tables": [ + {"id": "tab1", "caption": "...", "file": "tables/tab1.csv", "label": "tab:results"} + ] +} +``` + +The `label` field is the LaTeX `\label{}` so SPECIFY's anchor work and EXTRACT_TARGETS' selection can both reference the same artifact. + +## Path B — PDF fallback (when `work/reference/source/` does not exist) + +Use Docling — the lightcone-cli stack ships its CLI: + +```bash +# Run Docling against the PDF; outputs into work/reference/ +docling --output work/reference work/reference/paper.pdf +``` + +Docling produces `document.md`, `figures/`, `tables/`, and `metadata.json` with the same shape Path A produces. + +If Docling fails, the PDF may be corrupt — re-run ACQUIRE's download step before giving up. + +## Survey signals (entry into PARSE) + +If `work/reference/document.md` exists and `work/reference/metadata.json` exists, PARSE is done — proceed to SUMMARIZE. + +## Notes + +- **Path A is preferred whenever arXiv source was acquired.** PDF + Docling is the fallback for non-arXiv papers, not the default. The bundle's design philosophy is that math, ligatures, and caption fidelity are easier from LaTeX source than from re-extracted PDF text. +- **Equation numbers and section numbers must match the rendered paper.** Whether you use Path A or Path B, downstream phases (SPECIFY's evidence quotes, COMPARE's references) cite "eq. N" or "§N" by the printed number. Verify by spot-checking against the PDF. diff --git a/claude/lightcone/skills/paper2astra/references/review.md b/claude/lightcone/skills/paper2astra/references/review.md new file mode 100644 index 00000000..13363378 --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/review.md @@ -0,0 +1,79 @@ +# REVIEW — pre-implementation sanity check + +Verify that the ASTRA specification is complete, consistent, and ready for the IMPLEMENT phase. REVIEW edits the spec in place when fixes are obvious; it surfaces gaps to the user (or as Open Questions) when judgment is required. + +The constitution's per-phase mode is **user choice** for this phase — defaults to sub-agent. REVIEW is mostly mechanical (cross-reference, validation), so sub-agent suits it; but a paper that hits the SPECIFY conflict-surfacing path heavily may want REVIEW interactive too. + +## Inputs + +- `astra.yaml` — the spec from SPECIFY +- `universes/baseline.yaml` +- `implementation-notes.md` +- `work/notes/methodology.md` +- `targets/targets.md` +- `work/reference/document.md` (Grep into; do not re-read whole) +- `work/notes/literature.yaml` (if present) — for evidence verification + +## Outputs + +- In-place edits to `astra.yaml`, `universes/baseline.yaml`, `implementation-notes.md` as needed +- No new files unless a missing data-acquisition path needs to be flagged with content + +## Checks + +1. **Target coverage.** Every replication target from `targets/targets.md` must appear as an output (or finding, or input/decision/universe default) in `astra.yaml`. Any missing target either gets added or earns an explicit out-of-scope reason in `targets.md`. + +2. **Output definitions.** Each output has a clear `type` and sufficient description. + +3. **Methodology detail.** Cross-check `work/notes/methodology.md` against the spec for gaps: missing hyperparameters, underspecified algorithms, vague data-processing steps. Re-read targeted sections of the paper to fill them in. Use Grep on `work/reference/document.md` rather than re-reading the whole thing. + +4. **Decisions.** Decisions should cover what actually affects reproducibility. Remove cosmetic choices; add anything material that is missing. Ensure `universes/baseline.yaml` stays consistent. + +5. **Data obtainability.** Every data source needs a concrete path (URL, package name, or generation code). Flag anything vague or "available upon request." + +6. **Data acquisition.** Every input in `astra.yaml` must have a concrete acquisition path — a download URL, database query, API call, or package name. Verify that `methodology.md` documents how to obtain each dataset. Flag any dataset that is vague so IMPLEMENT knows what to handle. + +7. **Implementation notes.** Check `implementation-notes.md` for completeness — does it flag the tricky parts? Add anything IMPLEMENT should know. + +8. **Evidence verification.** If `work/notes/literature.yaml` exists, run: + ```bash + astra validate astra.yaml --verify-evidence + ``` + This verifies that all prior-insight quotes match the source PDFs. Flag any misquotes or unsupported claims; these typically arise when a quote was paraphrased or when prefix/suffix carry editorial commentary instead of real surrounding text. + +## Fixes + +Edit files directly. After any change to `astra.yaml`, run: + +```bash +astra validate astra.yaml +``` + +## CRITICAL: No synthetic data + +Unless the paper itself uses synthetic / simulated data as input, the pipeline must use **real data only**. Check that: + +- Every `astra.yaml` input has a real acquisition source (URL, query, etc.) +- `implementation-notes.md` does NOT suggest generating mock / synthetic data +- The methodology notes describe real data sources with concrete download paths + +If any input lacks a concrete acquisition path, add one by searching the paper for URLs, DOIs, or archive references. If the data truly cannot be obtained programmatically, document this clearly in `implementation-notes.md` so IMPLEMENT writes a script that fails with a helpful message rather than silently substituting fake data. + +## Rules + +- Use Grep to search `work/reference/document.md` for specific claims to verify — do not read the entire markdown at once. Work primarily from notes and the spec. +- **Minimize churn** — don't restructure or rename unnecessarily. +- If everything looks good, say so briefly; don't invent problems. +- Do **NOT** add implementation recipes — that is IMPLEMENT's job. + +## Survey signals (entry into REVIEW) + +- `astra.yaml` exists and validates ⇒ ready to review +- `astra validate astra.yaml --verify-evidence` returns clean (when literature.yaml exists) ⇒ evidence side done +- All `targets/targets.md` entries map to spec homes (output / finding / input / decision / universe default) ⇒ coverage side done +- Both ⇒ REVIEW complete; proceed to IMPLEMENT + +## Notes + +- **REVIEW does not write code.** Its outputs are edits to the spec and additions to `implementation-notes.md`, not new scripts. +- **A clean REVIEW reduces IMPLEMENT thrash.** It is worth running even when the spec looks fine after SPECIFY — the cross-check catches "looks fine in isolation, breaks under full coverage" gaps. diff --git a/claude/lightcone/skills/paper2astra/references/run.md b/claude/lightcone/skills/paper2astra/references/run.md new file mode 100644 index 00000000..7f3240ef --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/run.md @@ -0,0 +1,56 @@ +# RUN — execute the recipes + +Materialize every output in `astra.yaml` for the requested universe. RUN is mostly mechanical — `lc run --universe ` does the heavy lifting. The phase exists as a discrete step so failures get diagnosed and re-run before COMPARE. + +The constitution's per-phase mode is **user choice** — defaults to sub-agent. Failures may want diagnosis support; the user chooses based on how much trust they have in IMPLEMENT's first pass. + +## Inputs + +- `astra.yaml` with recipes (from IMPLEMENT) +- `universes/.yaml` — defaults to `baseline` + +## Outputs + +- `results///` for every output declared in `astra.yaml` + +## Task + +Execute all recipes: + +```bash +lc run --universe baseline +``` + +(Use whatever the constitution's `universe` field says; `baseline` is the default.) + +Check status: + +```bash +lc status --universe baseline +``` + +Status states are `ok` (materialized), `pending` (has recipe, not run), `no_recipe` (declared, no recipe — bug). Every output declared in `astra.yaml` must reach `ok`. + +If outputs fail: + +1. **Read the script's error.** `results///.log` (or wherever the runner emits stderr) usually has the message. +2. **Diagnose.** Common failures: missing data dependency (a referenced URL changed; the data archive moved), missing Python package (`requirements.txt` was incomplete), spec / script mismatch (the recipe's `inputs:` does not match what the script reads). +3. **Fix.** Edit the script or `requirements.txt` or the spec, whichever applies. +4. **Re-run.** `lc run --universe baseline` resumes from where things failed; it does not re-execute already-materialized outputs. +5. **Repeat** until all outputs are `ok`. + +## Rules + +- **Always use `lc run`** — do not run scripts directly. The runner manages dependencies, environments, and artifact paths; bypassing it produces inconsistent results. +- **Re-runs are idempotent.** `lc run` skips outputs that are already materialized. To force re-execution, the runner has a flag for that — check `lc run --help`. +- **Failures stay failures until fixed.** Do not "move on" past a failed output by editing it out of `astra.yaml`. Either fix the script or surface the failure as a constitution Open Question and stop. + +## Survey signals (entry into RUN) + +- `astra.yaml` has recipes and validates ⇒ ready to run +- `lc status --universe baseline` returns all `ok` ⇒ RUN done; proceed to COMPARE + +## Notes + +- The runner backend (Docker / local / SLURM) comes from the project's target configuration — `~/.lightcone/config.yaml` and `.lightcone/lightcone.yaml`. RUN does not need to choose; the runner picks based on config. +- For long-running computations, the script's stdout / stderr stream into the result directory's log file. The phase agent should `tail` the log file to monitor progress, not poll `lc status` repeatedly. diff --git a/claude/lightcone/skills/paper2astra/references/specify.md b/claude/lightcone/skills/paper2astra/references/specify.md new file mode 100644 index 00000000..655cd575 --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/specify.md @@ -0,0 +1,105 @@ +# SPECIFY — author the ASTRA spec + +Read the paper and accumulated notes; produce the structured ASTRA spec, the baseline universe, and the implementation notes. SPECIFY is the **first mandatory user-ratification seam** — material paper-vs-code conflicts surface here and require user input. + +The constitution's per-phase mode is **always interactive** for this phase. The user must be reachable. + +## Inputs + +- `work/notes/methodology.md` — decision map, results inventory, data sources +- `work/notes/code-analysis.md` (if present) — code structure, parameter values +- `work/notes/literature.yaml` (if present) — prior insights with evidence quotes and decision links +- `work/reference/document.md` — paper text (Grep into; do not re-read whole) +- `work/reference/figures/`, `work/reference/tables/` — extracted artifacts +- `work/reference/metadata.json` — figure / table index +- `targets/targets.md` — selected replication targets +- `work/notes/notes.md` — user-supplied context (read by every phase if present) + +## Outputs + +1. **`astra.yaml`** — the full ASTRA specification +2. **`universes/baseline.yaml`** — exactly the paper's choices (where paper and code disagree, see "Material conflicts" below) +3. **`implementation-notes.md`** — concise practical guidance for the IMPLEMENT phase: tricky algorithms, numerical gotchas, data-format quirks, things the spec can't capture. Bullets, not essays. + +## Substrate skills to invoke + +- **`/narrative`** — narrative authoring (any of the five `narrative.{summary,inputs,methods,findings,outputs}` keys, plus decision `rationale:` fields) is owned by the narrative skill. Invoke it when authoring the prose. The narrative skill teaches reserved entity names, the tree-path anchor grammar, the conditional-narrative requirement (which keys are required when), the five-key authoring order, paper-reproduction fidelity discipline, and the new downstream-consumer discipline (lightcone-cli#108). Do not duplicate that content. + +Your responsibility in this phase is the **structure**: build a spec whose entities are narrative-ready (human-readable labels, no ID collisions with reserved names, sub-analysis IDs as noun phrases) so `/narrative` can author cleanly downstream. + +## Decisions + +The notes identify many candidate decisions. Include every choice where a different defensible option could plausibly shift a numerical result — algorithmic methods, thresholds, statistical approaches, data selection criteria, calibration choices. + +Read `.claude/guides/decision-guide.md` (in lightcone-cli's plugin bundle) for the full definition of what counts. **Only exclude pure tooling choices** (language, library, file format) and fixed constraints. Use `when`, `incompatible_with`, and `requires` constraints for non-independent decisions. A typical analysis has 8–20 decisions; if you have fewer than 5, revisit `methodology.md` and reconsider what you excluded. + +## Prior insights from literature + +If `work/notes/literature.yaml` exists, incorporate its `prior_insights` into `astra.yaml`. Use the `decision_links` mapping to attach each insight to the relevant decision options, so the multiverse captures evidence-backed alternative choices from the literature. + +## Target coverage + +Targets are coverage obligations, not necessarily outputs. Map each target to the right ASTRA home: + +- **Figures, tables, equations-as-artifacts, generated data products** → `outputs` +- **Paper-level claims and quantitative results** → `findings` with source-anchored evidence +- **Constants and configuration values** → `inputs`, `decisions`, `universes/baseline.yaml` + +Out-of-scope targets stay in `targets/targets.md` with an explicit reason and should not be forced into the spec. Keep the target ledger's "spec home" pointers specific enough that a later reviewer can tell which claim was discharged where. + +--- + +## Material conflicts — the user-ratification seam + +When `methodology.md` or `code-analysis.md` mentions a paper-vs-code disagreement, **classify it before writing**: + +- **Material**: a different choice would plausibly change a numeric result the paper reports. +- **Stylistic / cosmetic / pure-tooling**: not material — record in `implementation-notes.md` and move on. + +For **material** conflicts, the SPECIFY phase pauses and surfaces the conflict to the user via `AskUserQuestion`. Present: + +- The paper's stated method (with quote / section reference) +- The code's actual method (with file / line reference) +- The plausible impact ("changes the BAO peak amplitude by ~5%") +- Three options: paper, code, *something else* (custom, with the user's choice spelled out) + +**Default on user silence is paper.** If the AskUserQuestion times out or the user declines to choose, the universe selects the paper's method. The override (paper-vs-code conflict, what was selected, why) is preserved in `astra.yaml` as: + +- A `decisions:` entry with both options preserved +- The `universes/baseline.yaml` selecting whichever option the user chose +- A finding (or an insight if the conflict matters for replication discipline broadly) that records the conflict with quote / line evidence + +This makes the override surface in any later review of the spec — *"the paper says X, the code does Y, the user chose Z, here's why."* The fidelity-of-prose side of this (voice seams, hedge preservation, evidence-quote verification) is the `/narrative` skill's job. + +--- + +## Sub-analysis structure + +Split into sub-analyses **only if the paper has genuinely independent analysis stages**. Examples: + +- A reconstruction stage that produces a catalog consumed by a clustering stage which produces inputs to a BAO fit — three sub-analyses. +- A monolithic analysis that runs end-to-end with no clean intermediate handoff — one analysis. + +Sub-analysis IDs should be **noun phrases** (not verb phrases): `reconstruction`, `clustering`, `bao_fit`. Avoid reserved names (`inputs`, `outputs`, `decisions`, `findings`, `prior_insights`, `analyses`, `options`, `content`, `narrative`). + +When sub-analyses exist, the root narrative MUST include a top-down end-to-end data-flow paragraph (per the narrative skill's data-flow rules — closes lightcone-cli#108). + +## Other rules + +- **Do NOT add executable implementation code or invented run commands.** Do add concise provenance / recipe descriptions where ASTRA fields support them, especially for paper-derived calculations, figure generation, imported constants, and values that IMPLEMENT will need to regenerate. +- **Equation and section numbers must match the rendered paper / PDF**, not a naïve count of TeX blocks or markdown headings. When citing "eq. N" or "§N", find the equation or heading by content in the rendered paper and use the printed number. +- **When adding finding evidence**, verify the quoted text against the paper source by Grep or PDF search. `astra validate --verify-evidence` currently verifies `prior_insights` evidence; artifact-anchored `findings` evidence still needs a manual quote check. +- **Validate** with `astra validate astra.yaml` and fix until it passes. +- **Work primarily from `work/notes/`** — SUMMARIZE has already distilled the paper. Use `work/reference/document.md` only to look up specific details (Grep for terms, or read targeted sections with offset/limit). Do not read the entire markdown at once. + +## Survey signals (entry into SPECIFY) + +- `work/notes/methodology.md` exists; `targets/targets.md` exists ⇒ ready to specify +- `astra.yaml` exists; `astra validate astra.yaml` returns clean ⇒ structural SPECIFY done +- `implementation-notes.md` exists ⇒ practical-guidance side done +- Both ⇒ SPECIFY complete; proceed to REVIEW + +## Notes + +- **Material conflicts that the user explicitly defers** become `Open Questions` in the constitution. The next iteration sees them and either re-surfaces them or notes their continued deferral. +- **The narrative skill is the prose author, not the structure author.** SPECIFY's job is structural correctness; `/narrative` invocation comes after the structural skeleton exists. diff --git a/claude/lightcone/skills/paper2astra/references/summarize.md b/claude/lightcone/skills/paper2astra/references/summarize.md new file mode 100644 index 00000000..6df42b52 --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/summarize.md @@ -0,0 +1,120 @@ +# SUMMARIZE — extract methodology, decisions, and results inventory + +Read the parsed paper and (in parallel, when present) the reference code, and extract everything the SPECIFY phase will need to author `astra.yaml`. The substance lives in `work/notes/methodology.md`, `work/notes/cited_papers.yaml`, and (when code exists) `work/notes/code-analysis.md`. + +The constitution's per-phase mode is **always sub-agent** for this phase. Spawn one Task-tool sub-agent for the paper analysis and (in parallel) a separate sub-agent for the code analysis if `work/reference/code/` exists. Each sub-agent gets fresh context and writes one file. + +## Inputs + +- `work/reference/document.md` — paper as markdown (from PARSE) +- `work/reference/figures/`, `work/reference/tables/`, `work/reference/metadata.json` +- `work/reference/code/` — code repo, if cloned + +## Outputs + +- `work/notes/methodology.md` — decision map + results inventory + data sources +- `work/notes/cited_papers.yaml` — papers worth following up on for prior insights +- `work/notes/code-analysis.md` — code structure (only when `work/reference/code/` exists) + +--- + +## Paper sub-agent — system prompt + +> You are a research paper analysis agent. Your job is to read a parsed paper and extract everything needed to reproduce the analysis. +> +> ### Approach +> +> Read `work/reference/document.md` **section by section** — do not try to read the entire file at once. Start by scanning the headers to understand the structure, then work through each section in order. +> +> **Write as you go.** After reading each section, immediately update `work/notes/methodology.md` and `work/notes/cited_papers.yaml` with what you learned. Do not wait until the end — build the outputs incrementally. This ensures partial progress is saved and forces you to consolidate your understanding at each step. +> +> Skip acknowledgments and author affiliations. Do read the references section — you will need it to resolve citations to DOIs. +> +> ### What to extract +> +> As you read each section, look for: +> +> - **Data sources** — every external dataset, catalog, survey, or archive the paper uses as input. For each one, record the exact name/version, where to obtain it (URL, database query, package name), and any selection criteria or quality cuts applied. This is critical — the implement phase must download real data, not generate synthetic substitutes. +> - **Decisions** — every choice that shaped the analysis (methods, parameters, data cuts, calibrations, etc.) and *what informed each one* (a cited paper, a physical argument, an empirical finding, internal results from the paper). +> - **Results** — numeric values, figures, tables; which are the paper's core claims vs. supporting/diagnostic outputs. +> - **Key references** — cited papers that actually influenced methodology (not general background). +> +> ### Output format — `work/notes/methodology.md` +> +> #### Decision map (most important) +> +> A complete list of every decision that shaped the analysis, grouped by pipeline stage. For each decision: +> +> - **What** was chosen (the specific value, method, or approach) +> - **Why** — what informed the choice: cite the specific paper, physical argument, or empirical finding. Use the citation as it appears in the text (e.g., "Freedman et al. 2020"). This is critical — decisions without traced justifications are much harder to reproduce. +> - **Alternatives** — what else could have been chosen, if mentioned +> +> #### Results inventory +> +> List the paper's outputs, separated into: +> +> - **Primary results** — the core claims; what you'd check to evaluate whether the work was reproduced. Flag which are most important. +> - **Secondary results** — supporting/diagnostic outputs. +> +> For each result, note which decisions feed into it and the expected values. +> +> #### Data sources (critical) +> +> For **every** external dataset the paper uses, document: +> +> - **Name and version** (e.g., "OGLE-III SMC LPV catalog, Soszynski+2011") +> - **How to obtain it** — exact URL, database query (with SQL if applicable), API endpoint, or package name. Be as specific as possible. +> - **Selection criteria** — any spatial, magnitude, quality, or flag cuts applied to the raw data. +> - **Format** — what columns/fields are used downstream. +> +> This section is essential. The implement phase will use it to write data download scripts. If acquisition details are vague in the paper, flag this explicitly so the review phase can investigate further. +> +> #### Additional context (brief) +> +> - Software and dependencies — languages, libraries, versions mentioned. +> +> ### Output format — `work/notes/cited_papers.yaml` +> +> ```yaml +> papers: +> - doi: "10.xxxx/yyyy" +> citation: "Smith et al. (2020)" +> relevance: "One-line description of why this paper matters for replication" +> ``` +> +> **Include** papers that: informed a methodological decision, provided a method or algorithm the paper builds on, contain calibration data or corrections the paper applies. +> +> **Exclude** papers cited only for general background or final-result comparisons. +> +> Only include papers whose DOI you can find in the references. Aim for 5–15 papers; quality over quantity. +> +> ### Style +> +> Be concise but precise. Use bullet points. Include exact numeric values and parameter choices. Do not pad with background or motivation — only include what is needed to reproduce the analysis. + +## Code sub-agent — system prompt (only when `work/reference/code/` exists) + +> You are a code exploration agent. Explore the repository at `work/reference/code/` and write up a detailed understanding of the codebase to `work/notes/code-analysis.md`. +> +> ### What to produce +> +> 1. **Architecture** — how the codebase is structured, what the main modules / scripts are, and how they relate to each other. +> 2. **Execution flow** — where things are run from, in what order, and where to look for different stages of the analysis. +> 3. **Key variables and parameters** — the main variables defined in the code, configuration values, and any decisions baked into the implementation. +> 4. **Outputs** — what the code produces, where results are written, what format they take. +> +> Be thorough — explore the file tree, read the main scripts, and trace the execution path. Focus on implementation decisions and parameter values that the paper might not mention. +> +> Do NOT modify any code in the repository. + +## Survey signals (entry into SUMMARIZE) + +- `work/reference/document.md` exists ⇒ ready to summarize the paper +- `work/notes/methodology.md` exists ⇒ paper sub-agent already ran +- `work/reference/code/` exists ∧ `work/notes/code-analysis.md` does not ⇒ code sub-agent should run +- Both `methodology.md` and (if code exists) `code-analysis.md` exist ⇒ SUMMARIZE done, proceed to EXTRACT_TARGETS + +## Notes + +- **Run the two sub-agents in parallel** when both apply. The paper agent and the code agent are fully independent; each writes one file. +- The methodology notes are the substrate everything downstream consumes. SPECIFY reads them, REVIEW cross-checks them, IMPLEMENT writes scripts based on them. Their quality determines the rest. diff --git a/claude/lightcone/skills/paper2astra/references/summarize_run.md b/claude/lightcone/skills/paper2astra/references/summarize_run.md new file mode 100644 index 00000000..f72ba4ee --- /dev/null +++ b/claude/lightcone/skills/paper2astra/references/summarize_run.md @@ -0,0 +1,58 @@ +# SUMMARIZE_RUN — final report and constitution outcome + +The reproduction has converged (verdict `pass` or user-accepted `partial`). Write the final summary, update the constitution's outcome, and prepare the workdir for handoff. + +The constitution's per-phase mode is **always sub-agent**. There are no decisions left; this is reportage. + +## Inputs + +- `astra.yaml` — final spec +- `comparison-report.yaml`, `comparison-report.md` — final verdict +- `targets/targets.md` — what was being matched against +- `work/notes/methodology.md` — for context +- The constitution at the project root — its `outcome:` field needs rewriting + +## Outputs + +- `REPRODUCTION-SUMMARY.md` (or whatever name fits the project) — final report; concise. +- Updated `outcome:` on the constitution. +- A final commit on the reproduction branch with a clear message. + +## What the final report covers + +A single markdown file at the project root, ~1–2 pages. Sections: + +1. **What was reproduced** — the paper, the scope, the targets. +2. **Verdict** — pass / partial. If partial, what failed and why we accepted it. +3. **Material decisions** — the paper-vs-code conflicts the SPECIFY phase surfaced, what the user chose, and why. +4. **Outputs** — pointers to the figures / tables / metrics produced. One bullet per primary target, with the path to the reproduced result. +5. **What was learned** — anything the reproduction surfaced that wasn't visible from the paper alone (a parameter the code uses but the paper doesn't mention, a data cut that's stricter than stated, etc.). This is where the reproduction's value to the broader literature gets recorded. +6. **Re-running** — one paragraph: how to re-run from this workdir (`lc run --universe baseline`, the constitution path, the relevant `astra.yaml`). + +Brief, not exhaustive. The depth lives in `astra.yaml` and the workdir's notes; the summary is the door into them. + +## Constitution outcome + +Rewrite the constitution's `outcome:` field to reflect the realized state. A good outcome teaches: + +> Reproduced against the targets in `targets/targets.md` with verdict `pass` (attempt 4). All 7 primary targets match within stated tolerance; 2 of 5 secondary targets show <5% offset attributable to . Material conflicts surfaced and resolved: . Spec at `astra.yaml` (validates with `--verify-evidence`); reproduction summary at `REPRODUCTION-SUMMARY.md`. + +The constitution's `status:` flips to `closed` only when the user accepts. This sub-agent does not flip status — it prepares the outcome and surfaces to the user (via the iteration's exit message) that the constitution is ready for closure. + +## Final commit + +Stage the report, the updated constitution, the final `astra.yaml`, the comparison report, and any housekeeping changes. Commit with a message that names the verdict: + +``` +reproduction: verdict , summary at REPRODUCTION-SUMMARY.md +``` + +## Survey signals (entry into SUMMARIZE_RUN) + +- `comparison-report.yaml` verdict is `pass` (or user has accepted `partial`) ⇒ ready +- `REPRODUCTION-SUMMARY.md` exists; constitution outcome is rewritten ⇒ done + +## Notes + +- **This phase does not flip the constitution's status to closed.** The user does that, after reviewing the summary. The phase's job is to produce the summary cleanly; the human keeps the close authority. +- **Keep the report short.** Long reports get skimmed; short reports get read. Two pages is generous. From 8bd67ffca521d9567a2cb030a1ee5d653918df87 Mon Sep 17 00:00:00 2001 From: Cail Daley Date: Mon, 4 May 2026 03:20:00 +0200 Subject: [PATCH 004/124] CLAUDE.md: surface paper-reproduction skill bundle in repo overview Repository-structure section's skills/ entry only listed lc-* skills; paper-reproduction bundle additions (paper2astra, narrative, constitution, ralph-loops, managing-bibliography, plus the pending check-sentence-by- sentence and figure-comparison) need to be discoverable from a CLAUDE.md walk-up. Points to claude/lightcone/skills/README.md for the full bundle map. Co-Authored-By: Claude Opus 4.7 --- CLAUDE.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/CLAUDE.md b/CLAUDE.md index b0631b4a..63f91d0e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -43,7 +43,10 @@ src/lightcone/ # namespace — NO __init__.py ├── harness.py, sandbox.py, graders.py, build.py, report.py, models.py claude/lightcone/ # Claude plugin source — force-included into the wheel -├── skills/ # lc-new, lc-build, lc-verify, lc-migrate, lc-feedback +├── skills/ # lc-new, lc-build, lc-verify, lc-migrate, lc-feedback; +│ # paper-reproduction bundle: paper2astra, narrative, +│ # constitution, ralph-loops, managing-bibliography +│ # (see skills/README.md for the full bundle map) ├── agents/ # lc-extractor ├── guides/ # astra-reference, lightcone-cli-reference, ui-brand ├── templates/ # Project CLAUDE.md template From 272599be78e344ace435f8549bc773cd0e1f5250 Mon Sep 17 00:00:00 2001 From: Cail Daley Date: Mon, 4 May 2026 03:53:55 +0200 Subject: [PATCH 005/124] skills/ralph-loops: tighten description triggers skill-creator audit (per the paper-reproduction bundle constitution) flagged that the bundle copy of ralph-loops over-fired on bare "ralph" and "iterate" outside of an active loop, and overlapped with /constitution on "spec" / "set up a ralph" triggers. Narrow the trigger list to in-loop auto-activation + explicit launch-time triggers ("launch ralph", "run ralph", "ralph loop on "); route spec-drafting intents to /constitution. The bundle copy now diverges from cailmdaley/skills upstream by one description block. Re-sync as needed. Co-Authored-By: Claude Opus 4.7 --- claude/lightcone/skills/ralph-loops/SKILL.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/claude/lightcone/skills/ralph-loops/SKILL.md b/claude/lightcone/skills/ralph-loops/SKILL.md index 649a813c..7910e49b 100644 --- a/claude/lightcone/skills/ralph-loops/SKILL.md +++ b/claude/lightcone/skills/ralph-loops/SKILL.md @@ -3,8 +3,10 @@ name: ralph-loops description: > Autonomous loop iteration toward a desired state. You are inside a ralph loop — your spec is in the system prompt. Survey, contribute, update state - discoverably, exit. Activated automatically inside ralph loops. - Triggers: "ralph-loops", "ralph", "ralph loop", "iterate", "autonomous loop". + discoverably, exit. Activated automatically inside ralph loops, or when + launching one against an existing spec via scripts/ralph; for drafting + the spec itself, use /constitution. + Triggers: "ralph-loops", "launch ralph", "run ralph", "ralph loop on ". --- # Ralph Loops From 9f19380a71fd0b62086d8e03731a673a863fd708 Mon Sep 17 00:00:00 2001 From: Nolan Koblischke Date: Mon, 4 May 2026 13:57:13 -0400 Subject: [PATCH 006/124] Add paper2astra follow-up skills: check-sentence-by-sentence and figure-comparison --- claude/lightcone/skills/README.md | 10 +- .../check-sentence-by-sentence/SKILL.md | 372 +++++++++++ .../skills/figure-comparison/SKILL.md | 579 ++++++++++++++++++ claude/lightcone/skills/paper2astra/SKILL.md | 24 +- .../skills/paper2astra/references/compare.md | 10 +- .../paper2astra/references/interview.md | 1 - .../paper2astra/references/summarize_run.md | 4 + 7 files changed, 973 insertions(+), 27 deletions(-) create mode 100644 claude/lightcone/skills/check-sentence-by-sentence/SKILL.md create mode 100644 claude/lightcone/skills/figure-comparison/SKILL.md diff --git a/claude/lightcone/skills/README.md b/claude/lightcone/skills/README.md index 439337cd..b0ed3d68 100644 --- a/claude/lightcone/skills/README.md +++ b/claude/lightcone/skills/README.md @@ -23,8 +23,8 @@ A self-contained toolkit for reproducing published papers in ASTRA. The bundle i | [`constitution`](constitution/SKILL.md) | Draft a constitution — a markdown spec for an iteration runner. Invoked by paper2astra during the interview. | Merged from [`cailmdaley/skills/skills/constitution`](https://github.com/cailmdaley/skills/tree/main/skills/constitution) (procedural backbone) + Cail's personal felt references (taste — two diamonds, six stances, funnel ledger, qualitative self-check), with felt-optional framing. | | [`ralph-loops`](ralph-loops/SKILL.md) | Drive an autonomous iteration loop. Includes `scripts/ralph` runner. Launched by paper2astra after the interview. | Direct copy from [`cailmdaley/skills/skills/ralph-loops`](https://github.com/cailmdaley/skills/tree/main/skills/ralph-loops). | | [`managing-bibliography`](managing-bibliography/SKILL.md) | Read arXiv LaTeX source; manage BibTeX via ADS API. Primary acquisition path for paper2astra's ACQUIRE phase. | Direct copy of Cail's personal `~/.claude/skills/managing-bibliography` (newer than the public version). | -| `check-sentence-by-sentence` | Paper-vs-code TeX audit via sub-agents; locates `file:line` or `NOT FOUND`. Invoked by paper2astra during COMPARE. | Nolan Koblischke's, on his Reproductions-branch. **Not yet pushed publicly** — see "Pending bundle additions" below. | -| `figure-comparison` | HTML side-by-side: original figures/tables/numerics vs replicated. Invoked by paper2astra during COMPARE. | Same — Nolan's, pending. | +| [`check-sentence-by-sentence`](check-sentence-by-sentence/SKILL.md) | Complementary paper-vs-code source audit via sub-agents; locates `file:line` or `NOT FOUND`. | Copy of Nolan's. | +| [`figure-comparison`](figure-comparison/SKILL.md) | Generates a HTML side-by-side report: original figures/tables/numerics vs replicated. Useful for manual review. | Copy of Nolan's. | The full reproduction story spans these seven skills. paper2astra's `SKILL.md` names each by role and tells the agent when to invoke them; the siblings stand alone and don't know about paper2astra. @@ -34,9 +34,3 @@ The full reproduction story spans these seven skills. paper2astra's `SKILL.md` n - **Single install path.** `lc init` is the install path for lightcone-cli skills. Adding a separate "also install Cail's public skills via plugin marketplace" step is friction we don't need. - **Copy-with-credit costs nothing.** The copied skills retain attribution to their original authors in the SKILL body; if those skills update upstream, we re-sync. - **Future consolidation is open.** Per Francois's "next week we improve" framing, the long-run shape might be `astra` ships skills in `astra`, `lc` ships skills in `lightcone-cli`, plus a centralized external-skills list. Today: bundle it all. - -### Pending bundle additions - -- **`check-sentence-by-sentence`** and **`figure-comparison`** — Nolan Koblischke's two skills. Per the bundle constitution ([`lightcone/.felt/lightcone/paper2astra-as-skill/skill-bundle`](https://github.com/LightconeResearch/lightcone/blob/main/.felt/lightcone/paper2astra-as-skill/skill-bundle.md)), these are part of the bundle, but at first cut they were not yet pushed to any public branch (only living on Nolan's local working tree on his Reproductions checkout). When Nolan pushes them, copy with attribution into this directory; paper2astra's SKILL.md and COMPARE reference already name them as expected siblings, so the integration is wire-compatible the moment they land. - - Until then, COMPARE falls back to direct image-diff judgment without `/figure-comparison`'s structured per-panel rendering, and SPECIFY's evidence-quote re-verification (when COMPARE flags `partial`) falls back to manual Grep against `work/reference/document.md` without `/check-sentence-by-sentence`'s sub-agent audit. Both fallbacks are workable but lossier than the intended path. diff --git a/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md new file mode 100644 index 00000000..e7a2ca2b --- /dev/null +++ b/claude/lightcone/skills/check-sentence-by-sentence/SKILL.md @@ -0,0 +1,372 @@ +--- +name: check-sentence-by-sentence +description: > + Sentence-by-sentence audit of a paper against an ASTRA project's code. For + every claim about implementation or results in the methodology, results, + discussion, and appendices, locate the corresponding code (file:line) or + mark NOT FOUND. Use when the user says "check reproduction", "verify the + paper line by line", or "sentence-by-sentence audit". Run from the project + folder containing astra.yaml. In paper2astra projects, read paper sources + from work/reference/: prefer arXiv TeX under work/reference/source/, fall + back to Docling/Pandoc markdown at work/reference/document.md. +allowed-tools: Read, Glob, Grep, Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), AskUserQuestion, Agent +argument-hint: "[path to paper source, e.g. work/reference/source/main.tex or work/reference/document.md]" +--- + +# /check-sentence-by-sentence + +Audit a paper against the code in this ASTRA project, sentence by sentence. +Every sentence that asserts an implementation detail or a numerical/empirical +result is located in the code (`file:line`) or marked NOT FOUND. The agent +does NOT run any code -- this is a static reading audit. + +In paper2astra projects, the paper substrate comes from `work/reference/`. +Path A is arXiv source at `work/reference/source/`; Path B is the parsed +markdown fallback at `work/reference/document.md`, produced by Docling or +Pandoc. + +## Setup + +1. **Confirm project root.** Read `astra.yaml` in the current working + directory. If it is missing, ask the user: + + > "I do not see an `astra.yaml` in the current directory. Please point me + > to the ASTRA project folder, or `cd` there and re-invoke." + + Stop until resolved. + +2. **Confirm paper source.** The user may have passed a path as an + argument. Resolve it in this order: + + 1. If the argument is a `.tex` file, use it in `tex` mode. + 2. If the argument is `work/reference/` or another directory, first look + for TeX source under `/source/`, then for `/document.md`. + 3. If no argument was supplied, prefer the paper2astra layout: + - `work/reference/source/
.tex` if TeX source exists. Identify the + main file with `grep -l '\\documentclass' work/reference/source/*.tex`; + if exactly one file matches, use it. If multiple files match, ask the + user which one is the main paper file. After identifying the main + file, expand its local `\input{...}` and `\include{...}` files before + section enumeration; many arXiv papers keep most prose outside the + main TeX wrapper. + - `work/reference/document.md` if there is no TeX source. This is the + Docling/Pandoc fallback and should be audited in `markdown` mode. + 4. Only after those paper2astra paths fail, look for an obvious legacy + `.tex` source in cwd: a top-level `*.tex`, or one inside `paper/`, + `tex/`, or a similarly named subdirectory. If exactly one obvious + candidate is found, use it in `tex` mode. + + If no usable source is found, ask: + + > "Which paper source should I audit? Please give me a `.tex` path or + > `work/reference/document.md`." + + If only `work/reference/paper.pdf` exists, ask the user to run the PARSE + phase first so `work/reference/document.md` exists. Do not audit PDFs + directly. + +## Section enumeration + +This is **your job in the main agent** -- do it carefully so each subagent +gets a precise line range. Do NOT read full section content; only enough to +identify boundaries. + +1. Enumerate sections according to source mode: + - In `tex` mode, first build the ordered audit source list. Start with the + main TeX file, scan it for local `\input{...}` and `\include{...}` paths, + normalize missing `.tex` suffixes, and include those files when they + exist under the same source tree. Recurse one level deeper when an + included file itself includes local TeX files. Ignore package/style + imports (`\usepackage`, `.sty`, `.cls`) and remote/generated files. If + the main file is mostly a wrapper, the leaf included files will carry + most audit units. + - For every file in the TeX audit source list, use `grep -n` for + `^\\section`, `^\\subsection`, and `^\\appendix`. Record each match's + file path, line number, and label. + - In `markdown` mode, use `grep -n` for markdown headings + (`^#`, `^##`, `^###`, etc.) in `work/reference/document.md`. Treat + heading depth the way TeX treats section/subsection. If Docling emitted + unnumbered headings, use their text labels. +2. Get the file's total line count with `wc -l`. +3. Compute each section's line range: **start = the section's own line + number; end = (next section/subsection or same/lower heading-depth start + minus 1 in the same source file), or that source file's last line for the + final section in that file.** For a section that contains subsections, + each subsection's range runs from its own line to (next subsection + start − 1), and the section's pre-subsection prose (if any) becomes its + own audit unit covering (section line + 1) to (first subsection − 1) if + that span is non-trivial. +4. Mark sections appearing after `\appendix` (TeX) or after an `Appendix` / + `Appendices` heading (markdown) as appendices regardless of label. + +Identify the audit-relevant sections: + +- Methodology (often `Methods`, `Analysis`, `Data`, `Sample selection`) +- Results +- Discussion (often `Discussion and Conclusions`) +- Appendices (every section after `\appendix`) + +Skip Abstract, Introduction, Acknowledgements, References, author lists. + +For each retained section, check whether it has subsections. **Spin up one +subagent per leaf (sub)section** -- a section with subsections becomes one +subagent per subsection (plus optionally one for any pre-subsection prose +span); a section without subsections becomes one subagent for the whole +section. Spawn them all in a single message so they run in parallel. + +## Subagent prompt + +Use `Agent(subagent_type="general-purpose", ...)`. Pass each subagent: + +- The absolute path to the paper source file for this section +- The paper source mode: `tex` or `markdown` +- The exact section/subsection label and the line range in the source file + it covers (so it knows where to read) +- The absolute path to the project root (which contains `astra.yaml`) +- The instructions below, verbatim + +``` +You are auditing one (sub)section of a paper against an ASTRA project's +code. Your job is mechanical and exhaustive. + +INPUTS +- Paper source file: +- Source mode: +- Section: , lines - +- Project root: + +PROCEDURE +1. Read the assigned section of the paper. Split it into sentences using + common sense, not naive period-splitting. In `tex` mode, use TeX-aware + splitting; in `markdown` mode, preserve Docling/Pandoc math blocks, + captions, and headings as source text. Treat `e.g.`, `i.e.`, `et al.`, + `Fig.`, `Eq.`, `Sec.`, `Dr.`, decimals (`0.5`), inline math `$...$`, + and citation commands (`\citep{...}`, `\citet{...}`) as part of the + surrounding sentence, not boundaries. Display equations belong to + whichever sentence introduces them. +2. For each sentence, decide using common sense: does it make a concrete + claim about an IMPLEMENTATION DETAIL (a method, parameter, threshold, + formula, data cut, model choice, sample definition, algorithmic step) + or a RESULTS DETAIL (a numerical value, plot, fitted parameter, + statistical outcome)? If neither -- pure motivation, citation prose, + or generic framing -- skip it. +3. Before searching, **read `astra.yaml` once** -- it is a pre-built + paper↔code map maintained by the project. Harvest specifically: + - `narrative.methods` — links paper methodology concepts to decision + IDs (e.g. paper prose "the chosen " → `#decisions.`) + - `narrative.findings` — links paper claims/values to result anchors + - `prior_insights` (if present) — extracted paper quotes already tied + to decisions + - per-decision `evidence` quotes and `description` fields + Treat these as your translation table: paper prose → decision/output + IDs → script files. Do not re-derive what the spec already encodes. + + For everything not covered by the spec, use common sense to translate + concepts. In general: + - A quality cut stated as a ratio or threshold may appear in code + under an inverted form or a different variable name -- map by + meaning, not by symbol. + - A named model or distribution will usually appear as a function + whose name describes its shape or role, not as the paper's prose + phrasing. + - A cited constant from a referenced paper will usually appear as a + module-level constant or as an option value in a decision. + Grep for the underlying concept, not just the paper's wording. +4. For every claim-bearing sentence, search the project code (`scripts/`, + source files, `universes/`, `astra.yaml`, `results/`) for where the + claim is implemented or computed. Use Grep, Glob, and Read. +5. Record one of: + - (quote, path/file.py:LINE, optional <10-word note) + when the sentence's claim is implemented or computed at that location + - (quote, NOT FOUND, optional <10-word note) + when no implementation or matching computation is present + +CONSTRAINTS +- Do NOT run any code. No Bash beyond ls/grep/find/wc for searching. +- Do NOT read the paper outside the assigned line range. +- Quote the sentence verbatim, trimmed to a single sentence. If the + sentence is long, you may include just the claim-bearing clause but + preserve enough text to identify it. +- file:line should point to the most specific line that implements or + states the claim (the function call, parameter assignment, or computed + value -- not just the file). +- Notes must be under 10 words. Use them for nuance like "approximate + match", "different constant", "implemented but commented out", + "value computed at runtime, not statically comparable", "produced as + figure but printed value not stored". +- For numerical results that the paper states as a final number, point + at the line that computes the value and use a note like "value + computed at runtime" -- you cannot verify numerical agreement without + executing code, and that is fine. + +OUTPUT +Return a JSON-ish list, one entry per sentence, in paper order: + +[ + {"quote": "...", "location": "scripts/foo.py:142", "note": "..."}, + {"quote": "...", "location": "NOT FOUND", "note": "..."}, + ... +] + +Return nothing else. +``` + +## Aggregation + +When all subagents return, you receive raw entries from every claim-bearing +sentence each subagent kept. **Do not just concatenate and print them.** +Two filtering passes happen here, in this order: + +### Pass 1 — drop non-computational sentences + +Subagents are deliberately generous about what they keep, so the raw list +contains a long tail of sentences that quote the paper but do not actually +correspond to anything you would expect to find in code. **Drop any entry +whose sentence is:** + +- **Framing / motivation** — sentences whose job is to set up the next + step, e.g. "the first step is...", "to investigate this...", "we want + to look at...", "for this reason..." +- **Citation prose / literature comparison** — sentences that compare to + or quote prior literature, e.g. "agrees with values typical of previous + measurements...", "much like Author+YYYY they show...", "in particular, + Author found ..." +- **Theoretical framing or derivations** — sentences asserting a property + expected from theory rather than implemented in code, and restatements + of textbook identities used only to introduce the next equation +- **Rhetorical / interpretive claims** — qualitative readings of a + figure or trend, e.g. "the trend clearly has an oscillatory + behaviour", "the trend seems to be independent of ", "this + supports that..." +- **Conclusions / justifications / qualitative observations** — + "thus we conclude that...", "we choose not to include this + because...", "by and large the trends are similar" +- **Future work / speculation** — "this could be improved by...", "the + discrepancy could be explained by..." +- **Forward/backward references with no claim** — "we discuss this in + Sec X below", "as described in Sec Y above" +- **NOT FOUND entries that fall in any of the above categories** — most + framing/motivation sentences will land as NOT FOUND because there is + nothing to find. Drop them silently; they are noise, not gaps. + +Keep an entry only if it asserts something a reader would expect to be +implemented or computed: a parameter value, a cut, a formula, an +algorithmic step, a fitted/measured value, a figure that the project +should produce, a sample size after a specific cut. + +When in doubt about a NOT FOUND, ask: "if this sentence is not in the +code, is that a real gap?" If no, drop it. + +### Pass 2 — deduplicate / merge near-duplicates + +Subagents do not see each other, and the same claim is often restated +across sentences within a (sub)section -- e.g. a prose statement of a +cut followed by a sentence asserting "this is the only cut we make", or +two sub-equations of one larger formula that map to the same line. +Collapse these: + +- If two adjacent sentences make the same claim and resolve to the same + `file:line`, keep one entry whose quote is the more specific or + formula-bearing of the two, and append the other in a short + parenthetical only if it adds information. +- If a paper-text claim and an explicit equation/quoted code map to the + same line, prefer the equation/quoted-code form. +- Do not merge across (sub)sections. +- Do not merge if the two sentences resolve to different `file:line` + locations -- they may look similar but are doing different things. + +### Pass 3 — render + +After filtering and deduplication, present the result to the user as +markdown, organized by section -> subsection -> sentence, in paper order: + +``` +# Sentence-by-sentence reproduction audit + +Paper: +Project: + +##
+ +### (omit if no subsections) + +- "" + → ✅ `scripts/foo.py:142` -- + +- "" + → ❌ NOT FOUND -- +- ... +``` + +Use `→ ✅ \`file:line\`` for found entries and `→ ❌ NOT FOUND` for +missing ones. Notes are optional; only include the trailing `-- ` +when the subagent supplied one. + +End with a one-line summary: + +> N sentences audited across M sections. K implemented, J not found. + +### Follow-up suggestion (conditional) + +After the summary, scan the NOT FOUND entries and **cluster them**. A +cluster is a group of NOT FOUND sentences that all relate to the same +missing piece of work (a missing analysis, a missing diagnostic, an +unimplemented model variant) -- usually a few consecutive sentences in +one (sub)section, or sentences that all reference the same concept across +sections. + +**Only emit the follow-up block if there is at least one major +unimplemented cluster** -- a cluster of genuine missing computation +substantial enough to be worth offering to add (rule of thumb: ≥3 +sentences of related missing-computation claims, or a single +heavyweight missing artifact like an entire missing analysis or +figure). If every NOT FOUND is isolated framing, motivation, or +qualitative interpretation -- or if the only clusters are tiny -- stop +after the one-line summary. Do not pad with a follow-up just to have +one. + +When the threshold is met, write a short follow-up block in this shape: + +> Major unimplemented clusters: (1) `` +> (`<§section>`, ~`` sentences), and (2) ` cluster 2>` (`<§section>`, ~`` sentences). The rest of the NOT +> FOUND entries are pure framing/motivation/qualitative interpretation, +> not computational claims. Worth considering as a follow-up if you +> want full coverage — want me to add `` and +> ``? + +Rules for this block: +- Only call out clusters that look like genuine missing computation, not + rhetoric. +- Keep it to 1–3 clusters. Do not enumerate every NOT FOUND entry. +- The closing offer must name **concrete artifacts** the user could add + (a new output ID, a new script filename, a new decision option, a new + figure) -- not vague promises like "fill in the gaps". +- Cite the section reference in the project's own notation (`§2.1`, + `Appendix B`, etc.) and an approximate sentence count. +- One short paragraph; do not pad. + +## Restrictions + +- You MUST NOT run project code, recipes, or `lc run`. This is static. +- You MUST NOT read the paper source wholesale into the main context; + delegate to subagents. +- You MUST NOT modify any project file. Read-only. +- You MUST NOT fabricate `file:line` locations -- if a subagent's location + looks suspicious, ask it to re-verify rather than guessing. +- You MUST spawn one subagent per leaf (sub)section, in parallel. + +## Anti-patterns + +- **Auditing intro/abstract** -- skip narrative-only sections; only + methodology, results, discussion, and appendices. +- **Bundling sentences** -- one entry per sentence. Do not collapse + multiple claims into one row even if they share a citation or location. +- **Vague locations** -- a bare filename (`scripts/foo.py`) is not + enough; a line number is required for found entries. +- **Long notes** -- the 10-word cap is a hard limit; reserve notes for + signal, not commentary. +- **Running code to verify** -- this skill is a reading audit. If a claim + cannot be verified by reading code alone, mark it found at the + computing line and note "value computed at runtime" rather than + executing anything. diff --git a/claude/lightcone/skills/figure-comparison/SKILL.md b/claude/lightcone/skills/figure-comparison/SKILL.md new file mode 100644 index 00000000..dfed8dd4 --- /dev/null +++ b/claude/lightcone/skills/figure-comparison/SKILL.md @@ -0,0 +1,579 @@ +--- +name: figure-comparison +description: > + Build a self-contained HTML report comparing the figures, tables, and + numerical results in paper2astra's `work/reference/` paper substrate + against artifacts produced under `results//`. When + `comparison-report.yaml` or `targets/targets.md` exists, use that scoped + target set first; otherwise fall back to paper-driven inventory from arXiv + TeX or Docling/Pandoc artifacts under `work/reference/`. Images are + base64-embedded; missing matches are flagged. Use when the user says + "compare results", "side-by-side comparison", "build comparison HTML", or + "did we reproduce the paper". Run from the project folder containing + astra.yaml. +allowed-tools: Read, Write, Glob, Grep, Bash(ls:*), Bash(wc:*), Bash(grep:*), Bash(find:*), Bash(file:*), Bash(python3:*), Bash(python:*), Bash(base64:*), AskUserQuestion, Agent +argument-hint: "[path to paper reference dir, e.g. work/reference/]" +--- + +# /figure-comparison + +Generate a single self-contained HTML report (`.lightcone/comparison.html`) +that places paper reference artifacts from `work/reference/` on the left +and the project's reproduced artifacts from `results//` on the +right, with red flags wherever a counterpart is missing. Images are embedded +as base64 so the HTML is portable. The helper script and intermediate +manifest also live under `.lightcone/` so they don't pollute the baseline +results. + +## Setup + +1. **Confirm project root.** Read `astra.yaml` in the cwd. If missing, ask: + + > "I do not see an `astra.yaml` here. Please `cd` to the ASTRA project + > and re-invoke." + + Stop until resolved. + +2. **Confirm results exist.** Default universe is `baseline`, unless + `comparison-report.yaml` names reproduced files under another universe or + the user supplied a universe explicitly. Check `ls results//`. + If the directory is missing or empty, ask: + + > "I cannot find populated results under `results//`. Build the + > universe first (`lc run --universe ` or equivalent), then + > re-invoke." + + Stop. Do NOT attempt to run the pipeline yourself -- this skill is + read-only over the build artifacts. + +3. **Locate the paper reference substrate.** The user may have passed a + path. Resolve it in this order: + + 1. If the argument is a directory containing `metadata.json`, + `document.md`, `figures/`, or `tables/`, use that directory as the + paper reference root. + 2. If the argument is an arXiv source directory containing `.tex` files, + use it as `source_root`, and use its parent `work/reference/` as the + paper reference root when that parent exists. + 3. If no argument was supplied, prefer paper2astra's layout: + - `work/reference/source/` when arXiv TeX source exists. Use the TeX + files there for labels/captions and the parsed artifacts under + `work/reference/{figures,tables,metadata.json}` for renderable + reference files. + - `work/reference/document.md` plus + `work/reference/{figures,tables,metadata.json}` when no TeX source + exists. This is the PDF + Docling fallback from paper2astra. + 4. Only after paper2astra paths fail, look for a legacy unzipped arXiv + dir in cwd: a directory containing both a `*.tex` file and figure + files (`*.pdf`, `*.png`, `*.eps`). Common names: `paper_source/`, + `arxiv_source/`, `*_Original_Paper/`. + + If no usable reference substrate is found, ask: + + > "Where is the paper reference directory? In a paper2astra project this + > should usually be `work/reference/`, containing `document.md`, + > `metadata.json`, and extracted `figures/` / `tables/`." + + If only `work/reference/paper.pdf` exists, ask the user to run the PARSE + phase first so Docling or the TeX parser populates `work/reference/`. + Do not compare directly against a whole PDF. + +## Phase 1 -- Understand the paper's main results + +Read, in this order: + +1. **Scoped comparison artifacts, if present.** + - If `comparison-report.yaml` exists, treat it as the highest-priority + scope because it records what paper2astra actually compared. Use its + `outputs:` entries, including `type`, `priority`, `paper_value`, + `reproduced_value`, `reference_file`, `reproduced_file`, `match`, and + `notes` when present. + - Else if `targets/targets.md` exists, treat it as the scope ledger. Use + only the targets it names, including out-of-scope notes, priorities, + reference paths, expected values/trends, and output/spec-home pointers. + - If neither file exists, use the default paper-driven flow below and + build a best-effort report from `astra.yaml` plus `work/reference/`. + +2. **`astra.yaml`** -- specifically `narrative.summary`, `narrative.outputs`, + `narrative.findings`, `outputs:`, and `findings:` if present. Use it to + map scoped targets to output IDs and to harvest declared findings. Do not + assume ASTRA outputs have a dedicated filename-hint field; result paths + come from the output ID and the result resolver in Phase 2. + +3. **The paper reference substrate**, in this order: + - Read `work/reference/metadata.json` when present. It is the primary + index for paper figures and tables; its paths are relative to + `work/reference/` and usually point into `figures/` or `tables/`. + - If `work/reference/source/` exists, grep its TeX files for + `\includegraphics`, `\label{fig:...}`, `\caption{...}`, and + `\begin{table}` to recover labels/captions that metadata may have + missed. + - If only `work/reference/document.md` exists, use the markdown plus + `metadata.json` as the source of captions, table text, and in-text + numerical claims. This is the Docling/Pandoc fallback; preserve its + line numbers and do not pretend it is TeX. + - Grep the abstract, results, and discussion sections of the TeX or + markdown source for in-text numerical claims that look like primary + results -- typically a quantity with value + uncertainty (e.g. + `$X = a \pm b$ unit`). Prefer values that `astra.yaml`'s `findings:` + already names; do not try to extract every number in the paper. + + Do NOT read the paper wholesale. For long papers (>500 lines), read + only the abstract, results, and discussion sections. + +If the paper is large or has many sections and neither `comparison-report.yaml` +nor `targets/targets.md` exists, **delegate the figure / table / value +enumeration to a single subagent** with +`subagent_type="general-purpose"` -- pass it the paper path, the output +schema below, and ask it to return only the inventory. One subagent is +enough; do not fan out. Multiple subagents would have to re-read the +same file. + +## Phase 2 -- Build the comparison manifest + +Produce a manifest in memory (you'll write it as JSON in Phase 3) with +three sections: `figures`, `tables`, `values`. Each entry pairs a +paper-side artifact with a project-side artifact. + +Build entries in this priority order: + +1. **From `comparison-report.yaml` if present.** One manifest entry per + `outputs.` item. Use `type` to route it to `figures`, + `tables`, or `values`. Use `reference_file` as the paper-side path and + `reproduced_file` as the project-side path when present. Preserve the + report's `paper_value`, `reproduced_value`, `match`, and `notes` in the + manifest so the HTML reflects the completed COMPARE verdict. +2. **Else from `targets/targets.md` if present.** One manifest entry per + in-scope target. Use each target's reference path under `targets/`, its + expected values/trends, and its output/spec-home pointer. If the ledger + marks a target out of scope, omit it from the HTML unless the user asked + for out-of-scope targets too. +3. **Else use the default paper-driven inventory.** Enumerate figures, + tables, and values from `astra.yaml` plus `work/reference/`, and fall back + to filename-stem similarity only when no scoped ledger exists. + +For project-side result paths, resolve every output ID with this order: +- Use an explicit `reproduced_file` from `comparison-report.yaml` or an + explicit reproduced path/glob from `targets/targets.md`, if present and + the file exists. +- Search for flat files at `results//.` with the + first suitable type-specific extension: images (`.png`, `.jpg`, `.jpeg`, + `.pdf`, `.eps`), tables (`.csv`, `.parquet`, `.md`, `.txt`), values + (`.json`, `.yaml`, `.yml`, `.txt`, `.md`). +- If still unmatched and no scoped ledger exists, fall back to filename-stem + similarity within `results//`. +- If no match is found, use `project_path: null` and render a red + `NOT PRODUCED` panel. Do not include unrelated result files; the report is + target-driven when target/report files exist, and paper-driven otherwise. + +For tables: use `work/reference/metadata.json` and `work/reference/tables/` +when present. If TeX source exists, capture the raw LaTeX of the `tabular` +block and any `\caption{...}`. If only `work/reference/document.md` exists, +capture the Docling/Pandoc markdown table or the extracted table artifact +under `work/reference/tables/`. The project side is whatever artifact +carries the same content -- typically a CSV / parquet / markdown file at +`results//.`. If `astra.yaml` declares no matching +output, use `project_path: null`. **If the paper contains no tables at all, +leave the manifest's `tables` list empty; the helper must omit the entire +Tables section from the HTML in that case (no header, no "no tables" +placeholder).** + +For values: each entry is `{name, paper_value, paper_uncertainty?, +project_value?, project_value_source?, paper_quote}`. Pull +`paper_value` from the in-text claim or `astra.yaml`'s +`findings.*.paper_value`. Pull `project_value` from +`astra.yaml`'s `findings.*.replicated_value` if present, otherwise from +a scoped `comparison-report.yaml` entry or a flat result summary file at +`results//.` that you can read statically. +**Never compute or re-derive values yourself.** If no project value can +be located statically, leave it null and flag in the HTML. + +When `comparison-report.yaml` or `targets/targets.md` exists, the values list +is scoped to that file. Otherwise, be exhaustive about values, not selective. +A common failure mode is the values section ending up with only 1--3 entries, +which makes the report feel thin. Aim for **every** numerical claim that the +paper asserts and the project tracks. Concretely, harvest from: +- Every entry under `findings:` in `astra.yaml` -- one manifest entry + per finding, even when several findings share a parent quantity. +- The paper's abstract: every ` ± ` it reports. +- The paper's results and discussion sections: every fitted parameter, + every feature location ("dip near x = X₁", "peak at x = X₂"), every + reported sample size after a specific cut, every bin width or step + used as a result-defining choice, every reported accuracy / score / + metric. +- Any explicit reproduction targets in `astra.yaml`'s `narrative.findings`. + +It is fine to repeat one quantity in multiple manifest entries when the +paper reports it under different conditions (preliminary vs. final, +per-subset, per-bin median, per-method variant). Each condition is its +own row. Feature locations are values too: encode "feature located at +domain coordinate X" as +`{name: "", paper_value: "", paper_unit: +""}`. **Target ≥6 value entries on a typical paper.** If you end +up with fewer than 4, you are filtering too aggressively -- re-read +`astra.yaml`'s `findings:` and the paper's results section. + +## Phase 3 -- Generate the HTML + +Use a small Python helper rather than embedding base64 inline through +your tool calls -- multi-MB image base64 strings would balloon your +context. + +Use the existing `.lightcone/` directory in the project root. Do not create +directories in this skill. All three files this skill writes -- manifest, +helper, and final HTML -- live there. + +1. **Write the manifest** as JSON to + `.lightcone/comparison_manifest.json`. Schema: + + ```json + { + "project_name": "...", + "paper_path": "work/reference/document.md", + "scope_source": "comparison-report.yaml", + "universe": "baseline", + "results_path": "results/baseline", + "figures": [ + { + "paper_label": "fig:main_result", + "paper_caption": "...", + "paper_path": "targets/main_result.pdf", + "project_output_id": "primary_metric_plot", + "project_path": "results/baseline/primary_metric_plot.png" + } + ], + "tables": [ + { + "paper_label": "tab:summary", + "paper_caption": "...", + "paper_latex": "\\begin{tabular}{...}\\end{tabular}", + "project_output_id": "...", + "project_path": "results/baseline/summary_table.csv" + } + ], + "values": [ + { + "name": "primary_metric", + "paper_value": "12.5", + "paper_uncertainty": "0.4", + "paper_unit": "", + "paper_quote": "we find $\\mathrm{metric} = 12.5 \\pm 0.4$ ", + "project_value": "12.47", + "project_uncertainty": "0.41", + "project_value_source": "results/baseline/metric.json" + } + ] + } + ``` + + `figures`, `tables`, and `values` may each be `[]`. Empty lists mean + the helper skips that section entirely. There is no + `unmatched_baseline` field -- baseline files the paper does not + reference are not in scope for this report. + + Use `null` for any missing field. Paths are relative to the project + root. + +2. **Write the helper script** to `.lightcone/build_comparison.py`. + The helper must: + - Read the manifest JSON. + - For each figure entry: emit one `
` per figure, + with the structure described in **"Required HTML structure"** + below -- a single `
` containing a + `
` and one row-level status badge, followed + by a `
` of two `
`s + (paper, project). One badge per row, in flow inside `.row-head`. + **Never emit per-cell absolutely-positioned badges.** + Read `paper_path` and `project_path` as bytes, base64-encode, and + embed each image inside its cell. **PDFs must be converted to PNG + before base64-encoding -- never embed PDFs as PDF data URIs.** Use + `` uniformly for every + figure cell. Conversion order to try, falling back if a tool is + unavailable: + 1. `pdf2image` (Python) -- `convert_from_path(path, dpi=150)[0]` + 2. `pypdfium2` -- render page 1 at 150 DPI to a PIL image + 3. shell out to `pdftoppm -png -r 150 -f 1 -l 1 ` + and read the resulting PNG + 4. shell out to `magick [0] -density 150 ` (ImageMagick) + If none are available, the helper renders a small ⚠️ panel that + says `PDF preview unavailable -- install pdf2image or pdftoppm` + and links to the `.pdf` file path. Do not fall back to embedding + the PDF binary. PNG / JPG inputs skip conversion and are + base64-encoded directly. For any non-image type, embed as a + UTF-8 text block. Missing path → render a red panel saying + `❌ NOT PRODUCED` with the expected output ID. Captions live as + `
` inside each cell, never as a row-spanning element. + - For each table entry: paper side renders the captured LaTeX inside + `
` plus the caption; project side renders the project file
+     (CSV/parquet → first ~20 rows as an HTML table; markdown → render
+     as `
`; missing → red ❌ panel). Same row structure as figures.
+   - For each value entry: emit one `
` + per value -- **same card layout as figures, not a ``.** + The row has a `.row-head` (value name + single status badge), + a `.row-grid` of two `.cell`s (paper | project), and a trailing + `.value-note` with the σ delta. The paper cell shows the value + (with uncertainty and unit) and the `paper_quote` as a + `
`. The project cell shows the value and the + `project_value_source` as a small `` line. Compute a simple + status -- ✅ if both values exist and the project value lies within + ±1 paper-uncertainty of the paper value; ⚠️ if both exist but + disagree by more than that; ❌ if either is missing. If + `paper_uncertainty` is null, fall back to a 5%-tolerance + comparison: ✅ if `|prj − paper| ≤ max(0.05·|paper|, 0.05)`. Do + NOT do anything more sophisticated; you cannot run code. **Do not + render values as a single HTML `
`** -- the report's whole + point is side-by-side cards. + - Emit a single self-contained HTML file with inline CSS in the + **Vellum** aesthetic (see below): the `` carries the + parchment background and grain, and **all content lives inside a + single `
` that is the lighter `--surface` cream + card with soft drop shadows.** This is non-negotiable -- the cream + page card on top of the parchment body is the headline visual. Two + content columns (paper | project) per row, the project name in the + `

`, and a top-of-page summary line counting found / missing + for each non-empty section. **Skip any section whose manifest list + is empty** -- omit its header and content entirely; do not emit a + "no tables found" placeholder. + - Write the HTML to `.lightcone/comparison.html` and print the + absolute path on stdout. + +### Required HTML structure (figures and values) + +The helper MUST produce this exact shape for every figure / value row. +Per-cell absolute badges, value-as-table, and missing `.row-head` are +all forbidden -- they break the layout (overlapping the cell heading, +losing the row-level status, breaking the visual rhythm with figures). + +```html +
+
+
+ fig:main_resultprimary_metric_plot +
+ ✅ matched +
+
+
+
PAPER
+ +
Caption from paper.
+
+
+
PROJECT · results/baseline/...
+ +
output_id
+
+
+ +
Δ = 0.03 <unit> (0.07σ)
+
+``` + +Status states for the row badge: `badge-ok` (matched), `badge-warn` +(partial / off-target / no σ), `badge-miss` (missing on either side). +Exactly one badge per row. + +3. **Run the helper:** `python3 .lightcone/build_comparison.py` + from the project root. If `python3` is missing, try `python`. If + the helper imports anything beyond the standard library (e.g. + `pyarrow` to read parquet, or `pandas` to render tables), have it + gracefully fall back to "preview not available -- file exists at + ``" rather than failing. The helper must work with stdlib + alone for the figure path; the parquet / pandas previews are + nice-to-haves. + +4. After the helper runs, **read back** the HTML's first ~50 lines and + the absolute file size to verify it was produced and isn't trivially + small (>10 KB sanity check). Then report to the user the path and a + one-line summary: + + > Comparison HTML at `.lightcone/comparison.html` -- N figures + > (K matched, J missing), N tables (...), N values (...). + +## Vellum aesthetic + +The helper must style the page in the **Vellum** aesthetic: a +weathered-parchment look that reads like a printed scientific paper, +not a web app. The helper bakes all of this into inline `