Structured artifacts from `astra paper add`: figures, tables, and a findings ledger under `work/reference/`

## Background

The paper2astra → lightcone-cli skill migration ([lightcone-cli#86](https://github.com/LightconeResearch/lightcone-cli/pull/86)) preserves all functionality but surfaced an ingestion gap: the original `parse_paper.py` produced structured figure/table metadata that the new skill doesn't reproduce, and there's no input-side mechanism for the COMPARE phase to verify reproduction outputs against the paper's claimed numerical results.

This is **complementary to #81** (substrate convergence). #81 picks the parser; this issue specifies what comes out of it.

## Proposal

`astra paper add <DOI>` produces, under `work/reference/`:

```
paper.pdf           # original
paper.tex           # main source (when arXiv source available)
figures/            # files copied from LaTeX source where available
                    #   metadata.json: caption, label, page, source-path
                    #   Docling render-and-crop only as PDF-only fallback (per #81)
tables/             # \begin{table} blocks extracted by label, with caption
findings.yaml       # the paper's own numerical findings (see below)
code/               # cloned reference repo when available (Zenodo / GitHub link)
```

Every consumer (paper2astra, standalone reproductions, commentary tools) trusts these artifacts blindly. paper2astra stops caring how the paper info was obtained.

## The findings artifact — input-side back-pressure

The most interesting piece. Today, COMPARE phases of reproductions rely on agents eyeballing figures and reading prose to decide whether their numerical results match the paper's. Without a structured surface to diff against, agents drift toward plausible-but-wrong values. (Nolan's `figure-comparison` skill in lightcone-cli#86 helps for figures; `check-sentence-by-sentence` helps for prose. There's no structured numerical surface yet.)

A structured ledger of the paper's published values would be a complementary forcing function. Example shape:

```yaml
findings:
  - claim: "Ω_m = 0.315 ± 0.007"
    evidence:
      - type: quote
        anchor: "abstract"
        text: "We find Ω_m = 0.315 ± 0.007..."
  - claim: "χ² = 1834.2 with 1750 dof"
    evidence:
      - type: quote
        anchor: "table:bao-fits"
        text: "..."
```

**ASTRA semantics:** these are the paper's own `findings:` (what *that* paper claims). When a reproduction's `astra.yaml` references them, it treats them as `prior_insights:` from its perspective. Same data, different roles — no tautology because the data lives in the ingestion artifact, not in the reproduction's spec.

COMPARE then iterates against a structured ledger:

```
for finding in findings.yaml:
    locate matching value in reproduction outputs
    diff and log to comparison-report.md
```

This mirrors the **output-side LaTeX-macro pattern** (`\newcommand{\Omegam}{0.315}` so the rendered paper can't drift from the analysis): an input-side analogue where the paper's claims are extracted into a structured ledger and the reproduction can't claim convergence without matching them.

## Two extraction paths

1. **Author-defined `\newcommand` macros** — regex over the LaTeX source. Free, scriptable. Realistic estimate: most papers don't define these; we'll build our own variable set per-paper.
2. **Inline values** like `$\Omega_m = 0.315 \pm 0.007$` — agent-driven during ACQUIRE. Imperfect but tractable; iterating on the prompt and validation rules will get most of the way.

## Open questions

- **Format for findings.** ASTRA-shape (`{claim, evidence}` entries under `findings:`) gives downstream consistency — `astra paper add` only ever emits valid ASTRA YAML, MySTRA renders paper findings with the same machinery as reproduction findings. Bespoke flat schema (`[{name, value, error, source, quote}]`) is more concise for COMPARE iteration. Agent-side back-pressure is roughly equivalent either way; argument for ASTRA-shape is downstream consistency.
- **Figures from LaTeX vs render-and-crop.** When LaTeX source is available, copying figure files directly is cleaner than rendering pages. Render-and-crop survives only as a fallback for PDF-only inputs (depends on #81).
- **Tables — is metadata enough?** A `\begin{table}` block with caption, by label, is enough for an agent to read. No need for extracted values as separate JSON; the LaTeX source already has the values.
- **Phasing.** Phase 1: figures + tables + author-macros (all scriptable). Phase 2: inline findings extraction (agent-driven, iterative).

## Cross-references

- **[#81](https://github.com/LightconeResearch/astra/issues/81)** — substrate convergence. This issue depends on #81 for the input layer.
- **[lightcone-cli#86](https://github.com/LightconeResearch/lightcone-cli/pull/86)** — the migration that surfaced this gap; includes Nolan's `figure-comparison` and `check-sentence-by-sentence` skills as adjacent forcing functions on different surfaces.
- **[Paper2ASTRA#10](https://github.com/LightconeResearch/Paper2ASTRA/pull/10)** — the design doc that motivated the migration.

## Suggested labels

`area:paper-management`, `enhancement`, `discuss-before-doing`

— Claude on behalf of Cail

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured artifacts from `astra paper add`: figures, tables, and a findings ledger under `work/reference/` #82

Background

Proposal

The findings artifact — input-side back-pressure

Two extraction paths

Open questions

Cross-references

Suggested labels

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Structured artifacts from astra paper add: figures, tables, and a findings ledger under work/reference/ #82

Description

Background

Proposal

The findings artifact — input-side back-pressure

Two extraction paths

Open questions

Cross-references

Suggested labels

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Structured artifacts from `astra paper add`: figures, tables, and a findings ledger under `work/reference/` #82