Skip to content

R3 Stages D + E: local-variable sweep + ROSETTA refresh (#87)#89

Merged
mbachaud merged 6 commits into
masterfrom
rename-r3-de
May 13, 2026
Merged

R3 Stages D + E: local-variable sweep + ROSETTA refresh (#87)#89
mbachaud merged 6 commits into
masterfrom
rename-r3-de

Conversation

@mbachaud
Copy link
Copy Markdown
Owner

Summary

R3 Stages D and E per the plan in
~/.claude/plans/ethereal-forging-cookie.md and
tracking issue #87. Completes the multi-stage rename effort started in
#88 (Stages A+B+C merged to master at 3b92b02).

Stage D — Local-variable + parameter sweep (in progress)

For each module, refactors local-scope genedoc, gene_a/gene_b
doc_a/doc_b, and genes (list) → docs (list) in function bodies
and signatures where the rename does not touch SQL contracts or
external callers
.

Pydantic field names (gene_id, promoter, epigenetics,
chromatin, codons) and SQL columns are unchanged. Type annotations
update from GeneDocument (identity-preserving alias from R3
Stage A).

File Commit Status
cymatics.py 80bf3d6 ✓ landed
fragments.py
compressor.py
knowledge_store.py
context_manager.py
context_packet.py
Small files (tagger, deberta_backend, seeded_edges, …)

Each commit ships independently — every per-file PR diff is reviewable
in isolation and the alias system means any module can be reverted
without breaking others.

Stage E — ROSETTA.md refresh + R3 design spec stub (pending)

  • Update phase-status table to mark R1, R2, and now R3 Stages A-D as
    shipped, with commit SHAs.
  • Add a one-page R3 design spec at
    docs/superpowers/specs/2026-05-13-rename-r3-symbol-rename-design.md.

Out of scope (still protected)

  • SQL schema (tables / columns)
  • Pydantic field names on the wire
  • Prometheus metric / label names
  • ChromatinState enum value strings
  • MCP tool names
  • agent_prompt.py JSON contract fields

Verification

Per-file: focused regression after each commit. Final sweep: full
pytest tests/ -m "not live" should return the same 1933 / 0 / 15 /
21 / 2 result the PR #88 baseline established.

Related

mbachaud added 6 commits May 13, 2026 09:44
#87)

R3 Stage D.1 — first per-file commit of the local-variable rename
sweep. Refactors gene→doc in cymatics.py module bodies and signatures
where the rename does NOT touch SQL contracts or external surfaces.

Function signatures (param renames):
  doc_spectrum(gene -> doc)           type: Gene -> Document
  cached_doc_spectrum(gene -> doc)    type: Gene -> Document
  interference_trim(genes -> docs)    type: List[Gene] -> List[Document]
  harmonic_weight(gene_a, gene_b -> doc_a, doc_b)  types -> Document
  compute_harmonic_weights(genes -> docs)          type: List[Document]

Loop counters + local variable references inside those functions
updated to match (gene -> doc, ga/gb -> da/db). No callers use the
renamed params via kwarg syntax (verified by grep), so positional
calls keep working.

Added Document to the schemas import alongside Gene (alias retained
for back-compat).

Pydantic field name `gene_id` is unchanged (SQL contract). Attribute
access like `doc.gene_id` and `doc.epigenetics.decay_score` continues
to work — only the local variable name moved.

Strings/comments that describe the new flow updated to use the
canonical vocab (e.g. `"Empty cymatics splice for gene %s"` ->
`"Empty cymatics trim for doc %s"`, return-shape doc comments).

Verification:
  - tests/test_cymatics.py: 45/45 pass in 0.09s (no regressions)
R3 Stage D.2 — local-variable rename sweep in compressor.py.

Return type annotations and local variable names migrated to the
canonical Document vocabulary. SQL contract field names (gene_id,
codons, promoter, epigenetics) and external surfaces unchanged.

compressor.py changes:
  - encode() return type: Gene -> Document
  - persist() return type: Gene -> Document
  - rerank() return + param types: List[Gene] -> List[Document]
  - trim() param types: List[Gene] -> List[Document]  (param name
    "genes" kept conservatively in case downstream callers ever
    used the kwarg name; the type annotation is the canonical one)

Local variables inside encode() and persist():
  - `gene_id = Genome.make_gene_id(...)`  -> `doc_id = ...`
  - `gene = Gene(...)`                    -> `doc = Document(...)`
  - `gene.key_values = ...`               -> `doc.key_values = ...`
  - `return gene`                         -> `return doc`
  - `PromoterTags(...)` calls              -> `DocumentTags(...)`
  - `EpigeneticMarkers()` calls            -> `DocumentSignals()`
  - Local variable `promoter_data`         -> `tags_data`

Import block widened to include Document, DocumentTags,
DocumentSignals; legacy Gene/PromoterTags/EpigeneticMarkers retained
as comment-flagged aliases. Identity preserved at the schema level —
both name pairs refer to the same class object.

LLM prompt strings (_PACK_SYSTEM, _EXPRESS_SYSTEM, _splice_system,
_REPLICATE_SYSTEM) are intentionally untouched — those strings are
contracts with the small local models, and the models were trained
on the biology vocabulary. Renaming risks degrading retrieval
quality on a downstream-LLM-keyed surface.

Verification:
  - tests/test_ribosome + test_deberta_backend + test_genome +
    test_cymatics: 107/107 pass in 10.6s (no regressions)
#87)

R3 Stage D.3 — refactors all `for gene in ...:` loop counters in
the two retrieval-pipeline orchestrators to use canonical `doc`.
No signature changes; pure body-local refactor.

helix_context/context_manager.py — 5 loop sites refactored:
  - L379 _merge_subquery_candidates: dedupe loop over sub-results
  - L1629 TCM session update loop after build_context
  - L2178 pending-buffer scan in _retrieve
  - L2321 cymatics tier scoring loop
  - L2369 harmonic-bin tier scoring loop

Each `for gene in X:` -> `for doc in X:` with body
references migrated (gene.gene_id -> doc.gene_id, gene.promoter ->
doc.promoter, etc.). Pydantic field name `gene_id` unchanged --
attribute access on Document continues to resolve to the SQL
contract field.

helix_context/context_packet.py — 1 loop site refactored:
  - L516 in _build_packet: per-document iteration through evidence
    rows. Variables migrated to `doc` consistently; `score_map`,
    `meta`, and item assembly arguments updated.

Verification:
  - tests/test_genome + test_retrieval_dimensions + test_server +
    test_abstain_tier + test_foveated_splice: 175/175 pass in 2:44
R3 Stage D.4 — refactors `for gene in lex_pool:` and `for gene, sim
in scored:` loops in knowledge_store.py:_apply_dense_rerank to use
`doc` as the loop counter. Local-only change, no signature/contract
movement.

Other `gene_*` references in knowledge_store.py are SQL contract:
  - Table name `gene_attribution`
  - Column name `gene_id`
  - Index names `idx_gene_attribution_*`
These remain unchanged (SQL on-disk contract per the protected list).

The Pydantic field name `gene_id` on Document also stays; attribute
access like `doc.gene_id` continues to work.

Verification:
  - tests/test_genome + test_retrieval_dimensions + test_dense_recall:
    58/58 pass in 0.9s (no regressions)
R3 Stage D.5 — last per-file loop-counter rename in the helix_context
package. shard_router.py:197 `for gene in genes:` -> `for doc in
genes:` with body references migrated. Pure loop-local refactor;
function signature unchanged.

Verification: 133 sharding/genome/retrieval tests pass in 38s.
…spec (#87)

R3 Stage E — closes the multi-stage rename effort with up-to-date
documentation.

docs/ROSETTA.md:
  Phase-status table at the bottom now reflects reality:
    R1 -> shipped @ 09d5548 (2026-04-15)
    R2 -> shipped @ PR #70 87fcb68 (2026-05-12)
    R3 Stage A -> shipped @ 56fcbed (PR #88, 2026-05-13)
    R3 Stage B -> shipped @ 460d824..9e7471f (PR #88)
    R3 Stage C -> shipped @ edc0194..71469ba (PR #88)
    R3 Stage D -> in progress (this PR #89)
    R3 Stage E -> in progress (this commit)
    R4 -> deferred (see #87)
  Pre-existing entries (out-of-scope list, How-to-use, mapping table)
  unchanged.

docs/superpowers/specs/2026-05-13-rename-r3-symbol-rename-design.md
(NEW):
  Durable design-spec record for R3 mirroring R2's structure
  (2026-05-11-rename-r2-prose-sweep-design.md). Captures:
    - Why R3 exists; predecessor specs referenced
    - Decisions baked in (class-def flip direction, module renames,
      no MCP slimdown)
    - Out-of-scope list (SQL, Pydantic fields, Prom metrics,
      ChromatinState enum values, MCP tool names, agent_prompt
      contract, LLM prompt strings)
    - Stage A/B/C/D/E summaries with commit SHAs + table-of-renames
    - Identity contract: 29 alias pairs across 7 layers
    - Verification gates (1933/0 across all 3 full-mock runs)
    - Stage D known-not-fully-completed scope (intentional scope cap)

Closes the R3 audit trail. Future contributors can audit the rename
by reading this spec + the R2 spec + ROSETTA.md without needing to
re-discover the design intent.
@mbachaud
Copy link
Copy Markdown
Owner Author

Stages D + E verification complete ✓

Full mock suite ran after Stage E (86d6ddc):

1933 passed, 15 skipped, 21 deselected, 2 xfailed in 469.63s (0:07:49)

Byte-for-byte match against every prior R3 full-suite run:

Gate Result Wallclock
Post-A full mock 1933 / 0 9:07
Post-B full mock 1933 / 0 8:52
Post-C full mock 1933 / 0 7:40
Post-D+E full mock 1933 / 0 7:49

Four identical full-suite runs across the R3 effort means zero net regressions across the entire rename — same 1933 passes, same 15 skips, same 21 deselected, same 2 expected failures.

What's in this PR

Commit Scope
80bf3d6 D.1 — cymatics.py: param renames (gene→doc, gene_a/gene_b→doc_a/doc_b), type annotations (Gene→Document), loop counters
5933d63 D.2 — compressor.py: encode() + persist() bodies and return types, import block widened to include Document/DocumentTags/DocumentSignals canonical names
87c5142 D.3 — context_manager.py + context_packet.py: 6 for gene in ...: loop counters → for doc in ...:
051885b D.4 — knowledge_store.py: 2 loop counters in _apply_dense_rerank
dba6530 D.5 — shard_router.py: 1 loop counter
86d6ddc E — ROSETTA.md phase table refreshed with commit SHAs for R1, R2, R3.A-E; new design spec at docs/superpowers/specs/2026-05-13-rename-r3-symbol-rename-design.md

Scope note

Stage D shipped the high-visibility subset (~30 renames across 6 files), not an exhaustive ~200+ sweep. The design spec (2026-05-13-rename-r3-symbol-rename-design.md) makes this explicit: the unswept references are aliases-only-cosmetic, so any future incremental cleanup is safe.

Files intentionally not exhaustively swept (the aliases handle them; cleanup deferable):

  • knowledge_store.py tier-scoring internals (~4500-line module)
  • Test files using legacy names in fixtures (already patched in C.3 for monkey-patch sites)
  • agent_prompt.py and docs/agent-sdk-fragment.md — explicit JSON contract surface
  • LLM prompt strings in compressor.py — keyed by small local models

Out of scope (still protected, per R2 spec §2)

  • SQL schema (tables/columns)
  • Pydantic field names
  • Prometheus metric/label names
  • ChromatinState enum value strings
  • MCP tool names (R4 territory)
  • agent_prompt.py JSON contract fields

Ready for review and merge. Closes the R3 effort.

@mbachaud mbachaud merged commit 8368ce4 into master May 13, 2026
3 checks passed
@mbachaud mbachaud deleted the rename-r3-de branch May 13, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant