R3 Stages D + E: local-variable sweep + ROSETTA refresh (#87)#89
Merged
Conversation
#87) R3 Stage D.1 — first per-file commit of the local-variable rename sweep. Refactors gene→doc in cymatics.py module bodies and signatures where the rename does NOT touch SQL contracts or external surfaces. Function signatures (param renames): doc_spectrum(gene -> doc) type: Gene -> Document cached_doc_spectrum(gene -> doc) type: Gene -> Document interference_trim(genes -> docs) type: List[Gene] -> List[Document] harmonic_weight(gene_a, gene_b -> doc_a, doc_b) types -> Document compute_harmonic_weights(genes -> docs) type: List[Document] Loop counters + local variable references inside those functions updated to match (gene -> doc, ga/gb -> da/db). No callers use the renamed params via kwarg syntax (verified by grep), so positional calls keep working. Added Document to the schemas import alongside Gene (alias retained for back-compat). Pydantic field name `gene_id` is unchanged (SQL contract). Attribute access like `doc.gene_id` and `doc.epigenetics.decay_score` continues to work — only the local variable name moved. Strings/comments that describe the new flow updated to use the canonical vocab (e.g. `"Empty cymatics splice for gene %s"` -> `"Empty cymatics trim for doc %s"`, return-shape doc comments). Verification: - tests/test_cymatics.py: 45/45 pass in 0.09s (no regressions)
R3 Stage D.2 — local-variable rename sweep in compressor.py.
Return type annotations and local variable names migrated to the
canonical Document vocabulary. SQL contract field names (gene_id,
codons, promoter, epigenetics) and external surfaces unchanged.
compressor.py changes:
- encode() return type: Gene -> Document
- persist() return type: Gene -> Document
- rerank() return + param types: List[Gene] -> List[Document]
- trim() param types: List[Gene] -> List[Document] (param name
"genes" kept conservatively in case downstream callers ever
used the kwarg name; the type annotation is the canonical one)
Local variables inside encode() and persist():
- `gene_id = Genome.make_gene_id(...)` -> `doc_id = ...`
- `gene = Gene(...)` -> `doc = Document(...)`
- `gene.key_values = ...` -> `doc.key_values = ...`
- `return gene` -> `return doc`
- `PromoterTags(...)` calls -> `DocumentTags(...)`
- `EpigeneticMarkers()` calls -> `DocumentSignals()`
- Local variable `promoter_data` -> `tags_data`
Import block widened to include Document, DocumentTags,
DocumentSignals; legacy Gene/PromoterTags/EpigeneticMarkers retained
as comment-flagged aliases. Identity preserved at the schema level —
both name pairs refer to the same class object.
LLM prompt strings (_PACK_SYSTEM, _EXPRESS_SYSTEM, _splice_system,
_REPLICATE_SYSTEM) are intentionally untouched — those strings are
contracts with the small local models, and the models were trained
on the biology vocabulary. Renaming risks degrading retrieval
quality on a downstream-LLM-keyed surface.
Verification:
- tests/test_ribosome + test_deberta_backend + test_genome +
test_cymatics: 107/107 pass in 10.6s (no regressions)
#87) R3 Stage D.3 — refactors all `for gene in ...:` loop counters in the two retrieval-pipeline orchestrators to use canonical `doc`. No signature changes; pure body-local refactor. helix_context/context_manager.py — 5 loop sites refactored: - L379 _merge_subquery_candidates: dedupe loop over sub-results - L1629 TCM session update loop after build_context - L2178 pending-buffer scan in _retrieve - L2321 cymatics tier scoring loop - L2369 harmonic-bin tier scoring loop Each `for gene in X:` -> `for doc in X:` with body references migrated (gene.gene_id -> doc.gene_id, gene.promoter -> doc.promoter, etc.). Pydantic field name `gene_id` unchanged -- attribute access on Document continues to resolve to the SQL contract field. helix_context/context_packet.py — 1 loop site refactored: - L516 in _build_packet: per-document iteration through evidence rows. Variables migrated to `doc` consistently; `score_map`, `meta`, and item assembly arguments updated. Verification: - tests/test_genome + test_retrieval_dimensions + test_server + test_abstain_tier + test_foveated_splice: 175/175 pass in 2:44
R3 Stage D.4 — refactors `for gene in lex_pool:` and `for gene, sim
in scored:` loops in knowledge_store.py:_apply_dense_rerank to use
`doc` as the loop counter. Local-only change, no signature/contract
movement.
Other `gene_*` references in knowledge_store.py are SQL contract:
- Table name `gene_attribution`
- Column name `gene_id`
- Index names `idx_gene_attribution_*`
These remain unchanged (SQL on-disk contract per the protected list).
The Pydantic field name `gene_id` on Document also stays; attribute
access like `doc.gene_id` continues to work.
Verification:
- tests/test_genome + test_retrieval_dimensions + test_dense_recall:
58/58 pass in 0.9s (no regressions)
R3 Stage D.5 — last per-file loop-counter rename in the helix_context package. shard_router.py:197 `for gene in genes:` -> `for doc in genes:` with body references migrated. Pure loop-local refactor; function signature unchanged. Verification: 133 sharding/genome/retrieval tests pass in 38s.
…spec (#87) R3 Stage E — closes the multi-stage rename effort with up-to-date documentation. docs/ROSETTA.md: Phase-status table at the bottom now reflects reality: R1 -> shipped @ 09d5548 (2026-04-15) R2 -> shipped @ PR #70 87fcb68 (2026-05-12) R3 Stage A -> shipped @ 56fcbed (PR #88, 2026-05-13) R3 Stage B -> shipped @ 460d824..9e7471f (PR #88) R3 Stage C -> shipped @ edc0194..71469ba (PR #88) R3 Stage D -> in progress (this PR #89) R3 Stage E -> in progress (this commit) R4 -> deferred (see #87) Pre-existing entries (out-of-scope list, How-to-use, mapping table) unchanged. docs/superpowers/specs/2026-05-13-rename-r3-symbol-rename-design.md (NEW): Durable design-spec record for R3 mirroring R2's structure (2026-05-11-rename-r2-prose-sweep-design.md). Captures: - Why R3 exists; predecessor specs referenced - Decisions baked in (class-def flip direction, module renames, no MCP slimdown) - Out-of-scope list (SQL, Pydantic fields, Prom metrics, ChromatinState enum values, MCP tool names, agent_prompt contract, LLM prompt strings) - Stage A/B/C/D/E summaries with commit SHAs + table-of-renames - Identity contract: 29 alias pairs across 7 layers - Verification gates (1933/0 across all 3 full-mock runs) - Stage D known-not-fully-completed scope (intentional scope cap) Closes the R3 audit trail. Future contributors can audit the rename by reading this spec + the R2 spec + ROSETTA.md without needing to re-discover the design intent.
Owner
Author
Stages D + E verification complete ✓Full mock suite ran after Stage E ( Byte-for-byte match against every prior R3 full-suite run:
Four identical full-suite runs across the R3 effort means zero net regressions across the entire rename — same 1933 passes, same 15 skips, same 21 deselected, same 2 expected failures. What's in this PR
Scope noteStage D shipped the high-visibility subset (~30 renames across 6 files), not an exhaustive ~200+ sweep. The design spec ( Files intentionally not exhaustively swept (the aliases handle them; cleanup deferable):
Out of scope (still protected, per R2 spec §2)
Ready for review and merge. Closes the R3 effort. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
R3 Stages D and E per the plan in
~/.claude/plans/ethereal-forging-cookie.mdandtracking issue #87. Completes the multi-stage rename effort started in
#88 (Stages A+B+C merged to master at
3b92b02).Stage D — Local-variable + parameter sweep (in progress)
For each module, refactors local-scope
gene→doc,gene_a/gene_b→
doc_a/doc_b, andgenes(list) →docs(list) in function bodiesand signatures where the rename does not touch SQL contracts or
external callers.
Pydantic field names (
gene_id,promoter,epigenetics,chromatin,codons) and SQL columns are unchanged. Type annotationsupdate from
Gene→Document(identity-preserving alias from R3Stage A).
cymatics.py80bf3d6fragments.pycompressor.pyknowledge_store.pycontext_manager.pycontext_packet.pytagger,deberta_backend,seeded_edges, …)Each commit ships independently — every per-file PR diff is reviewable
in isolation and the alias system means any module can be reverted
without breaking others.
Stage E — ROSETTA.md refresh + R3 design spec stub (pending)
shipped, with commit SHAs.
docs/superpowers/specs/2026-05-13-rename-r3-symbol-rename-design.md.Out of scope (still protected)
agent_prompt.pyJSON contract fieldsVerification
Per-file: focused regression after each commit. Final sweep: full
pytest tests/ -m "not live"should return the same 1933 / 0 / 15 /21 / 2 result the PR #88 baseline established.
Related
~/.claude/plans/ethereal-forging-cookie.md