Skip to content

R3 Stage A+B: class-def flip + module renames (#87)#88

Merged
mbachaud merged 10 commits into
masterfrom
rename-r3
May 13, 2026
Merged

R3 Stage A+B: class-def flip + module renames (#87)#88
mbachaud merged 10 commits into
masterfrom
rename-r3

Conversation

@mbachaud
Copy link
Copy Markdown
Owner

Summary

R3 Stages A and B per the plan in
~/.claude/plans/ethereal-forging-cookie.md and
tracking issue #87. Closes Stage A and Stage B of the 5-stage R3
effort; Stages C / D / E remain.

All canonical software-vocabulary names are now the real class and
module definitions; legacy biology names remain as identity-preserving
aliases declared immediately after each class def, plus shim modules
at every old import path.

What this PR does

Stage A — class-def flip + alias inversion (56fcbed)

For each pair below, the canonical name is now the real class def
and the legacy name is a one-line module-level alias:

Module Canonical (real def) Legacy (alias)
schemas.py LifecycleTier ChromatinState
schemas.py DocumentTags PromoterTags
schemas.py DocumentSignals EpigeneticMarkers
schemas.py Document Gene
schemas.py DocumentAttribution GeneAttribution
genome.py KnowledgeStore Genome
ribosome.py Compressor Ribosome

Identity holds in both directions — Document is Gene and
Gene is Document both evaluate True. __name__ reports the canonical
(e.g. Gene.__name__ == 'Document'). aliases.py was inverted to
import the new canonical names from their home modules.

Stage B — module file moves + shim modules (460d8249e7471f, 5 commits)

Old path New path
helix_context/ribosome.py helix_context/compressor.py
helix_context/genome.py helix_context/knowledge_store.py
helix_context/codons.py helix_context/fragments.py
helix_context/replication.py helix_context/persistence.py
helix_context/hgt.py helix_context/cross_store_import.py

Each old path now contains a shim that walks dir(_new_module) and
re-exports every non-dunder attribute. This covers historical private
name leakage (_parse_json, _EXPRESS_SYSTEM, _splice_system,
_kv_keys_from_list) used by training scripts and tests.

Out of scope (still in R3, later stages)

  • Stage C — internal method renames (pack→encode,
    splice→trim, replicate→persist, re_rank→rerank,
    upsert_gene→upsert_doc, query_genes→query_docs, _express→_retrieve,
    plus cymatics helpers and the reflection-string edge case at
    context_manager.py:2336).
  • Stage D — local variable / loop-counter sweep (~200+ gene→doc).
  • Stage Edocs/ROSETTA.md phase-status refresh + R3 spec stub.

Out of scope (still protected, per R2 spec §2)

  • SQL schema (tables genes, gene_attribution, harmonic_links;
    columns gene_id, promoter, epigenetics, chromatin, codons)
  • Pydantic field names on the wire (would force a DB migration)
  • Prometheus metric/label names (dashboard contract)
  • ChromatinState enum value strings (OPEN / EUCHROMATIN /
    HETEROCHROMATIN) — serialized as TEXT, queryable
  • MCP tool names (R4 territory)
  • agent_prompt.py::HELIX_NO_MATCH_FRAGMENT JSON contract fields

Verification

Stage Test surface Result
Baseline (pre-A) tests/test_genome + test_retrieval_dimensions + test_server 127/127 in 2:18
After Stage A (same focused) 127/127 in 2:14
After Stage A full mock suite pytest -m "not live" 1933 passed / 0 failed / 15 skipped / 21 deselected / 2 xfailed in 9:07
After Stage B.1 (ribosome) test_ribosome + test_genome + test_retrieval_dimensions 72/72 in 5s
After Stage B.2 (genome) test_genome + test_retrieval_dimensions + test_server 127/127 in 2:01
After Stage B.3 (codons) test_codons + test_genome 55/55 in 0.4s
After Stage B.4 (replication) test_server + test_genome 114/114 in 2:06
After Stage B.5 (hgt) test_health 10/10 (+2 pre-existing xfailed)
After Stage B (full) TODO before merge (running)

Pydantic JSON round-trip preserved both ways. Identity contract holds
across all 7 schema/module pairs. All 4 import paths to each class
(legacy module, canonical module, aliases.py, package-level
helix_context) resolve to the same class object.

How to review

  1. git log --oneline 56fcbed^..9e7471f — 6 commits, one logical per
    stage / module.
  2. Each Stage B commit shows up as A + M instead of a single rename
    because git's rename detection breaks when the same path is
    modified to a shim in the same commit. Use
    git log --follow -M50% <file> for history traversal.
  3. The shim files are ~25 lines each, structurally identical, so
    review one closely and the others rhyme.

Related

mbachaud added 6 commits May 13, 2026 00:37
R3 Stage A per the plan in
~/.claude/plans/ethereal-forging-cookie.md and tracking issue #87.

Flips the canonical/legacy direction for 7 class pairs so the
software-vocabulary names are now the *real* class definitions and
the legacy biology names are one-line aliases. Identity is preserved
in both directions (Document is Gene and Gene is Document both
evaluate True); only __name__ changes (now reports the canonical).

schemas.py:
  ChromatinState     -> LifecycleTier         (IntEnum; enum values stay)
  PromoterTags       -> DocumentTags
  EpigeneticMarkers  -> DocumentSignals
  Gene               -> Document
  GeneAttribution    -> DocumentAttribution

genome.py:
  Genome             -> KnowledgeStore

ribosome.py:
  Ribosome           -> Compressor

aliases.py:
  Inverts the import direction. Pre-R3: `Gene as Document` (Gene
  was real). Post-R3: imports `Document` directly (Document is
  real) plus `_RENAME_LOG` provenance dict. Both import surfaces
  continue to resolve to the same class object.

Out of scope per the R3 plan (stays untouched in Stage A):
  - Pydantic field names (gene_id, promoter, epigenetics, chromatin,
    codons, harmonic_links, gene_attribution) — SQL contract
  - SQL table/column names
  - Prometheus metric/label names
  - ChromatinState enum value strings (OPEN/EUCHROMATIN/
    HETEROCHROMATIN) — serialized as TEXT, queryable
  - MCP tool names (R4 territory)
  - agent_prompt.py contract field names
  - Method names (pack/splice/replicate/re_rank/upsert_gene/
    query_genes/_express) — Stage C
  - Module file names — Stage B
  - Local variables and parameters — Stage D

Verification:
  - Identity contract: 7/7 pairs `legacy is canonical` evaluate True
  - All __name__ attributes report canonical name (e.g. Gene.__name__
    == 'Document')
  - Pydantic JSON round-trip OK via both Gene + Document import
    surfaces; chromatin enum int values preserved (EUCHROMATIN == 1)
  - tests/test_genome.py + test_retrieval_dimensions.py +
    test_server.py: 127/127 pass (matches baseline byte-for-byte)
  - Zero brittle `__name__ ==` reflection patterns in the codebase
    (verified by grep pre-edit)
R3 Stage B.1 per the plan in
~/.claude/plans/ethereal-forging-cookie.md and issue #87.

Renames helix_context/ribosome.py to helix_context/compressor.py so
the canonical filename matches the canonical class (`Compressor`,
established in R3 Stage A). The old path remains as a back-compat
shim that re-exports every module-level name (public + single-
underscore private) from the new location.

Files:
  - helix_context/ribosome.py  -> helix_context/compressor.py (git rename)
  - helix_context/ribosome.py   (new file — shim)

The shim uses a dir()-walk loop to re-export every non-dunder name,
which covers the historical private-name leakage that training and
test files rely on:
  - _parse_json, _EXPRESS_SYSTEM, _splice_system, _PACK_SYSTEM,
    _KV_EXTRACT_SYSTEM, _REPLICATE_SYSTEM
  - Plus all public classes: Compressor, Ribosome (alias), OllamaBackend,
    ClaudeBackend, DeBERTaRibosome, etc.

Verification:
  - Direct import works:        from helix_context.compressor import Compressor
  - Legacy import works:        from helix_context.ribosome import Ribosome
  - Package-level import works: from helix_context import Ribosome
  - All four import paths resolve to the same class object
  - Private name access works:  from helix_context.ribosome import _parse_json
  - tests/test_ribosome.py + test_genome.py + test_retrieval_dimensions.py:
    72/72 pass in 5s
R3 Stage B.2 per ~/.claude/plans/ethereal-forging-cookie.md and #87.

Renames the largest module in the codebase (4588 lines) so the
canonical filename matches the canonical class (``KnowledgeStore``,
established in R3 Stage A). The old path stays as a shim that
re-exports every module-level name (public + private).

Files:
  - helix_context/genome.py -> helix_context/knowledge_store.py (rename)
  - helix_context/genome.py  (new file — shim)

SQL contracts unchanged: the on-disk SQLite schema still references
tables/columns as ``genes``, ``gene_id``, ``gene_attribution``,
``harmonic_links``, ``chromatin``, ``promoter``, ``epigenetics``,
``codons``. Only the Python module filename and class identity moved.

Shim covers private-name leakage for ``_kv_keys_from_list`` used by
scripts/backfill_path_key_index.py.

Verification:
  - 3-path identity: Genome is KnowledgeStore is helix_context.Genome
  - path_tokens / file_tokens / _kv_keys_from_list all reachable
  - Genome.__name__ == 'KnowledgeStore' (canonical)
  - tests/test_genome.py + test_retrieval_dimensions.py + test_server.py:
    127/127 pass in 2:01
R3 Stage B.3 per the plan in ~/.claude/plans/ethereal-forging-cookie.md
and tracking issue #87.

Files:
  - helix_context/codons.py -> helix_context/fragments.py (rename)
  - helix_context/codons.py  (new shim — re-exports everything)

Class names (Codon / CodonChunker / CodonEncoder / RawStrand) and
the Pydantic field name (Gene.codons / Document.codons) are unchanged
— Stage C may rename the helper class identifiers later.

Verification:
  - import surfaces resolve via Codon (codons), Codon (fragments),
    and helix_context.Codon all to the same class
  - tests/test_codons.py + test_genome.py: 55/55 pass in 0.4s
R3 Stage B.4 per the plan in ~/.claude/plans/ethereal-forging-cookie.md
and tracking issue #87.

Files:
  - helix_context/replication.py -> helix_context/persistence.py (rename)
  - helix_context/replication.py  (new shim)

Class identifier (ReplicationManager) is unchanged — Stage C may
rename the method-level surface (replicate -> persist).

Verification: 3-path identity + tests/test_server.py + test_genome.py
114/114 pass in 2:06.
R3 Stage B.5 per the plan in ~/.claude/plans/ethereal-forging-cookie.md
and tracking issue #87. This completes Stage B (5 module renames).

Files:
  - helix_context/hgt.py -> helix_context/cross_store_import.py (rename)
  - helix_context/hgt.py  (new shim)

Per docs/ROSETTA.md, "HGT" (horizontal gene transfer) is the legacy
biology framing for what's just a cross-store document import
operation. Function-level surface (export_genome / import_genome /
genome_diff) is unchanged in Stage B; Stage C may rename to
export_documents / import_documents / store_diff.

Verification:
  - Imports work via both helix_context.hgt and
    helix_context.cross_store_import
  - tests/test_health.py: 10/10 pass (+ 2 xfailed pre-existing)
@mbachaud
Copy link
Copy Markdown
Owner Author

Post-Stage-B verification complete ✓

Full mock suite ran after Stage B.5 (9e7471f):

1933 passed, 15 skipped, 21 deselected, 2 xfailed in 532.42s (0:08:52)

Identical result to the post-Stage-A full suite (1933 / 0 failed, 9:07 vs 8:52 wallclock). The 5 module renames + 7 class flips + alias inversion together produced zero test regressions.

Breakdown:

  • 1933 passed
  • 15 skipped (live-marker tests requiring Ollama)
  • 21 deselected (-m "not live" filter)
  • 2 xfailed (pre-existing expected failures, unrelated)

Ready for review.

mbachaud added 4 commits May 13, 2026 09:02
…icate/re_rank) (#87)

R3 Stage C.1 per the plan in ~/.claude/plans/ethereal-forging-cookie.md
and issue #87.

Renames the 4 main methods on the Compressor class (was Ribosome) to
canonical software vocabulary. Legacy method names remain valid as
intra-class aliases pointing at the same function objects -- not
wrappers, so latency histograms, identity checks, and call counters
behave exactly as before.

compressor.py:
  Method renames inside class Compressor:
    pack       -> encode    (signature unchanged)
    re_rank    -> rerank    (signature unchanged)
    splice     -> trim      (signature unchanged)
    replicate  -> persist   (signature unchanged)

  Section header comments updated to match canonical names. End of
  class body now carries a "Legacy method aliases" block:
    pack      = encode
    splice    = trim
    replicate = persist
    re_rank   = rerank

Internal caller updates (canonical names everywhere we own):
  context_manager.py:
    L709:  self.ribosome.pack(...)    -> self.ribosome.encode(...)
    L1880: self.ribosome.pack(...)    -> self.ribosome.encode(...)
    L2336: hasattr(self.ribosome, "re_rank") -> hasattr(..., "rerank")
    L2339: self.ribosome.re_rank(...) -> self.ribosome.rerank(...)
  deberta_backend.py:
    L284:  self._ollama.pack(...)     -> self._ollama.encode(...)
    L290:  self._ollama.replicate(...) -> self._ollama.persist(...)

Notes:
- The `self.ribosome` attribute name on HelixContextManager is
  unchanged in Stage C (Stage D variable sweep may rename it).
- DeBERTaRibosome.pack() and .replicate() methods themselves stay as-is
  (separate class; its own method-name story can be addressed later).
- The 4 internal helpers in DeBERTaRibosome that call into Compressor
  via self._ollama have been migrated to canonical names; the public
  method names (pack/replicate) on the DeBERTaRibosome class itself
  are preserved.

Reflection edge case at context_manager.py:2336 covered: both legacy
("re_rank") and canonical ("rerank") names resolve to the same
function object, so the hasattr check works either way; updating the
string to "rerank" matches the canonical method name.

Verification:
  - Compressor.pack is Compressor.encode -> True (and 3 other pairs)
  - tests/test_ribosome + test_deberta_backend + test_genome +
    test_retrieval_dimensions + test_server: 151/151 pass in 2:34
…ller updates (#87)

R3 Stage C.2 per the plan in ~/.claude/plans/ethereal-forging-cookie.md
and issue #87.

Renames the 5 main methods on KnowledgeStore (was Genome) to canonical
software vocabulary, with intra-class aliases keeping every legacy
caller working unchanged.

knowledge_store.py — method renames inside class KnowledgeStore:
  upsert_gene              -> upsert_doc
  query_genes              -> query_docs
  query_genes_ann          -> query_docs_ann
  query_genes_dense_recall -> query_docs_dense_recall
  get_gene                 -> get_doc

Each rename keeps:
  - method signature unchanged (param names + types stay -- param
    rename is Stage D territory if we want it)
  - method body identical to pre-rename

End of class body now carries a "Legacy method aliases" block. Each
alias is the *same function object* as the canonical method, not a
wrapper:
  upsert_gene              = upsert_doc
  query_genes              = query_docs
  query_genes_ann          = query_docs_ann
  query_genes_dense_recall = query_docs_dense_recall
  get_gene                 = get_doc

Internal caller migration (canonical names everywhere in helix_context):
  api.py              (2x get_gene -> get_doc)
  context_manager.py  (5x upsert_gene, 1x query_genes, 1x query_genes_ann)
  context_packet.py   (2x query_genes)
  cross_store_import.py (2x upsert_gene, 1x get_gene)
  expand.py           (2x get_gene)
  fusion.py           (1x docstring)
  knowledge_store.py  (1x upsert_gene, 2x query_genes, 2x dense_recall, 2x get_gene)
  persistence.py      (1x comment)
  registry.py         (1x upsert_gene)
  server.py           (5x get_gene)
  shard_router.py     (2x query_genes)
  sharding.py         (1x query_genes, 1x get_gene)

Total: 12 files, +52 / -37 lines.

External API parity preserved:
  - api.py::gene_get(gene_id) still works -- now calls get_doc(gid)
    internally, but the public API name is unchanged.
  - Test files (tests/*.py) NOT updated -- they call the legacy
    names which still resolve via aliases. Stage D may sweep these
    for consistency.
  - SQL table/column names (genes, gene_id, gene_attribution,
    harmonic_links, chromatin, promoter, epigenetics, codons)
    untouched -- on-disk contract is preserved.

Verification:
  - 5/5 method pairs: legacy method is canonical method (identity)
  - tests/test_genome + test_retrieval_dimensions + test_server +
    test_ribosome + test_deberta_backend + test_health:
    161 passed, 4 deselected, 2 xfailed in 2:20
…ics, fragments) (#87)

R3 Stage C.3 per the plan in ~/.claude/plans/ethereal-forging-cookie.md
and issue #87. Completes the Stage C method/helper renames.

context_manager.py — method renames inside HelixContextManager:
  _express              -> _retrieve
  _make_parent_gene_id  -> _make_parent_doc_id
  _upsert_parent_gene   -> _upsert_parent_doc

cymatics.py — module-level function renames:
  gene_spectrum         -> doc_spectrum
  _cached_gene_spectrum -> _cached_doc_spectrum   (LRU-cached wrapper)
  cached_gene_spectrum  -> cached_doc_spectrum
  interference_splice   -> interference_trim
  Internal callers in cymatics.py updated to canonical names.

fragments.py — staticmethod rename inside CodonEncoder:
  codon_id -> fragment_id  (staticmethod descriptor aliased)

Each rename adds the legacy name as a one-line alias pointing at the
same function/method object — `is` identity holds, .cache_info() /
.cache_clear() on the LRU wrappers still work via either name.

Internal caller migrations (canonical name everywhere we own):
  context_manager.py — self._upsert_parent_gene, self._make_parent_gene_id,
    self._express (multiple sites) -> canonical equivalents
  server.py — helix._express -> helix._retrieve
  cymatics.py — internal calls to cached_gene_spectrum / _cached_gene_spectrum
    -> canonical names
  context_manager.py + server.py — imports of cached_gene_spectrum from
    cymatics module -> cached_doc_spectrum (still re-exported via alias)

Test updates (monkey-patch path):
  - tests/conftest.py — fixture patches both `g.upsert_doc` (canonical,
    what internal code calls) and `g.upsert_gene` (legacy alias)
  - tests/test_abstain_tier.py + test_foveated_splice.py — patch both
    manager._retrieve and manager._express
  - tests/test_dense_recall.py — patch both query_docs / query_genes and
    query_docs_dense_recall / query_genes_dense_recall; switch test call
    site to canonical query_docs_ann
  - tests/test_genome.py — fixture patches both upsert_doc / upsert_gene
  - tests/test_server.py — TestDebugIntrospectionEndpoints fixture
    patches both manager._retrieve and manager._express

Why test updates matter: aliases give name-level identity at the *class*
level. Instance-attribute monkey-patches only affect the patched name;
if internal code uses the canonical name, the patch is bypassed. Fix is
patch both names. (Tests calling the methods directly without
monkey-patching are unaffected — the alias resolves to the same function.)

Reflection edge case (context_manager.py:2336 hasattr) is unaffected —
it was already fixed in C.1 to check the canonical name.

Verification:
  - 7/7 alias pairs: legacy is canonical (identity preserved)
  - tests/test_server::TestDebugIntrospectionEndpoints + test_dense_recall
    + test_abstain_tier + test_foveated_splice + test_cymatics:
    106/106 pass in 1:09
… get_doc (#87)

3 tests in tests/test_api_walk.py were configuring
`mgr.genome.get_gene.return_value = ...` (or .side_effect) but after
Stage C.2 api.py:gene_get calls the canonical `genome.get_doc()`.
On a MagicMock, the unconfigured .get_doc attribute returns a fresh
MagicMock instead of the test's expected None / fake gene.

Switch the mock configuration to the canonical name (.get_doc).
Identity-preserving aliases mean real KnowledgeStore instances still
accept both names, but MagicMock attribute lookup is per-name.

Tests fixed:
  - test_gene_get_delegates_to_genome
  - test_gene_get_returns_none_on_unknown_id
  - test_neighbors_sorts_by_similarity_desc

Also fixes a 4th mock configuration in the same file (line 266) for
consistency, even though that test was already passing.
@mbachaud
Copy link
Copy Markdown
Owner Author

Stage C verification complete ✓

Full mock suite ran after Stage C.3 + test-mock fix (71469ba):

1933 passed, 15 skipped, 21 deselected, 2 xfailed in 460.63s (0:07:40)

Identical pass count across all three Stage verifications:

Gate Result Wallclock
Post-Stage-A full mock 1933 / 0 9:07
Post-Stage-B full mock 1933 / 0 8:52
Post-Stage-C full mock 1933 / 0 7:40

Stage C added 17 method/helper renames + ~30 internal caller migrations + 7 test monkey-patch site updates, with zero net change in test outcomes.

What's now in the PR

  • Stage A (56fcbed) — 7 class-def flips + alias inversion
  • Stage B.1-B.5 — 5 module renames + shims (ribosome→compressor, genome→knowledge_store, codons→fragments, replication→persistence, hgt→cross_store_import)
  • Stage C.1 (edc0194) — Compressor methods: pack→encode, splice→trim, replicate→persist, re_rank→rerank
  • Stage C.2 (7b90491) — KnowledgeStore methods: upsert_gene→upsert_doc, query_genes*→query_docs*, get_gene→get_doc
  • Stage C.3 (7ee0b5c) — Internal helpers: _express→_retrieve, cymatics + fragments helpers
  • Stage C test-mock fix (71469ba) — test_api_walk.py MagicMock configs use canonical .get_doc

Identity contract

29 identity-preserving aliases total:

Layer Count
Class (schemas + genome + ribosome) 7
Compressor methods 4
KnowledgeStore methods 5
context_manager helpers 3
cymatics helpers 4
fragments helpers 1
Module re-exports (shim) 5

No SQL schema change. No Pydantic field-name change. No MCP tool rename. No wire-format break.

Still in R3 (separate PRs)

  • Stage D — local variable / loop-counter sweep (~200+ gene→doc in module bodies)
  • Stage Edocs/ROSETTA.md phase-status refresh + R3 design spec stub

@mbachaud mbachaud merged commit 3b92b02 into master May 13, 2026
3 checks passed
@mbachaud mbachaud deleted the rename-r3 branch May 13, 2026 16:39
mbachaud added a commit that referenced this pull request May 13, 2026
…spec (#87)

R3 Stage E — closes the multi-stage rename effort with up-to-date
documentation.

docs/ROSETTA.md:
  Phase-status table at the bottom now reflects reality:
    R1 -> shipped @ 09d5548 (2026-04-15)
    R2 -> shipped @ PR #70 87fcb68 (2026-05-12)
    R3 Stage A -> shipped @ 56fcbed (PR #88, 2026-05-13)
    R3 Stage B -> shipped @ 460d824..9e7471f (PR #88)
    R3 Stage C -> shipped @ edc0194..71469ba (PR #88)
    R3 Stage D -> in progress (this PR #89)
    R3 Stage E -> in progress (this commit)
    R4 -> deferred (see #87)
  Pre-existing entries (out-of-scope list, How-to-use, mapping table)
  unchanged.

docs/superpowers/specs/2026-05-13-rename-r3-symbol-rename-design.md
(NEW):
  Durable design-spec record for R3 mirroring R2's structure
  (2026-05-11-rename-r2-prose-sweep-design.md). Captures:
    - Why R3 exists; predecessor specs referenced
    - Decisions baked in (class-def flip direction, module renames,
      no MCP slimdown)
    - Out-of-scope list (SQL, Pydantic fields, Prom metrics,
      ChromatinState enum values, MCP tool names, agent_prompt
      contract, LLM prompt strings)
    - Stage A/B/C/D/E summaries with commit SHAs + table-of-renames
    - Identity contract: 29 alias pairs across 7 layers
    - Verification gates (1933/0 across all 3 full-mock runs)
    - Stage D known-not-fully-completed scope (intentional scope cap)

Closes the R3 audit trail. Future contributors can audit the rename
by reading this spec + the R2 spec + ROSETTA.md without needing to
re-discover the design intent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant