Skip to content

docs(plans): add jepa_cross_modal_alignment.md (1.2+ research direction)#252

Merged
dennys246 merged 1 commit into
mainfrom
docs/plans-jepa-cross-modal-alignment
May 15, 2026
Merged

docs(plans): add jepa_cross_modal_alignment.md (1.2+ research direction)#252
dennys246 merged 1 commit into
mainfrom
docs/plans-jepa-cross-modal-alignment

Conversation

@dennys246
Copy link
Copy Markdown
Owner

Summary

Plan-only PR. Adds docs/plans/jepa_cross_modal_alignment.md as a 1.2+ research direction, plus a README slot for the 1.1-track triangle (roy_5 / cross_modal_binding / JEPA).

No code changes. The plan stays DRAFT until roy_5_encoder_alignment_disambiguator.md Stage 3 (cradle-arc redesign) ships and produces sufficient paired training data. Stage 0 of the JEPA plan is a ~50 LOC data audit on Roy-5b's existing data; only after that PASSES does any code work begin.

Why this plan exists

Roy-5a-substrate-on (PR #251) surfaced a structural finding the plan it was meant to resolve did not model: SensorEncoder (384-dim) and LinguisticEncoder (768-dim) embed in different-dimensional spaces. Cross-modality cosine is mathematically undefined; any cosine-based cross-modal alignment is structurally impossible without a learned projection layer.

Three previously-disjoint sketches were each working around this gap without acknowledging each other:

Plan What it sketched What it actually needs
cross_modal_substrate_binding.md Stage 4a Hebbian binding on raw cosine between EC nodes A learned projection — Hebbian on different-dim vectors is impossible. This is what cancelled it via Roy-4.
roy_5_encoder_alignment_disambiguator.md Stage 4b "Encoder replacement to 1.2+ research direction" A bio-defensible learned projection that doesn't replace either encoder. JEPA is that.
grounded_language_acquisition.md Phase 2 "Small MLP or tiny RNN" binding token sequences → EC node IDs A one-modality JEPA where one head is a token-embedding lookup. Phase 2's sketch is structurally a small JEPA.

Writing this plan now consolidates all three into one design — without committing to implementation. The risk of NOT writing it is that three months from now someone could plausibly start the Phase 2 MLP without seeing it's the same problem cross_modal_substrate_binding Stage 4a couldn't solve and roy_5's Stage 4b deferred. Worst case: three half-implementations of the same alignment learner.

Load-bearing rules baked into the plan

  • No pretrained-model imports. Projection weights come from Roy-priming-derived training data only (no CLIP/ImageBind transfers).
  • No central hand-curated (sensor, word) lexicon. Per feedback_interim_contamination.md.
  • Contamination guard is a CI test, not a convention. A tests/unit/test_jepa_no_contamination.py (planned) constructs synthetic curated pairs and FAILS the build if the training loader accepts pairs without substrate-provenance tags.
  • No replacement for SensorEncoder or LinguisticEncoder. JEPA adds a projection ON TOP; existing encoders are unchanged.
  • use_projection=False default. EC's pattern_complete_or_separate gains an opt-in flag; existing call sites are unaffected.
  • Earliest possible landing is 1.2. No section of v1_refinement.md should depend on JEPA.

Plan structure

7 stages (~1,250 LOC, 8-12 weeks once unblocked):

Stage Item Prereq
0 Data audit — confirm Stage 3 cradle redesign produces sufficient paired training data Stage 3 of roy_5 shipped
1 Projection module + persistence Stage 0 PASS
2 Training pipeline + contamination-detector test Stage 1
3 Encoder integration (additive embed_projected) Stage 2
4 EC integration — use_projection parameter Stage 3
5 Roy-5c validation iteration Stage 4
6 Hivemind shareability (conditional) Stage 5 PASS + Hivemind 1.1 ships

Out of scope

  • Implementation work. This is a plan-only PR. No code, no tests, no infrastructure.
  • JEPA Stage 0 data audit. Runs when Stage 3 of roy_5 ships — not now.
  • Changes to existing plans. README updated to acknowledge the new plan + clarify cross_modal_substrate_binding's resurrection conditions; the individual plans aren't edited.

Test plan

  • Plan reads coherently end-to-end (self-check).
  • No code changes — ruff check / pytest are trivially clean.
  • All cross-plan references resolve (verified during writing).
  • User review for scope + sequencing.

🤖 Generated with Claude Code

Roy-5a-substrate-on (PR #251) surfaced a structural finding the plan
it was meant to resolve did not model: SensorEncoder (384-dim) and
LinguisticEncoder (768-dim) embed in different-dimensional spaces.
Cross-modality cosine is mathematically undefined; any cosine-based
cross-modal alignment is structurally impossible without a learned
projection layer.

This plan consolidates three previously-disjoint sketches that were
each working around the same gap without acknowledging each other:

- cross_modal_substrate_binding.md Stage 4a — Hebbian binding on raw
  cosine. Cancelled by Roy-4 because cosine on different-dim vectors
  is undefined.
- roy_5_encoder_alignment_disambiguator.md Stage 4b — "encoder
  replacement to 1.2+ research direction." JEPA is the bio-defensible
  answer.
- grounded_language_acquisition.md Phase 2 — symbol-binding layer
  sketched as "small MLP, or a tiny RNN." Phase 2's sketch is
  structurally a one-modality JEPA.

The plan stays DRAFT until roy_5_encoder_alignment_disambiguator.md
Stage 3 (cradle-arc redesign) ships and produces sufficient paired
training data. Stage 0 of this plan is a ~50 LOC data audit on
Roy-5b's existing data; only after that PASSES does any code work
begin.

Load-bearing rules captured in the plan:
- No pretrained-model imports (no CLIP/ImageBind transfers).
- No central hand-curated (sensor, word) lexicon.
- Contamination guard is a CI test, not a convention.
- JEPA adds a projection on TOP of the existing encoders;
  SensorEncoder + LinguisticEncoder are unchanged.
- use_projection=False default; opt-in flag on EC.
- Earliest possible landing is 1.2.

README also updated with the 1.1 track triangle:
- roy_5_encoder_alignment_disambiguator.md → Stage 1 SHIPPED
- cross_modal_substrate_binding.md → CANCELLED by Roy-4
- jepa_cross_modal_alignment.md → DRAFT

Total ~1,250 LOC over 8-12 weeks once unblocked. On par with the
1.0 cleanup wave's scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dennys246 dennys246 merged commit cd51be5 into main May 15, 2026
5 checks passed
@dennys246 dennys246 deleted the docs/plans-jepa-cross-modal-alignment branch May 15, 2026 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant