docs(plans): add jepa_cross_modal_alignment.md (1.2+ research direction)#252
Merged
Merged
Conversation
Roy-5a-substrate-on (PR #251) surfaced a structural finding the plan it was meant to resolve did not model: SensorEncoder (384-dim) and LinguisticEncoder (768-dim) embed in different-dimensional spaces. Cross-modality cosine is mathematically undefined; any cosine-based cross-modal alignment is structurally impossible without a learned projection layer. This plan consolidates three previously-disjoint sketches that were each working around the same gap without acknowledging each other: - cross_modal_substrate_binding.md Stage 4a — Hebbian binding on raw cosine. Cancelled by Roy-4 because cosine on different-dim vectors is undefined. - roy_5_encoder_alignment_disambiguator.md Stage 4b — "encoder replacement to 1.2+ research direction." JEPA is the bio-defensible answer. - grounded_language_acquisition.md Phase 2 — symbol-binding layer sketched as "small MLP, or a tiny RNN." Phase 2's sketch is structurally a one-modality JEPA. The plan stays DRAFT until roy_5_encoder_alignment_disambiguator.md Stage 3 (cradle-arc redesign) ships and produces sufficient paired training data. Stage 0 of this plan is a ~50 LOC data audit on Roy-5b's existing data; only after that PASSES does any code work begin. Load-bearing rules captured in the plan: - No pretrained-model imports (no CLIP/ImageBind transfers). - No central hand-curated (sensor, word) lexicon. - Contamination guard is a CI test, not a convention. - JEPA adds a projection on TOP of the existing encoders; SensorEncoder + LinguisticEncoder are unchanged. - use_projection=False default; opt-in flag on EC. - Earliest possible landing is 1.2. README also updated with the 1.1 track triangle: - roy_5_encoder_alignment_disambiguator.md → Stage 1 SHIPPED - cross_modal_substrate_binding.md → CANCELLED by Roy-4 - jepa_cross_modal_alignment.md → DRAFT Total ~1,250 LOC over 8-12 weeks once unblocked. On par with the 1.0 cleanup wave's scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Plan-only PR. Adds
docs/plans/jepa_cross_modal_alignment.mdas a 1.2+ research direction, plus a README slot for the 1.1-track triangle (roy_5 / cross_modal_binding / JEPA).No code changes. The plan stays DRAFT until
roy_5_encoder_alignment_disambiguator.mdStage 3 (cradle-arc redesign) ships and produces sufficient paired training data. Stage 0 of the JEPA plan is a ~50 LOC data audit on Roy-5b's existing data; only after that PASSES does any code work begin.Why this plan exists
Roy-5a-substrate-on (PR #251) surfaced a structural finding the plan it was meant to resolve did not model:
SensorEncoder(384-dim) andLinguisticEncoder(768-dim) embed in different-dimensional spaces. Cross-modality cosine is mathematically undefined; any cosine-based cross-modal alignment is structurally impossible without a learned projection layer.Three previously-disjoint sketches were each working around this gap without acknowledging each other:
cross_modal_substrate_binding.mdStage 4aroy_5_encoder_alignment_disambiguator.mdStage 4bgrounded_language_acquisition.mdPhase 2Writing this plan now consolidates all three into one design — without committing to implementation. The risk of NOT writing it is that three months from now someone could plausibly start the Phase 2 MLP without seeing it's the same problem
cross_modal_substrate_bindingStage 4a couldn't solve and roy_5's Stage 4b deferred. Worst case: three half-implementations of the same alignment learner.Load-bearing rules baked into the plan
(sensor, word)lexicon. Perfeedback_interim_contamination.md.tests/unit/test_jepa_no_contamination.py(planned) constructs synthetic curated pairs and FAILS the build if the training loader accepts pairs without substrate-provenance tags.use_projection=Falsedefault. EC'spattern_complete_or_separategains an opt-in flag; existing call sites are unaffected.v1_refinement.mdshould depend on JEPA.Plan structure
7 stages (~1,250 LOC, 8-12 weeks once unblocked):
embed_projected)use_projectionparameterOut of scope
Test plan
ruff check/pytestare trivially clean.🤖 Generated with Claude Code