Skip to content

project-89/platonic_rotation

Repository files navigation

The Platonic Rotation

Constructive alignment between text embedding models via Lohe dynamics on the hypersphere.

Independent text embedders, trained with different objectives and dimensions, encode the same content as rotated copies of each other on the unit hypersphere. We give a constructive method that finds the rotation, validate it on held-out text and held-out domains, and measure the dimensionality of the manifold on which two encoders actually agree.

This is a constructive, geometric extension of the Platonic Representation Hypothesis (Huh et al. 2024).

The headline finding

For Gemini gemini-embedding-001 (768d) and OpenAI text-embedding-3-large (3072d) on a 2000-text corpus spanning 10 domains, a single orthogonal rotation aligns the two encoders with 99.3% top-1 paired retrieval on held-out text. The rotation lives in a ~100-dimensional subspace; the remaining 668 dimensions of Gemini and 2972 dimensions of OpenAI carry model-specific noise that does not correspond between models.

Capability → convergence on the Platonic manifold

Capability convergence

A direct, constructive test of the Huh+ 2024 PRH capability-scaling claim, on text embedders specifically:

  • (a) Centrality. Higher-capability encoders are more central in the family graph — their mean alignment to every other encoder is higher. Trend: +0.022 top-1 per decade of nominal output dimension. The four highest-capacity encoders (OpenAI-3-large, BGE-large, BGE-base, E5-large) cluster at centrality 0.987; lower-capacity encoders (BGE-small, MPNet, Gemini-MRL) sit at 0.95–0.97.
  • (b) Closure burden. Lower-capability encoders contribute more to chart inconsistency. The triangles with the worst closure all involve BGE-small (384d) or the Matryoshka-truncated Gemini slice. Trend: -0.039 closure residual per decade.
  • (c) Within-family scaling. Inside BGE: small (384d) → base (768d) → large (1024d) shows monotonically improving alignment to a fixed external reference (OpenAI-3-large): 0.966 → 0.988 → 0.991. Direct, single-family demonstration of the Huh+ scaling claim.
  • (d) All-pairs alignment matrix. Sorted by centrality: the high-capability cluster (top-left of the heatmap) shows uniform ≥ 0.99 alignment; the low-capability tail drops smoothly.

The convergence is monotone with capability, visible across five distinct training paradigms, and consistent in both directions (larger → more central, larger → less chart inconsistency). This is the most direct test of PRH-capability published on text embedders, with a constructive method (literal rotation, not just MKNN agreement) and held-out evaluation throughout.

A new metric: Platonic Centroid Score (PCS)

If encoders converge with capability, then the family centroid is itself a target. We compute the Platonic Centroid Score as the held-out paired top-1 of an encoder aligned to the capability-weighted Lohe-aggregated centroid of the family (leave-self-out). PCS:

  • requires no labeled benchmarks (MTEB, BEIR, MIRACL) — only the anchor corpus,
  • is bounded in [0, 1], directly interpretable,
  • ranks our family consistently with the centrality measure,
  • defines a candidate distillation loss: $\mathcal{L}_\text{Platonic} = \mathbb{E}x[,1 - \cos(\text{Procrustes}(e\theta(x)),, C^*(x)),]$ — distill against the family centroid, not any single teacher.

Empirical PCS ranking (held-out, 3-seed):

Encoder dim PCS
BGE-base 768 0.998
E5-large 1024 0.997
OpenAI-3-large 3072 0.994
BGE-large 1024 0.994
BGE-small 384 0.992
MPNet-base 768 0.989
Gemini-emb (768d MRL slice) 768 0.977

Honest caveat on extrapolation: with seven encoders and six of them in a tight high-capability cluster, the saturation fit (PCS vs nominal dim, $c_\infty \le 1$) achieves only $R^2 \approx 0.04$. PCS is informative for ranking, but reliable estimation of the asymptotic Platonic limit needs a wider capability range (more sub-100M-parameter encoders below saturation).

The PCS construction generalizes: any new encoder can be scored against the bundled centroid in seconds. The metric does not depend on labeled benchmarks, only on the anchor corpus and a fixed reference family. See scripts/11_platonic_score.py for the implementation.

Encoder zoo

Validated across 7 text embedders spanning 4 output dimensions (384 / 768 / 1024 / 3072) and 5 model families:

Encoder dim family
gemini-embedding-001 768 (MRL) Google (closed)
openai-text-embedding-3-large 3072 OpenAI (closed)
BAAI/bge-small-en-v1.5 384 BGE (open)
BAAI/bge-base-en-v1.5 768 BGE (open)
BAAI/bge-large-en-v1.5 1024 BGE (open)
intfloat/e5-large-v2 1024 E5 (open)
sentence-transformers/all-mpnet-base-v2 768 MPNet (open)

Pairwise paired retrieval (D=128 chart, ref=Gemini, in-sample):

Alt encoder top-1 MRR@10 mkNN@10 max distortion
OpenAI-3-large 0.991 0.995 0.539 0.351
BGE-large 0.967 0.982 0.467 0.363
E5-large 0.966 0.982 0.447 0.439
BGE-base 0.962 0.980 0.463 0.379
MPNet 0.941 0.968 0.466 0.418
BGE-small 0.939 0.966 0.426 0.470

Every pair admits a rotation, with paired retrieval ≥ 0.94 even on the smallest open-source encoder (BGE-small at 384d).

D* varies by pair, capturing the difference between within-family and cross-family alignment:

Pair D* (ε=0.005) asymptote top-1
Gemini × OpenAI-3-large (closed × closed) 109 0.996
BGE-small × BGE-large (within-family scaling) 95 0.989
OpenAI-3-large × BGE-large (closed × open) 63 0.943

Cross-family pairs have a smaller, looser shared manifold than within-family or both-closed-vendor pairs. The shared geometry is real but its size and tightness depend on whether the encoders share training paradigm.

Triangle closure (independently-fit pairwise rotations, 35 triples): mean residual 0.66, mean effective angle 38°. The worst closures all involve Gemini (768d MRL) or BGE-small (384d) — the lowest-capability encoders. The best closures are between high-capability encoders. Higher-capability encoders agree on a more consistent global chart; lower-capability encoders pick incompatible local axes. This is a direct empirical instance of Huh+ 2024's capability-scaling claim.

Three results

1. The rotation generalizes by topic, not just by sample. Holding out an entire 200-sentence domain (astronomy, mathematics, biology, etc.) and fitting on the other 9 domains, held-out top-1 retrieval is 0.955–1.000 across all 10 leave-one-out splits. Mean ≈ 0.98, only marginally below the random by-sample baseline of 0.996. The Platonic rotation is universal across topics.

2. The shared geometry is low-dimensional. Sweeping the common-chart dimension D from 2 to 768, paired alignment shows three regimes:

D held-out top-1 mkNN@10 variance captured
2 0.003 (collapse) 0.086 5%
20 0.672 (recovery) 0.637 26%
100 0.987 (saturation) 0.516 56%
768 0.998 0.289 100%

A bisection on the held-out top-1 curve (5-seed averaged) sharpens this from a discrete-sweep estimate to D* = 109 at tolerance ε=0.005 and D* = 134 at the noise floor ε=0.002. The asymptote is 0.9957 ± 0.0023. Below D* the rotation collapses; at D* paired retrieval is statistically indistinguishable from the full-dimensional asymptote, using only ~56–64% of each encoder's variance. Roughly half of each encoder's signal is model-specific noise that does not correspond between models. This is the cross-model analog of the d_eff finding inside transformers reported in our prior coherence-guided pruning work.

3. The rotation is overdetermined by O(D) anchors. With D=128 chart, paired retrieval saturates at N≈200-400 anchors and is statistically indistinguishable from the full-anchor solution by N=800. Evidence the rotation is discovered (a property of the underlying geometry) rather than fitted (a property of the data).

The method

Canonical Resonance Transform (CRT) — three stages:

  1. Spherical-PCA chart — project both spaces to a common D, whiten, renormalize to S^(D-1).
  2. Weighted orthogonal Procrustes — closed-form rotation R via SVD of Aₐᵀ Aᵣ, det(R)=+1.
  3. Lohe / Kuramoto flow on S^(D-1) under a coherence-floor (PAS) legality corridor — tangent-projected nearest-neighbor pull, accepts a step only if Phase-Alignment Score (the Kuramoto order parameter ‖⟨x⟩‖) does not decrease.

The Lohe step certifies the Procrustes solution as the lawful synchronized state: in the basin where the rotation is already correct, the flow refuses to take a step (steps_accepted=0). Where Procrustes is residually misaligned (low D, few anchors, decoherent regions), the flow finds steps that improve alignment without breaking PAS coherence.

Connection to the coupled-oscillator paradigm

LayerNorm places token states approximately on S^(d-1). In our prior work [coherence-guided pruning paper], we showed that attention heads act as inter-oscillator couplings on this sphere, and that derived BKT-criticality thresholds identify dead heads at 95–100% precision across six architecture families.

The CRT applies the same physics one level up: embedding outputs from two models live on spheres; the rotation between them is the alignment of two oscillator populations. The PAS-floor legality corridor is the same Kuramoto order-parameter constraint that governs synchronization in the transformer's interior.

That the same sphere-Kuramoto physics produces (a) intelligence inside one transformer and (b) commensurability between two transformers is suggestive of a universal geometric structure that the next paper in this series will quantify.

Reproduce

All experiments use only NumPy and SciPy. The crt.py module is self-contained — no dependency on the parent coherence_lattice library. Cached embeddings for the 2000-text corpus are bundled (Gemini and OpenAI). Adding more encoders requires API keys for the relevant providers.

# Regenerate every figure from cached data (no API calls)
python3 scripts/plot_all.py

# Re-run all experiments + plot
python3 scripts/reproduce_all.py

# Add another encoder (one of: openai/, gemini/, cohere/, st/)
python3 scripts/01_compute_embeddings.py --encoder st/BAAI/bge-large-en-v1.5
python3 scripts/01_compute_embeddings.py --encoder openai/text-embedding-3-small

# Single experiments
python3 scripts/02_headline_2x2.py           # Procrustes vs P+Lohe, 2x2 split
python3 scripts/03_dimensional_collapse.py   # D-sweep (headline plot)
python3 scripts/04_encoder_zoo.py            # pairwise + triangle closure
python3 scripts/05_anchor_scaling.py         # N-anchors curve
python3 scripts/06_pas_floor_ablation.py     # legality-corridor sweep
python3 scripts/07_domain_holdout.py         # leave-one-domain-out
python3 scripts/08_negative_control.py       # random rotation + noise sweep
python3 scripts/09_narrowing_dstar.py        # bisect to find D* exactly

Repository contents

README.md                this file
paper.tex / paper.pdf    manuscript
references.bib           bibliography
Makefile                 build pipeline
INDEX.md                 file map
NOTICE                   license

embeddings/
  corpus.jsonl                            2000 sentences, 10 domains
  domains.json                            domain block layout
  gemini-embedding-001.npy                (2000, 768)  cached
  openai-text-embedding-3-large.npy       (2000, 3072) cached

scripts/
  crt.py                                  self-contained CRT (no external deps)
  01_compute_embeddings.py                fetch + cache embeddings (4 providers)
  02_headline_2x2.py                      Procrustes vs P+Lohe, in vs held-out
  03_dimensional_collapse.py              D-sweep — headline plot
  04_encoder_zoo.py                       pairwise + triangle closure
  05_anchor_scaling.py                    N-anchors curve
  06_pas_floor_ablation.py                legality corridor sweep
  07_domain_holdout.py                    leave-one-domain-out
  08_negative_control.py                  random rotation + noise sweep
  09_narrowing_dstar.py                   bisect to pin D* exactly
  plot_all.py                             regenerate figures from data/
  reproduce_all.py                        run every experiment end-to-end

data/                                     frozen JSON artifacts per experiment
figures/                                  publication figures

License and patent

This work is released under the PolyForm Noncommercial License 1.0.0. Free for research, education, and noncommercial use. Contact us for commercial licensing.

Citation

@article{Sharpe2026PlatonicRotation,
  author = {Michael Sharpe},
  title  = {The Platonic Rotation: Constructive Alignment Between
            Text Embedding Models via Lohe Dynamics on the Hypersphere},
  year   = {2026}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors