Official code repository for:
Kumar, A. (2026). Phonosemantic Grounding: Sanskrit as a Formalized Case of Motivated Sign Structure for Interpretable AI (Version 2.0). Zenodo. DOI: 10.5281/zenodo.19564026
Kumar, A. (2026). DDIN: Devavāṇī-Derived Interpretable Network — Sequential Neural ODEs for Phonosemantic Grounding. arXiv:2604.XXXXX.
Kumar, A. (2026). PhonosemanticMeta Benchmark: Evaluating AI Vector Grounding via the Phonosemantic Manifold. Kaggle Competition + arXiv.
This repository contains all experimental code, the phonosemantic embedding layer, and the figure generation script associated with the papers above.
The research program explores two complementary tracks:
-
Phonosemantic Grounding: The paper proposes that the articulatory anatomy of speech production — the five loci of the vocal tract, manner of constriction, phonation type, and somatic resonance — constitutes a physically real coordinate system for AI semantic representations. Sanskrit, with its rigorously formalized phonological system, serves as the proof-of-concept case.
-
DDIN (Devavāṇī-Derived Interpretable Network): A parallel research track exploring whether sequential phoneme encoding through neural ODEs can extract semantic structure from Sanskrit verbal roots without backpropagation or dense connectivity. The "Receiver Model" (W=0) architecture proves that semantics can emerge from heterogeneous neuron physics alone.
The main repository contains the core scripts for Phase 1-3 validation. The /Experiments folder contains all exploratory tracking from Phase 4 (Formant Grounding), Phase 5 (Neuromorphic SNN constraints), and Phase 5B-6 (Sequential ODE + Architecture Ceiling).
| File/Folder | What it does |
|---|---|
phonosemantic_embedding.py |
The core embedding layer. Replaces nn.Embedding with a 10-dimensional physically grounded coordinate system derived from articulatory anatomy. Importable module. |
experiment_v3.py |
Canonical root clustering experiment. 150 Sanskrit verbal roots, 5 phonological groups, 3 statistical tests (Wilcoxon, Mann-Whitney, multinomial classification). All semantic scores derived independently from Monier-Williams dictionary definitions. Reproduces the main results in Section 9 of the paper. |
linear_probe.py |
Linear probe comparison. Tests whether articulatory geometry carries semantic signal beyond phoneme identity alone. Phonosemantic coordinates vs one-hot PCA baseline vs random at equal dimensionality. Reproduces the +14pp result in Section 9.4. |
blind_clustering.py |
Blind clustering experiment. Automatic TF-IDF semantic clustering with permutation test — no manual axis labels. Reproduces the null result reported honestly in Section 9.5. |
make_figure.py |
Generates Figure 1: the phonosemantic manifold map (phonemes in locus × manner space with word trajectory overlays). Requires matplotlib. |
phonosemantic_figure.png |
Figure 1 from the paper — pre-generated for convenience. |
/Experiments |
Contains Phase 4 (Continuous Formant Grounding), Phase 5 (SNN simulations), and Phase 5B-6 (Sequential ODE breakthrough + architecture ceiling). |
generate_submission_v21.py |
Competition submission generator for Kaggle. Uses v21 sequential ODE model. |
submission_v21.csv |
Competition predictions: 42% accuracy, ARI = 0.0538 |
git clone https://github.com/HmbleCreator/PhonoSemantics.git
cd PhonoSemantics
pip install -r requirements.txtRun in this order — each script builds on the root data defined in experiment_v3.py:
# 1. Main clustering experiment (reproduces Section 9.1–9.3)
python experiment_v3.py
# 2. Linear probe (reproduces Section 9.4)
python linear_probe.py
# 3. Blind clustering with permutation test (reproduces Section 9.5)
python blind_clustering.py
# 4. Regenerate Figure 1
python make_figure.pyAll scripts are self-contained and print results to stdout. No GPU required. Runs on CPU in under 2 minutes total.
The Experiments/ directory contains the complete chronological chain of advanced empirical validation.
| Script | What it does |
|---|---|
ddin_exp15_formant_grounding.py |
Continuous-Time Formant Grounding (Section 10). Employs real F1/F2 acoustic frequencies. |
ddin_exp16_weighted_formant.py |
Establishes the ARI = 0.0366 unsupervised baseline mapping. |
| Script | What it does |
|---|---|
ddin_exp17_snn_pynn.py |
PyNN deployment. Identifies the Epileptiform Synchrony Limit (~500 Hz). |
ddin_exp18_snn_inhibition.py |
Tests if lateral inhibition prevents seizure state. |
ddin_exp19_snn_wta.py |
Global Winner-Take-All architecture. |
| Script | Key Result |
|---|---|
ddin_exp21_sequential_ode.py |
ARI = 0.0591 — shatters static 0.037 ceiling (+61%) |
ddin_exp22_grpo.py |
GRPO with discrete ARI reward — flat gradient |
ddin_exp24_grpo_supervised.py |
Supervised centroid reward — no improvement |
ddin_exp28b_structured_w.py |
Structured W initialization — best ARI = 0.0592 |
ddin_exp29_two_layer.py |
Two-layer hierarchy — +0.017 delta |
ddin_exp30_contrastive.py |
Contrastive GRPO — ARI = 0.0690 |
ddin_exp31_supervised_centroid.py |
Supervised on Layer 2 — ARI = 0.0690 |
To reproduce the key experiments:
cd Experiments
python ddin_exp21_sequential_ode.py # Best model: ARI = 0.0591
python ddin_exp29_two_layer.py # Two-layer baseline(Note: Phase 5 experiments require pyNN and Brian2 backend.)
from phonosemantic_embedding import PhonosemantikEmbedding
# Initialize
embed = PhonosemantikEmbedding()
# Get 10D coordinate for a phonological group
vec = embed.get_group_vector('LABIAL') # returns np.array shape (10,)
# Get centroid trajectory for a root
centroid = embed.get_root_centroid('LABIAL') # mean over phoneme vectors
# Full coordinate breakdown
print(embed.describe('THROAT'))
# → locus: [1,0,0,0,0,0], manner: 0.30, phonation: [0.5, 0.40], resonance: 1.0Every phoneme maps to a 10-dimensional vector:
φ(p) = ( ℓ(p), α(p), β(p), ρ(p) )
| Dimension | Description | Size |
|---|---|---|
| D1 — Articulation locus | Where in the vocal tract (throat, palate, cerebral, dental, labial, nasal) | 6D |
| D2 — Articulation manner | Degree of closure (0 = full stop, 1 = fully open) | 1D |
| D3 — Phonation type | (voicing, breath force) | 2D |
| D4 — Somatic resonance | Primary body region of proprioceptive feedback (R1–R5, spinal axis) | 1D |
Total: 10 dimensions, fixed by articulatory anatomy. No statistical learning required for the coordinate system itself.
| Experiment | Result |
|---|---|
| Axis clustering (Test 2, between-group) | 5/5 groups significant, all p < 0.001 |
| Multinomial classification | 41.3% vs 20% chance, p ≈ 10⁻¹⁴ |
| Linear probe — phonosemantic geometry | 63.3% ± 10.5% |
| Linear probe — phoneme identity baseline | 49.3% ± 6.8% |
| Linear probe — geometry advantage | +14.0 percentage points, p < 0.001 |
| Blind TF-IDF clustering | ARI = 0.007, p = 0.143 (not significant — reported in full) |
| Phase 4: Acoustic Formant Grounding (Sec 10) | ARI = 0.0366 via unsupervised F1/F2 continuous clustering. |
| Phase 5: SNN Validation Limit (Sec 11) | Epileptiform Synchrony Limit identified at ~500 Hz when modeling temporal context extension. |
| Phase 5B: Sequential ODE (v21) | ARI = 0.0591 — sequential phoneme encoding via Neural ODE shatters static ceiling (+61% improvement). |
| Phase 5B-6: GRPO & Architecture (v22-v31) | All reward variants converge to ~0.06 ARI ceiling. Two-layer hierarchy adds +0.017 delta. |
| Kaggle Competition Submission | Accuracy: 42%, ARI: 0.0538 (v21 model) — submitted April 2026 |
Kumar, A. (2026). Phonosemantic Grounding: Sanskrit as a Formalized Case of Motivated Sign Structure for Interpretable AI (Version 2.0). Zenodo. DOI: 10.5281/zenodo.19564026
Core claim: The articulatory anatomy of speech production (locus, manner, phonation, resonance) constitutes a physically grounded coordinate system for AI semantic representations — no statistical learning required for the coordinate system itself.
Key results:
- 41.3% multinomial classification (vs 20% chance)
- +14pp linear probe advantage over phoneme identity baseline
- ARI = 0.0366 via F1/F2 formant clustering (Phase 4)
- Epileptiform Synchrony Limit at ~500 Hz (Phase 5 SNN)
Kumar, A. (2026). DDIN: Devavāṇī-Derived Interpretable Network — Sequential Neural ODEs for Phonosemantic Grounding. arXiv:2604.XXXXX.
Core claim: Sequential phoneme encoding through neural ODEs can extract semantic structure from Sanskrit verbal roots without backpropagation or dense recurrent connectivity.
Key results:
- ARI = 0.0591 (v21 sequential ODE) — +61% over static baseline
- Receiver Model validated: W=0 (no recurrent weights) achieves ~0.06 ARI
- Two-layer hierarchy adds +0.017 delta
- All optimization strategies (GRPO, supervised, contrastive) converge to ~0.06 ceiling
- Precise architecture ceiling measurement for this task
Functional claim (publication-ready):
A zero-weight, 128-neuron sequential ODE reservoir achieves ARI ~0.06 on unsupervised semantic clustering of Sanskrit roots — the practical ceiling for this architecture.
Kumar, A. (2026). PhonosemanticMeta Benchmark: Evaluating AI Vector Grounding via the Phonosemantic Manifold. Kaggle Competition + arXiv.
Core claim: The benchmark evaluates whether frontier AI systems possess intrinsic semantic grounding by testing them against a physically real coordinate system (Locus × Manner × Phonation × Resonance). By enforcing a strict Vector Grounding Score (VGS) threshold, it empirically exposes the "Grounding Gap" in modern LLMs.
The 8 Benchmark Tasks:
| Task | Name | What it tests | VGS Result |
|---|---|---|---|
| T1 | Axis Prediction | Semantic clustering of 150 Sanskrit roots into 5 phenomenological axes | Pass |
| T2 | Phonological Siblings | Same-locus root similarity vs cross-locus | Pass |
| T3 | Fabricated Roots | Novel root prediction from phoneme patterns | Pass |
| T4 | Cross-Locus Distance | Semantic distance across articulation loci | Fail (positional bias) |
| T5 | Rule Generalization | Paninian rule application to held-out roots | Pass |
| T6 | Trajectories | Mapping word trajectories to phenomenological arcs | Fail (95% overconfidence, all "Arc A") |
| T7 | Triplets | Harmonic coherence: anchor-to-locus matching | Fail (0% confidence collapse) |
| T8 | Phonation | Mapping breath force to motor-unit recruitment | Fail (0% confidence collapse) |
Key finding: Tasks 7 & 8 produce 0% confidence collapses in frontier models (Gemini 2.5 Flash). The model guesses the correct physical answer but reports zero internal certainty — proving the absence of bodily grounding in statistical embeddings.
VGS Metric: VGS = (Is_Correct) × (Confidence ≥ 70%)
The /Experiments folder now contains the complete Phase 5B-6 experimental chain documenting the transition from static embeddings to sequential Neural ODE processing.
| Script | Description | Key Result |
|---|---|---|
ddin_exp21_sequential_ode.py |
Sequential phoneme encoding through 128-neuron ODE reservoir. W=0 (Receiver Model). | ARI = 0.0591 — shatters static 0.037 ceiling |
ddin_exp22_grpo.py |
GRPO optimization on α (decay) topology with discrete ARI reward. | ARI = 0.0411 — flat gradient (discrete metric) |
ddin_exp24_grpo_supervised.py |
Supervised centroid reward on α+β. | ARI = 0.0538 — no improvement |
ddin_exp28b_structured_w.py |
Structured W initialization (Dhātu-like clusters). | Best: ARI = 0.0592 (10 clusters) |
ddin_exp29_two_layer.py |
Two-layer hierarchical architecture (Layer 1: phoneme encoder, Layer 2: semantic organizer). | ARI = 0.0492 (+0.017 delta) |
ddin_exp30_contrastive.py |
Contrastive GRPO on two-layer architecture. | ARI = 0.0690 |
ddin_exp31_supervised_centroid.py |
Supervised centroid on Layer 2. | ARI = 0.0690 |
- Sequential ODE breakthrough (v21): ARI = 0.0591 — a +61% improvement over static embedding baseline
- The 0.06 ceiling: All reward variants (unsupervised, supervised, contrastive) converge to ~0.06 regardless of:
- Number of layers (1 or 2)
- Reward type
- W structure (zero, random, structured)
- Two-layer adds signal: Exp 29 showed +0.017 delta from hierarchy
- The ceiling is a feature: It's a precise measurement of single-layer heterogeneous ODE reservoir capacity on this specific task — not a universal limitation
The Phase 5-6 investigation produces a complete, precise measurement:
- Exact ceiling mapped from every direction
- Architecture capabilities precisely quantified
- Hierarchical processing adds signal but insufficient
- Clean, negative-but-precise result is scientifically valuable
Functional claim (ready for publication):
A zero-weight, 128-neuron sequential ODE reservoir achieves ARI ~0.06 on unsupervised semantic clustering of Sanskrit roots — the practical ceiling for this architecture.
@article{kumar2026phonosemanticmeta,
author = {Kumar, Amit},
title = {PhonosemanticMeta Benchmark: Evaluating AI Vector Grounding
via the Phonosemantic Manifold},
journal = {Kaggle Competition + arXiv preprint},
year = {2026},
eprint = {2604.XXXXX},
archivePrefix = {arXiv}
}
@article{kumar2026ddin,
author = {Kumar, Amit},
title = {DDIN: Devavāṇī-Derived Interpretable Network — Sequential Neural
ODEs for Phonosemantic Grounding},
journal = {arXiv preprint},
year = {2026},
eprint = {2604.XXXXX},
archivePrefix = {arXiv}
}
@misc{kumar2026phonosemantic,
author = {Kumar, Amit},
title = {Phonosemantic Grounding: {Sanskrit} as a Formalized Case
of Motivated Sign Structure for Interpretable {AI}},
year = {2026},
publisher = {Zenodo},
version = {2.0},
doi = {10.5281/zenodo.19564026},
url = {https://doi.org/10.5281/zenodo.19564026}
}Code: MIT License Paper: CC BY 4.0
Amit Kumar — Independent Researcher, Bihar, India GitHub: @HmbleCreator