This repository contains three papers, all preserved and clearly labelled:
| Paper | Status | Where to read it |
|---|---|---|
| V1 (Feb 2026) | preserved unchanged for v1-statistic reproducibility | paper_v1_archived.md |
| V2 (May 2026, DOI 10.5281/zenodo.20023733) | preserved for v2 reproducibility | paper_v2.md / paper_v2.tex / paper_v2.pdf |
| V3 (May 2026) | current canonical paper | paper_v3.md |
For the full layout breakdown — which scripts belong to v1, which to v2/v3, what's shared — see PAPER_V1_VS_V2_LAYOUT.md.
Author: Kameldip Singh Basra (kameldipbasra@gmail.com) Repository: https://github.com/kamb-code/Voynich Zenodo V3 (this release): 10.5281/zenodo.20072618 — concept-DOI 10.5281/zenodo.18598229 Zenodo V2 (previous): 10.5281/zenodo.20023733 V3 release date: 2026-05-07
The Voynich Manuscript (Beinecke MS 408, carbon-dated 1404–1438) is identified as a 15th-century Sri Lankan Elu-Sinhala pharmaceutical text — a working pharmacist's compressed reference recording Ayurvedic preparations in a bespoke abugida.
| Claim | Confidence |
|---|---|
| South Asian Indic substrate | ~97% |
| Sri Lankan provenance specifically | ~94% |
| Sinhala/Elu specifically (vs Pali sister) | ~91% |
| Pre-12c Elu chronolect | ~82% |
| Working-pharmacist register (vs literary canon) | ~98% |
| P(Sinhala identification wrong) | ~5-8% |
Decoder: V17 (scripts/v17_decoder.py), with daiiin → gena patch applied.
Strongest single evidence streams:
- Cross-corpus hostile-reviewer test: Sri Lankan medical 33.9-36.3% vs pan-Indic Sanskrit medical 10.2-10.7% under 50,000-token size-matched control (~3.4× ratio; pre-registered ≥1.5× criterion exceeded by 4× at the mean)
- Parallel-recipe template matching: f75r line 38 V17 output
q-keda q-keda q-keda q-keda q-keda ladais structurally identical to Bodleian MS Sinh.a.2(R)kalandayi kalandayi…enumeration; Bonferroni-corrected P ≈ 10⁻⁷ over ~26,000 line × template comparisons - External state-marker grounding:
leda("disease") attested 60× in K.D. Somadasa 1996 Wellcome catalogue (33 distinct disease-stems);seda("fomentation") in DPD Pali + Caraka 126× sveda compounds - Gaskell classifier replication: V17-decoded Voynich crosses to meaningful (P=58.9%); authentic Sinhala recipe text classifies as gibberish (P=43.7%); raw Voynich classifies as gibberish (P=24-34%, replicating Gaskell's published result)
- Vinaya / Samantapāsādikā falsification probes: 0 of 460,000+ Pali canonical/commentary tokens match VPNS state-markers; <3% type overlap on the medicines-specific subsection
release_v2/ ~63 MB total
├── README.md ← you are here
├── MANIFEST.md, REPRODUCTION.md, UPLOAD_INSTRUCTIONS.md ← release docs
├── LICENSE (CC-BY-4.0), CITATION.cff, .zenodo.json ← academic metadata
├── AUDIT_NOTES.md ← v1 audit (preserved)
├── smoke_test.py ← end-to-end validation
├── run_all.sh ← full validation rebuild
│
├── main.tex ← ★ CURRENT PAPER (LaTeX source, paper v2)
├── main.pdf ← ★ CURRENT PAPER (PDF, 22 pages, A4)
├── paper.md ← ★ CURRENT PAPER (markdown source)
├── paper_v2.tex / paper_v2.pdf ← same as main.tex/pdf, alternate names
├── paper_v1_archived.md ← Feb 2026 paper, preserved for v1-statistic reproducibility
├── references.bib ← bibliography (corrected Gaskell-Bowern citation)
│
├── scripts/ ← 32 Python scripts (decoders, analysis, tests, translation)
├── data/ ← EVA transcription, vocabularies, dictionaries (9 files)
├── translation/ ← V17 corpus DB + translation outputs (4 files)
├── supplementary/ ← 39 substantive analysis writeups + reviewer packages
├── references/medical_corpus/ ← cleaned comparison corpora (Sarartha, BM, Vinaya, Samantapāsādikā, chronicles, Niganduwa, 8 pan-Indic)
└── results/ ← validation outputs from run_all.sh
Top-level layout matches the existing Paper/ directory in https://github.com/kamb-code/Voynich for drop-in replacement.
Total: ~63 MB on disk; 142 files. See MANIFEST.md for complete inventory.
# Clone or download release
cd release_v2/
# Validate end-to-end (5 sec)
python3 smoke_test.py
# Expected: ✓ ALL SMOKE TESTS PASSED
# Run the canonical decoder
python3 -c "
import sys; sys.path.insert(0, 'scripts')
from v17_decoder import decode_v17
print(decode_v17('qokeedy')) # → q-keda
"
# Reproduce the hostile-reviewer cross-corpus test (size-matched figures)
python3 scripts/hostile_reviewer/cross_corpus_analysis.py
# Reproduce the Bowern-suite metrics
python3 scripts/bowern_suite_metrics.py
# Generate the V17 translation (~30 sec)
python3 scripts/translate_book_v17.py
# Compile the paper (xelatex required)
xelatex main.tex && xelatex main.texFull replication instructions in REPRODUCTION.md.
- V17 decoder canonical (resolves paper v1 §15 "primary open question": u-prefix anomaly 13.8% → 5.0%)
- Bowern-Gaskell engagement (§4.13): 6-metric Bowern-suite + replicated random-forest classifier; recipe-register confound demonstrated
- Hostile-reviewer cross-corpus test: 3.4× SL/pan-Indic ratio under size-matched control
- Two falsification probes passed: Vinaya Bhesajjakkhandhaka 0/195K + Samantapāsādikā 0/265K VPNS markers
- External state-marker grounding:
leda(Somadasa 60×, 33 disease-stems) +seda(DPD + Caraka 126×) - Compositional grounding: V17 X-leda 14 types matches Somadasa 33 disease-stems; V17 X-seda 9 types matches Caraka 9 sveda compounds
- Cūḷavaṃsa 37.146 primary-source documentation of Buddhadāsa's medical compendium (Sinhalese tradition identifies this with the Sārārtha Saṃgrahaya, our 36.3% size-matched-overlap source)
- Elu chronolect revalidation: 81% Elu-native, 0% post-12c Sanskrit loans in LOCKED vocabulary
- Bhesajjamañjūsā re-OCR: 27.5% (corrupted Devanagari OCR) → 33.9% (clean pdftotext, size-matched)
- q-/ch- as phonologically-conditioned allomorphs of one deictic morpheme (revised 2026-05-04 from "two morphemes — definite article + demonstrative")
- Register-specific grammar: formal BNF, slot-occupancy statistics, ~14% residue
- Polysemy disambiguation: section/collocation rules for ura/gara/kara/meda/etc.
- Scribal normalization: ~5,200 tokens get clearer reading; line-initial a-/g- strip eliminates ~30 phantom lexemes
- Parallel-recipe template matching: 10-15 V17 lines map at ≥70% slot-fill to attested templates; Match #1 (f75r L38) is structurally identical to Bodleian enumeration
- Botanist's review packet: all 112 herbal folios with Beinecke IIIF image URLs + tentative IDs + tailored questions
- Specialist outreach package: 17 named scholars across 3 disciplines + email templates
The paper is honest about what it does NOT yet establish:
- No Sinhala/Elu specialist linguistic validation of the decoded prose. Materials prepared (
SPECIALIST_OUTREACH_PACKAGE.md); user-decision pending on outreach. - No trained-eye botanical verification. Materials prepared (
BOTANIST_DOSSIER.md). - VPNS specific kalpana taxonomy partially validated — only 2 of 18 state-markers (leda, seda) externally grounded; the symmetric 12×2 extension across all base classes is project-extrapolation.
- Sister-language indistinguishability: Pali, Maharashtri, Konkani at typological level cannot be ruled out by lexical evidence alone; the Bhesajjamañjūsā 33.9% size-matched figure is comparable to Sārārtha 36.3%, and the substrate-labeling question (Sinhala/Elu working notation vs Sri Lankan Sinhala-Pali medical register) remains the genuine open question.
- The decoded text is recipe-register, not narrative prose. Lines read as compressed pharmacy notation (operators + preparation classes + state markers + targets), not English-translatable sentences. See
REGISTER_SAMPLES.mdfor the correctly-framed examples.
@misc{basra2026voynich,
title = {A Candidate Decipherment of the Voynich Manuscript: Evidence for a Phonetic Transcription of Spoken Elu-Sinhala (V2)},
author = {Basra, Kameldip Singh},
year = {2026},
month = {May},
doi = {10.5281/zenodo.20023733},
url = {https://doi.org/10.5281/zenodo.20023733},
note = {Concept DOI 10.5281/zenodo.18598229 resolves to most recent version}
}Beinecke Rare Book and Manuscript Library (digital access to MS 408); Stolfi, Takahashi, and the Voynich research community (foundational EVA transcription); Daniel Gaskell (open-source release of his random-forest classifier code, github.com/danielgaskell/voynich, made the §4.13 replication possible). The Buddhist temples of Sri Lanka, whose inscriptions provided the visual spark for this investigation. Anthropic Claude Opus as AI coding assistant.