Voynich Manuscript — Decipherment Repository (V3 current)

This repository contains three papers, all preserved and clearly labelled:

Paper	Status	Where to read it
V1 (Feb 2026)	preserved unchanged for v1-statistic reproducibility	`paper_v1_archived.md`
V2 (May 2026, DOI 10.5281/zenodo.20023733)	preserved for v2 reproducibility	`paper_v2.md` / `paper_v2.tex` / `paper_v2.pdf`
V3 (May 2026)	current canonical paper	`paper_v3.md`

For the full layout breakdown — which scripts belong to v1, which to v2/v3, what's shared — see PAPER_V1_VS_V2_LAYOUT.md.

Author: Kameldip Singh Basra (kameldipbasra@gmail.com) Repository: https://github.com/kamb-code/Voynich Zenodo V3 (this release): 10.5281/zenodo.20072618 — concept-DOI 10.5281/zenodo.18598229 Zenodo V2 (previous): 10.5281/zenodo.20023733 V3 release date: 2026-05-07

TL;DR

The Voynich Manuscript (Beinecke MS 408, carbon-dated 1404–1438) is identified as a 15th-century Sri Lankan Elu-Sinhala pharmaceutical text — a working pharmacist's compressed reference recording Ayurvedic preparations in a bespoke abugida.

Claim	Confidence
South Asian Indic substrate	~97%
Sri Lankan provenance specifically	~94%
Sinhala/Elu specifically (vs Pali sister)	~91%
Pre-12c Elu chronolect	~82%
Working-pharmacist register (vs literary canon)	~98%
P(Sinhala identification wrong)	~5-8%

Decoder: V17 (scripts/v17_decoder.py), with daiiin → gena patch applied.

Strongest single evidence streams:

Cross-corpus hostile-reviewer test: Sri Lankan medical 33.9-36.3% vs pan-Indic Sanskrit medical 10.2-10.7% under 50,000-token size-matched control (~3.4× ratio; pre-registered ≥1.5× criterion exceeded by 4× at the mean)
Parallel-recipe template matching: f75r line 38 V17 output q-keda q-keda q-keda q-keda q-keda lada is structurally identical to Bodleian MS Sinh.a.2(R) kalandayi kalandayi… enumeration; Bonferroni-corrected P ≈ 10⁻⁷ over ~26,000 line × template comparisons
External state-marker grounding: leda ("disease") attested 60× in K.D. Somadasa 1996 Wellcome catalogue (33 distinct disease-stems); seda ("fomentation") in DPD Pali + Caraka 126× sveda compounds
Gaskell classifier replication: V17-decoded Voynich crosses to meaningful (P=58.9%); authentic Sinhala recipe text classifies as gibberish (P=43.7%); raw Voynich classifies as gibberish (P=24-34%, replicating Gaskell's published result)
Vinaya / Samantapāsādikā falsification probes: 0 of 460,000+ Pali canonical/commentary tokens match VPNS state-markers; <3% type overlap on the medicines-specific subsection

What's in this folder (matches existing GitHub Paper/ structure)

release_v2/                                ~63 MB total
├── README.md                              ← you are here
├── MANIFEST.md, REPRODUCTION.md, UPLOAD_INSTRUCTIONS.md  ← release docs
├── LICENSE (CC-BY-4.0), CITATION.cff, .zenodo.json       ← academic metadata
├── AUDIT_NOTES.md                         ← v1 audit (preserved)
├── smoke_test.py                          ← end-to-end validation
├── run_all.sh                             ← full validation rebuild
│
├── main.tex                               ← ★ CURRENT PAPER (LaTeX source, paper v2)
├── main.pdf                               ← ★ CURRENT PAPER (PDF, 22 pages, A4)
├── paper.md                               ← ★ CURRENT PAPER (markdown source)
├── paper_v2.tex / paper_v2.pdf            ← same as main.tex/pdf, alternate names
├── paper_v1_archived.md                   ← Feb 2026 paper, preserved for v1-statistic reproducibility
├── references.bib                         ← bibliography (corrected Gaskell-Bowern citation)
│
├── scripts/                               ← 32 Python scripts (decoders, analysis, tests, translation)
├── data/                                  ← EVA transcription, vocabularies, dictionaries (9 files)
├── translation/                           ← V17 corpus DB + translation outputs (4 files)
├── supplementary/                         ← 39 substantive analysis writeups + reviewer packages
├── references/medical_corpus/             ← cleaned comparison corpora (Sarartha, BM, Vinaya, Samantapāsādikā, chronicles, Niganduwa, 8 pan-Indic)
└── results/                               ← validation outputs from run_all.sh

Top-level layout matches the existing Paper/ directory in https://github.com/kamb-code/Voynich for drop-in replacement.

Total: ~63 MB on disk; 142 files. See MANIFEST.md for complete inventory.

Quick reproduction

# Clone or download release
cd release_v2/

# Validate end-to-end (5 sec)
python3 smoke_test.py
# Expected: ✓ ALL SMOKE TESTS PASSED

# Run the canonical decoder
python3 -c "
import sys; sys.path.insert(0, 'scripts')
from v17_decoder import decode_v17
print(decode_v17('qokeedy'))  # → q-keda
"

# Reproduce the hostile-reviewer cross-corpus test (size-matched figures)
python3 scripts/hostile_reviewer/cross_corpus_analysis.py

# Reproduce the Bowern-suite metrics
python3 scripts/bowern_suite_metrics.py

# Generate the V17 translation (~30 sec)
python3 scripts/translate_book_v17.py

# Compile the paper (xelatex required)
xelatex main.tex && xelatex main.tex

Full replication instructions in REPRODUCTION.md.

What was added in v2 (since paper v1, February 2026)

V17 decoder canonical (resolves paper v1 §15 "primary open question": u-prefix anomaly 13.8% → 5.0%)
Bowern-Gaskell engagement (§4.13): 6-metric Bowern-suite + replicated random-forest classifier; recipe-register confound demonstrated
Hostile-reviewer cross-corpus test: 3.4× SL/pan-Indic ratio under size-matched control
Two falsification probes passed: Vinaya Bhesajjakkhandhaka 0/195K + Samantapāsādikā 0/265K VPNS markers
External state-marker grounding: leda (Somadasa 60×, 33 disease-stems) + seda (DPD + Caraka 126×)
Compositional grounding: V17 X-leda 14 types matches Somadasa 33 disease-stems; V17 X-seda 9 types matches Caraka 9 sveda compounds
Cūḷavaṃsa 37.146 primary-source documentation of Buddhadāsa's medical compendium (Sinhalese tradition identifies this with the Sārārtha Saṃgrahaya, our 36.3% size-matched-overlap source)
Elu chronolect revalidation: 81% Elu-native, 0% post-12c Sanskrit loans in LOCKED vocabulary
Bhesajjamañjūsā re-OCR: 27.5% (corrupted Devanagari OCR) → 33.9% (clean pdftotext, size-matched)
q-/ch- as phonologically-conditioned allomorphs of one deictic morpheme (revised 2026-05-04 from "two morphemes — definite article + demonstrative")
Register-specific grammar: formal BNF, slot-occupancy statistics, ~14% residue
Polysemy disambiguation: section/collocation rules for ura/gara/kara/meda/etc.
Scribal normalization: ~5,200 tokens get clearer reading; line-initial a-/g- strip eliminates ~30 phantom lexemes
Parallel-recipe template matching: 10-15 V17 lines map at ≥70% slot-fill to attested templates; Match #1 (f75r L38) is structurally identical to Bodleian enumeration
Botanist's review packet: all 112 herbal folios with Beinecke IIIF image URLs + tentative IDs + tailored questions
Specialist outreach package: 17 named scholars across 3 disciplines + email templates

Honest limitations

The paper is honest about what it does NOT yet establish:

No Sinhala/Elu specialist linguistic validation of the decoded prose. Materials prepared (SPECIALIST_OUTREACH_PACKAGE.md); user-decision pending on outreach.
No trained-eye botanical verification. Materials prepared (BOTANIST_DOSSIER.md).
VPNS specific kalpana taxonomy partially validated — only 2 of 18 state-markers (leda, seda) externally grounded; the symmetric 12×2 extension across all base classes is project-extrapolation.
Sister-language indistinguishability: Pali, Maharashtri, Konkani at typological level cannot be ruled out by lexical evidence alone; the Bhesajjamañjūsā 33.9% size-matched figure is comparable to Sārārtha 36.3%, and the substrate-labeling question (Sinhala/Elu working notation vs Sri Lankan Sinhala-Pali medical register) remains the genuine open question.
The decoded text is recipe-register, not narrative prose. Lines read as compressed pharmacy notation (operators + preparation classes + state markers + targets), not English-translatable sentences. See REGISTER_SAMPLES.md for the correctly-framed examples.

Citation

@misc{basra2026voynich,
  title  = {A Candidate Decipherment of the Voynich Manuscript: Evidence for a Phonetic Transcription of Spoken Elu-Sinhala (V2)},
  author = {Basra, Kameldip Singh},
  year   = {2026},
  month  = {May},
  doi    = {10.5281/zenodo.20023733},
  url    = {https://doi.org/10.5281/zenodo.20023733},
  note   = {Concept DOI 10.5281/zenodo.18598229 resolves to most recent version}
}

Acknowledgments

Beinecke Rare Book and Manuscript Library (digital access to MS 408); Stolfi, Takahashi, and the Voynich research community (foundational EVA transcription); Daniel Gaskell (open-source release of his random-forest classifier code, github.com/danielgaskell/voynich, made the §4.13 replication possible). The Buddhist temples of Sri Lanka, whose inscriptions provided the visual spark for this investigation. Anthropic Claude Opus as AI coding assistant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voynich Manuscript — Decipherment Repository (V3 current)

TL;DR

What's in this folder (matches existing GitHub Paper/ structure)

Quick reproduction

What was added in v2 (since paper v1, February 2026)

Honest limitations

Citation

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github/workflows		.github/workflows
data		data
references		references
results		results
scripts		scripts
supplementary		supplementary
translation		translation
.dockerignore		.dockerignore
.gitignore		.gitignore
.zenodo.json		.zenodo.json
AUDIT_NOTES.md		AUDIT_NOTES.md
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.md		MANIFEST.md
PAPER_V1_VS_V2_LAYOUT.md		PAPER_V1_VS_V2_LAYOUT.md
README.md		README.md
REPRODUCTION.md		REPRODUCTION.md
SESSION_NOTES_v14.md		SESSION_NOTES_v14.md
SESSION_NOTES_v8.md		SESSION_NOTES_v8.md
SESSION_NOTES_v9.md		SESSION_NOTES_v9.md
UNRESOLVED_ISSUES_FOR_REVIEW.md		UNRESOLVED_ISSUES_FOR_REVIEW.md
UPLOAD_INSTRUCTIONS.md		UPLOAD_INSTRUCTIONS.md
VALIDATION_LOG.md		VALIDATION_LOG.md
main.pdf		main.pdf
main.tex		main.tex
paper.md		paper.md
paper_framework.md		paper_framework.md
paper_v1_archived.md		paper_v1_archived.md
paper_v2.md		paper_v2.md
paper_v2.pdf		paper_v2.pdf
paper_v2.tex		paper_v2.tex
paper_v3.md		paper_v3.md
paper_v3.pdf		paper_v3.pdf
references.bib		references.bib
requirements.txt		requirements.txt
run_all.sh		run_all.sh
smoke_test.py		smoke_test.py

Folders and files

Latest commit

History

Repository files navigation

Voynich Manuscript — Decipherment Repository (V3 current)

TL;DR

What's in this folder (matches existing GitHub Paper/ structure)

Quick reproduction

What was added in v2 (since paper v1, February 2026)

Honest limitations

Citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages