CAPR: Computer Assisted Proto-language Reconstruction

CAPR is a Dockerized toolkit for comparative linguistics, combining:

Cognate board management — track cognate sets across related languages
FST development — build and test finite-state transducers for sound change modeling
Web interface — Svelte UI for data exploration and FST debugging

Project Components

This repository contains two parallel research pipelines sharing common infrastructure:

Component	Description	Status
Germanic/	Proto-Germanic → Old English sound changes	Active development (62 of 1057 rows mismatched)
Burmish/	Proto-Burmish reconstruction	Maintenance mode

Both use the same web application and FST tooling but have independent data, transducers, and documentation.

Quick Start

# Start the Docker stack (Flask API + Svelte UI)
docker compose up -d

# In another terminal, proxy through Caddy
caddy run --config Caddyfile.dev

# Open http://localhost:5002

The Docker stack mounts Germanic/ by default. To switch to Burmish, edit docker-compose.yml to mount Burmish/ paths instead.

Repository Structure

capr-v3-working/
├── Germanic/               # Proto-Germanic → OE pipeline
│   ├── data/               # TSV wordlists (germanic-aligned-final.tsv)
│   ├── fsts/               # FST sources (germanic.txt)
│   ├── tools/              # Python analysis tools
│   └── docs/               # DEV_NOTES.md, debug snapshots
├── Burmish/                # Proto-Burmish pipeline
│   ├── data/               # Burmish wordlists
│   ├── fsts/               # burmish.txt
│   └── ...
├── app/                    # Shared web application
│   ├── backend/            # Flask API (compare_fst.py, refish.py, etc.)
│   └── frontend/           # Svelte cognate board UI
├── server/                 # Docker working directory (code copied here)
├── cognate-app/            # Svelte source (mirrored in app/frontend/)
├── docs/                   # Shared documentation
│   ├── references/         # Scholarly sources (PDFs, text extracts)
│   ├── runbook.md          # Operational checklist
│   └── README.md           # Documentation index
├── build/                  # Compiled .bin files (gitignored)
├── docker-compose.yml      # Development stack
└── Caddyfile.dev           # Reverse proxy config

Germanic Pipeline

The active focus is modeling Proto-Germanic → Old English sound changes via FST rules.

Current status: 62 mismatches out of 1057 lexemes (94% accuracy)

Key files:

Germanic/fsts/germanic.txt — Main FST with all sound change rules
Germanic/data/germanic-aligned-final.tsv — Aligned proto-forms and OE targets
Germanic/docs/DEV_NOTES.md — Detailed research log and phonological decisions

Development workflow:

# Compile FST and check results
docker compose exec -T backend bash -c "cd /usr/app && foma -q -l fsts/germanic.txt -e quit"

# Run mismatch analysis
docker compose exec -T backend python3 tools/oe_mismatch_report.py

See Germanic/README.md for detailed documentation.

Burmish Pipeline

Proto-Burmish reconstruction using LingPy/LingRex for cognate detection.

See Burmish/README.md for documentation.

Documentation

docs/README.md — Documentation index
SETUP.md — Full installation guide
USAGE.md — UI walkthrough
docs/runbook.md — Operational checklist
WORKFLOW.md — Development workflow

Citations

Xun Gong & Nathan Hill (2020). Materials for an Etymological Dictionary of Burmish. Zenodo. https://doi.org/10.5281/zenodo.4311182
List, J.-M. & R. Forkel (2022). LingRex. Zenodo.
Hulden, M. (2009). "Foma: a finite-state compiler and library." EACL.

Name		Name	Last commit message	Last commit date
Latest commit History 542 Commits
.copilot/session-state/79de229e-e817-4269-9364-009f42864358		.copilot/session-state/79de229e-e817-4269-9364-009f42864358
Burmish		Burmish
Germanic		Germanic
backend		backend
docs		docs
frontend		frontend
.DS_Store		.DS_Store
.gitignore		.gitignore
Caddyfile		Caddyfile
Caddyfile.dev		Caddyfile.dev
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
dutch.bin		dutch.bin
english.bin		english.bin
english_after_consonant_rules.bin		english_after_consonant_rules.bin
english_after_gh_deletion.bin		english_after_gh_deletion.bin
english_after_gh_marker.bin		english_after_gh_marker.bin
english_after_glide_deletion.bin		english_after_glide_deletion.bin
english_after_initial_kn.bin		english_after_initial_kn.bin
english_after_lme_short_vowel_split.bin		english_after_lme_short_vowel_split.bin
english_after_long_vowel_realisation.bin		english_after_long_vowel_realisation.bin
english_after_orthography.bin		english_after_orthography.bin
english_after_palatalisation.bin		english_after_palatalisation.bin
english_after_postvocalic_r_loss.bin		english_after_postvocalic_r_loss.bin
english_after_pre_me_short_back_lowering.bin		english_after_pre_me_short_back_lowering.bin
english_after_proto_input.bin		english_after_proto_input.bin
english_after_proto_rhotic_fronting.bin		english_after_proto_rhotic_fronting.bin
english_after_proto_to_oe.bin		english_after_proto_to_oe.bin
english_after_proto_to_oe_apocope.bin		english_after_proto_to_oe_apocope.bin
english_after_proto_to_oe_weak_tail.bin		english_after_proto_to_oe_weak_tail.bin
english_after_proto_to_oe_weight_cleanup.bin		english_after_proto_to_oe_weight_cleanup.bin
english_after_proto_to_oe_weight_markers.bin		english_after_proto_to_oe_weight_markers.bin
english_after_rhotic_breaking.bin		english_after_rhotic_breaking.bin
english_after_rhotic_coloring.bin		english_after_rhotic_coloring.bin
english_after_short_a_fronting.bin		english_after_short_a_fronting.bin
english_after_silent_cleanup.bin		english_after_silent_cleanup.bin
english_after_surface.bin		english_after_surface.bin
english_after_vowel_rules.bin		english_after_vowel_rules.bin
english_after_weak_tail.bin		english_after_weak_tail.bin
english_after_weak_tail_cleanup.bin		english_after_weak_tail_cleanup.bin
english_after_west_germanic.bin		english_after_west_germanic.bin
german.bin		german.bin
old_english.bin		old_english.bin
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAPR: Computer Assisted Proto-language Reconstruction

Project Components

Quick Start

Repository Structure

Germanic Pipeline

Burmish Pipeline

Documentation

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CAPR: Computer Assisted Proto-language Reconstruction

Project Components

Quick Start

Repository Structure

Germanic Pipeline

Burmish Pipeline

Documentation

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages