foccaciabot

Two things live here, and they're connected:

An interactive focaccia calculator — a quality-driven dashboard. You drive the qualities you want (open crumb, tang, flake, crust, richness, salt) and pick a style; an inverse model solves back to a full baker's-percentage formula and method, live.
A RAG reference corpus — cleaned notes distilled from three baking-science books, used to ground the calculator's coefficients (and any downstream focaccia assistant) in why a formula behaves the way it does — hydration, fermentation, gluten development, oven dynamics — rather than just recipes.

The calculator's dial-science is cited straight to the corpus: every coefficient in the model traces to a chunk (see CITES in src/focaccia-model.js).

The calculator

File	What it is
`src/focaccia-model.js`	The model — the meat. A two-layer system (recipe → latent dough state → sensory qualities) plus an identity-fixed inverse that solves qualities → recipe by gradient descent. Exports `qualities()`, `solveWithin()`, `solveConforming()`, `classify()`, `deviations()`, `QUALITY_AXES`, `IDENTITY_KEYS`.
`focaccia-build-sheet.jsx`	The React UI — a slider per quality axis, style picker, live recipe + step-by-step method, theming (light / dark / a GeoCities skin), and toppings.
`src/main.jsx` + `index.html`	Vite harness that mounts the build sheet for standalone local dev.

The model's central idea: a focaccia's identity (ferment schedule, durum-semola vs. plain wheat, two-pan deep bake) is never solved away — it's what makes a style what it is. The continuous levers (hydration, lamination folds, oils, salt) are tuned within that identity to hit your target qualities.

Run it

npm install
npm run dev        # Vite dev server on http://localhost:5173
npm run build      # production bundle → dist/

Used by the blog

This repo is the single source of truth for the focaccia widget on the 4AM365 blog. It's consumed as a git dependency (github:4AM365/foccaciabot, package name focaccia-widget) — the blog holds only a one-line re-export shim, not a copy. The blog bundles it with esbuild, aliasing React → preact/compat at its own build step, so the component imports plain react here and stays portable across both build systems.

To ship a change to the live widget, run the integration pipeline from this repo. It commits + pushes here, then in the blog pulls the new commit, rebuilds the widget bundle, syncs the model doc, and commits + pushes (which auto-deploys):

npm run publish:blog -- "your change description"
# equivalently: node scripts/publish.mjs "your change description"

docs/focaccia-model.md is the canonical model-equations explainer; the pipeline publishes a copy to the blog at /kitchen/focaccia-model. Keep this repo all-inclusive — the model code, the calculator, and the model docs all live here.

The reference corpus

RAG-ready notes distilled from three baking-science references, for grounding the calculator (and any focaccia assistant) in why a formula behaves the way it does.

What's tracked vs. ignored

Path	In git?	What it is
`.epub`, `.pdf`	❌ ignored	The purchased source books (binaries). Never committed.
`notes/`	✅ committed	Cleaned, per-section Markdown extracted from the books (human-reviewable).
`data/chunks.jsonl`	✅ committed	The embeddable RAG corpus — one chunk per line, with metadata.
`data/manifest.json`	✅ committed	Corpus stats + per-book index.
`scripts/`	✅ committed	The reproducible extraction + chunking pipeline.

The source books are licensed/purchased copies kept locally; only the derived notes are version-controlled.

Sources

Slug	Book	Lang	Role
`cauvain-technology-of-breadmaking`	Cauvain & Young, Technology of Breadmaking (Springer, 1998)	en	core — the academic reference; dough rheology, fermentation, baking.
`mcgee-on-food-and-cooking`	McGee, On Food and Cooking (1984)	en	high — Ch. 9 (Grains) & Ch. 10 (Cereal Doughs: Bread) are the bread core.
`bressanini-scienza-della-pasticceria`	Bressanini, La scienza della pasticceria (Gribaudo, 2014)	it	general — baking chemistry (gluten, leavening, flour).

Current corpus: 748 notes → ~1,737 chunks (~1.0M tokens), of which ~648 are flagged bread_relevant. Regenerate the exact numbers with the pipeline below.

Pipeline

pip install -r requirements.txt          # beautifulsoup4, lxml, PyMuPDF

python scripts/extract_epub.py           # EPUBs  -> notes/<slug>/*.md
python scripts/extract_pdf.py            # PDF    -> notes/<slug>/*.md
python scripts/build_corpus.py           # notes/ -> data/chunks.jsonl + manifest.json

notes/ is the source of truth; data/ is fully derived from it. Re-running is idempotent for a given set of source books.

How extraction works

EPUB (extract_epub.py): reads the spine + NCX table of contents directly (ebooklib chokes on these files' broken font manifests). Sectioning is TOC-driven via anchor IDs, because every page repeats the book title as an <h1>. Handles both nested TOCs (Bressanini) and flat TOCs where chapter titles are listed before a separate flat list of sub-sections (McGee), mapping sub-sections to chapters by spine position. Decodes cp1252-mislabeled-as-UTF-8 bytes so Italian/French accents survive.
PDF (extract_pdf.py): the book has no bookmarks, so chapters are detected from numbered title lines ("3 Functional ingredients") validated by monotonic numbering. Running headers/footers (page numbers, the ALL-CAPS running title) are stripped; printed page numbers are captured as inline markers so each chunk gets a precise page for citation. Front matter (title/copyright/contents OCR noise) is skipped.

Chunk schema (`data/chunks.jsonl`)

Each line is one JSON object:

{
  "id": "cauvain-...:0004:12:ab12cd34",  // stable: book:order:index:hash
  "book_slug": "cauvain-technology-of-breadmaking",
  "book": "Technology of Breadmaking",
  "author": "Stanley P. Cauvain & Linda S. Young",
  "year": 1998,
  "lang": "en",
  "relevance": "core",                    // book-level
  "bread_relevant": true,                 // chunk-level focaccia filter
  "chapter": "Chapter 4: Mixing and dough processing",
  "heading": "Chapter 4: Mixing and dough processing",
  "breadcrumb": ["Chapter 4: ...", "..."],
  "page": 98,                             // printed page (PDF) or null (EPUB)
  "citation": "Stanley P. Cauvain, Technology of Breadmaking (1998), Chapter 4..., p.98",
  "source_note": "notes/cauvain-.../0005-....md",
  "tokens_est": 587,
  "text": "An integral part of all breadmaking is..."
}

Chunks target ~800 tokens with ~100 tokens of overlap, packed on paragraph boundaries (oversized paragraphs are sentence-split). Token counts are estimated as chars/4 to avoid a tokenizer dependency.

Consuming the corpus

chunks.jsonl is embedder-agnostic. Typical next step: embed each text, store with the metadata, and at query time filter on bread_relevant (or weight by relevance) before vector search. The citation field is ready to surface in answers.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.claude		.claude
data		data
docs		docs
notes		notes
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
Untitled.png		Untitled.png
calc.png		calc.png
focaccia-build-sheet.jsx		focaccia-build-sheet.jsx
goldmember.png		goldmember.png
image-1781381393563.jpg		image-1781381393563.jpg
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

foccaciabot

The calculator

Run it

Used by the blog

The reference corpus

What's tracked vs. ignored

Sources

Pipeline

How extraction works

Chunk schema (`data/chunks.jsonl`)

Consuming the corpus

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

foccaciabot

The calculator

Run it

Used by the blog

The reference corpus

What's tracked vs. ignored

Sources

Pipeline

How extraction works

Chunk schema (data/chunks.jsonl)

Consuming the corpus

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Chunk schema (`data/chunks.jsonl`)

Packages