codex-builder

A meta-tool for constructing agent-traversable knowledge codices. It packages a multi-phase orchestration process into reference prompts that Claude Code reads and executes autonomously in target projects.

A codex is a structured knowledge base: source material (manuals, documentation, specs) transcribed into indexed, tagged markdown files with a manifest for surgical retrieval by AI agents.

When to use this

You have a large body of reference material (PDFs, documentation, specs) and want to make it efficiently queryable by AI agents. Instead of feeding entire documents into context windows, a codex gives agents a manifest to search, section-level line numbers to jump to, and a controlled vocabulary to filter by.

Prerequisites

Claude Code (this tool is designed to be driven by CC)
Python 3.10+ (for scripts)
PyMuPDF (pip install PyMuPDF) — for PDF TOC extraction and verification
pyyaml (pip install pyyaml) — for YAML processing in scripts

Usage

codex-builder is a set of prompts, not a runtime. You clone it once, then run it from within your target project. The codex is generated entirely inside the target project — codex-builder itself stays untouched.

# 1. Clone codex-builder somewhere accessible
git clone https://github.com/ryan-voitiskis/codex-builder.git

# 2. Create or navigate to your target project
mkdir my-project && cd my-project

# 3. Start Claude Code and feed it the first prompt
claude

# Inside Claude Code, tell it to read the init prompt:
> Read /path/to/codex-builder/prompts/00-init.md and follow its instructions

Each phase prompt tells Claude Code what to do. Run them in order — each reads the outputs of previous phases. The generated codex lives entirely in your target project:

my-project/
├── docs/{corpus-name}/       # Transcribed markdown (the codex)
├── source-material/          # Raw PDFs, docs (gitignored)
├── codex-config.yaml         # Your codex settings
├── codex-state.yaml          # Progress tracker
├── manifest.yaml             # Section-level index for agent retrieval
└── validate.sh               # Validation script (copied from templates)

Workflow

Each phase reads the outputs of previous phases and produces inputs for the next.

Phase 00 → 01 → 02 → 03 → 04 → 05 → 06
init    gather plan  transcribe verify map  review

Phase 00 — Initialization (`prompts/00-init.md`)

Guided conversation to set up the project. Gathers domain info, creates directory structure, generates codex-config.yaml.

Input: User answers Output: codex-config.yaml, codex-state.yaml, directory structure

Phase 01 — Source Gathering (`prompts/01-source-gathering.md`)

Discovers and catalogs all source material. Helps fetch public resources, guides manual download of gated content.

Input: codex-config.yaml Output: sources.yaml, populated source-material/

Phase 02 — Structure Planning (`prompts/02-structure-plan.md`)

Analyzes source structure (PDF TOCs, web sitemaps) and plans how to split material into transcribable chunks.

Input: sources.yaml, source PDFs Output: chunking-plan.yaml

Phase 03 — Transcription (`prompts/03-transcription.md`)

Orchestrates parallel subagents to transcribe each chunk into markdown with YAML frontmatter. Tracks progress for resumability.

Input: chunking-plan.yaml, source material Output: docs/{corpus}/ with markdown files

Phase 04 — Verification (`prompts/04-verification.md`)

Systematic accuracy checks against source material. Configurable depth: exhaustive, sampling, or hybrid.

Input: Transcribed docs, source material Output: Confidence levels, fixes applied

Phase 05 — Mapping (`prompts/05-mapping.md`)

Finalizes the controlled vocabulary (topics were tagged freely during transcription, now normalized). Generates the manifest with section-level entries. Adds cross-references.

Input: Verified docs Output: manifest.yaml, normalized vocabulary, cross-references

Phase 06 — Final Review (`prompts/06-final-review.md`)

Last-pass validation, spot-checks, manifest integrity verification, and README generation for the codex itself.

Input: Mapped codex Output: Validated codex, README, completion status

Project Structure

codex-builder/
├── prompts/                    # Phase prompt files — the core of the tool
│   ├── 00-init.md
│   ├── 01-source-gathering.md
│   ├── 02-structure-plan.md
│   ├── 03-transcription.md
│   ├── 04-verification.md
│   ├── 05-mapping.md
│   └── 06-final-review.md
├── templates/                  # Reusable templates copied into target projects
│   ├── codex-config.yaml       # Config template (single source of truth)
│   ├── manifest-template.yaml  # Empty manifest skeleton
│   └── validate.sh             # Bash validation script (no dependencies)
├── scripts/                    # Python helper scripts
│   ├── pdf-toc-extract.py      # PDF bookmark/TOC extraction
│   └── verify-transcription.py # Automated verification
├── examples/                   # Real-world example configs
│   └── rekordbox-codex-config.yaml
└── README.md

Customization

The entire process is driven by codex-config.yaml. Key customization points:

vocabulary — Document types, modes, and topics specific to your domain
content_types — Toggle rules for prose, tables, code, formulas, images
verification.depth — Trade off speed vs. confidence
transcription.batch_size — Control parallelism based on your resources
screenshot_notation — Format for image placeholders

Key Design Decisions

Topics evolve organically. During transcription (phase 03), agents tag freely with whatever terms fit. Phase 05 collects all tags, analyzes frequency/overlap, proposes a normalized vocabulary, and runs a fixup pass. This avoids premature taxonomy design.

Section-level manifest entries. The manifest doesn't just list documents — it indexes every ## heading with line numbers. Agents can jump directly to relevant sections without reading entire files.

Resumable progress tracking. codex-state.yaml tracks per-document status (pending → transcribed → verified → mapped → reviewed). If a phase is interrupted, it picks up where it left off.

No heavy dependencies for validation. validate.sh uses only bash, awk, and grep — it runs anywhere without Python or jq.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
prompts		prompts
scripts		scripts
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

codex-builder

When to use this

Prerequisites

Usage

Workflow

Phase 00 — Initialization (`prompts/00-init.md`)

Phase 01 — Source Gathering (`prompts/01-source-gathering.md`)

Phase 02 — Structure Planning (`prompts/02-structure-plan.md`)

Phase 03 — Transcription (`prompts/03-transcription.md`)

Phase 04 — Verification (`prompts/04-verification.md`)

Phase 05 — Mapping (`prompts/05-mapping.md`)

Phase 06 — Final Review (`prompts/06-final-review.md`)

Project Structure

Customization

Key Design Decisions

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ryan-voitiskis/codex-builder

Folders and files

Latest commit

History

Repository files navigation

codex-builder

When to use this

Prerequisites

Usage

Workflow

Phase 00 — Initialization (prompts/00-init.md)

Phase 01 — Source Gathering (prompts/01-source-gathering.md)

Phase 02 — Structure Planning (prompts/02-structure-plan.md)

Phase 03 — Transcription (prompts/03-transcription.md)

Phase 04 — Verification (prompts/04-verification.md)

Phase 05 — Mapping (prompts/05-mapping.md)

Phase 06 — Final Review (prompts/06-final-review.md)

Project Structure

Customization

Key Design Decisions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Phase 00 — Initialization (`prompts/00-init.md`)

Phase 01 — Source Gathering (`prompts/01-source-gathering.md`)

Phase 02 — Structure Planning (`prompts/02-structure-plan.md`)

Phase 03 — Transcription (`prompts/03-transcription.md`)

Phase 04 — Verification (`prompts/04-verification.md`)

Phase 05 — Mapping (`prompts/05-mapping.md`)

Phase 06 — Final Review (`prompts/06-final-review.md`)

Packages