LexBuild is an open-source toolchain for U.S. legal texts. It transforms official source XML into structured Markdown with rich metadata, optimized for LLMs, RAG pipelines, and semantic search.
The United States Code is the official codification of federal statutory law, organized into 54 titles. It is available as USLM XML from the Office of the Law Revision Counsel (OLRC).
The Code of Federal Regulations (CFR) is the official codification of federal administrative regulations, organized into 50 titles. The Electronic Code of Federal Regulations (eCFR) is a continuously updated editorial compilation incorporating changes as they appear in the Federal Register. eCFR XML is available from the ecfr.gov API (daily-updated) and GovInfo (bulk data).
Both formats are dense and deeply nested, making them difficult to work with directly.
LexBuild transforms this XML into per-section Markdown files with YAML frontmatter, predictable file paths, and content sized for typical embedding model context windows, making the full corpus of federal law and regulations accessible to LLMs, vector databases, and legal research tools.
| Source | Package | XML Format | Titles | Status |
|---|---|---|---|---|
| U.S. Code | @lexbuild/usc |
USLM 1.0 | 54 | Stable |
| eCFR (Code of Federal Regulations) | @lexbuild/ecfr |
GPO/SGML | 50 | Stable |
| Annual CFR (official edition) | @lexbuild/cfr |
GPO/SGML | 50 | Planned |
| Federal Register | @lexbuild/fr |
GPO/SGML variant | — | Planned |
| State statutes | @lexbuild/state-* |
Varies | — | Exploratory |
| Source | Download From | Update Frequency | Notes |
|---|---|---|---|
| U.S. Code | uscode.house.gov (OLRC) | Multiple times/month | Release point auto-detected from OLRC download page |
| eCFR (default) | ecfr.gov API | Daily | Point-in-time support via --date flag |
| eCFR (fallback) | govinfo.gov | Irregular | Bulk XML, updates per-title as regulations change |
npx @lexbuild/cli download-usc --all
npx @lexbuild/cli convert-usc --allnpm install -g @lexbuild/cli
# or
pnpm add -g @lexbuild/cliRequires Node.js >= 22 and pnpm >= 10.
git clone https://github.com/chris-c-thomas/LexBuild.git
cd LexBuild
pnpm install && pnpm turbo build# Download and convert all 54 titles
lexbuild download-usc --all && lexbuild convert-usc --all
# Start small — a single title
lexbuild download-usc --titles 1 && lexbuild convert-usc --titles 1
# A range of titles
lexbuild download-usc --titles 1-5 && lexbuild convert-usc --titles 1-5# Download and convert all 50 titles
lexbuild download-ecfr --all && lexbuild convert-ecfr --all
# A single title
lexbuild download-ecfr --titles 17 && lexbuild convert-ecfr --titles 17
# Point-in-time download (CFR as of a specific date)
lexbuild download-ecfr --all --date 2025-01-01Fetch U.S. Code XML from the OLRC. Auto-detects the latest release point.
lexbuild download-usc --all # All 54 titles
lexbuild download-usc --titles 1-5,8,11 # Specific titles
lexbuild download-usc --all --release-point 119-73not60 # Pin a release| Option | Default | Description |
|---|---|---|
--titles <spec> |
— | Title(s): 1, 1-5, 1-5,8,11 |
--all |
— | Download all 54 titles (single bulk zip) |
-o, --output <dir> |
./downloads/usc/xml |
Output directory |
--release-point <id> |
auto-detected | Pin a specific OLRC release point |
Convert downloaded USC XML to Markdown.
lexbuild convert-usc --all # All downloaded titles
lexbuild convert-usc --titles 1 -g chapter # Chapter-level output
lexbuild convert-usc --titles 26 --dry-run # Preview without writing
lexbuild convert-usc ./downloads/usc/xml/usc01.xml # Direct file path| Option | Default | Description |
|---|---|---|
--titles <spec> |
— | Title(s) to convert |
--all |
— | Convert all titles in input directory |
-i, --input-dir <dir> |
./downloads/usc/xml |
Input XML directory |
-o, --output <dir> |
./output |
Output directory |
-g, --granularity |
section |
section, chapter, or title |
--link-style |
plaintext |
plaintext, canonical, or relative |
--no-include-source-credits |
— | Exclude source credits |
--no-include-notes |
— | Exclude all notes |
--include-editorial-notes |
— | Include editorial notes only |
--include-statutory-notes |
— | Include statutory notes only |
--include-amendments |
— | Include amendment notes only |
--dry-run |
— | Parse and report without writing |
-v, --verbose |
— | Verbose output |
Fetch eCFR XML. Defaults to the ecfr.gov API (daily-updated); govinfo bulk data available as fallback.
lexbuild download-ecfr --all # All 50 titles (eCFR API)
lexbuild download-ecfr --titles 1-5,17 # Specific titles
lexbuild download-ecfr --all --date 2025-01-01 # Point-in-time download
lexbuild download-ecfr --all --source govinfo # Govinfo bulk fallback| Option | Default | Description |
|---|---|---|
--titles <spec> |
— | Title(s): 1, 1-5, 1-5,17 |
--all |
— | Download all 50 titles |
-o, --output <dir> |
./downloads/ecfr/xml |
Output directory |
--source |
ecfr-api |
ecfr-api (daily-updated) or govinfo (bulk) |
--date <YYYY-MM-DD> |
current | Point-in-time date (ecfr-api only) |
Convert downloaded eCFR XML to Markdown.
lexbuild convert-ecfr --all # All downloaded titles
lexbuild convert-ecfr --titles 17 -g part # Part-level output
lexbuild convert-ecfr --all --dry-run # Preview without writing
lexbuild convert-ecfr ./downloads/ecfr/xml/ECFR-title17.xml # Direct file path| Option | Default | Description |
|---|---|---|
--titles <spec> |
— | Title(s) to convert |
--all |
— | Convert all titles in input directory |
-i, --input-dir <dir> |
./downloads/ecfr/xml |
Input XML directory |
-o, --output <dir> |
./output |
Output directory |
-g, --granularity |
section |
section, part, chapter, or title |
--link-style |
plaintext |
plaintext, canonical, or relative |
--no-include-source-credits |
— | Exclude source credits |
--no-include-notes |
— | Exclude all notes |
--include-editorial-notes |
— | Include editorial/regulatory notes only |
--include-statutory-notes |
— | Include statutory notes only |
--include-amendments |
— | Include amendment notes only |
--dry-run |
— | Parse and report without writing |
-v, --verbose |
— | Verbose output |
U.S. Code (-g section, default):
output/usc/
title-01/
README.md
_meta.json
chapter-01/
_meta.json
section-1.md
section-2.md
eCFR (-g section, default):
output/ecfr/
title-17/
README.md
_meta.json
chapter-IV/
part-240/
_meta.json
section-240.10b-5.md
All granularity levels:
| Source | section | chapter/part | title |
|---|---|---|---|
| USC | title-01/chapter-01/section-1.md |
title-01/chapter-01/chapter-01.md |
title-01.md |
| eCFR | title-17/chapter-IV/part-240/section-240.10b-5.md |
title-17/chapter-IV/part-240.md |
title-17.md |
Every Markdown file includes YAML frontmatter with source-specific metadata:
U.S. Code:
---
identifier: "/us/usc/t1/s7"
source: "usc"
legal_status: "official_legal_evidence"
title: "1 USC § 7 - Marriage"
title_number: 1
title_name: "GENERAL PROVISIONS"
section_number: "7"
section_name: "Marriage"
chapter_number: 1
chapter_name: "RULES OF CONSTRUCTION"
positive_law: true
currency: "119-73"
last_updated: "2025-12-03"
format_version: "1.1.0"
generator: "lexbuild@1.9.3"
source_credit: "(Added Pub. L. 104-199, § 3(a), Sept. 21, 1996, ...)"
---eCFR:
---
identifier: "/us/cfr/t17/s240.10b-5"
source: "ecfr"
legal_status: "authoritative_unofficial"
title: "17 CFR § 240.10b-5 - Employment of manipulative and deceptive devices"
title_number: 17
section_number: "240.10b-5"
positive_law: false
authority: "15 U.S.C. 78a et seq., ..."
cfr_part: "240"
---The source field discriminates content origin. The legal_status field indicates provenance: "official_legal_evidence" (positive law USC titles), "official_prima_facie" (non-positive law USC titles), or "authoritative_unofficial" (eCFR).
Each directory includes a _meta.json sidecar file for programmatic access without parsing Markdown:
{
"format_version": "1.1.0",
"identifier": "/us/usc/t5",
"title_number": 5,
"title_name": "Government Organization and Employees",
"stats": {
"chapter_count": 63,
"section_count": 1162,
"total_tokens_estimate": 2207855
},
"chapters": [
{
"identifier": "/us/usc/t5/ptI/ch1",
"number": 1,
"name": "Organization",
"directory": "chapter-01",
"sections": [
{
"identifier": "/us/usc/t5/s101",
"number": "101",
"name": "Executive departments",
"file": "section-101.md",
"token_estimate": 4200,
"has_notes": true,
"status": "current"
}
]
}
]
}| Corpus | Titles | Sections | Est. Tokens | Conversion Time |
|---|---|---|---|---|
| U.S. Code | 54 | ~60,000 | ~85M | ~20–30s |
| eCFR | 49 (excl. reserved) | ~227,000 | ~350M | ~60–90s |
| Combined | 103 | ~287,000 | ~435M | ~2 min |
SAX streaming keeps memory bounded for even the largest titles (100MB+ XML). Conversion is CPU-bound — no network I/O during the convert step.
LexBuild is a monorepo managed with pnpm workspaces and Turborepo.
lexbuild/
├── packages/
│ ├── core/ # @lexbuild/core — XML parsing, AST, Markdown rendering
│ ├── usc/ # @lexbuild/usc — U.S. Code converter and downloader
│ ├── ecfr/ # @lexbuild/ecfr — eCFR converter and downloader
│ └── cli/ # @lexbuild/cli — CLI binary
├── apps/
│ └── astro/ # LexBuild web app (lexbuild.dev)
├── fixtures/ # Test fixtures (synthetic XML + expected output snapshots)
├── reference/ # GPO/OLRC XML schema reference guides
├── turbo.json
└── pnpm-workspace.yaml
@lexbuild/cli
├── @lexbuild/usc
│ └── @lexbuild/core
├── @lexbuild/ecfr
│ └── @lexbuild/core
└── @lexbuild/core
apps/astro (no code deps — consumes output only)
Source packages are independent — @lexbuild/usc and @lexbuild/ecfr never import from each other. Future source packages follow the same pattern.
All internal dependencies use pnpm's workspace:* protocol. Changesets manages lockstep versioning across all published packages.
| Package | npm | Description |
|---|---|---|
@lexbuild/cli |
CLI binary — download and convert legal XML | |
@lexbuild/core |
Shared XML parsing, AST, Markdown rendering | |
@lexbuild/usc |
U.S. Code (USLM XML) converter and downloader | |
@lexbuild/ecfr |
eCFR converter and downloader (ecfr.gov API + govinfo) |
Each package has its own README with full API documentation.
A server-rendered legal content browser built with Astro 6, React 19, Tailwind CSS 4, and shadcn/ui.
- 260,000+ section pages across U.S. Code and eCFR
- Four granularity levels — title, chapter, part (eCFR), section
- Syntax-highlighted source and rendered HTML preview
- Sidebar navigation with virtualized section lists
- Full-text search via Meilisearch
- Dark mode with system preference detection
- Zero client JS by default — interactive React islands only where needed
The web app consumes LexBuild's output (.md files and _meta.json sidecars) and has no code dependency on the conversion packages.
See apps/astro/README.md for setup and development instructions.
git clone https://github.com/chris-c-thomas/LexBuild.git
cd LexBuild
pnpm install
pnpm turbo buildpnpm turbo build # Build all packages
pnpm turbo test # Run all tests
pnpm turbo lint # Lint all packages
pnpm turbo typecheck # Type-check all packages
pnpm turbo dev # Watch modepnpm turbo build --filter=@lexbuild/core
pnpm turbo test --filter=@lexbuild/ecfr
# Run the CLI locally
node packages/cli/dist/index.js download-usc --titles 1
node packages/cli/dist/index.js convert-usc --titles 1
node packages/cli/dist/index.js download-ecfr --titles 17
node packages/cli/dist/index.js convert-ecfr --titles 17# Build packages first
pnpm turbo build
# Download and convert some content
node packages/cli/dist/index.js download-usc --titles 1 && node packages/cli/dist/index.js convert-usc --titles 1
node packages/cli/dist/index.js download-ecfr --titles 1 && node packages/cli/dist/index.js convert-ecfr --titles 1
# Set up the web app
cd apps/astro
bash scripts/link-content.sh
npx tsx scripts/generate-nav.ts
pnpm devContributions are welcome. Please see CONTRIBUTING.md.