Status: partial implementation. sync-settings.js shipped 2026-04-28; sync-env-vars.js shipped 2026-04-29; sync-hooks.js shipped 2026-04-29 (pass #1 — lifecycle table) and 2026-05-08 (pass #2 — handler-field tables, common-input fields, per-event input/output schemas); sync-sub-agents.js shipped 2026-05-07 (supported-frontmatter-fields table only); sync-mcp.js shipped 2026-05-07 (installation-scopes table only — transport types, managed-mcp.json semantics, and tool-search threshold values are not yet captured); sync-permissions.js shipped 2026-05-07 (permission-modes table only — rule-syntax, path-pattern, and managed-only-settings tables are not yet captured); sync-keybindings.js and sync-cli-reference.js shipped 2026-05-08; sync-model-config.js shipped 2026-05-22 (effort-levels table only — model aliases, model-support matrix, and extended-thinking/context controls are not yet captured).
Open work — new scripts, the hooks pass #2, the
read_catalogwire-up, and the open questions at the bottom of this doc — is tracked inroadmap.md.
Every Claude Code config surface (settings, env vars, hooks, MCP, …) has an authoritative upstream representation — usually a docs page, sometimes a JSON Schema. For each surface we want a small, idempotent script that pulls that upstream form and reshapes it into a flat JSON file the desktop app consumes via its planned read_catalog Tauri command.
This spec replaces a deliberately heavier earlier draft (5-phase harness, snapshot files, RSS-driven cadence, diff/report artifacts). That elaborate design isn't worth the carry cost while the project is pre-release. Git is a perfectly good diff tool; PR review is a perfectly good change report.
- One script per upstream source. Independent, runnable solo, easy to delete or replace.
- Stdlib only where possible. Node's built-in
fetch,node:test, and--experimental-test-coveragecover the work. No new deps unless a parser genuinely needs one. - The output is the contract. Each script produces
catalog/<source>.jsonwith a flat array of records under a small envelope ({ source, fetchedAt, count, <records> }). - Prose fields are markdown. User-facing description fields (
description,purpose, etc.) are GitHub-flavoured Markdown —[text](url)links, bare URLs (autolink literals), inline code, and emphasis are passed through verbatim from upstream and rendered as markdown by the inspector. Site-relative links ([…](/en/…)) resolve against the docs root insource. Sync scripts must not strip backticks, brackets, or HTML-decode these fields. - Idempotent. Re-running on unchanged upstream produces a one-line diff (
fetchedAtonly). Records are sorted by a stable key. - Reshape on the way in, not on the way out. Flatten nested schemas to dotted-key rows, drop fields the consumer doesn't use. The committed catalog should be ergonomic for the UI even if the upstream form isn't.
- Provenance is part of the data.
sourceURL andfetchedAttimestamp ride with every catalog file. - Git is the diff tool. No watermarks, no snapshots, no
verify.mdartifact. PR review surfaces drift.
scripts/
├── sync-settings.js # implemented
├── sync-settings.test.js # 15 tests, ~93% branch coverage
├── sync-env-vars.js # implemented
├── sync-env-vars.test.js
├── sync-hooks.js # implemented
├── sync-hooks.test.js
├── sync-sub-agents.js # implemented
├── sync-sub-agents.test.js
├── sync-mcp.js # implemented
├── sync-mcp.test.js
├── sync-permissions.js # implemented
└── sync-permissions.test.js
catalog/
├── settings.json # written by sync-settings.js
├── env-vars.json # written by sync-env-vars.js
├── hooks.json # written by sync-hooks.js
├── sub-agents.json # written by sync-sub-agents.js
├── mcp.json # written by sync-mcp.js
└── permissions.json # written by sync-permissions.js
| Source | https://json.schemastore.org/claude-code-settings.json |
| Output | catalog/settings.json (~155 entries) |
| Script | scripts/sync-settings.js |
| Run | npm run sync:settings |
Why the JSON Schema, not settings.md: the schemastore document has explicit type, default, enum, description, and recursive properties for object-typed settings (permissions, sandbox, statusLine, …). The markdown page buries those in a 3-column Key | Description | Example table where defaults and enum values live in prose and types must be inferred from the example. The schema is a higher-fidelity source for the same data.
Output shape per record:
{
"key": "permissions.defaultMode",
"type": "string",
"enum": ["acceptEdits", "bypassPermissions", "default", "delegate", "dontAsk", "plan", "auto"],
"description": "..."
}Nested object schemas are flattened: permissions and permissions.defaultMode are sibling rows. A small allowlist of fields (type, const, enum, default, minimum, maximum, pattern, examples, description, $ref, anyOf/oneOf/allOf, plus a recursive summary of items for arrays) is preserved; everything else is dropped to keep upstream JSON Schema metadata churn out of the catalog.
| Source | https://code.claude.com/docs/en/env-vars.md |
| Output | catalog/env-vars.json (~215 entries) |
| Script | scripts/sync-env-vars.js |
| Run | npm run sync:env-vars |
No JSON Schema sibling exists, so the script parses the markdown directly. The page is structurally simple: one 2-column table (Variable | Purpose) covering all variables. Defaults, ranges, and constraints are not in a dedicated column — they're embedded as prose inside Purpose (default: 600000, or 10 minutes; maximum: 2147483647).
Approach:
fetch()the.mdURL.- Parse the markdown table. Hand-rolled is fine — the page is one table with backtick-wrapped names in column 1 and free-form prose in column 2. Reach for a markdown parser only if the page structure changes.
- For each row, extract
name,purpose, and a best-effortdefault(regexdefault: (\S+)from the purpose text — recordnullwhen not present rather than guessing). - Sort by
name, wrap with the standard envelope, write tocatalog/env-vars.json.
Output shape per record:
{
"name": "API_TIMEOUT_MS",
"purpose": "Timeout for API requests in milliseconds — default: 600000, or 10 minutes; maximum: 2147483647",
"default": "600000"
}Pragmatic acceptance criteria: every row in the upstream table appears in the output; defaults extracted when present; no entry silently dropped. Lossy parsing (e.g. for vars whose purpose mentions multiple numbers) is acceptable as long as raw purpose is preserved verbatim — the consumer can re-parse if needed.
Test plan: mirror sync-settings.test.js. Pure functions (table parser, default extractor) get unit tests with small fixture strings; main() stays uncovered.
| Source | https://code.claude.com/docs/en/hooks.md |
| Output | catalog/hooks.json (~29 events + handler-field tables + commonInput) |
| Script | scripts/sync-hooks.js |
| Run | npm run sync:hooks |
| UI | drawer cross-reference (header when), Matcher Groups list (centre list + drawer + waterfall summary), HookDetailsModal (per-handler impl + per-event schema) |
Pass #1 (2026-04-29) captured only the lifecycle summary table at the top of the page (| Event | When it fires |). Pass #2 (2026-05-08) extended the parser to walk the document heading-aware (h2..h5) and pull three additional artifacts: handler-fields tables (common, command, http, mcp_tool, prompt_and_agent), ### Common input fields (8 shared fields), and per-event input/output schemas (each {inputFields, inputExample, outputFields}). The full envelope is {source, fetchedAt, count, events, handlers, commonInput}.
UI consumers landed 2026-05-16:
- Drawer header cross-reference for
hooks.<EventName>rows uses the catalogwheninstead of the thinner settings-JSON-Schema description. - The drawer's Matcher Groups list is row-scoped and parses the value via
src/lib/hooks.ts(defensive against malformed user settings). - The HookDetailsModal is the first consumer of
inputFields/outputFields/inputExample.
Approach (pass #1, still the structural backbone):
fetch()the.mdURL.- Locate the lifecycle table by header signature
| Event | When it fires |(case-insensitive). The page has other tables whose first column header is "Event" — the second column disambiguates. - For each row, extract
name(backtick-stripped) andwhen(cadence prose, preserved verbatim). - Pass #2 then walks the document looking for
### Hook handler fields,### Common input fields, and the per-event#### <Event> input/#### <Event> decision control(or#### <Event> output) sections, enriching each event with its schema. - Sort events by
name, wrap with the standard envelope, write tocatalog/hooks.json.
Output shape per event:
{
"name": "PreToolUse",
"when": "Before a tool call executes. Can block it",
"inputFields": [{ "field": "tool_name", "description": "…" }, ...],
"inputExample": "{ \"session_id\": \"abc123\", … }",
"outputFields": [{ "field": "decision", "description": "…" }, ...]
}Pragmatic acceptance criteria: every row in the upstream lifecycle table appears in events; cadence prose preserved verbatim; per-event schemas captured when the upstream doc has them. Out of scope (deliberate): per-tool nested tool_input tables under #### PreToolUse input (4-col Field | Type | Example | Description shape; pass #2 only captures the shared 2-col shape), the ### Matcher patterns cross-reference, the ### JSON output universal-fields table, exit-code-2 behavior prose, HTTP response handling, and async-hook config.
Test plan: mirror sync-env-vars.test.js. Pure functions (parseRow, parseTable, buildRecords, plus the heading-aware walker for pass #2 artifacts) get unit tests with small fixture strings; main() stays uncovered.
| Source | https://code.claude.com/docs/en/sub-agents.md |
| Output | catalog/sub-agents.json (~16 entries) |
| Script | scripts/sync-sub-agents.js |
| Run | npm run sync:sub-agents |
The page documents three things: built-in subagents (Explore / Plan / general-purpose / etc.), the YAML frontmatter that defines a custom subagent, and the operational rules around tool restrictions, hooks, and model selection. Of those, only the frontmatter table (#### Supported frontmatter fields — | Field | Required | Description |) is a single canonical artifact; the built-in agents are in tabbed prose and the operational rules are scattered across ### sections. The first cut captures the frontmatter table — every key a ~/.claude/agents/<name>.md file's YAML can carry, with whether it's required and the prose description.
Approach:
fetch()the.mdURL.- Locate the table by header signature
| Field | Required | Description |(case-insensitive). The page has another 3-column table on the "Other" built-in subagents tab (| Agent | Model | When Claude uses it |); the header signature disambiguates. - For each row, extract
name(backtick-stripped),required(boolean — "Yes" →true, anything else →false; upstream uses exactly those two values today), anddescription(prose, preserved verbatim including markdown links and inline code). - Sort by
name, wrap with the standard envelope, write tocatalog/sub-agents.json.
Output shape per record:
{
"name": "permissionMode",
"required": false,
"description": "[Permission mode](#permission-modes): `default`, `acceptEdits`, `auto`, `dontAsk`, `bypassPermissions`, or `plan`. Ignored for [plugin subagents](#choose-the-subagent-scope)"
}Pragmatic acceptance criteria: every row in the upstream frontmatter table appears in the output; name and description (the only two fields upstream marks required) are flagged required: true. Built-in subagent identities, model-resolution order, and per-event hook semantics are out of scope for this cut and remain candidates for follow-up work.
Test plan: mirror sync-env-vars.test.js and sync-hooks.test.js. Pure functions (parseRow, parseTable, buildRecords) get unit tests with small fixture strings; main() stays uncovered.
| Source | https://code.claude.com/docs/en/mcp.md |
| Output | catalog/mcp.json (~3 entries) |
| Script | scripts/sync-mcp.js |
| Run | npm run sync:mcp |
The page is heterogeneous — it documents transport types (HTTP, SSE-deprecated, stdio), per-scope CLI flows, OAuth credential handling, managed-mcp.json exclusive-control + allowlist/denylist semantics, tool-search deferral thresholds, and more. Almost all of that lives under prose-heavy ###/#### sections rather than in canonical tables; the few tables that exist are niche (two env vars passed to dynamic-header scripts; the five MCP_TOOL_SEARCH_DEFER_LOAD values). The single catalog-friendly artifact is the MCP installation scopes table at ## MCP installation scopes — a 4-column reference for where Local / Project / User scopes live and what they share. That's the first cut.
Approach:
fetch()the.mdURL.- Locate the table by header signature
| Scope | Loads in | Shared with team | Stored in |(case-insensitive). The page's other 2-column tables don't share this signature. - For each row, extract
name(markdown link[Local](#local-scope)→Local, or backticks stripped from a bare\Local`),loadsIn,shared(preserved verbatim — qualifier prose like "Yes, via version control" is part of the data), andstoredIn` (preserved verbatim, including code-spans on paths). - Sort by
name, wrap with the standard envelope, write tocatalog/mcp.json.
Output shape per record:
{
"name": "Project",
"loadsIn": "Current project only",
"shared": "Yes, via version control",
"storedIn": "`.mcp.json` in project root"
}Pragmatic acceptance criteria: every row in the upstream scopes table appears in the output; cell prose preserved verbatim except for the link-wrapping on the name. Transport types, managed-mcp.json semantics, OAuth flows, and tool-search threshold values are out of scope for this cut and remain candidates for follow-up work.
Test plan: mirror sync-sub-agents.test.js. Pure functions (parseRow, parseTable, buildRecords) get unit tests with small fixture strings; main() stays uncovered.
| Source | https://code.claude.com/docs/en/permissions.md |
| Output | catalog/permissions.json (~6 entries) |
| Script | scripts/sync-permissions.js |
| Run | npm run sync:permissions |
The page is structurally rich — it documents the tiered tool-type taxonomy, the permission rule syntax (Tool / Tool(specifier)), wildcard semantics, tool-specific patterns (Bash, PowerShell, Read/Edit, WebFetch, MCP, Agent), the Read/Edit path-prefix table (4 patterns: //path / ~/path / /path / path), the managed-only settings table (12 keys), and a working-directories table. Of those, the ## Permission modes table (| Mode | Description |) is the single canonical artifact most directly consumable: it enumerates every value permissions.defaultMode accepts (default, acceptEdits, plan, auto, dontAsk, bypassPermissions) with prose richer than the short blurbs in the settings JSON Schema. That's the first cut.
Approach:
fetch()the.mdURL.- Locate the table by header signature
| Mode | Description |(case-insensitive). The page has other 2-col tables (Rule | Effectfor rule examples,Setting | Descriptionfor managed-only settings, plus the 4-colPattern | Meaning | Example | Matchesfor path syntax) — theModeheader disambiguates. - For each row, extract
name(backtick-stripped) anddescription(prose preserved verbatim including inline code spans). - Sort by
name, wrap with the standard envelope, write tocatalog/permissions.json.
Output shape per record:
{
"name": "acceptEdits",
"description": "Automatically accepts file edits and common filesystem commands (`mkdir`, `touch`, `mv`, `cp`, etc.) for paths in the working directory or `additionalDirectories`"
}Pragmatic acceptance criteria: every row in the upstream modes table appears in the output; description prose preserved verbatim. The path-pattern table, managed-only-settings table, rule-syntax tables, and tool-specific patterns are out of scope for this cut and remain candidates for follow-up work — particularly the managed-only-settings table, which is the strongest candidate for a second pass since it would let the app annotate settings catalog entries with "managed-only" provenance.
Test plan: mirror sync-mcp.test.js. Pure functions (parseRow, parseTable, buildRecords) get unit tests with small fixture strings; main() stays uncovered.
Each gets the same recipe: one script, one catalog file, one test file. Remaining candidates in rough priority order: keybindings.md, cli-reference.md. A second hooks.md pass to capture handler types and per-event input/output schemas also belongs on this list, as does a second sub-agents.md pass to capture built-in subagent identities, a second mcp.md pass to capture transport types and managed-mcp.json semantics, and a second permissions.md pass to capture the path-pattern and managed-only-settings tables. None are committed scope today.
- CI on cron — shipped.
.github/workflows/catalog-drift.ymlrunsnpm run sync:settings,npm run sync:env-vars,npm run sync:hooks,npm run sync:sub-agents,npm run sync:mcp, andnpm run sync:permissionsevery Monday at 09:00 UTC and onworkflow_dispatch. The detect step normalises out the always-changingfetchedAtfield before deciding whether content drifted; if only the timestamp moved, the working tree is restored to HEAD and no PR is opened. Real drift opens (or updates) a singlechore/catalog-driftPR viapeter-evans/create-pull-request@v8(paired withactions/checkout@v5andactions/setup-node@v5for the 2026-06-02 Node 24 cutover). Required permissions:contents: write+pull-requests: write.
- Coverage thresholds.
--test-coverage-lines/--test-coverage-branchesto fail the run below a target. Premature now; reasonable when there are several scripts.
- Not a multi-phase harness with snapshots and watermarks. The earlier draft of this file proposed
docs/sync/snapshots/,manifest.json,watermark.txt,verify.md, an RSS-driven trigger, and per-section then merged JSON. All cut. If we need any of that back, it's a real change request, not a defaulting-back. - Not a write path to upstream or to
spec/inventory.md. Catalog flows in one direction: upstream → script →catalog/<source>.json→ app. - Not a substitute for human review. The catalog is the current upstream truth; whether to surface a new field, hide a deprecated one, or annotate a quirk is a UI decision in the Tauri app, not a sync concern.
spec/inventory.md's future. Oncecatalog/settings.jsonandcatalog/env-vars.jsonexist, the sections ofinventory.mdthey cover are largely redundant. Decision deferred — but the inventory's hand-edited prose is not something this harness should try to regenerate.$refresolution.catalog/settings.jsonpreserves$refstrings (#/$defs/permissionRule) without expanding them. If a consumer needs the resolved schema (e.g. to validate a permission-rule string), we can either expand at sync time or expose$defsas a sibling block in the envelope.- Schema staleness signal.
json.schemastore.org/claude-code-settings.jsonhas no embedded version orupdatedtimestamp. If we want change-detection beyond "did the file content differ," ETag or content hash on fetch is the obvious move.