Skip to content

fix(schema-pack): embed bundled pack YAMLs in the compiled binary (+ entity-slug orphan floor)#2267

Open
JiraiyaETH wants to merge 1 commit into
garrytan:masterfrom
JiraiyaETH:jarvis/schema-pack-embed-fix-20260618
Open

fix(schema-pack): embed bundled pack YAMLs in the compiled binary (+ entity-slug orphan floor)#2267
JiraiyaETH wants to merge 1 commit into
garrytan:masterfrom
JiraiyaETH:jarvis/schema-pack-embed-fix-20260618

Conversation

@JiraiyaETH

Copy link
Copy Markdown

The bug (Layer 0 — the foundational outage)

Bundled schema-pack YAMLs are never embedded in bun build --compile binaries. The pack locators resolve paths via fileURLToPath(import.meta.url) + existsSync, which points at /$bunfs/root/... in a compiled binary where the YAML is not on disk. Symptoms in a deployed binary:

  • gbrain schema activeunknown schema pack: gbrain-base. Any active pack that extends: gbrain-base fails to resolve because the bundled parent can't load, taking the whole resolution down.
  • put_page silently degraded — the pack-load try/catch swallowed the failure, reverted to a hardcoded prefix table, and skipped type validation → accumulating page-type sprawl.
  • gbrain schema show gbrain-base and the lens packs (gbrain-creator/investor/engineer/everything) were unreachable from the CLI/MCP.

build:schema is not the cause (it regenerates Postgres DDL); a plain rebuild reproduces the bug. The fix is a source-level embed.

The fix

  • New central registry src/core/schema-pack/bundled-packs.ts imports all 7 bundled pack YAMLs via import … with { type: 'file' } (Bun embeds the bytes; the import evaluates to a readable path in both bun run and --compile). Mirrors the existing WASM-embed pattern (src/core/chunkers/code.ts + scripts/check-wasm-embedded.sh). Exports BUNDLED_PACK_PATHS, BUNDLED_PACK_LIST, BUNDLED_PACK_NAMES, bundledPackPath().
  • Routed every pack-load surface through it (the previous code had four divergent locators): load-active.ts defaultPackLocator, the CLI schema show/validate/list/use/fork (packPathByName + runList), MCP list_schema_packs + schema_lint named-pack, and the read-only mutability set in mutate.ts (was 3 names, now all 7).
  • Fail-closed put_page — a configured pack (resolution source ≠ default) that won't load now throws instead of silently degrading; a default-pack load failure loud-warns.
  • Functional regression guard scripts/check-packs-embedded.sh + scripts/packs-smoketest.ts: compiles a probe and loads every bundled pack through the compiled binary (not strings | grep). Wired into check:all, run-verify-parallel, and ci-local. Updated the BUNDLED_PACK_NAMES size test 3 → 7. Docs refreshed.

Layer 1a — entity-slug floor (no literal-null/hyphen-flattened fact orphans)

  • extract.ts: a model-emitted entity of "null"/"none"/whitespace is coerced to JSON null (the fact stays unbound, never a null-slug orphan).
  • entities/resolve.ts: non-entity tokens → null binding (kept unbound via the existing legacy bucket, not dropped); the floor uses slugifyEntityPath, which preserves an explicit entity-prefix path (companies/Acme Cocompanies/acme-co) but flattens non-prefix slashes (A/B Partnersa-b-partners) so it can't mint arbitrary nested pages past the stub guard.
  • Read path (recall + MCP list_facts): a null resolution returns no facts instead of querying the raw string and surfacing legacy orphan rows.

Deferred (called out, not in this PR)

  • registry.ts resolvePack returns the child manifest unchanged for extends/borrow_from, so gbrain-recommended/gbrain-everything resolve with 0 page types. Not the outage (operational packs declare their types directly), but worth a follow-up to honor the lens-pack design.
  • Deriving the entity prefix set from the active pack's path_prefixes instead of the hardcoded people/+companies/.

Tests

typecheck clean · scripts/check-packs-embedded.sh passes (compiled-binary probe) · the schema-pack/facts/entity suites pass (246 tests). Verified live: a freshly compiled binary resolves the configured active pack and schema validate <lens-pack> reads its embedded /$bunfs/root/...yaml.

🤖 Generated with Claude Code

…ty-slug orphan floor

LAYER 0 — schema-pack compile bug (the foundational filing/taxonomy outage).
Bundled pack YAMLs were never embedded in `bun build --compile` binaries: the
locators resolved paths via `fileURLToPath(import.meta.url)+existsSync`, which
points at `/$bunfs/root/...` where the YAML isn't on disk. So `gbrain schema
active` => "unknown schema pack: gbrain-base" (the active pack extends
gbrain-base, so the bundled parent failing took the whole resolution down),
put_page silently degraded (skipped type validation → 32-vs-15 type sprawl),
and brain-taxonomist was neutered.

- New central registry src/core/schema-pack/bundled-packs.ts imports all 7
  bundled YAMLs via `import … with { type: 'file' }` (Bun embeds the bytes;
  the import is a readable path in both `bun run` and `--compile`). Mirrors the
  proven WASM-embed pattern.
- Route EVERY pack-load surface through it: load-active defaultPackLocator,
  CLI schema show/validate/list/use/fork (packPathByName + runList), MCP
  list_schema_packs + schema_lint named-pack, and the read-only mutability set
  (was 3 names, now all 7).
- put_page FAIL-CLOSED: a configured pack (resolution.source !== 'default')
  that won't load now throws instead of silently degrading; default-pack load
  failure loud-warns. (Stops the silent-degrade that accumulated the drift.)
- Functional regression guard scripts/check-packs-embedded.sh +
  packs-smoketest.ts (compiles a probe, loads all bundled packs through the
  compiled binary). Wired into check:all, run-verify-parallel (the CI gate),
  ci-local. Updated the BUNDLED_PACK_NAMES size test 3 → 7. Docs refreshed.

DEFERRED (noted): registry resolvePack returns the child manifest unchanged for
extends/borrow_from, so gbrain-recommended/everything resolve with 0 page types
— not the outage (jarvis-operational declares its types directly); reconciling
would change the live type set right before deploy.

LAYER 1a — entity-slug floor: no literal-null / hyphen-flattened orphans.
- extract.ts: a model-emitted `entity` of "null"/"none"/whitespace is coerced
  to JSON null (fact stays unbound, never a `null`-slug orphan).
- entities/resolve.ts: non-entity tokens → null binding (kept unbound via the
  existing backstop legacy bucket, not dropped); floor uses slugifyEntityPath
  which preserves an explicit ENTITY-prefix path (companies/Hermes Agent →
  companies/hermes-agent) instead of flattening to companies-hermes-agent, but
  flattens non-prefix slashes (A/B Partners → a-b-partners) so it can't mint
  arbitrary nested pages past the stub guard.
- Read path (recall + MCP list_facts): a null resolution returns no facts
  instead of querying the raw string and surfacing legacy orphan rows.
- Tests: junk→null, path-preserve vs flatten, backstop "null"→entity_slug NULL.

DEFERRED (noted): deriving the entity prefix set from the active pack's
path_prefixes instead of the hardcoded people/+companies/ (matches the live
pack today).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant