You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #100's Django Reinhardt cutover left 57/85 entities in wikidata_status=pending. Beyond the Whisper-hallucination class (tracked in #102), a meaningful fraction were correctly transcribed but mis-labelled by the extractor — for example, "Carnegie Hall" was extracted under entity_type city despite being a venue (which exists in musicbrainz.place).
Once the type is wrong, foreground MB resolution looks in the wrong table, finds nothing, the entity lands without an MBID, and the background Wikidata enrichment is also working with the wrong P31 class — so it can't recover. Better grounding at extraction time fixes this at the source for the entire pipeline.
The MB DB is already loaded locally and is itself the canonical truth we trust. We can use it both as a grounding signal in the extraction prompt and as a post-extraction validation gate.
Proposals
1. MB-aware type validation pass (highest leverage)
After the LLM extraction step but before resolution, run a cross-table MB lookup: for each (name, entity_type) extracted, check whether the name exists under the expected MB table or under a different MB table.
Three branches:
Outcome
Action
Exact match in expected table
Pass through unchanged.
No match in expected table, exact match in exactly one other MB-backed table
Reclassify to that table's entity_type. ("Carnegie Hall" → drop from city, attach as music_venue.)
Match in multiple other tables, or no match anywhere
Either re-prompt the extractor with the conflict surfaced, or pass through unchanged for the resolver to handle.
This costs only Postgres queries against the already-loaded MB DB. It also self-reinforces: the extractor's mistakes get corrected by the canonical source, and the corrected entities_json flows into resolution and embedding.
Where it lands: a new pipeline step between extract and resolve, or a section at the end of episodes/extractor.py.
Tests: "Carnegie Hall as city" → reclassified to music_venue, "Miles Davis as city" → no match in area, exact match in artist/Person → reclassified to musician, "Liberschi" → no MB match anywhere → passed through (no info gained).
episodes/initial_entity_types.yaml already has an examples field for each type (used inside the extraction system prompt). Today these are hand-written — small set, possibly stale.
Generate per-type examples directly from MB at app-startup or via a one-shot management command:
music_venue — sample top-N from musicbrainz.place (jazz-relevant or just popular).
musician — sample from musicbrainz.artist filtered by type=Person.
etc.
The extractor sees realistic, MB-faithful examples instead of curated ones. The classification distribution shifts toward labels that actually exist in MB.
Where it lands: a new manage.py refresh_entity_type_examples that pulls a sample from each MB table and updates EntityType.examples. Run after every MB dump re-import.
3. Reasoning prompt for ambiguous classifications (medium)
For each extracted entity, ask the extractor to briefly justify why this name fits this type. Chain-of-thought catches mis-labels in non-thinking models. Adds output tokens; can be turned off later for cost.
Where it lands: extend RESOLUTION_RESPONSE_SCHEMA (and the corresponding extraction schema) with a justification: str field per entity, log it, and use it for debugging the next round of mis-labels.
4. Confidence scoring (cheap, complementary)
Have the extractor emit confidence: low|medium|high per entity. Drop low before resolution, or surface them in the admin for review. Pure prompt engineering + schema update.
5. (Skip for now) Per-type extraction passes
Running 14 separate narrow extraction calls ("find only musicians", "find only venues") would improve precision but blow up LLM costs ~14×. Not justifiable until simpler steps are exhausted.
Suggested implementation order
MB-aware type validation pass (Replace Processing Pipeline table with Mermaid flowchart #3 in extraction priority list, biggest payoff). Wire it as a post-extraction filter that runs cross-table MB lookups via the existing episodes/musicbrainz.py plumbing. Add comprehensive tests with hand-rolled fixtures.
manage.py refresh_entity_type_examples: pull MB samples to populate EntityType.examples. One-shot, idempotent, run after MB re-imports.
Re-ingest the Django Reinhardt episode and measure: how many of the 57 PENDING entities resolve after these two changes? Compare against the baseline.
Add reasoning + confidence fields only if recall is still lagging.
Context
PR #100's Django Reinhardt cutover left 57/85 entities in
wikidata_status=pending. Beyond the Whisper-hallucination class (tracked in #102), a meaningful fraction were correctly transcribed but mis-labelled by the extractor — for example, "Carnegie Hall" was extracted under entity_typecitydespite being a venue (which exists inmusicbrainz.place).Once the type is wrong, foreground MB resolution looks in the wrong table, finds nothing, the entity lands without an MBID, and the background Wikidata enrichment is also working with the wrong P31 class — so it can't recover. Better grounding at extraction time fixes this at the source for the entire pipeline.
The MB DB is already loaded locally and is itself the canonical truth we trust. We can use it both as a grounding signal in the extraction prompt and as a post-extraction validation gate.
Proposals
1. MB-aware type validation pass (highest leverage)
After the LLM extraction step but before resolution, run a cross-table MB lookup: for each
(name, entity_type)extracted, check whether the name exists under the expected MB table or under a different MB table.Three branches:
city, attach asmusic_venue.)This costs only Postgres queries against the already-loaded MB DB. It also self-reinforces: the extractor's mistakes get corrected by the canonical source, and the corrected
entities_jsonflows into resolution and embedding.Where it lands: a new pipeline step between
extractandresolve, or a section at the end ofepisodes/extractor.py.Tests: "Carnegie Hall as city" → reclassified to music_venue, "Miles Davis as city" → no match in
area, exact match inartist/Person → reclassified to musician, "Liberschi" → no MB match anywhere → passed through (no info gained).2. MB-anchored prompt examples (cheap, complementary)
episodes/initial_entity_types.yamlalready has anexamplesfield for each type (used inside the extraction system prompt). Today these are hand-written — small set, possibly stale.Generate per-type examples directly from MB at app-startup or via a one-shot management command:
music_venue— sample top-N frommusicbrainz.place(jazz-relevant or just popular).musician— sample frommusicbrainz.artistfiltered bytype=Person.The extractor sees realistic, MB-faithful examples instead of curated ones. The classification distribution shifts toward labels that actually exist in MB.
Where it lands: a new
manage.py refresh_entity_type_examplesthat pulls a sample from each MB table and updatesEntityType.examples. Run after every MB dump re-import.3. Reasoning prompt for ambiguous classifications (medium)
For each extracted entity, ask the extractor to briefly justify why this name fits this type. Chain-of-thought catches mis-labels in non-thinking models. Adds output tokens; can be turned off later for cost.
Where it lands: extend
RESOLUTION_RESPONSE_SCHEMA(and the corresponding extraction schema) with ajustification: strfield per entity, log it, and use it for debugging the next round of mis-labels.4. Confidence scoring (cheap, complementary)
Have the extractor emit
confidence: low|medium|highper entity. Droplowbefore resolution, or surface them in the admin for review. Pure prompt engineering + schema update.5. (Skip for now) Per-type extraction passes
Running 14 separate narrow extraction calls ("find only musicians", "find only venues") would improve precision but blow up LLM costs ~14×. Not justifiable until simpler steps are exhausted.
Suggested implementation order
episodes/musicbrainz.pyplumbing. Add comprehensive tests with hand-rolled fixtures.manage.py refresh_entity_type_examples: pull MB samples to populateEntityType.examples. One-shot, idempotent, run after MB re-imports.Out of scope
Linked PR: #100.