Wikidata enrichment enqueued with string field names instead of entity IDs

## Symptom

After processing an episode, two ``enrich_entity_wikidata`` workflows show as ERROR in the Episode admin's "View workflow steps" page (``/admin/episodes/episode/<id>/dbos-steps/``), with no step records:

```
episode-1-run-1-10  ERROR  wikidata_enrichment   2026-05-04 14:25
episode-1-run-1-11  ERROR  wikidata_enrichment   2026-05-04 14:25
```

The main pipeline workflow (``episode-1-run-1``) succeeds — the episode reaches ``ready``. Only the downstream Wikidata enrichments fail.

The Django/uvicorn log shows two stack traces, both from ``Entity.objects.get(pk=entity_id)`` in ``episodes/enrichment.py:_fetch_entity``:

```
ValueError: Field 'id' expected a number but got 'status'.
ValueError: Field 'id' expected a number but got 'episode_id'.
```

## Smoking gun

The ``resolve_step`` admin row shows that the resolver returned literal field-name strings as entity IDs:

```
ResolveStepOutput(
  episode_id=1,
  step_name=Episode.Status.RESOLVING,
  entity_ids_to_enrich=('status', 'episode_id')   ← bug
)
```

Confirmed at the queue level by decoding ``dbos.workflow_status.inputs`` (base64 pickle):

```python
# episode-1-run-1-10
{'args': ('status',), 'kwargs': {}}

# episode-1-run-1-11
{'args': ('episode_id',), 'kwargs': {}}
```

So the strings were already what the resolver enqueued — this is a bug **inside ``resolver.resolve_entities()``**, not in the workflow plumbing or in DBOS replay/deserialization.

## Steps to reproduce

1. Fresh dev DB:
   ```bash
   uv run python manage.py dbreset --yes
   uv run python manage.py migrate
   uv run python manage.py load_entity_types
   ```
2. Start an ASGI worker:
   ```bash
   uv run uvicorn ragtime.asgi:application --host 127.0.0.1 --port 8000
   ```
3. From a separate terminal, submit any episode whose extracted text plausibly produces words like "status" or "episode" (the ARD URL we used reproduces it):
   ```bash
   uv run python manage.py submit_episode "https://www.ardsounds.de/episode/urn:ard:episode:fdcf93eef8395b35/"
   ```
4. Wait for the pipeline to complete. Check ``/admin/episodes/episode/<id>/dbos-steps/`` — two ``enrich_entity_wikidata`` workflows should appear with status ERROR.

Reproducible on ``main`` (the resolver/enrichment code paths are unchanged by recent PRs #129/#131).

## Diagnostic queries

Run against the dev DB (substitute ``$RAGTIME_DB_USER`` for your configured user):

**1. Verify EntityType keys are clean** — should be the 14 canonical jazz types, no ``status`` / ``episode_id``:

```bash
docker exec ragtime-postgres-1 psql -U "$RAGTIME_DB_USER" -d ragtime -c \"
  SELECT key, name, is_active FROM episodes_entitytype ORDER BY key;
\"
```

**2. Verify chunk.entities_json top-level keys are clean** — same expectation:

```bash
docker exec ragtime-postgres-1 psql -U \"$RAGTIME_DB_USER\" -d ragtime -c \"
  SELECT DISTINCT jsonb_object_keys(entities_json::jsonb) AS type_key
  FROM episodes_chunk
  WHERE episode_id = 1 AND entities_json IS NOT NULL
  ORDER BY type_key;
\"
```

**3. List entities actually persisted for the episode** — does any have a name that matches the bad strings? Are PKs all integers as expected?

```bash
docker exec ragtime-postgres-1 psql -U \"$RAGTIME_DB_USER\" -d ragtime -c \"
  SELECT e.id, e.name, et.key AS type_key, e.wikidata_status, e.wikidata_attempts
  FROM episodes_entity e
  JOIN episodes_entitytype et ON et.id = e.entity_type_id
  WHERE e.id IN (
    SELECT DISTINCT entity_id FROM episodes_entitymention WHERE episode_id = 1
  )
  ORDER BY e.id;
\"
```

**4. Inspect the failed enrichment workflows' pickled inputs** — confirms what was on the wire:

```bash
docker exec ragtime-postgres-1 psql -U \"$RAGTIME_DB_USER\" -d ragtime -c \"
  SELECT workflow_uuid, status, substring(inputs from 1 for 200) AS inputs_preview
  FROM dbos.workflow_status
  WHERE name LIKE '%enrich_entity_wikidata%'
  ORDER BY created_at DESC LIMIT 5;
\"
```

The ``inputs`` column is base64-encoded pickle. Decode in Python:

```python
import base64, pickle
pickle.loads(base64.b64decode(\"<inputs string>\"))
# → {'args': ('status',), 'kwargs': {}}
```

**5. Inspect the chunk JSON content for entity *names* (not just type keys)** — the names that flow into ``unique_names`` in the resolver:

```bash
docker exec ragtime-postgres-1 psql -U \"$RAGTIME_DB_USER\" -d ragtime -c \"
  SELECT index, jsonb_pretty(entities_json::jsonb)
  FROM episodes_chunk
  WHERE episode_id = 1 AND entities_json IS NOT NULL
  ORDER BY index LIMIT 1;
\"
```

## Code-level analysis

The only code path that populates ``entities_to_enrich`` is in ``episodes/resolver.py``:

```python
# resolver.py:274
entities_to_enrich: set[int] = set()

# resolver.py:284 — the only .add() call
def _maybe_enqueue(entity: Entity) -> None:
    if (...):
        entities_to_enrich.add(entity.pk)
```

``_maybe_enqueue`` is called from three places, each of which assigns ``entity`` from either:

- ``Entity.objects.get_or_create(...)`` (via ``_get_or_create_entity``)
- ``existing_by_id[matched_id]``
- ``existing_by_mbid[mbid]``

All three should yield ``Entity`` instances with integer ``pk``. The bug must therefore be one of:

1. **Some other call site we haven't found** that mutates ``entities_to_enrich`` (e.g. ``.update()`` with an iterable of strings).
2. **An unexpected object passed to ``_maybe_enqueue``** that has ``.wikidata_id`` / ``.wikidata_status`` / ``.wikidata_attempts`` attributes (so the guard succeeds) but whose ``.pk`` returns a string. No model in the codebase obviously fits that shape.
3. **An LLM resolution response leaking** — e.g. ``match[\"matched_entity_id\"]`` returns the string ``\"status\"``, which then ends up assigned to ``entity`` somehow. The resolution schema constrains it to ``[\"integer\", \"null\"]``, but a non-strict provider might let strings through.

The two strings (``status``, ``episode_id``) match Django ``Episode``-model and ``EntityMention``-FK field names respectively. That co-occurrence suggests something is iterating over a Django model's field-name introspection (``_meta.fields``, ``__dict__``, etc.) rather than over actual entity IDs — but I haven't located that path. Worth scanning for any code that hands ``entity._meta`` or similar to the resolver.

## Impact

- **Per occurrence:** two ``Entity`` rows whose enrichment is permanently stuck — ``wikidata_status='pending'``, ``wikidata_attempts=1`` after the failed run. ``manage.py enrich_entities`` will retry them, but it'll re-fetch with the same string IDs and fail the same way (the workflow re-uses the persisted bad args via DBOS workflow recovery semantics).
- **Per episode:** the main pipeline succeeds and the episode reaches ``ready``, so the user-facing impact is missing Wikidata IDs on a couple of entities. Search-time hydration in ``vector_store.search_chunks()`` simply returns ``None`` for those entities' Q-IDs.
- **Pre-existing:** unchanged by recent PRs #129 / #131. This bug is likely on every episode that exercises the affected resolver code path.

## Acceptance criteria

- [ ] Identify the call site that produces string values in ``entities_to_enrich``. (Most likely a small targeted change once found.)
- [ ] Add a unit test against ``resolver.resolve_entities()`` with a fixture that triggers the path — assert the returned list contains only integers (``all(isinstance(x, int) for x in ids)``).
- [ ] Re-run the reproduction steps above; both ``enrich_entity_wikidata`` workflows succeed (or short-circuit cleanly per the resolver-level idempotency rules).

## Out of scope

- Driver-level hardening of ``_fetch_entity`` to log + skip on non-int input. Defense-in-depth, but the right fix is to stop the bad data at the source.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wikidata enrichment enqueued with string field names instead of entity IDs #138

Symptom

Smoking gun

Steps to reproduce

Diagnostic queries

Code-level analysis

Impact

Acceptance criteria

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Wikidata enrichment enqueued with string field names instead of entity IDs #138

Description

Symptom

Smoking gun

Steps to reproduce

Diagnostic queries

Code-level analysis

Impact

Acceptance criteria

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions