diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0c13e58..8cf0f6a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,8 +6,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## 2026-05-04
+### Added
+
+- `Episode.show_name` `CharField` (max 255, blank default) populated by the fetch_details agent from ``, ``, RSS ``, JSON-LD `isPartOf.name` / `partOfSeries.name`, or the visible publisher heading. Persisted only when the agent's extracted value is non-empty (a re-run that fails to extract leaves a previously-good value or admin edit in place). Migration `0025_episode_show_name`. No backfill — pre-prod data freedom per `feedback_reembed_ok_preprod.md`.
+- `EpisodeCandidate.published_at` (`date | None`) on the podcast-aggregator dataclass. Each aggregator now extracts a publication date from its native field (`pubdate` for fyyd, `releaseDate` for iTunes, `datePublished` epoch seconds for podcastindex) and logs + returns `None` on missing or malformed input rather than dropping the candidate. Surfaced through `DownloadDeps.published_at` and `IndexCandidate.published_at` so the download agent sees per-episode and per-candidate dates.
+
### Changed
+- Download agent's `_show_name(episode)` cascade now prefers `episode.show_name`, falling back to the URL host only as a defense-in-depth signal. The agent prompt is updated to detect hostname-shaped `Show` values (contains `.` and no spaces) and switch to a `(title, published_at)` match with ±1 day tolerance instead of requiring an exact `show_name` string match. Real broadcast titles still use the existing show-plus-title path. Closes #111 — [plan](doc/plans/2026-05-04-download-show-name-fix.md), [feature](doc/features/2026-05-04-download-show-name-fix.md), [planning session](doc/sessions/2026-05-04-download-show-name-fix-planning-session.md), [implementation session](doc/sessions/2026-05-04-download-show-name-fix-implementation-session.md)
- AGENTS.md and the Feature PR Documentation Bundle AI check now recognize **agent-orchestrated sessions**. When a parallel implementation agent is launched from a parent Claude Code session (e.g. under Conductor) and has no direct user-to-implementation-agent messages, the transcript may use `### Parent agent (orchestrator)` headings *instead of* `### User`, provided the parent-agent's launching prompt is reproduced verbatim. The transcript must declare the session as agent-orchestrated at the top of `## Detailed conversation`. Same verbatim rule applies — summarized parent prompts are still rejected. This is a policy clarification only; no code changes.
- AI checks workflow now renders non-applicable rules as gray "skipped" icons instead of green "pass". `list_ai_checks.py` evaluates each rule's `paths:` frontmatter against `git diff --name-only $BASE_REF...HEAD` and emits `applies: bool` per matrix include; `.github/workflows/ai-checks.yml` gates each `check` shard on `if: ${{ ... && matrix.applies }}`, so non-applicable shards skip at the GHA level — no runner spin-up, no model call, no token cost. Four rules gain `paths:` frontmatter (`pipeline-step-sync`, `asgi-wsgi-scott`, `qdrant-payload-slim`, `entity-creation-race-safety`); the other five remain semantic. The driver's verdict tool drops `skip` from its enum: semantic non-applicability is now `pass` with `summary: "Rule does not apply."` and a one-line `details` explanation. Closes #124 — [plan](doc/plans/2026-05-04-ai-checks-skipped.md), [feature](doc/features/2026-05-04-ai-checks-skipped.md), [planning session](doc/sessions/2026-05-04-ai-checks-skipped-planning-session.md), [implementation session](doc/sessions/2026-05-04-ai-checks-skipped-implementation-session.md)
- **BREAKING** — `StepFailed`-derived exceptions now pickle as `RuntimeError(message)` rather than as their typed subclass. Pre-fix workflow rows in the DBOS `dbos.workflow_status` table will not deserialize to readable text — the Episode admin's "View workflow steps" page will show a base64 preview only. Action required: reprocess affected episodes (which produces fresh, portable pickles) or clear the workflow_status table in dev environments. No production impact since this project is pre-prod. See PR #129 for details.
diff --git a/doc/README.md b/doc/README.md
index a02c54e..b7747f4 100644
--- a/doc/README.md
+++ b/doc/README.md
@@ -36,7 +36,7 @@ Three keyless tools live in [`episodes/agents/fetch_details_tools.py`](../episod
The agent emits a wrapped `FetchDetailsOutput { details, report, concise }`:
-- `details` — episode-level facts: `title`, `description`, `published_at`, `image_url`, `audio_url`, `audio_format` (closed `Literal`), `language` (ISO 639-1), `country` (ISO 3166-1 alpha-2), `guid`, `canonical_url`, `source_kind` (`canonical | aggregator | unknown`), `aggregator_provider`.
+- `details` — episode-level facts: `title`, `show_name` (broadcast / podcast title, e.g. "Zeitzeichen" — not the publisher's company name and not the URL hostname), `description`, `published_at`, `image_url`, `audio_url`, `audio_format` (closed `Literal`), `language` (ISO 639-1), `country` (ISO 3166-1 alpha-2), `guid`, `canonical_url`, `source_kind` (`canonical | aggregator | unknown`), `aggregator_provider`.
- `report` — structured trace: `attempted_sources`, `discovered_canonical_url`, `discovered_audio_url`, `cross_linked`, `extraction_confidence` (`high | medium | low`), `narrative` (2–4 sentences), `hints_for_next_step` (carried into the Download step).
- `concise` — `outcome` (5-value enum) + `summary` (≤140 chars).
@@ -52,7 +52,7 @@ Five outcomes drive the step's status transitions:
Discrimination among the three terminal outcomes happens on `FetchDetailsRun.outcome` — only one `Episode.Status.FAILED` value is used.
-Every run persists a `FetchDetailsRun` row carrying the structured output, the auto-captured tool-call trace (input / output excerpts / `ok` flag), the Pydantic AI usage dict, and the DBOS workflow ID. `Episode` columns are overwritten directly by the agent's authoritative output (no empty-field-only merge); a re-run via the admin `reprocess` action increments `run_index` and overwrites again.
+Every run persists a `FetchDetailsRun` row carrying the structured output, the auto-captured tool-call trace (input / output excerpts / `ok` flag), the Pydantic AI usage dict, and the DBOS workflow ID. `Episode` columns are overwritten directly by the agent's authoritative output (no empty-field-only merge); a re-run via the admin `reprocess` action increments `run_index` and overwrites again. The one exception is `show_name`, which is **additive only** — when a fresh run fails to extract a value, any previously-good `show_name` (whether from an earlier run or an admin edit) is preserved rather than cleared.
The step orchestrator ([`episodes/fetch_details_step.py`](../episodes/fetch_details_step.py)) is DBOS-agnostic: the `@DBOS.step()` wrapper in `episodes/workflows.py` reads `DBOS.workflow_id` and passes it in. The orchestrator records it onto `FetchDetailsRun.dbos_workflow_id` for cross-reference forensics.
diff --git a/doc/features/2026-05-04-download-show-name-fix.md b/doc/features/2026-05-04-download-show-name-fix.md
new file mode 100644
index 0000000..a450329
--- /dev/null
+++ b/doc/features/2026-05-04-download-show-name-fix.md
@@ -0,0 +1,127 @@
+# Download show_name + published_at fix
+
+**Date:** 2026-05-04
+
+## Problem
+
+The Download agent rejected every fyyd / podcastindex candidate for the
+canonical ARD Sounds test episode
+(`https://www.ardsounds.de/episode/urn:ard:episode:fdcf93eef8395b35/`)
+even though fyyd had a clean Akamai enclosure URL for it. Two factors
+combined:
+
+1. `Episode` had no `show_name` field. `episodes/downloader._show_name`
+ fell back to the URL host (`www.ardsounds.de`), which never matches
+ a real broadcast title (`Zeitzeichen`).
+2. The download agent's system prompt encouraged a strict show / title
+ match — so the LLM played it safe and rejected the candidate set.
+
+## Changes
+
+Three coordinated layers, single commit set.
+
+### Layer 1 — `Episode.show_name` source of truth
+
+- New `Episode.show_name = CharField(max_length=255, blank=True, default="")`.
+- Migration `0025_episode_show_name`. No backfill (pre-prod).
+- `EpisodeDetails` Pydantic schema in `episodes/agents/fetch_details.py`
+ gains `show_name: str | None = None`. The system prompt instructs the
+ agent to extract from (in order): ``,
+ ``, RSS ``, JSON-LD
+ `isPartOf.name` / `partOfSeries.name`, then the visible publisher
+ heading. Hostnames and company names are explicitly out of scope.
+- `episodes/fetch_details_step.py` persists
+ `EpisodeDetails.show_name → Episode.show_name` only when the agent's
+ value is non-empty. A re-run that fails to extract a show name leaves
+ the previously-good value (or admin edit) in place.
+- `episodes/downloader._show_name(episode)` cascades:
+ `episode.show_name` → URL host → `""`. The URL host fallback stays as
+ a defense-in-depth so the agent always receives some `show_name`
+ context, but the agent prompt now treats hostname-shaped values as a
+ degraded signal.
+
+### Layer 2 — `published_at` as a tiebreaker
+
+- `EpisodeCandidate` (in `episodes/podcast_aggregators/base.py`) gains
+ `published_at: date | None = None`.
+- Each aggregator now extracts a publication date and logs+returns
+ `None` on missing / malformed input rather than dropping the candidate:
+ - **fyyd**: parses `item["pubdate"]` (string,
+ `"YYYY-MM-DD HH:MM:SS"` per fyyd docs; also handles ISO 8601 and
+ bare `YYYY-MM-DD`).
+ - **iTunes**: parses `item["releaseDate"]` (ISO 8601 datetime, e.g.
+ `"2024-08-30T04:00:00Z"`).
+ - **podcastindex**: converts `item["datePublished"]` (Unix epoch
+ seconds, ints or numeric strings) to UTC date.
+- `DownloadDeps` (in `episodes/agents/download_deps.py`) gains
+ `published_at: date | None = None`.
+- `IndexCandidate` in `episodes/agents/download_tools.py` gains
+ `published_at: date | None = None` so the agent sees per-candidate
+ dates inside `lookup_podcast_index` results.
+- `episodes/downloader.download_episode` passes `episode.published_at`
+ through to `run_download_agent` → `_run_agent_async` → `DownloadDeps`.
+
+### Layer 3 — hostname-aware download agent prompt
+
+`DOWNLOAD_SYSTEM_PROMPT` in `episodes/agents/download.py` now:
+
+- Surfaces the episode's `Published` date alongside title / show.
+- Tells the agent: when `Show` contains a `.` and no spaces (i.e. it
+ looks like a hostname), treat it as a degraded signal — do NOT
+ require exact string match on candidate `show_name`. Instead, prefer
+ matching candidates by `(title, published_at)`, with a window of ±1
+ day. When `Published` is unknown, fall back to title similarity alone.
+- Real show titles still use the existing show-plus-title match.
+
+## Key parameters
+
+- `Episode.show_name max_length = 255` — comfortable for very long
+ podcast titles (Apple Podcasts allows up to 255).
+- Date match window: ±1 day. Looser than exact match (covers timezone
+ drift between publisher and aggregator) but tighter than e.g. a
+ week (avoids matching an unrelated re-broadcast).
+
+## Verification
+
+Tests:
+
+```bash
+uv run python manage.py test
+# 384 tests pass.
+```
+
+Manual (post-deploy / pre-merge with reviewer — see PR body):
+
+1. Submit `https://www.ardsounds.de/episode/urn:ard:episode:fdcf93eef8395b35/`
+ via the admin "Submit episode" form.
+2. Watch fetch_details populate `Episode.show_name = "Zeitzeichen"` (or
+ similar).
+3. Confirm download step reaches `READY` via the index path with
+ `source="fyyd"`.
+4. `dbos workflow steps ` shows `download_step` succeeded.
+5. Smoke-test on one English episode (e.g. an iTunes-indexed show).
+6. Smoke-test on an episode where fetch_details fails to extract a
+ show name → confirm host-fallback path still works (agent reads
+ the hostname-shaped show_name, switches to `(title, published_at)`
+ match).
+
+## Files modified
+
+| Path | Summary |
+|------|---------|
+| `episodes/models.py` | Add `show_name` `CharField`. |
+| `episodes/migrations/0025_episode_show_name.py` | New migration. |
+| `episodes/agents/fetch_details.py` | Add `show_name` to `EpisodeDetails`; extend system prompt. |
+| `episodes/fetch_details_step.py` | Persist `show_name` only when non-empty. |
+| `episodes/downloader.py` | Cascade in `_show_name`; pass `published_at` through. |
+| `episodes/podcast_aggregators/base.py` | Add `published_at` to `EpisodeCandidate`. |
+| `episodes/podcast_aggregators/fyyd.py` | Parse `pubdate` → `date`. |
+| `episodes/podcast_aggregators/itunes.py` | Parse `releaseDate` → `date`. |
+| `episodes/podcast_aggregators/podcastindex.py` | Parse `datePublished` (epoch) → `date`. |
+| `episodes/agents/download_deps.py` | Add `published_at` to `DownloadDeps`. |
+| `episodes/agents/download_tools.py` | Add `published_at` to `IndexCandidate`; surface in tool output. |
+| `episodes/agents/download.py` | Hostname-aware prompt; plumb `published_at`. |
+| `episodes/tests/test_models.py` | `show_name` default + persistence. |
+| `episodes/tests/test_podcast_aggregators.py` | Pubdate parse / missing / malformed for all three aggregators. |
+| `episodes/tests/test_download.py` | `_show_name` cascade tests. |
+| `CHANGELOG.md` | Entry under `## 2026-05-04`. |
diff --git a/doc/plans/2026-05-04-download-show-name-fix.md b/doc/plans/2026-05-04-download-show-name-fix.md
new file mode 100644
index 0000000..3d6ee0c
--- /dev/null
+++ b/doc/plans/2026-05-04-download-show-name-fix.md
@@ -0,0 +1,124 @@
+# Download show_name + published_at fix (issue #111)
+
+**Date:** 2026-05-04
+
+## Summary
+
+Tighten the download agent's index-candidate matching for non-English shows
+whose extracted show name is empty by fixing the root cause (no `show_name`
+extraction in fetch-details) rather than papering over it in the agent
+prompt alone.
+
+## Problem
+
+Submitting `https://www.ardsounds.de/episode/urn:ard:episode:fdcf93eef8395b35/`
+fails the download step even though fyyd carries a clean enclosure URL for
+the episode. Two factors interact:
+
+1. **No `show_name` source of truth.** `Episode` has no `show_name` field
+ and `fetch_details` doesn't extract one. `episodes/downloader._show_name`
+ falls back to the URL host (`www.ardsounds.de`).
+2. **Strict match in the agent prompt.** With `show_name = "www.ardsounds.de"`
+ on the episode and `show_name = "Zeitzeichen"` on every fyyd candidate,
+ the agent rejects every candidate and the episode goes to `FAILED`.
+
+## Plan
+
+Three layers, bundled into one PR.
+
+### Layer 1 — real `show_name`
+
+- Add `Episode.show_name = CharField(max_length=255, blank=True, default="")`.
+ Generate Django migration. No backfill (project is pre-prod).
+- Extend `EpisodeDetails` schema in `episodes/agents/fetch_details.py` with
+ `show_name: str | None = None`. Update the system prompt to extract
+ `show_name` from ``,
+ ``, RSS ``, JSON-LD
+ `isPartOf.name`, and the visible publisher / show heading.
+- Persist `EpisodeDetails.show_name → Episode.show_name` in
+ `episodes/fetch_details_step.py` only when the value is non-empty (don't
+ wipe a previously-good value or a user edit on a re-run that fails to
+ extract).
+- Update `episodes/downloader._show_name(episode)` to prefer
+ `episode.show_name`, with the URL host as a defense-in-depth fallback.
+
+### Layer 2 — date as a tiebreaker
+
+- Add `published_at: date | None = None` to
+ `episodes/podcast_aggregators/base.EpisodeCandidate`.
+- Plumb pubdate through each aggregator's `_candidate()`:
+ - **fyyd**: `item["pubdate"]` (ISO 8601 string like
+ `"2024-08-30 04:00:00"`) → `date`.
+ - **iTunes**: `item["releaseDate"]` (ISO 8601 datetime) → `date`.
+ - **podcastindex**: `item["datePublished"]` (Unix epoch seconds) →
+ `date`.
+ - On parse failure: leave `published_at = None` and log a warning. Do
+ NOT drop the candidate.
+- Pass `episode.published_at` through `DownloadDeps` and the download
+ agent's prompt template.
+- Surface `published_at` on each `IndexCandidate` returned by
+ `lookup_podcast_index`.
+
+### Layer 3 — looser, hostname-aware prompt
+
+Update the download agent system prompt in `episodes/agents/download.py`:
+
+> `show_name` may be a publisher hostname rather than the broadcast title
+> (e.g. `www.ardsounds.de` instead of `Zeitzeichen`). When `show_name`
+> looks like a hostname (contains `.` and no spaces), do NOT require an
+> exact string match against the candidate's `show_name`. Instead, prefer
+> matching candidates by `(title, published_at)`. A candidate is a strong
+> match when its title closely matches the episode title and its
+> `published_at` is within ±1 day of the episode's `published_at`.
+
+## Decisions
+
+- **Bundle all three layers**: Layer 1 alone leaves the prompt strict;
+ Layer 3 alone papers over the missing field. Doing all three in one
+ commit gives the agent both real data and the right matching policy.
+- **Add new model field, no backfill**: `Episode.show_name` is a new
+ `CharField` with empty default. Pre-prod data per
+ `feedback_reembed_ok_preprod.md`.
+- **Manual verification in PR description**: requires a live ASGI server,
+ provider keys, and the canonical ARD episode. Not run by the
+ implementation agent — flagged in the PR body for the reviewer.
+
+## Test plan
+
+- Each aggregator's `_candidate()` populates `published_at` from a canned
+ payload.
+- Each aggregator handles missing / malformed pub dates gracefully
+ (returns `published_at=None`, doesn't drop the candidate).
+- `_show_name(episode)` returns `episode.show_name` when set, falls back
+ to URL host otherwise, returns `""` for a URL with no host.
+- `Episode.show_name` is blank by default and persists when set.
+- `manage.py makemigrations --check` passes (no further migrations
+ needed).
+- Full test suite passes via `uv run python manage.py test`.
+
+## Files touched
+
+- `episodes/models.py` — add `show_name` field.
+- `episodes/migrations/0025_episode_show_name.py` — new migration.
+- `episodes/agents/fetch_details.py` — `EpisodeDetails.show_name` +
+ prompt update.
+- `episodes/fetch_details_step.py` — persist `show_name` when non-empty.
+- `episodes/downloader.py` — `_show_name` cascade; pass `published_at`
+ to the agent.
+- `episodes/podcast_aggregators/base.py` — `EpisodeCandidate.published_at`.
+- `episodes/podcast_aggregators/{fyyd,itunes,podcastindex}.py` — parse
+ pubdate, log + skip on failure.
+- `episodes/agents/download_deps.py` — `DownloadDeps.published_at`.
+- `episodes/agents/download_tools.py` — `IndexCandidate.published_at`,
+ surface in `lookup_podcast_index` return value.
+- `episodes/agents/download.py` — extend prompt, plumb `published_at`
+ through `_run_agent_async` / `run_download_agent`.
+- `episodes/tests/test_models.py` — `show_name` blank default + persistence.
+- `episodes/tests/test_podcast_aggregators.py` — three pubdate parse
+ scenarios per aggregator.
+- `episodes/tests/test_download.py` — `_show_name` cascade tests.
+- `doc/plans/2026-05-04-download-show-name-fix.md` — this file.
+- `doc/features/2026-05-04-download-show-name-fix.md` — implementation doc.
+- `doc/sessions/2026-05-04-download-show-name-fix-planning-session.md`
+- `doc/sessions/2026-05-04-download-show-name-fix-implementation-session.md`
+- `CHANGELOG.md` — entry under `## 2026-05-04`.
diff --git a/doc/sessions/2026-05-04-download-show-name-fix-implementation-session.md b/doc/sessions/2026-05-04-download-show-name-fix-implementation-session.md
new file mode 100644
index 0000000..9bb4d42
--- /dev/null
+++ b/doc/sessions/2026-05-04-download-show-name-fix-implementation-session.md
@@ -0,0 +1,136 @@
+# Download show_name + published_at implementation session
+
+**Date:** 2026-05-04
+
+**Session ID:** unavailable
+
+## Summary
+
+Implementation session for the three-layer fix planned in
+`doc/plans/2026-05-04-download-show-name-fix.md`: real `show_name`
+source of truth, `published_at` plumbing through aggregators, and an
+updated download agent prompt that handles hostname-shaped show names.
+Carried out on branch `rafacm/download-show-name-fix`. All 384 tests
+pass.
+
+## Detailed conversation
+
+> **Agent-orchestrated session.** This implementation session was
+> launched by a parent Claude Code (Conductor) agent rather than
+> directly by the user — the parent authorized the parallel session
+> and handed the implementation agent a summarized brief, not a
+> user-authored prompt. Per AGENTS.md's "Agent-orchestrated sessions"
+> convention, the section below uses `### Parent agent (orchestrator)`
+> instead of `### User`. The parent-agent's verbatim launching prompt
+> could not be recovered from session logs and is therefore left
+> unavailable; the substance of the brief is summarized in prose
+> rather than fabricated as a quote.
+
+### Parent agent (orchestrator)
+
+*Verbatim launching prompt unavailable.* Substance of the brief
+relayed to the implementation agent: implement issue #111. Verify
+staleness of the headline lines first (`_show_name`,
+`Episode.show_name`, `EpisodeDetails`, `EpisodeCandidate`, the three
+aggregator `_candidate()` methods). Then carry out the three-layer
+plan: add `Episode.show_name` (no backfill) + extract via
+fetch_details; add `published_at` to `EpisodeCandidate` + plumb
+through fyyd / iTunes / podcastindex; relax the download agent prompt
+to handle hostname-shaped show names. Manual verification belongs in
+the PR body. Document and PR via the project's normal flow.
+
+### Assistant
+
+**Staleness check.** Read each headline file. Confirmed:
+- `episodes/downloader._show_name` still falls back to URL host.
+- `Episode.show_name` was absent (`published_at` already present).
+- `EpisodeDetails` did not extract `show_name`.
+- `EpisodeCandidate` had no `published_at` field.
+- None of fyyd / iTunes / podcastindex `_candidate()` parsed pubdate.
+
+Plan still applies in full.
+
+**Branch.** Created `rafacm/download-show-name-fix` off the worktree's
+main equivalent.
+
+**Layer 1 — `show_name`.**
+- Added `show_name = CharField(max_length=255, blank=True, default="")`
+ to `Episode`.
+- Generated `episodes/migrations/0025_episode_show_name.py` via
+ `uv run python manage.py makemigrations`.
+- Added `show_name: str | None = None` to `EpisodeDetails` in
+ `episodes/agents/fetch_details.py`. Updated the system prompt to
+ enumerate the extraction sources (`og:site_name`,
+ `application-name`, RSS ``, JSON-LD
+ `isPartOf.name` / `partOfSeries.name`, visible publisher heading)
+ and to forbid using the URL hostname or company name.
+- Updated `episodes/fetch_details_step.py:_apply_details` to persist
+ `show_name` only when the agent's value is non-empty (additive
+ semantics — never wipe a previously-good value or admin edit on a
+ partial re-run).
+- Updated `episodes/downloader._show_name` to prefer
+ `episode.show_name` and fall back to URL host.
+
+**Layer 2 — `published_at`.**
+- Added `published_at: date | None = None` to `EpisodeCandidate` in
+ `episodes/podcast_aggregators/base.py`.
+- fyyd: added `_parse_pubdate(raw)` accepting fyyd's documented
+ `"YYYY-MM-DD HH:MM:SS"` plus ISO 8601 / `T`-separated / bare-date
+ variants. Logs a warning and returns `None` on unparseable input —
+ never raises.
+- iTunes: added `_parse_release_date(raw)` using `datetime.fromisoformat`
+ after replacing trailing `Z` with `+00:00`.
+- podcastindex: added `_parse_date_published(raw)` accepting `int`,
+ `float`, or numeric string Unix epoch seconds. Rejects `bool`
+ (subclass of `int`) explicitly.
+- All three log + return `None` on missing / malformed input. None
+ drops the candidate.
+- Added `published_at: date | None = None` to `DownloadDeps`.
+- Added `published_at: date | None = None` to `IndexCandidate` in
+ `episodes/agents/download_tools.py` and propagated it from each
+ `EpisodeCandidate` in `lookup_podcast_index`.
+- Updated `episodes/agents/download._run_agent_async` and
+ `run_download_agent` signatures to accept `published_at: date | None`,
+ forwarded into `DownloadDeps`.
+- Updated `episodes/downloader.download_episode` to pass
+ `episode.published_at` to `run_download_agent`.
+
+**Layer 3 — prompt update.**
+- Surfaced `Published: {published_at}` in the agent context block.
+- Replaced the strict "if a candidate looks right" wording with a
+ detailed candidate-picking section that:
+ - Tells the agent to detect hostname-shaped `Show` values (contain
+ `.` and no spaces) and switch to `(title, published_at)` matching.
+ - Defines a "strong match" as ~equal title plus `published_at` ±1
+ day.
+ - Falls back to title similarity alone when `Published` is unknown.
+ - Keeps the existing real-show-title path for the well-behaved case.
+
+**Tests added.**
+- `episodes/tests/test_models.py` — `show_name` blank by default;
+ persists when set.
+- `episodes/tests/test_podcast_aggregators.py` — for each of fyyd /
+ iTunes / podcastindex: pubdate parsed correctly; missing pubdate
+ returns `None`; malformed pubdate returns `None` without dropping
+ the candidate.
+- `episodes/tests/test_download.py` — `ShowNameTests`: `_show_name`
+ returns `episode.show_name` when set; falls back to host; returns
+ `""` for a URL with no host.
+
+**Test run.** `uv run python manage.py test` — 384 tests passing.
+`uv run python manage.py makemigrations --check --dry-run` — no
+further migrations needed.
+
+**Documentation.** Wrote plan, feature, planning session, and this
+implementation session under `doc/`. Added a CHANGELOG entry under
+`## 2026-05-04`.
+
+**Manual verification.** Not run by the implementation agent —
+included as a checklist in the PR body per the user's instruction.
+The reviewer will exercise the canonical ARD Sounds URL on a live
+ASGI worker with provider keys configured.
+
+**PR.** Created against `main` with rebase merge strategy. Title:
+"Plumb show_name + published_at through download cascade (closes
+#111)". Body includes the manual verification checklist plus a
+short summary of the three layers.
diff --git a/doc/sessions/2026-05-04-download-show-name-fix-planning-session.md b/doc/sessions/2026-05-04-download-show-name-fix-planning-session.md
new file mode 100644
index 0000000..0a52834
--- /dev/null
+++ b/doc/sessions/2026-05-04-download-show-name-fix-planning-session.md
@@ -0,0 +1,84 @@
+# Download show_name + published_at planning session
+
+**Date:** 2026-05-04
+
+**Session ID:** unavailable
+
+## Summary
+
+Planning session that turned issue #111 into a three-layer plan:
+(1) extract a real `show_name` in fetch_details and persist it on the
+Episode; (2) plumb `published_at` from the aggregator candidates
+through to the download agent so it can match by `(title, published_at)`;
+(3) update the download agent's system prompt to recognize a
+hostname-shaped `show_name` as a degraded signal and prefer the
+date-based match. The user accepted bundling all three into one PR
+with no backfill on the new model field, since the project is pre-prod.
+
+## Detailed conversation
+
+> **Agent-orchestrated session.** This planning session was launched
+> by a parent Claude Code (Conductor) agent rather than directly by
+> the user. The implementation agent that produced this transcript
+> never received the user's verbatim wording — only a summarized brief
+> from the parent agent. Per AGENTS.md's "Agent-orchestrated sessions"
+> convention, the section below uses `### Parent agent (orchestrator)`
+> instead of `### User`. The parent-agent's verbatim launching prompt
+> could not be recovered from session logs and is therefore left
+> unavailable; the substance of the brief is summarized in prose
+> rather than fabricated as a quote.
+
+### Parent agent (orchestrator)
+
+*Verbatim launching prompt unavailable.* Substance of the brief
+relayed to the planning agent: surface issue #111 — the canonical ARD
+Sounds episode submitted in the PR #109 test plan goes to `FAILED`
+even though fyyd carries a clean enclosure URL for it. Two contributing
+factors: (a) `_show_name` falls back to URL host (`www.ardsounds.de`)
+and never matches a real broadcast title; (b) the download agent
+prompt encourages strict show matches. Produce a plan.
+
+### Assistant
+
+Worked through the failure mode in `episodes/downloader._show_name`
+and the download agent system prompt. Confirmed via PR #109 test logs
+that the agent rejected every candidate with the message
+*"Keiner der gefundenen Podcast-Episoden aus dem Index passt zum Titel
+oder der Show www.ardsounds.de"*. Proposed three layers of fix:
+
+1. Real `show_name` source of truth: new `Episode.show_name` field;
+ `EpisodeDetails` extracts it from `og:site_name` / RSS title /
+ JSON-LD; `_show_name` prefers it over the URL host.
+2. `published_at` as a tiebreaker: add it to `EpisodeCandidate`,
+ plumb pubdate through fyyd / iTunes / podcastindex parsers, surface
+ it on `IndexCandidate` and in `DownloadDeps`.
+3. Looser, hostname-aware prompt: explicitly tell the agent that when
+ `show_name` looks like a hostname it should switch to a
+ `(title, published_at)` match with ±1 day.
+
+Considered alternatives:
+- *Layer 3 alone* (just relax the prompt) — rejected: fragile, the
+ LLM still has nothing to match on when the publisher's hostname is
+ the only signal it has.
+- *Layer 1 alone* (extract show_name, leave prompt strict) — rejected:
+ the prompt's exact-match wording would still bite when the publisher's
+ show name doesn't appear verbatim on aggregator pages.
+- *Backfill `show_name` for existing rows* — rejected per the user's
+ pre-prod-data-freedom feedback (`feedback_reembed_ok_preprod.md`):
+ full Postgres regen + re-ingest is the supported upgrade path.
+
+### Parent agent (orchestrator)
+
+*Verbatim relay unavailable.* Decisions communicated through the
+parent agent:
+- Bundle all three layers in one PR.
+- Add a new model field (no backfill).
+- Manual verification belongs in the PR description, not run by the
+ implementation agent.
+- Staleness against `main` is verified.
+
+### Assistant
+
+Captured the decisions, drafted this plan document, and queued the
+implementation agent to execute it on a fresh branch
+(`rafacm/download-show-name-fix`).
diff --git a/episodes/admin.py b/episodes/admin.py
index 089965d..f37220c 100644
--- a/episodes/admin.py
+++ b/episodes/admin.py
@@ -297,6 +297,7 @@ class EpisodeAdmin(admin.ModelAdmin):
METADATA_FIELDS = (
"title",
+ "show_name",
"description",
"published_at",
"image_url",
diff --git a/episodes/agents/download.py b/episodes/agents/download.py
index 7eb0e9d..9057487 100644
--- a/episodes/agents/download.py
+++ b/episodes/agents/download.py
@@ -21,6 +21,7 @@
import asyncio
import logging
import os
+from datetime import date
from django.conf import settings
from pydantic_ai import Agent
@@ -42,6 +43,7 @@
- Episode URL: {episode_url}
- Title: {title}
- Show: {show_name}
+- Published: {published_at}
- GUID hint: {guid}
- Known audio URL (failed wget): {audio_url}
- Language: {language}
@@ -51,10 +53,30 @@
1. Call `lookup_podcast_index` with the title/show/guid hints.
Podcast indexes (fyyd, podcastindex.org) often carry the
publisher's RSS-feed enclosure URL even when the publisher's
- page hides it behind interactive UI. If a candidate looks
- right, call `download_file` on its `audio_url`. On success,
- return success with `source` set to the candidate's
- `source_index` (e.g. "fyyd").
+ page hides it behind interactive UI.
+
+ Picking the right candidate:
+ - The episode's `Show` value above may be a publisher hostname
+ rather than the broadcast title — for example
+ `www.ardsounds.de` instead of `Zeitzeichen`. Detect this:
+ when `Show` contains a `.` and no spaces, treat it as a
+ hostname and do NOT require an exact string match against
+ the candidate's `show_name`.
+ - For hostname-shaped `Show` values, prefer matching candidates
+ by `(title, published_at)` instead. A candidate is a strong
+ match when its `title` is essentially the same as the
+ episode `Title` (allowing for trailing punctuation, suffixes
+ like `" | Podcast"`, etc.) AND its `published_at` is within
+ ±1 day of the episode `Published` value.
+ - For real show titles, an exact / fuzzy match on
+ `show_name` plus a similar episode `title` is enough.
+ - When `Published` is unknown, fall back to title similarity
+ alone — do not reject a clear title match just because the
+ date is missing on either side.
+
+ When you find a strong match, call `download_file` on its
+ `audio_url`. On success, return success with `source` set to
+ the candidate's `source_index` (e.g. "fyyd").
2. If no index candidates look right, navigate to the episode
page with `navigate_to_url`, then use `find_audio_links` to
@@ -130,10 +152,14 @@ def _build_agent() -> Agent[DownloadDeps, DownloadAgentResult]:
def _get_system_prompt(deps: DownloadDeps) -> str:
from ..languages import ISO_639_LANGUAGE_NAMES, ISO_639_RE
+ published_at_str = (
+ deps.published_at.isoformat() if deps.published_at else "(unknown)"
+ )
prompt = DOWNLOAD_SYSTEM_PROMPT.format(
episode_url=deps.episode_url,
title=deps.title or "(unknown)",
show_name=deps.show_name or "(unknown)",
+ published_at=published_at_str,
guid=deps.guid or "(none)",
audio_url=deps.audio_url or "(none)",
language=deps.language or "(unknown)",
@@ -157,6 +183,7 @@ async def _run_agent_async(
show_name: str,
guid: str,
language: str,
+ published_at: date | None = None,
) -> DownloadAgentResult:
import shutil
import tempfile
@@ -176,6 +203,7 @@ async def _run_agent_async(
download_dir=download_dir,
page=page,
screenshots=[],
+ published_at=published_at,
)
system_prompt = _get_system_prompt(deps)
@@ -339,6 +367,7 @@ def run_download_agent(
show_name: str = "",
guid: str = "",
language: str = "",
+ published_at: date | None = None,
) -> DownloadAgentResult:
"""Run the download agent synchronously (entry point from the step)."""
try:
@@ -355,13 +384,13 @@ def run_download_agent(
asyncio.run,
_run_agent_async(
episode_id, episode_url, audio_url,
- title, show_name, guid, language,
+ title, show_name, guid, language, published_at,
),
).result()
return asyncio.run(
_run_agent_async(
episode_id, episode_url, audio_url,
- title, show_name, guid, language,
+ title, show_name, guid, language, published_at,
)
)
finally:
diff --git a/episodes/agents/download_deps.py b/episodes/agents/download_deps.py
index c94d166..fe190c4 100644
--- a/episodes/agents/download_deps.py
+++ b/episodes/agents/download_deps.py
@@ -1,6 +1,7 @@
"""Dependencies and result model for the download agent."""
import dataclasses
+from datetime import date
from playwright.async_api import Page
from pydantic import BaseModel
@@ -20,6 +21,10 @@ class DownloadDeps:
download_dir: str
page: Page
screenshots: list[bytes]
+ # Episode publication date (when fetch_details extracted one). Used
+ # by the agent as a tiebreaker against ``IndexCandidate.published_at``
+ # when ``show_name`` is degraded (e.g. fell back to URL host).
+ published_at: date | None = None
class DownloadAgentResult(BaseModel):
diff --git a/episodes/agents/download_tools.py b/episodes/agents/download_tools.py
index d52cad0..11f1963 100644
--- a/episodes/agents/download_tools.py
+++ b/episodes/agents/download_tools.py
@@ -3,6 +3,7 @@
import asyncio
import logging
import os
+from datetime import date
from playwright.async_api import Error as PlaywrightError
from pydantic import BaseModel
@@ -411,13 +412,21 @@ def on_response(response):
class IndexCandidate(BaseModel):
- """One candidate from a podcast index, surfaced to the agent."""
+ """One candidate from a podcast index, surfaced to the agent.
+
+ ``published_at`` is the candidate's publication date when the
+ aggregator surfaces one (ISO 8601 ``YYYY-MM-DD`` in serialized
+ output). The agent should match candidates by
+ ``(title, published_at)`` when ``show_name`` is a hostname rather
+ than a real broadcast title.
+ """
audio_url: str
title: str = ""
show_name: str = ""
duration_seconds: int | None = None
source_index: str = ""
+ published_at: date | None = None
async def lookup_podcast_index(
@@ -461,6 +470,7 @@ async def lookup_podcast_index(
show_name=c.show_name,
duration_seconds=c.duration_seconds,
source_index=c.source_index,
+ published_at=c.published_at,
)
for c in candidates
]
diff --git a/episodes/agents/fetch_details.py b/episodes/agents/fetch_details.py
index 770d74a..4d50a24 100644
--- a/episodes/agents/fetch_details.py
+++ b/episodes/agents/fetch_details.py
@@ -43,8 +43,18 @@
to recover by searching aggregators based on the URL alone.
3. If the fetch succeeded, determine whether the page is a podcast
episode page at all. If it isn't, emit outcome=not_a_podcast_episode.
- 4. Extract metadata: title, description, language, country, image,
- published date, audio URL, audio format, GUID.
+ 4. Extract metadata: title, show_name, description, language, country,
+ image, published date, audio URL, audio format, GUID.
+ For show_name, look at (in order of preference):
+ - on the episode page
+ -
+ - RSS/Atom when fetching the feed URL
+ - JSON-LD structured data: PodcastEpisode → isPartOf.name (the
+ parent PodcastSeries) or partOfSeries.name
+ - The visible publisher / show title near the episode title
+ show_name is the broadcast / podcast title (e.g. "Zeitzeichen",
+ "This American Life") — NOT the publisher's company name and NOT
+ the URL hostname. Leave blank when you can't pinpoint it.
5. Classify source_kind (canonical | aggregator | unknown) and
aggregator_provider when applicable.
6. Cross-link ONLY when the submitted page is itself a podcast
@@ -153,6 +163,7 @@ class EpisodeDetails(BaseModel):
"""
title: str | None = None
+ show_name: str | None = None
description: str | None = None
published_at: date | None = None
image_url: str | None = None
diff --git a/episodes/downloader.py b/episodes/downloader.py
index 35be8ad..3a86ae6 100644
--- a/episodes/downloader.py
+++ b/episodes/downloader.py
@@ -98,7 +98,18 @@ def _save_audio(episode: Episode, src_path: str) -> int:
def _show_name(episode: Episode) -> str:
- """Best-effort show name (no Show model — fall back to URL host)."""
+ """Best-effort show name.
+
+ Prefers ``Episode.show_name`` (extracted by the fetch_details agent
+ from ``og:site_name`` / RSS ```` / JSON-LD, etc.).
+ Falls back to the URL host so the download agent always has *some*
+ string to work with — but the download agent prompt knows that a
+ hostname-shaped show_name is a degraded signal and should match
+ candidates by ``(title, published_at)`` instead.
+ """
+ if episode.show_name:
+ return episode.show_name
+
from urllib.parse import urlparse
netloc = urlparse(episode.url).netloc
@@ -168,6 +179,7 @@ def download_episode(episode_id: int) -> None:
show_name=_show_name(episode),
guid=episode.guid or "",
language=episode.language or "",
+ published_at=episode.published_at,
)
if agent_result.success and agent_result.downloaded_file:
diff --git a/episodes/fetch_details_step.py b/episodes/fetch_details_step.py
index d382864..5470bfe 100644
--- a/episodes/fetch_details_step.py
+++ b/episodes/fetch_details_step.py
@@ -82,6 +82,15 @@ def _apply_details(episode: Episode, details) -> list[str]:
value = getattr(details, name)
setattr(episode, name, value if isinstance(value, str) else "")
+ # show_name is "additive only" — we never want a fresh agent run
+ # that fails to extract a show name to wipe out a previously-good
+ # value (or a user edit in the admin). Write only when non-empty.
+ touched_fields = list(string_fields)
+ show_name_value = getattr(details, "show_name", None)
+ if isinstance(show_name_value, str) and show_name_value.strip():
+ episode.show_name = show_name_value.strip()
+ touched_fields.append("show_name")
+
# Nullable date — write the value or clear it.
if isinstance(details.published_at, date):
episode.published_at = details.published_at
@@ -92,7 +101,7 @@ def _apply_details(episode: Episode, details) -> list[str]:
# always reflect the agent's classification.
episode.source_kind = details.source_kind or Episode.SourceKind.UNKNOWN
- return [*string_fields, "published_at", "source_kind"]
+ return [*touched_fields, "published_at", "source_kind"]
def _persist_run(
diff --git a/episodes/migrations/0025_episode_show_name.py b/episodes/migrations/0025_episode_show_name.py
new file mode 100644
index 0000000..07efa3d
--- /dev/null
+++ b/episodes/migrations/0025_episode_show_name.py
@@ -0,0 +1,18 @@
+# Generated by Django 5.2.12 on 2026-05-04 08:00
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+ dependencies = [
+ ('episodes', '0024_fetch_details_cross_linking'),
+ ]
+
+ operations = [
+ migrations.AddField(
+ model_name='episode',
+ name='show_name',
+ field=models.CharField(blank=True, default='', max_length=255),
+ ),
+ ]
diff --git a/episodes/models.py b/episodes/models.py
index 09e63cc..52583bb 100644
--- a/episodes/models.py
+++ b/episodes/models.py
@@ -31,6 +31,7 @@ class SourceKind(models.TextChoices):
# Metadata fields (populated by fetch_details step)
title = models.CharField(max_length=500, blank=True, default="")
+ show_name = models.CharField(max_length=255, blank=True, default="")
description = models.TextField(blank=True, default="")
published_at = models.DateField(null=True, blank=True)
image_url = models.URLField(max_length=2000, blank=True, default="")
diff --git a/episodes/podcast_aggregators/base.py b/episodes/podcast_aggregators/base.py
index 0a988fd..4ef6be0 100644
--- a/episodes/podcast_aggregators/base.py
+++ b/episodes/podcast_aggregators/base.py
@@ -4,6 +4,7 @@
import dataclasses
from abc import ABC, abstractmethod
+from datetime import date
@dataclasses.dataclass(frozen=True)
@@ -15,6 +16,13 @@ class EpisodeCandidate:
agent (and ``DownloadResult``) can record which aggregator hit.
The legacy field name is preserved to avoid churn in callers that
only consume the value as an opaque label.
+
+ ``published_at`` is the candidate's publication date (when the
+ aggregator surfaces one). The download agent uses it as a
+ tiebreaker when ``show_name`` is degraded — e.g. for non-English
+ publishers whose extracted ``show_name`` falls back to the URL
+ host. ``None`` means the aggregator did not expose a date or it
+ failed to parse.
"""
audio_url: str
@@ -27,6 +35,7 @@ class EpisodeCandidate:
# Fetch Details agent's cross-linking flow; ignored by the
# Download cascade. Empty when the aggregator doesn't expose one.
episode_page_url: str = ""
+ published_at: date | None = None
class PodcastAggregator(ABC):
diff --git a/episodes/podcast_aggregators/fyyd.py b/episodes/podcast_aggregators/fyyd.py
index 72a5830..4867909 100644
--- a/episodes/podcast_aggregators/fyyd.py
+++ b/episodes/podcast_aggregators/fyyd.py
@@ -7,6 +7,7 @@
from __future__ import annotations
import logging
+from datetime import date, datetime
from typing import Any
import httpx
@@ -82,10 +83,46 @@ def _candidate(self, item: dict) -> EpisodeCandidate | None:
duration = item.get("duration")
if not isinstance(duration, int):
duration = None
+ published_at = _parse_pubdate(item.get("pubdate"))
return EpisodeCandidate(
audio_url=audio_url,
title=item.get("title") or "",
show_name=show_name,
duration_seconds=duration,
source_index=self.name,
+ published_at=published_at,
)
+
+
+def _parse_pubdate(raw: Any) -> date | None:
+ """Parse fyyd's ``pubdate`` field to a ``date``.
+
+ fyyd documents the value as ISO 8601 (e.g. ``"2024-08-30 04:00:00"``).
+ Returns ``None`` on missing / unparseable input — never raises, so
+ a single broken row doesn't drop a candidate.
+ """
+ if not raw:
+ return None
+ if isinstance(raw, date) and not isinstance(raw, datetime):
+ return raw
+ if isinstance(raw, datetime):
+ return raw.date()
+ if not isinstance(raw, str):
+ logger.warning("fyyd pubdate has unexpected type %s: %r", type(raw), raw)
+ return None
+ text = raw.strip()
+ if not text:
+ return None
+ # Try a few likely shapes: "YYYY-MM-DD HH:MM:SS", ISO 8601 with T,
+ # bare "YYYY-MM-DD".
+ for fmt in ("%Y-%m-%d %H:%M:%S", "%Y-%m-%dT%H:%M:%S", "%Y-%m-%d"):
+ try:
+ return datetime.strptime(text[: len(fmt) + 4], fmt).date()
+ except ValueError:
+ continue
+ # Last resort: ``fromisoformat`` (handles offsets, fractional seconds, etc.)
+ try:
+ return datetime.fromisoformat(text.replace("Z", "+00:00")).date()
+ except ValueError:
+ logger.warning("fyyd pubdate could not be parsed: %r", raw)
+ return None
diff --git a/episodes/podcast_aggregators/itunes.py b/episodes/podcast_aggregators/itunes.py
index 5a16c89..bd353eb 100644
--- a/episodes/podcast_aggregators/itunes.py
+++ b/episodes/podcast_aggregators/itunes.py
@@ -18,6 +18,7 @@
from __future__ import annotations
import logging
+from datetime import date, datetime
from typing import Any
import httpx
@@ -82,6 +83,7 @@ def _candidate(self, item: dict) -> EpisodeCandidate | None:
duration_seconds: int | None = None
if isinstance(duration_ms, int) and duration_ms > 0:
duration_seconds = duration_ms // 1000
+ published_at = _parse_release_date(item.get("releaseDate"))
return EpisodeCandidate(
audio_url=audio_url,
title=item.get("trackName") or "",
@@ -89,4 +91,30 @@ def _candidate(self, item: dict) -> EpisodeCandidate | None:
duration_seconds=duration_seconds,
source_index=self.name,
episode_page_url=item.get("trackViewUrl") or "",
+ published_at=published_at,
)
+
+
+def _parse_release_date(raw: Any) -> date | None:
+ """Parse iTunes' ``releaseDate`` (ISO 8601 datetime) to a ``date``.
+
+ Example value: ``"2024-08-30T04:00:00Z"``. Returns ``None`` on
+ missing / unparseable input — never raises.
+ """
+ if not raw:
+ return None
+ if isinstance(raw, date) and not isinstance(raw, datetime):
+ return raw
+ if isinstance(raw, datetime):
+ return raw.date()
+ if not isinstance(raw, str):
+ logger.warning("iTunes releaseDate has unexpected type %s: %r", type(raw), raw)
+ return None
+ text = raw.strip()
+ if not text:
+ return None
+ try:
+ return datetime.fromisoformat(text.replace("Z", "+00:00")).date()
+ except ValueError:
+ logger.warning("iTunes releaseDate could not be parsed: %r", raw)
+ return None
diff --git a/episodes/podcast_aggregators/podcastindex.py b/episodes/podcast_aggregators/podcastindex.py
index 17f0076..eb72481 100644
--- a/episodes/podcast_aggregators/podcastindex.py
+++ b/episodes/podcast_aggregators/podcastindex.py
@@ -11,6 +11,7 @@
import hashlib
import logging
import time
+from datetime import date, datetime, timezone
from typing import Any
import httpx
@@ -107,6 +108,7 @@ def _candidates_from_payload(self, payload: Any, key: str) -> list[EpisodeCandid
duration = item.get("duration")
if not isinstance(duration, int):
duration = None
+ published_at = _parse_date_published(item.get("datePublished"))
results.append(
EpisodeCandidate(
audio_url=audio_url,
@@ -114,6 +116,35 @@ def _candidates_from_payload(self, payload: Any, key: str) -> list[EpisodeCandid
show_name=item.get("feedTitle") or item.get("title") or "",
duration_seconds=duration,
source_index=self.name,
+ published_at=published_at,
)
)
return results
+
+
+def _parse_date_published(raw: Any) -> date | None:
+ """Parse podcastindex's ``datePublished`` (Unix epoch seconds) to ``date``.
+
+ Returns ``None`` on missing / unparseable input — never raises.
+ """
+ if raw is None or raw == "":
+ return None
+ if isinstance(raw, bool): # bool is a subclass of int; reject explicitly.
+ logger.warning("podcastindex datePublished is bool: %r", raw)
+ return None
+ if isinstance(raw, (int, float)):
+ try:
+ return datetime.fromtimestamp(raw, tz=timezone.utc).date()
+ except (OverflowError, OSError, ValueError):
+ logger.warning("podcastindex datePublished out of range: %r", raw)
+ return None
+ if isinstance(raw, str):
+ try:
+ return datetime.fromtimestamp(int(raw), tz=timezone.utc).date()
+ except (TypeError, ValueError):
+ logger.warning("podcastindex datePublished not numeric: %r", raw)
+ return None
+ logger.warning(
+ "podcastindex datePublished has unexpected type %s: %r", type(raw), raw,
+ )
+ return None
diff --git a/episodes/tests/test_download.py b/episodes/tests/test_download.py
index b7d2cb2..4866392 100644
--- a/episodes/tests/test_download.py
+++ b/episodes/tests/test_download.py
@@ -175,3 +175,35 @@ def test_wrong_status_skips(self):
episode.refresh_from_db()
self.assertEqual(episode.status, Episode.Status.QUEUED)
+
+
+class ShowNameTests(TestCase):
+ """`_show_name(episode)` cascade: episode.show_name → URL host."""
+
+ def _create_episode(self, **kwargs):
+ with patch("episodes.signals.DBOS"):
+ return Episode.objects.create(**kwargs)
+
+ def test_returns_show_name_when_set(self):
+ from episodes.downloader import _show_name
+
+ episode = self._create_episode(
+ url="https://www.ardsounds.de/ep/x",
+ show_name="Zeitzeichen",
+ )
+ self.assertEqual(_show_name(episode), "Zeitzeichen")
+
+ def test_falls_back_to_host_when_show_name_blank(self):
+ from episodes.downloader import _show_name
+
+ episode = self._create_episode(
+ url="https://www.ardsounds.de/ep/y",
+ )
+ self.assertEqual(_show_name(episode), "www.ardsounds.de")
+
+ def test_returns_empty_when_url_has_no_host(self):
+ from episodes.downloader import _show_name
+
+ # URL without netloc (e.g. relative path stored erroneously) — return "".
+ episode = Episode(url="not-a-url", show_name="")
+ self.assertEqual(_show_name(episode), "")
diff --git a/episodes/tests/test_models.py b/episodes/tests/test_models.py
index 6f77e2c..201b59f 100644
--- a/episodes/tests/test_models.py
+++ b/episodes/tests/test_models.py
@@ -32,6 +32,7 @@ def test_new_statuses_exist(self, mock_async):
def test_metadata_fields_blank_by_default(self, mock_async):
episode = Episode.objects.create(url="https://example.com/ep/1")
self.assertEqual(episode.title, "")
+ self.assertEqual(episode.show_name, "")
self.assertEqual(episode.description, "")
self.assertIsNone(episode.published_at)
self.assertEqual(episode.image_url, "")
@@ -42,3 +43,11 @@ def test_metadata_fields_blank_by_default(self, mock_async):
self.assertEqual(episode.canonical_url, "")
self.assertEqual(episode.source_kind, Episode.SourceKind.UNKNOWN)
self.assertEqual(episode.aggregator_provider, "")
+
+ def test_show_name_can_be_set(self, mock_async):
+ episode = Episode.objects.create(
+ url="https://example.com/ep/1",
+ show_name="Zeitzeichen",
+ )
+ episode.refresh_from_db()
+ self.assertEqual(episode.show_name, "Zeitzeichen")
diff --git a/episodes/tests/test_podcast_aggregators.py b/episodes/tests/test_podcast_aggregators.py
index d47a0c8..8f7ced4 100644
--- a/episodes/tests/test_podcast_aggregators.py
+++ b/episodes/tests/test_podcast_aggregators.py
@@ -1,5 +1,6 @@
"""Tests for the podcast aggregator provider abstraction."""
+from datetime import date
from unittest.mock import patch
import httpx
@@ -67,6 +68,57 @@ def test_http_error_returns_empty(self):
get.side_effect = httpx.ConnectError("nope")
self.assertEqual(FyydAggregator().search(title="x"), [])
+ def test_pubdate_parsed_to_date(self):
+ payload = {
+ "data": [
+ {
+ "title": "Django Reinhardt",
+ "enclosure": "https://wdr.example/episode.mp3",
+ "pubdate": "2024-08-30 04:00:00",
+ "podcast": {"title": "Zeitzeichen"},
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.fyyd.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = FyydAggregator().search(
+ title="Django Reinhardt", show_name="Zeitzeichen"
+ )
+ self.assertEqual(len(results), 1)
+ self.assertEqual(results[0].published_at, date(2024, 8, 30))
+
+ def test_missing_pubdate_returns_none(self):
+ payload = {
+ "data": [
+ {
+ "title": "x",
+ "enclosure": "https://e/ep.mp3",
+ "podcast": {"title": "y"},
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.fyyd.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = FyydAggregator().search(title="x")
+ self.assertIsNone(results[0].published_at)
+
+ def test_malformed_pubdate_returns_none(self):
+ payload = {
+ "data": [
+ {
+ "title": "x",
+ "enclosure": "https://e/ep.mp3",
+ "pubdate": "not-a-date",
+ "podcast": {"title": "y"},
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.fyyd.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = FyydAggregator().search(title="x")
+ self.assertEqual(len(results), 1)
+ self.assertIsNone(results[0].published_at)
+
class ItunesAggregatorTests(SimpleTestCase):
def test_search_returns_candidates(self):
@@ -108,6 +160,54 @@ def test_http_error_returns_empty(self):
get.side_effect = httpx.ConnectError("nope")
self.assertEqual(ItunesAggregator().search(title="x"), [])
+ def test_release_date_parsed_to_date(self):
+ payload = {
+ "results": [
+ {
+ "trackName": "x",
+ "collectionName": "Show",
+ "episodeUrl": "https://e/ep.mp3",
+ "releaseDate": "2024-08-30T04:00:00Z",
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.itunes.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = ItunesAggregator().search(title="x", show_name="Show")
+ self.assertEqual(results[0].published_at, date(2024, 8, 30))
+
+ def test_missing_release_date_returns_none(self):
+ payload = {
+ "results": [
+ {
+ "trackName": "x",
+ "collectionName": "Show",
+ "episodeUrl": "https://e/ep.mp3",
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.itunes.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = ItunesAggregator().search(title="x", show_name="Show")
+ self.assertIsNone(results[0].published_at)
+
+ def test_malformed_release_date_returns_none(self):
+ payload = {
+ "results": [
+ {
+ "trackName": "x",
+ "collectionName": "Show",
+ "episodeUrl": "https://e/ep.mp3",
+ "releaseDate": "garbage",
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.itunes.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = ItunesAggregator().search(title="x", show_name="Show")
+ self.assertEqual(len(results), 1)
+ self.assertIsNone(results[0].published_at)
+
class PodcastIndexOrgTests(SimpleTestCase):
def test_guid_lookup_first(self):
@@ -137,6 +237,55 @@ def test_falls_back_to_term_search_when_guid_missing(self):
called_url = get.call_args[0][0]
self.assertIn("/search/byterm", called_url)
+ def test_date_published_epoch_parsed_to_date(self):
+ # 2024-08-30T04:00:00Z → 1724990400 epoch seconds.
+ payload = {
+ "feeds": [
+ {
+ "title": "x",
+ "feedTitle": "Show",
+ "enclosureUrl": "https://e/ep.mp3",
+ "datePublished": 1724990400,
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.podcastindex.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = PodcastIndexOrg("k", "s").search(title="x", show_name="Show")
+ self.assertEqual(results[0].published_at, date(2024, 8, 30))
+
+ def test_missing_date_published_returns_none(self):
+ payload = {
+ "feeds": [
+ {
+ "title": "x",
+ "feedTitle": "Show",
+ "enclosureUrl": "https://e/ep.mp3",
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.podcastindex.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = PodcastIndexOrg("k", "s").search(title="x", show_name="Show")
+ self.assertIsNone(results[0].published_at)
+
+ def test_malformed_date_published_returns_none(self):
+ payload = {
+ "feeds": [
+ {
+ "title": "x",
+ "feedTitle": "Show",
+ "enclosureUrl": "https://e/ep.mp3",
+ "datePublished": "garbage",
+ }
+ ]
+ }
+ with patch("episodes.podcast_aggregators.podcastindex.httpx.get") as get:
+ get.return_value = _FakeResponse(payload)
+ results = PodcastIndexOrg("k", "s").search(title="x", show_name="Show")
+ self.assertEqual(len(results), 1)
+ self.assertIsNone(results[0].published_at)
+
class FactoryTests(SimpleTestCase):
@override_settings(RAGTIME_PODCAST_AGGREGATORS="")