Summary
Evolve Archiver from the Phase 1–3a InfoItem ↔ InfoSpec model into the central registry + authoring service for the Cannabis Observer information layer. Watcher and the forthcoming Replicator become execution-side consumers; WordPress becomes a downstream display layer (separate design).
Design doc
docs/plans/2026-05-08-archiver-v2-architecture-design.md
Anchored decisions
- Wide registry boundary (Option A) — Archiver owns InfoItem, InfoSource, SourceRevision, RepSpec, InfoItemRepSpec + the two Item↔X join tables.
- Pre-production: clean schema cutover, no compat shim, breaking SDK changes acceptable.
- Page-once cascade via
info_sources.parent_info_source_id (XOR on parent vs. url).
- Fingerprint algorithm locked to SHA-256, stored as
sha256:<hex>.
- Provider credentials via alias indirection (RepSpec carries name; Replicator deploy config maps alias → bucket + creds).
content_cache_uri is authoritative-when-set; Watcher sweeper PATCHes registry on file deletion (best-effort), Replicator read-failure is the safety net.
- Authoring tools (
/tools/*) own rep_fields resolution; RepSpec catalog + per-provider sub-schemas live in Archiver.
Phased scope
- Phase 4 — Archiver v2: schema cutover, authoring tools v2, change-bus publisher, SDK v1.0.
- Phase 5 — Watcher refactor: source-cascade pipeline, SourceRevision outbox writer, temp cache + sweeper.
- Phase 6 — Replicator stand-up (sibling repo, port 8030). MVP providers:
gcs, gdrive, ia.
Out of scope
- Phase 7 (WordPress cache integration) — separate design in WP repo.
- Phase 8 (Authoring CLI) — defer until operator demand justifies.
Open questions tracked in design doc
- Watcher
Watch table reshape semantics (per-source vs per-item-with-multiple-sources).
- Hash-verify failure policy on re-fetch.
info_item_source_revisions growth bound / retention.
Summary
Evolve Archiver from the Phase 1–3a
InfoItem ↔ InfoSpecmodel into the central registry + authoring service for the Cannabis Observer information layer. Watcher and the forthcoming Replicator become execution-side consumers; WordPress becomes a downstream display layer (separate design).Design doc
docs/plans/2026-05-08-archiver-v2-architecture-design.mdAnchored decisions
info_sources.parent_info_source_id(XOR onparentvs.url).sha256:<hex>.content_cache_uriis authoritative-when-set; Watcher sweeper PATCHes registry on file deletion (best-effort), Replicator read-failure is the safety net./tools/*) ownrep_fieldsresolution; RepSpec catalog + per-provider sub-schemas live in Archiver.Phased scope
gcs,gdrive,ia.Out of scope
Open questions tracked in design doc
Watchtable reshape semantics (per-source vs per-item-with-multiple-sources).info_item_source_revisionsgrowth bound / retention.