Skip to content

Archiver v2 — central registry architecture (InfoSource/SourceRevision/RepSpec) #7

@gregoryfoster

Description

@gregoryfoster

Summary

Evolve Archiver from the Phase 1–3a InfoItem ↔ InfoSpec model into the central registry + authoring service for the Cannabis Observer information layer. Watcher and the forthcoming Replicator become execution-side consumers; WordPress becomes a downstream display layer (separate design).

Design doc

docs/plans/2026-05-08-archiver-v2-architecture-design.md

Anchored decisions

  • Wide registry boundary (Option A) — Archiver owns InfoItem, InfoSource, SourceRevision, RepSpec, InfoItemRepSpec + the two Item↔X join tables.
  • Pre-production: clean schema cutover, no compat shim, breaking SDK changes acceptable.
  • Page-once cascade via info_sources.parent_info_source_id (XOR on parent vs. url).
  • Fingerprint algorithm locked to SHA-256, stored as sha256:<hex>.
  • Provider credentials via alias indirection (RepSpec carries name; Replicator deploy config maps alias → bucket + creds).
  • content_cache_uri is authoritative-when-set; Watcher sweeper PATCHes registry on file deletion (best-effort), Replicator read-failure is the safety net.
  • Authoring tools (/tools/*) own rep_fields resolution; RepSpec catalog + per-provider sub-schemas live in Archiver.

Phased scope

  • Phase 4 — Archiver v2: schema cutover, authoring tools v2, change-bus publisher, SDK v1.0.
  • Phase 5 — Watcher refactor: source-cascade pipeline, SourceRevision outbox writer, temp cache + sweeper.
  • Phase 6 — Replicator stand-up (sibling repo, port 8030). MVP providers: gcs, gdrive, ia.

Out of scope

  • Phase 7 (WordPress cache integration) — separate design in WP repo.
  • Phase 8 (Authoring CLI) — defer until operator demand justifies.

Open questions tracked in design doc

  • Watcher Watch table reshape semantics (per-source vs per-item-with-multiple-sources).
  • Hash-verify failure policy on re-fetch.
  • info_item_source_revisions growth bound / retention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions