Skip to content

Multi-ontology-per-file: one garage_key tagged to multiple ontologies #508

Description

@aaronsb

Observation

Surfaced while implementing the :DocumentMeta serialization for #505 (PR #507). In the dev graph there are 153 distinct (garage_key, s.document) combinations across only 120 files — i.e. some files (one garage_key) carry Source chunks tagged to multiple ontologies (s.document).

Why it matters

document_id == content_hash == sha256(document bytes) is content-derived and ontology-independent, but garage_key = sources/<ontology>/<hash[:32]>.<ext> embeds the ontology. So the same content ingested into two ontologies produces:

  • the same document_id (one :DocumentMeta after MERGE-on-id — last writer wins on ontology/garage_key), but
  • two different garage_keys.

This means one DocumentMeta can't faithfully represent a file that legitimately lives in multiple ontologies, and two known limitations fall out of it (both documented in PR #507, neither affecting the dominant full-DB restore):

  1. Scoped (per-ontology) export filters on d.ontology, so a multi-ontology document is only carried by the export of that one ontology (api/lib/serialization/exporter.py:export_documents).
  2. Restore-time durability dedups per garage_key but MERGEs to one node per document_id — distinct garage_keys collapse, last-writer-wins (api/app/lib/age_client/rehydration.py:rehydrate_document_layer).

Questions to resolve

  • Is a single file legitimately belonging to multiple ontologies an intended capability, or is the cross-ontology tagging a bug in ingestion/annealing?
  • If intended: should :DocumentMeta be keyed on (content_hash, ontology) (matching the re-ingest dedup key) rather than document_id alone, so each (file, ontology) gets its own node? That would also fix the catalog's per-ontology document tier and both limitations above.
  • If a bug: where does the multi-tagging originate (annealing re-scoping? re-ingest into a new ontology)?

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions