Skip to content

No deletion semantic in memvid #5

@tashisleepy

Description

@tashisleepy

Context

Tracked from issue #3 review as deferred item.

Current behavior

The memvid vector store at memvid/master.mv2 (and per-client variants under memvid/per-client/) supports appends via the WAL (master.manifest.wal), but there is no way to delete a vector or mark it tombstoned. Once content is indexed, it lives forever.

Why it matters

  1. Privacy: Client content removal requests cannot be honored cleanly
  2. Staleness: Outdated sources remain searchable indefinitely
  3. GDPR/compliance: Right to erasure cannot be implemented
  4. Testing: Cannot remove test data from a real memvid store

Expected behavior

Need a deletion semantic. Design options:

Option A: Tombstoning

  • Add a deletion marker to WAL
  • Search filters out tombstoned IDs
  • Compaction job periodically rebuilds store excluding tombstones

Option B: Rebuild

  • Delete requires full rebuild from sources
  • Simple but expensive
  • May be acceptable given typical scale

Option C: Hard delete with lock

  • Acquire master.mv2.lock
  • Rewrite store excluding target IDs
  • Update manifest
  • Release lock

Acceptance criteria

  • Design doc with chosen approach committed
  • delete_source(source_id) function added to memvid module
  • Handles concurrent access via .lock file semantics
  • Per-client variant supported
  • Test covers: delete single, delete batch, delete during concurrent search
  • Documentation updated in README

Priority

High if anyone requests data removal. Medium otherwise.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions