Skip to content

Per-client archive atomicity gap #6

@tashisleepy

Description

@tashisleepy

Context

Tracked from issue #3 review as deferred item.

Current behavior

Per-client archives under memvid/per-client/{client}/ are updated by scripts (save-session.sh, register-tool.sh, etc.) with non-atomic file writes. If a write is interrupted (crash, signal, disk full), the archive can be left in a partial state:

  • Manifest WAL updated but data files not written
  • Data files written but manifest not updated
  • Multiple concurrent writers to same client directory = last-write-wins on manifests

Why it matters

  1. Corrupted client archive = permanent data loss or broken search
  2. No recovery path except re-ingest from sources
  3. Debugging requires manual inspection of WAL vs data state
  4. Multi-tenant use case breaks under concurrent load

Expected behavior

All write paths to per-client archives should be atomic:

  1. Write new data to temp file
  2. Fsync temp file
  3. Atomic rename to final path (POSIX rename() is atomic on same filesystem)
  4. Update manifest WAL via the same pattern
  5. Acquire .lock at client-level before any multi-file write

Acceptance criteria

  • Audit all write paths: bridge.py, save-session.sh, register-tool.sh, any script touching memvid/
  • Wrap multi-file writes in atomic rename pattern
  • Add per-client .lock file check before writes
  • Document the write protocol in memvid/README.md
  • Crash recovery test: SIGKILL mid-write, verify archive recoverable
  • Concurrent write test: 2 processes writing to same client simultaneously

Priority

High for production multi-client use. Medium for current single-user workload.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions