Skip to content

Speed up JSONL hydration#28

Merged
valiantone merged 3 commits into
mainfrom
feat/fast-jsonl-hydration-embeddings
Jun 14, 2026
Merged

Speed up JSONL hydration#28
valiantone merged 3 commits into
mainfrom
feat/fast-jsonl-hydration-embeddings

Conversation

@valiantone

@valiantone valiantone commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What

Speed up JSONL hydration and snapshot rehydration for portable HotMem mounts.

This PR combines #24 and #25:

  • Adds a batch insert path for hydrate so JSONL imports commit once instead of once per memory.
  • Adds SQLite-native duplicate skipping through INSERT OR IGNORE and a partial unique content_hash index where possible.
  • Preloads existing content hashes so duplicate records are skipped before embedding work.
  • Adds hydrate trace counters for parsed rows, loaded rows, skipped duplicates, bytes read, computed embeddings, reused embeddings, and total duration.
  • Adds embedding_b64 to snapshots by default so compatible snapshots can hydrate without recomputing embeddings.
  • Keeps legacy swap.jsonl files working when embeddings are missing, malformed, or incompatible.
  • Preserves ttl_seconds, created_at, metadata, and existing CLI/API behavior.

Why

HotMem's plug-and-play mount story depends on small serialized memory files hydrating quickly into a usable SQLite memory DB. The old path paid avoidable costs: per-row commits, per-row dedupe queries, and repeated embedding computation for known snapshots.

Closes #24.
Closes #25.

Checklist

  • Tests pass (uv run pytest)
  • Lint passes (uv run ruff check src/ tests/)
  • No new external dependencies added to core

@valiantone valiantone added this to the Phase 3: DX Polish milestone Jun 8, 2026
@valiantone valiantone added enhancement New feature or request phase:3-dx Phase 3: Developer Experience Polish labels Jun 8, 2026
@valiantone valiantone self-assigned this Jun 8, 2026
@valiantone valiantone requested a review from alphakenz June 8, 2026 19:25
@valiantone valiantone merged commit 8182813 into main Jun 14, 2026
4 checks passed
@valiantone valiantone deleted the feat/fast-jsonl-hydration-embeddings branch June 14, 2026 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request phase:3-dx Phase 3: Developer Experience Polish

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Rehydrate from stored embeddings Feature: Faster JSONL hydration

2 participants