feat: task_message tiered storage (D1 metadata + R2 content + KV cache)#84
feat: task_message tiered storage (D1 metadata + R2 content + KV cache)#84GenerQAQ wants to merge 13 commits into
Conversation
D1 stores metadata only, R2 stores full message content per task, KV provides read-through cache. Targets reducing D1 from 8GB to <1GB.
24 tests covering: append, list, delete, filtering (since/excludeTypes), graceful KV degradation, data integrity across write/read cycles.
Dual-write: D1 metadata + R2/KV full content. Read from store.
chat-init and task messages now read from R2/KV store.
This migration should only be applied AFTER the batch R2 migration has completed and R2 data integrity is verified.
Supports --dry-run and --offset for safe incremental migration. Run before applying migration 0030.
All tests now mock TaskMessageStore class instead of D1 query functions.
Migration will be applied separately after batch R2 migration completes, since bump auto-executes migrations on deploy.
- taskMessageToResponse now handles both camelCase (Drizzle) and snake_case (TaskMessage interface from R2/KV store) - Daemon GET endpoint verifies task belongs to workspace before returning messages, matching POST behavior
d521e48 to
a2013f9
Compare
gusye1234
left a comment
There was a problem hiding this comment.
Review Summary
Architecture is sound and well-executed — approve with minor suggestions.
The tiered storage approach (D1 metadata + R2 content + KV cache) correctly addresses the scaling problem. 51 tests is solid coverage.
Actionable Items (non-blocking):
-
Fix: Hardcoded
D1_DATABASE_IDinmigrate-task-messages-to-r2.ts— Replace with an env var. Hardcoded UUIDs in source are a maintainability and security concern. -
Fix: Unused
cacheKeys.taskMessagesincache.ts— The store uses its own inline"tm:${taskId}"prefix instead. Either wire it into the store or remove the addition to avoid confusion. -
Consider:
id: ""in R2 content — Messages stored in R2 have empty ID because D1 insert IDs are not captured fromcreateTaskMessageresults. If anything needs to correlate R2 messages back to D1 rows by ID, this will break. Document the assumption or propagate actual IDs. -
Race condition in
appendMessages— ThereadAll+ append +putsequence is not atomic. Acceptable if single-writer-per-task is a guaranteed invariant. Worth a comment documenting that assumption. -
Store instantiation repeated 5 times across route files. Consider a
getTaskMessageStore()factory helper.
None of these block merge.
Why we need this PR?
D1 database is at 8GB with only 78 users. Root cause:
task_messagetable has 6.3M rows storing full tool call content/input/output in D1. Two users alone account for 6.3M rows (4.1M + 2.2M). This PR moves the heavy content to R2 with KV caching, keeping only lightweight metadata in D1.Expected D1 reduction: 8GB → <1GB
What changed
TaskMessageStoreclass (src/web/src/lib/task-message-store.ts) — tiered read/write with KV → R2 fallbackalook-task-messages(already created)scripts/migrate-task-messages-to-r2.tswith --dry-run and --offset supportStorage Architecture
Deployment Sequence
alook-task-messagescreatednpx tsx scripts/migrate-task-messages-to-r2.ts --dry-runthen without --dry-runMigration 0030 (NOT in this PR, for follow-up)
Checklist
Impact Areas
@alook/shared)@alook/web)@alook/cli)@alook/email-worker)@alook/ws-do)