feat(whaleflow): add trace store schema migration#2816
Conversation
There was a problem hiding this comment.
Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Code Review
This pull request introduces a state-store v2 schema migration to support WhaleFlow trace tables, including workflow_runs, branch_runs, leaf_runs, control_node_runs, and teacher_candidates, along with corresponding integration tests. The review feedback highlights that SQLite does not enforce foreign key constraints by default, suggesting that PRAGMA foreign_keys = ON; should be enabled to ensure cascading deletes function correctly. Additionally, the reviewer recommended removing a redundant index on leaf_runs and adding missing indexes on the branch_run_id foreign keys in both leaf_runs and teacher_candidates to avoid full table scans during deletions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| FOREIGN KEY(workflow_run_id) REFERENCES workflow_runs(id) ON DELETE CASCADE | ||
| ); |
There was a problem hiding this comment.
SQLite does not enforce foreign key constraints by default. To ensure that ON DELETE CASCADE and ON DELETE SET NULL actions are actually executed for these new tables (and existing ones), PRAGMA foreign_keys = ON; must be enabled on every database connection opened by the application (typically inside the conn() helper). Without this, orphaned rows will accumulate silently when parent records are deleted.
| CREATE INDEX IF NOT EXISTS idx_leaf_runs_workflow_run_id | ||
| ON leaf_runs(workflow_run_id); | ||
| CREATE INDEX IF NOT EXISTS idx_leaf_runs_replay_lookup | ||
| ON leaf_runs(workflow_run_id, leaf_id, input_hash); |
There was a problem hiding this comment.
The index idx_leaf_runs_workflow_run_id on (workflow_run_id) is redundant because the composite index idx_leaf_runs_replay_lookup on (workflow_run_id, leaf_id, input_hash) already covers it (since workflow_run_id is the leftmost column). Removing it saves disk space and write overhead.
Additionally, there is a missing index on the foreign key branch_run_id. When a branch_run is deleted, SQLite must perform a full table scan on leaf_runs to execute the ON DELETE SET NULL action. Adding an index on branch_run_id prevents this performance bottleneck.
| CREATE INDEX IF NOT EXISTS idx_leaf_runs_workflow_run_id | |
| ON leaf_runs(workflow_run_id); | |
| CREATE INDEX IF NOT EXISTS idx_leaf_runs_replay_lookup | |
| ON leaf_runs(workflow_run_id, leaf_id, input_hash); | |
| CREATE INDEX IF NOT EXISTS idx_leaf_runs_replay_lookup | |
| ON leaf_runs(workflow_run_id, leaf_id, input_hash); | |
| CREATE INDEX IF NOT EXISTS idx_leaf_runs_branch_run_id | |
| ON leaf_runs(branch_run_id); |
| CREATE INDEX IF NOT EXISTS idx_teacher_candidates_workflow_run_id | ||
| ON teacher_candidates(workflow_run_id); | ||
| CREATE INDEX IF NOT EXISTS idx_teacher_candidates_control_node_run_id | ||
| ON teacher_candidates(control_node_run_id); |
There was a problem hiding this comment.
There is a missing index on the foreign key branch_run_id in the teacher_candidates table. When a branch_run is deleted, SQLite has to perform a full table scan on teacher_candidates to execute the ON DELETE SET NULL action. Adding an index on branch_run_id avoids this performance issue.
| CREATE INDEX IF NOT EXISTS idx_teacher_candidates_workflow_run_id | |
| ON teacher_candidates(workflow_run_id); | |
| CREATE INDEX IF NOT EXISTS idx_teacher_candidates_control_node_run_id | |
| ON teacher_candidates(control_node_run_id); | |
| CREATE INDEX IF NOT EXISTS idx_teacher_candidates_workflow_run_id | |
| ON teacher_candidates(workflow_run_id); | |
| CREATE INDEX IF NOT EXISTS idx_teacher_candidates_control_node_run_id | |
| ON teacher_candidates(control_node_run_id); | |
| CREATE INDEX IF NOT EXISTS idx_teacher_candidates_branch_run_id | |
| ON teacher_candidates(branch_run_id); |
Summary
Stewardship notes
This is a narrow #2668 slice. It creates the persistence shape needed by future WhaleFlow execution/replay work, but does not add runtime writes, workflow execution, replay commands, or provider calls. The WhaleFlow draft/cost-tracking direction from #2482/#2486 remains credited in the changelog via @AdityaVG13.
Verification
cargo test -p codewhale-state --locked./scripts/release/check-versions.shcmp -s CHANGELOG.md crates/tui/CHANGELOG.md && echo changelogs-matchgit diff --check