Skip to content

feat(whaleflow): add trace store schema migration#2816

Merged
Hmbown merged 1 commit into
codex/v0.9.0-stewardshipfrom
codex/v090-whaleflow-tracestore-schema
Jun 6, 2026
Merged

feat(whaleflow): add trace store schema migration#2816
Hmbown merged 1 commit into
codex/v0.9.0-stewardshipfrom
codex/v090-whaleflow-tracestore-schema

Conversation

@Hmbown
Copy link
Copy Markdown
Owner

@Hmbown Hmbown commented Jun 6, 2026

Summary

  • add a state-store v2 schema migration for WhaleFlow trace tables
  • cover workflow, branch, leaf, control-node, and teacher-candidate run persistence shapes
  • add tests for fresh schemas and existing v1 schemas migrating to the new tables

Stewardship notes

This is a narrow #2668 slice. It creates the persistence shape needed by future WhaleFlow execution/replay work, but does not add runtime writes, workflow execution, replay commands, or provider calls. The WhaleFlow draft/cost-tracking direction from #2482/#2486 remains credited in the changelog via @AdityaVG13.

Verification

  • cargo test -p codewhale-state --locked
  • ./scripts/release/check-versions.sh
  • cmp -s CHANGELOG.md crates/tui/CHANGELOG.md && echo changelogs-match
  • git diff --check

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Hmbown Hmbown merged commit a2cc6bd into codex/v0.9.0-stewardship Jun 6, 2026
2 checks passed
@Hmbown Hmbown deleted the codex/v090-whaleflow-tracestore-schema branch June 6, 2026 02:51
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a state-store v2 schema migration to support WhaleFlow trace tables, including workflow_runs, branch_runs, leaf_runs, control_node_runs, and teacher_candidates, along with corresponding integration tests. The review feedback highlights that SQLite does not enforce foreign key constraints by default, suggesting that PRAGMA foreign_keys = ON; should be enabled to ensure cascading deletes function correctly. Additionally, the reviewer recommended removing a redundant index on leaf_runs and adding missing indexes on the branch_run_id foreign keys in both leaf_runs and teacher_candidates to avoid full table scans during deletions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread crates/state/src/lib.rs
Comment on lines +409 to +410
FOREIGN KEY(workflow_run_id) REFERENCES workflow_runs(id) ON DELETE CASCADE
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

SQLite does not enforce foreign key constraints by default. To ensure that ON DELETE CASCADE and ON DELETE SET NULL actions are actually executed for these new tables (and existing ones), PRAGMA foreign_keys = ON; must be enabled on every database connection opened by the application (typically inside the conn() helper). Without this, orphaned rows will accumulate silently when parent records are deleted.

Comment thread crates/state/src/lib.rs
Comment on lines +431 to +434
CREATE INDEX IF NOT EXISTS idx_leaf_runs_workflow_run_id
ON leaf_runs(workflow_run_id);
CREATE INDEX IF NOT EXISTS idx_leaf_runs_replay_lookup
ON leaf_runs(workflow_run_id, leaf_id, input_hash);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The index idx_leaf_runs_workflow_run_id on (workflow_run_id) is redundant because the composite index idx_leaf_runs_replay_lookup on (workflow_run_id, leaf_id, input_hash) already covers it (since workflow_run_id is the leftmost column). Removing it saves disk space and write overhead.

Additionally, there is a missing index on the foreign key branch_run_id. When a branch_run is deleted, SQLite must perform a full table scan on leaf_runs to execute the ON DELETE SET NULL action. Adding an index on branch_run_id prevents this performance bottleneck.

Suggested change
CREATE INDEX IF NOT EXISTS idx_leaf_runs_workflow_run_id
ON leaf_runs(workflow_run_id);
CREATE INDEX IF NOT EXISTS idx_leaf_runs_replay_lookup
ON leaf_runs(workflow_run_id, leaf_id, input_hash);
CREATE INDEX IF NOT EXISTS idx_leaf_runs_replay_lookup
ON leaf_runs(workflow_run_id, leaf_id, input_hash);
CREATE INDEX IF NOT EXISTS idx_leaf_runs_branch_run_id
ON leaf_runs(branch_run_id);

Comment thread crates/state/src/lib.rs
Comment on lines +467 to +470
CREATE INDEX IF NOT EXISTS idx_teacher_candidates_workflow_run_id
ON teacher_candidates(workflow_run_id);
CREATE INDEX IF NOT EXISTS idx_teacher_candidates_control_node_run_id
ON teacher_candidates(control_node_run_id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a missing index on the foreign key branch_run_id in the teacher_candidates table. When a branch_run is deleted, SQLite has to perform a full table scan on teacher_candidates to execute the ON DELETE SET NULL action. Adding an index on branch_run_id avoids this performance issue.

Suggested change
CREATE INDEX IF NOT EXISTS idx_teacher_candidates_workflow_run_id
ON teacher_candidates(workflow_run_id);
CREATE INDEX IF NOT EXISTS idx_teacher_candidates_control_node_run_id
ON teacher_candidates(control_node_run_id);
CREATE INDEX IF NOT EXISTS idx_teacher_candidates_workflow_run_id
ON teacher_candidates(workflow_run_id);
CREATE INDEX IF NOT EXISTS idx_teacher_candidates_control_node_run_id
ON teacher_candidates(control_node_run_id);
CREATE INDEX IF NOT EXISTS idx_teacher_candidates_branch_run_id
ON teacher_candidates(branch_run_id);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant