feat(query): support experimental table branch#19551
feat(query): support experimental table branch#19551zhyass wants to merge 6 commits intodatabendlabs:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9a61930b6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
b1a380e to
8690b60
Compare
e5e2a5c to
62b0be6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 62b0be6c3f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
2d9b217 to
ff2d089
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff2d089ca7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ff2d089 to
4ee7b06
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ee7b06209
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
4e11f1f to
a099884
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a09988423b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
a099884 to
e2da87d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2da87d535
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
e2da87d to
440dfd8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 440dfd8b24
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
440dfd8 to
5963f95
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5963f9502a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
5963f95 to
e17cb4a
Compare
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
7453adc to
7432d49
Compare
f9d2561 to
423a7c8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 423a7c82fa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a7435232a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1ad47a3c8b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
1ad47a3 to
52781f8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 52781f8eea
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
fix fix fix fix fix
52781f8 to
8cbe3b1
Compare
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
drmingdrmer
left a comment
There was a problem hiding this comment.
@drmingdrmer reviewed all commit messages and made 3 comments.
Reviewable status: 0 of 98 files reviewed, 3 unresolved discussions (waiting on dantengsky, zhang2014, and zhyass).
src/meta/api/src/api_impl/ref_api.rs line 348 at r14 (raw file):
)), )); }
I wonder If there is a better approach to do this, I don't think creating a dropped table is the appropriate way for locking.
Code quote:
async fn create_table_branch(
&self,
req: CreateTableBranchReq,
) -> Result<CreateTableBranchReply, KVAppError> {
debug!(req :? =(&req); "RefApi: {}", func_name!());
if !req.as_dropped && req.table_meta.drop_on.is_some() {
return Err(KVAppError::AppError(AppError::CreateTableWithDropTime(
CreateTableWithDropTime::new(&req.branch_name),
)));
}
if req.as_dropped && req.table_meta.drop_on.is_none() {
return Err(KVAppError::AppError(
AppError::CreateAsDropTableWithoutDropTime(CreateAsDropTableWithoutDropTime::new(
&req.branch_name,
)),
));
}src/meta/api/src/api_impl/ref_api.rs line 356 at r14 (raw file):
db_id: req.db_id, table_name: req.name_ident.table_name.clone(), };
Because table-id is a globally unique id for a table, it should be better to get the table by (db-id, table_name) outside the transaction retry-loop. there is no need to re-fetch it again and again.
Code quote:
let key_table = DBIdTableName {
db_id: req.db_id,
table_name: req.name_ident.table_name.clone(),
};src/meta/api/src/api_impl/ref_api.rs line 370 at r14 (raw file):
key_source_table_id.to_string_key(), key_branch.to_string_key(), ];
If it isn't on the hot path, there is no need to collect the key and using multigate to git all of the values.
Just using normal get_pb() for simplicity.
create_table() doing so because it encounter a very high conflict rate and should do the best to avoid the delay between each transaction-retry.
Code quote:
let keys = vec![
key_table.to_string_key(),
key_source_table_id.to_string_key(),
key_branch.to_string_key(),
];1fe74f7 to
d2e876d
Compare
🤖 CI Job Analysis
📊 Summary
❌ NO RETRY NEEDEDAll failures appear to be code/test issues requiring manual fixes. 🔍 Job Details
🤖 AboutAutomated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed). |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1fe74f7996
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
d2e876d to
ff6ec17
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff6ec17a2b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/meta/api/src/api_impl/ref_api.rs
Outdated
| let txn = TxnRequest::new(conditions, vec![txn_put_pb( | ||
| &key_staged_branch, | ||
| &staged_branch, | ||
| )?]); |
There was a problem hiding this comment.
Record staged branch reachability during phase-1
Phase-1 currently persists only __fd_staged_branch and does not write any GC-visible branch root metadata. In the current vacuum flow (do_vacuum2), staged entries are only consulted for timeout cleanup and fresh staged branches are not included in any protection set, so a concurrent vacuum can advance LVT and reclaim source snapshots/segments between phase-1 and commit_table_branch_meta. That allows commit to succeed while the new branch references files that were already deleted. Please persist a placeholder that vacuum treats as reachable (or explicitly protect staged refs) before returning from phase-1.
Useful? React with 👍 / 👎.
drmingdrmer
left a comment
There was a problem hiding this comment.
@drmingdrmer made 2 comments and resolved 1 discussion.
Reviewable status: 0 of 98 files reviewed, 5 unresolved discussions (waiting on dantengsky, zhang2014, and zhyass).
src/meta/api/src/api_impl/ref_api.rs line 372 at r16 (raw file):
} }; let key_staged_branch = StagedBranchIdent::new(req.table_id, branch_id);
What is staged? what does it mean here? add some comment to explain its purpose.
And if the staged branch id contains the branch_id we just created, key_staged_branch would never present, because no one else knows about this branch_id. Unless a crreate-branch transaction is done and retried. But there should be other macheism to prevent such cases if it is forbidden.
src/meta/api/src/api_impl/ref_api.rs line 421 at r16 (raw file):
return Ok(CreateTableBranchReply { branch_id, auto_increment_start_vals,
it's a little bit weird to return a auto incremental value. Why should the caller need these values?
Code quote:
let mut auto_increment_start_vals = BTreeMap::new();
for table_field in seq_source_table_meta.data.schema.fields() {
let Some(auto_increment_expr) = table_field.auto_increment_expr() else {
continue;
};
let source_ai_key =
AutoIncrementKey::new(req.source_table_id, table_field.column_id());
let source_ai_ident =
AutoIncrementStorageIdent::new_generic(&req.tenant, source_ai_key);
let start_value = match self.get_pb(&source_ai_ident).await? {
Some(seq_v) => seq_v.data.into_inner().0,
None => auto_increment_expr.start,
};
auto_increment_start_vals.insert(table_field.column_id(), start_value);
}
return Ok(CreateTableBranchReply {
branch_id,
auto_increment_start_vals,ff6ec17 to
f2ee314
Compare
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
This PR implements table branches for FUSE tables, including branch creation, branch-qualified reads and writes, branch lifecycle metadata, and branch-aware garbage collection. It also extends vacuum2 and virtual-column vacuum so historical files referenced by active branches or tags remain protected until they are no longer reachable.
A branch is a lightweight, writable fork of a table's snapshot history — it shares the base table's storage (segments/blocks) via copy-on-write semantics and maintains its own independent snapshot chain.
Syntax
Implementation Details
KV Metadata Model
Branches are persisted as explicit KV entries in meta-service:
__fd_table_branch/<table_id>/<branch_name>TableBranch { expire_at, branch_id }__fd_dropped_branch/<table_id>/<branch_name>/<branch_id>DroppedBranchMeta { drop_on, expire_at }TableId { branch_id }TableMetaEach branch gets its own table_id (allocated via fetch_id) and its own storage prefix (<db_id>/<branch_table_id>/). The branch's TableMeta.options records OPT_KEY_BASE_TABLE_ID and OPT_KEY_REFERENCED_BRANCH_IDS for cross-table GC protection.
Two-Phase Branch Creation
Snapshot-based branch creation uses a staged commit protocol to ensure atomicity:
If Phase 2 fails, the orphan is eventually cleaned up by vacuum after retentain period
Ref API
New meta-service APIs in ref_api.rs:
Vacuum2 Semantics
fuse_vacuum2() is now branch-aware. It:
This is especially important for branch chains such as base -> b1 -> b2, where data introduced on b1 must remain alive while b2 still references it.
Virtual Column Vacuum
Virtual-column cleanup is also extended to understand branch/tag reachability. Historical virtual-column files referenced only by an active branch or tag are preserved, and expired/dropped references stop protecting them once they become unreachable.
This keeps branch and tag semantics consistent across both snapshot data files and derived virtual-column artifacts.
Tests
Type of change
This change is