Feature/query by row by hongzhi-gao · Pull Request #745 · apache/tsfile

hongzhi-gao · 2026-03-17T07:16:26Z

Summary

Add queryByRow(paths/table, offset, limit) for both tree and table model. Results are equivalent to “full query then skip first offset rows and take at most limit rows,” but offset/limit are pushed down so that Chunk/Page-level skipping avoids decoding unnecessary data where possible.

Changes

Tree model

API: TsFileReader::queryByRow(path_list, offset, limit) / TsFileTreeReader::queryByRow(devices, measurements, offset, limit).
Pushdown: Single-path: set_row_range(offset, limit) on SSI → Chunk/Page skipped by count. Multi-path: offset/limit applied in merge loop; min_time_hint used to skip stale Chunks.
Tests: Correctness (no offset/limit, offset only, limit only, offset+limit, boundaries, multi-path merge) + QueryByRowFasterThanManualNext (timing: queryByRow faster than full query + manual next, 5% tolerance).

Table model

API: TsFileReader::queryByRow(table_name, column_names, offset, limit).
Pushdown:
- Device: skip whole device when remaining_offset >= device_row_count (Dense (in this codebase) means: within one device, every queried column has the same number of rows and the same timestamps.).
- SSI: when dense, set_row_range(offset, limit) on each column’s SSI → Chunk/Page skip by count.
- TsBlock: when sparse or not fully consumed at SSI, offset/limit applied in merge loop.
Tests: Correctness (single/multi device, offset/limit, boundaries, equivalence with manual skip, SSI-level pushdown) + QueryByRowFasterThanManualNext (same timing check as tree).

Review focus

Semantics: queryByRow(offset, limit) matches “full query + skip offset + take limit” (existing equivalence tests).
Performance: New timing tests require queryByRow to be no slower than manual next within 5% (min of 5 runs); confirms pushdown is used in practice.

cpp/src/reader/chunk_reader.cc

cpp/src/reader/qds_without_timegenerator.cc

jt2594838 · 2026-03-23T03:19:23Z

cpp/src/reader/table_query_executor.cc

+    for (size_t i = 0; i < lower_case_column_names.size(); ++i) {
+        auto ind = table_schema->find_column_index(lower_case_column_names[i]);
+        if (ind < 0) {
+            delete time_filter;


Why is time_filter deleted here?

I removed the delete there, since this layer should not own/free time_filter in that path.

Double-check this, instead of removing the delete, more deletes are even added.

cpp/src/reader/table_query_executor.cc

cpp/src/reader/tsfile_reader.cc

jt2594838 · 2026-03-23T03:26:27Z

cpp/src/reader/tsfile_series_scan_iterator.cc


+bool TsFileSeriesScanIterator::should_skip_chunk_by_time(
+    ChunkMeta* cm, int64_t min_time_hint) {
+    if (min_time_hint < 0 || cm->statistic_ == nullptr) {


Beware of negative timestamp

I switched to INT64_MIN sentinel handling here as well, so negative timestamps are not treated as an invalid hint.

Made-with: Cursor

jt2594838

May also add in C and Python.

hongzhi-gao · 2026-03-25T03:40:47Z

May also add in C and Python.

Done

hongzhi-gao added 4 commits March 16, 2026 18:49

cpp tree query by row

5b6d46b

cpp table query by row

d4887a0

QueryByRowFasterThanManualNext

550e253

mvn spotless apply

e28d3a1

hongzhi-gao closed this Mar 17, 2026

hongzhi-gao deleted the feature/query-by-row branch March 17, 2026 09:23

hongzhi-gao restored the feature/query-by-row branch March 17, 2026 09:24

hongzhi-gao reopened this Mar 17, 2026

hongzhi-gao added 4 commits March 18, 2026 20:46

fix ci

0697fde

fix compute_dense_row_count

6c03dfb

query by row c/python wrapper

8678911

add license

ca00cb8

jt2594838 reviewed Mar 23, 2026

View reviewed changes

hongzhi-gao added 4 commits March 24, 2026 19:01

fix some issues

5ccdc8d

Merge upstream/develop into feature/query-by-row

c68f9ae

Made-with: Cursor

fix ci

521ab73

fix ci mem leak

44e2a9b

jt2594838 approved these changes Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/query by row#745

Feature/query by row#745
hongzhi-gao wants to merge 12 commits intoapache:developfrom
hongzhi-gao:feature/query-by-row

hongzhi-gao commented Mar 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jt2594838 Mar 23, 2026

Uh oh!

hongzhi-gao Mar 24, 2026

Uh oh!

jt2594838 Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

jt2594838 Mar 23, 2026

Uh oh!

hongzhi-gao Mar 24, 2026

Uh oh!

jt2594838 left a comment

Uh oh!

hongzhi-gao commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hongzhi-gao commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tree model

Table model

Review focus

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jt2594838 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

hongzhi-gao Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

jt2594838 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jt2594838 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

hongzhi-gao Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

jt2594838 left a comment

Choose a reason for hiding this comment

Uh oh!

hongzhi-gao commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hongzhi-gao commented Mar 17, 2026 •

edited

Loading