Skip to content

feat: Apache Arrow / Polars integration for QWP-WS#150

Open
kafka1991 wants to merge 22 commits into
mainfrom
arrow_polars
Open

feat: Apache Arrow / Polars integration for QWP-WS#150
kafka1991 wants to merge 22 commits into
mainfrom
arrow_polars

Conversation

@kafka1991

@kafka1991 kafka1991 commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds Apache Arrow + Polars integration across both directions of the QuestDB Wire Protocol (WebSocket transport):

  • Egress (server → client): Cursor::as_record_batch_reader() adapter that yields arrow::RecordBatch from streaming query results; optional Polars bridge via Arrow C Data Interface.
  • Ingress (client → server): Buffer::append_arrow() writes an entire RecordBatch into a QWP-WS buffer via a column-major dense bulk path — primitive / string / binary columns use a single memcpy per column instead of the row-by-row HashMap lookup that the existing per-cell API takes.

Public surface

Rust (questdb-rs):

  • Buffer::append_arrow(table, batch) — ingress; server stamps each row's timestamp on arrival (same semantics as at_now()).
  • Buffer::append_arrow_at_column(table, batch, ts_column) — ingress; sources the per-row designated timestamp from a named Timestamp(_) column inside the batch (must have no null rows).
  • Cursor::as_record_batch_reader() -> CursorRecordBatchReader — egress, streaming Iterator<Item = Result<RecordBatch, ArrowError>>.
  • Cursor::fetch_all_arrow() -> (SchemaRef, Vec<RecordBatch>) — egress, eager drain.
  • Cursor::next_polars() / iter_polars() / fetch_all_polars() — polars sub-feature.
  • polars::dataframe_to_batches(df, max_rows) — chunked DataFrame → RecordBatch iterator for ingress.

C ABI (gated on QUESTDB_ENABLE_ARROW):

  • line_reader_cursor_next_arrow_batch(...) — egress via Arrow C Data Interface
  • line_sender_buffer_append_arrow(...) / line_sender_buffer_append_arrow_at_column(...) — ingress via Arrow C Data Interface
  • line_sender_buffer_new_qwp_ws() — construct the QWP/WebSocket columnar buffer the Arrow ingest path expects.
  • New error codes: schema_drift, no_schema, arrow_export, arrow_unsupported_column_kind, arrow_ingest

Dense bulk path

QWP-WS buffer gains 7 Arrow-fed column variants (ArrowFixed / ArrowVarLen / ArrowBool / ArrowSymbol / ArrowDecimal / ArrowGeohash / ArrowArray) that store wire-format-ready bytes directly. append_arrow is column-major (one HashMap lookup per column, not per cell) and per-kind correctness follows the QWP sparse-null vs sentinel convention: i32::MIN / f64::NAN for non-nullable signed integer / float kinds; non-null + bitmap for kinds that support it. UInt8 widens to Int32 and UInt64 is rejected when value > i64::MAX, avoiding silent sentinel collisions.

For batches of N=10K rows × M=20 primitive columns with no nulls, this collapses the work from ~200K per-cell HashMap operations to 20 memcpys. With nulls present, each column's QWP null bitmap is built by byte-stride OR-with-NOT of the Arrow validity buffer when source and destination boundaries align; per-row fallback only when bit-offsets are unaligned.

Feature gating

  • questdb-rs/Cargo.toml: arrow + polars features (opt-in, excluded from almost-all-features)
  • questdb-rs-ffi/Cargo.toml: arrow feature mirrors
  • CMakeLists.txt: QUESTDB_ENABLE_ARROW=OFF by default; auto-flipped to ON when QUESTDB_TESTS_AND_EXAMPLES=ON so tests / examples can exercise the Arrow path without explicit opt-in.

FFI safety

  • Schema depth pre-validated (cap 64) before arrow::ffi::from_ffi recurses through children.
  • Top-level Struct with root null bits → arrow_ingest error, not an assert! abort.
  • row_count capped at 16 M and per-column value_data() at 1 GiB on the FFI entry; bounds every Vec::reserve against allocator-OOM aborts under the FFI crate's panic = "abort" profile.
  • Cursor honours its terminal_error replay contract on the Arrow path: a transport error captured on a previous call replays on every subsequent _next_arrow_batch, no silent Ok(None) truncation.

Tests

  • Rust unit tests: 80+ tests in questdb-rs/src/egress/arrow/tests.rs + src/ingress/arrow.rs covering each Arrow → QWP kind, multi-batch dense extension, sliced array (primitive / utf8 / bool with non-zero offset), Decimal256 round-trip, mid-stream schema drift (tentative-ndim upgrade vs firm-firm drift), UInt64 sentinel rejection, geohash precision width check.
  • C ABI (cpp_test/test_arrow_c.c): enum constants, struct layouts, NULL-safety, per-type dispatch with strict-success assertions (no soft-error masking).
  • C++ doctest: cpp_test/test_arrow_egress.cpp + cpp_test/test_arrow_ingress.cpp
  • System tests (live QuestDB):
    • system_test/arrow_egress_fuzz.py — egress fuzz
    • system_test/arrow_ingress_fuzz.py — ingress fuzz
    • system_test/arrow_round_trip_fuzz.py — write Arrow → flush → SQL readback → row-level compare
    • system_test/arrow_alignment_fuzz.py — alignment regression
  • CI (ci/run_all_tests.py): wires the three new cpp_test binaries + runs cargo tests with arrow,polars features

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds opt-in QUESTDB Arrow build feature, C/C++ public FFI for Arrow export/import, Rust ingress/egress Arrow conversion and QWP/WS bulk support, optional Polars bridge, Python ctypes system tests and fuzz harnesses, C/C++ and Rust unit/integration tests, examples, and CI updates to install/run Arrow/Polars tests.

Changes

Apache Arrow C Data Interface Integration

Layer / File(s) Summary
All modifications (single checkpoint)
CMakeLists.txt, include/*, questdb-rs-ffi/*, questdb-rs/src/*, cpp_test/*, system_test/*, examples/*, ci/*
Adds build flag QUESTDB_ENABLE_ARROW, public C/C++ Arrow FFI (egress/ingress enums, Arrow structs, exported functions), Rust FFI wiring and feature flags, egress conversion (DecodedBatch → RecordBatch), ingress append (RecordBatch → QWP/WS bulk), Polars bridge, Python ctypes bindings and many system/fuzz/unit tests, examples, and CI pipeline updates.

Sequence Diagram(s)

sequenceDiagram
  participant PyArrow as PyArrow (test)
  participant ArrowFFI as line_sender_buffer_append_arrow
  participant QwpBuf as QwpWsColumnarBuffer
  participant QuestDB as QuestDB Table
  PyArrow->>ArrowFFI: export RecordBatch (ArrowArray/ArrowSchema)
  ArrowFFI->>QwpBuf: classify fields, append Arrow bulk context
  QwpBuf->>QwpBuf: validate/merge null bitmap and payloads
  QwpBuf->>QuestDB: commit batch rows to table
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~30–90 minutes

Possibly related PRs

Suggested reviewers

  • bluestreak01

Poem

"I'm a rabbit with a tiny crate,
I hop through Arrow, pack and wait,
From Rust to C and back again,
Batches snug as a carrot den. 🥕"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch arrow_polars

@kafka1991 kafka1991 marked this pull request as draft May 26, 2026 10:06
@kafka1991 kafka1991 changed the title feat(arrow): support arrow/polars on sender and reader feat: Apache Arrow / Polars integration for QWP-WS May 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (4)
CMakeLists.txt (2)

373-375: 💤 Low value

Fix misleading comment: no fatal_error gate exists.

The comment claims "The fatal_error gate above forces QUESTDB_ENABLE_ARROW=ON" but lines 89-92 use set(QUESTDB_ENABLE_ARROW ON), not FATAL_ERROR.

📝 Proposed fix for comment accuracy
-    # Apache Arrow C Data Interface tests. The fatal_error gate above
-    # forces QUESTDB_ENABLE_ARROW=ON when tests are enabled, so these
-    # always build alongside the rest of the suite.
+    # Apache Arrow C Data Interface tests. The logic at lines 89-92
+    # ensures QUESTDB_ENABLE_ARROW=ON when tests are enabled, so these
+    # always build alongside the rest of the suite.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CMakeLists.txt` around lines 373 - 375, The comment is misleading because
there is no FATAL_ERROR gate; the code actually uses set(QUESTDB_ENABLE_ARROW
ON) earlier (see the QUESTDB_ENABLE_ARROW variable and its set call), so update
the comment to remove the reference to a "fatal_error gate" and instead state
that QUESTDB_ENABLE_ARROW is explicitly set to ON when tests are enabled (or
refer to the set(QUESTDB_ENABLE_ARROW ON) operation); keep the rest about Apache
Arrow C Data Interface tests building alongside the suite.

89-92: ⚡ Quick win

Consider using FATAL_ERROR for consistency with the QUESTDB_ENABLE_READER pattern.

The existing pattern at lines 51-55 uses FATAL_ERROR when QUESTDB_TESTS_AND_EXAMPLES=ON conflicts with QUESTDB_ENABLE_READER=OFF. Here, the code silently overrides QUESTDB_ENABLE_ARROW=OFF when tests are enabled. This inconsistency could surprise users who explicitly set -DQUESTDB_ENABLE_ARROW=OFF -DQUESTDB_TESTS_AND_EXAMPLES=ON, expecting their choice to be respected.

♻️ Proposed fix for consistency
-if(QUESTDB_TESTS_AND_EXAMPLES AND NOT QUESTDB_ENABLE_ARROW)
-    message(STATUS "QUESTDB_TESTS_AND_EXAMPLES=ON: enabling QUESTDB_ENABLE_ARROW")
-    set(QUESTDB_ENABLE_ARROW ON)
-endif()
+if(QUESTDB_TESTS_AND_EXAMPLES AND NOT QUESTDB_ENABLE_ARROW)
+    message(FATAL_ERROR
+        "QUESTDB_TESTS_AND_EXAMPLES=ON requires QUESTDB_ENABLE_ARROW=ON: "
+        "the Arrow C Data Interface tests would fail to link without it.")
+endif()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CMakeLists.txt` around lines 89 - 92, The current CMake block silently forces
QUESTDB_ENABLE_ARROW ON when QUESTDB_TESTS_AND_EXAMPLES is ON, which is
inconsistent with the QUESTDB_ENABLE_READER check that uses FATAL_ERROR; update
the QUESTDB_TESTS_AND_EXAMPLES vs QUESTDB_ENABLE_ARROW logic to mirror the
reader pattern by emitting a FATAL_ERROR (including a clear message explaining
the conflict) instead of silently overriding QUESTDB_ENABLE_ARROW, referencing
the QUESTDB_TESTS_AND_EXAMPLES and QUESTDB_ENABLE_ARROW variables so users who
set -DQUESTDB_ENABLE_ARROW=OFF are alerted to the incompatibility.
questdb-rs-ffi/src/egress.rs (1)

160-162: ⚡ Quick win

Extend the local ABI round-trip test for these new Arrow error codes.

error_code_round_trips_for_every_variant() still stops at FailoverWouldDuplicate, so SchemaDriftMidStream, NoSchema, and ArrowExport can drift without tripping the file’s own “every variant” regression.

Suggested follow-up
         ErrorCode::Cancelled,
         ErrorCode::FailoverWouldDuplicate,
+        ErrorCode::SchemaDriftMidStream,
+        ErrorCode::NoSchema,
+        ErrorCode::ArrowExport,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@questdb-rs-ffi/src/egress.rs` around lines 160 - 162, The local ABI
round-trip test error_code_round_trips_for_every_variant() currently stops at
FailoverWouldDuplicate and misses ErrorCode::SchemaDriftMidStream,
ErrorCode::NoSchema, and ErrorCode::ArrowExport; update that test to include
those three new variants (or iterate over all ErrorCode variants) so the
round-trip assertion covers SchemaDriftMidStream, NoSchema, and ArrowExport as
well as the existing variants, and ensure the mapping logic in egress.rs that
matches those variants to line_reader_error_schema_drift,
line_reader_error_no_schema, and line_reader_error_arrow_export is exercised by
the test.
system_test/arrow_alignment_fuzz.py (1)

121-141: ⚡ Quick win

Use real test assertions instead of bare assert.

Running the suite with python -O strips these checks entirely, which would make the misalignment validation path silently stop testing anything. Prefer explicit AssertionErrors or self.assert* helpers here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_alignment_fuzz.py` around lines 121 - 141, Replace all bare
assert statements in the validation block that checks kinds (the checks around
variables true_count, total, min_v, max_v and col.type.byte_width) with explicit
runtime checks that raise AssertionError with the same messages; e.g., wherever
you currently have "assert <cond> [ , msg ]" change to "if not (<cond>): raise
AssertionError(<msg>)" (handle the no-message cases by providing a clear
message), and for the boolean sum use an explicit range check on int(true_count)
that raises AssertionError when out of range—this ensures the validation in the
kind-handling logic (the branches for "boolean", integer kinds, floating kinds,
"uuid"/"long256", and "timestamp") still runs under python -O.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp_test/test_arrow_ingress.cpp`:
- Around line 543-549: The test currently only fails when both ok is false and
err is non-null; change the check to fail whenever ok is false. In the block
handling append_arrow results (variables ok and err, and helpers
line_sender_error_msg / line_sender_error_free), change the condition to if
(!ok) and then inside: if err is non-null include the detailed message from
line_sender_error_msg in the FAIL call, otherwise call FAIL with a generic
"DTS=Column append_arrow failed" message; still free err with
line_sender_error_free when present.

In `@questdb-rs-ffi/src/lib.rs`:
- Around line 3638-3648: The enum line_sender_designated_timestamp_kind must not
be accepted directly across the C ABI; change the extern "C" function signature
of line_sender_buffer_append_arrow to take a raw integer (e.g., u32 or i32)
instead of the Rust enum, then validate and convert that integer to
line_sender_designated_timestamp_kind inside Rust (implement/derive TryFrom for
the enum or write a small mapping) and handle the invalid discriminant path
(return an error code or abort) before performing the existing match logic
around where the match currently occurs.

In `@questdb-rs/src/egress/arrow/convert.rs`:
- Around line 303-307: The loop over buf.values using .take(row_count) silently
treats missing boolean bytes as false; before the loop that writes into packed
(the for (i, &b) in buf.values.iter().take(row_count).enumerate() { ... }
block), validate that buf.values.len() >= row_count and return a
decoding/protocol error (or propagate Err) when it's shorter so truncated
payloads fail instead of being coerced to false; adjust the surrounding
function's error type/return path (the function that performs boolean conversion
in convert.rs) to propagate this explicit failure.
- Around line 374-376: The code currently silently leaves zeros when copying
geohash bytes because the conditional if s + src_width <= src.len() skips
copying on truncation; instead, before the loop (or before any copy) check that
src.len() >= row_count * src_width and if not return a protocol error; replace
the conditional around out[d..d+src_width].copy_from_slice(&src[s..s+src_width])
so it never silently skips—on insufficient length return Err(...) with the
crate's protocol/error type (i.e., surface a protocol error from this function)
rather than zero-extending.
- Around line 539-540: The slice operation let row_shape = &shapes[lo..hi] can
panic if hi > shapes.len() or lo > hi; update the code that computes lo/hi (and
the caller handling shape_offsets) to validate bounds before slicing: check that
lo <= hi && hi <= shapes.len() (and that lo is within shapes.len()), and if the
check fails return a ProtocolError instead of slicing. Reference the variables
shape_offsets, shapes, lo, hi and the binding row_shape in the function so the
validation occurs immediately prior to taking the slice.

In `@questdb-rs/src/egress/arrow/polars.rs`:
- Around line 20-46: fetch_all_polars currently returns ErrorCode::NoSchema when
the reader yields zero batches; instead, after the loop if acc is None, obtain
the reader.schema() and construct and return an empty DataFrame with the correct
columns (0 rows) rather than an error. Update fetch_all_polars to call
reader.schema(), convert that Arrow Schema to a Polars schema/empty DataFrame
(using the same conversion path as record_batch_to_dataframe or a helper), and
return Ok(empty_df); only fall back to NoSchema if the reader.schema() is
genuinely absent or conversion fails.
- Around line 64-69: Replace the ad-hoc unsafe transmute_copy + mem::forget
operations with a small helper (e.g., transfer_arrow_ffi<Target, Source>) that
documents the safety invariants (Arrow C Data Interface ABI compatibility and
ownership/release-callback transfer), performs compile-time checks using
core::mem::size_of::<>() and core::mem::align_of::<>() to assert layout
compatibility between arrow::ffi::FFI_ArrowSchema/FFI_ArrowArray and
polars_arrow::ffi::ArrowSchema/ArrowArray, and then does the transmute/forget;
update both usages in polars.rs (the ArrowSchema/ArrowArray conversions) and
ingress/polars.rs to call this helper so the invariants and ABI assertions are
centralized and clearly documented.

In `@questdb-rs/src/ingress/arrow.rs`:
- Around line 1103-1105: The branch handling (DataType::Int8, Some(name), _)
where name.starts_with("geohash") currently constructs
ColumnKind::Geohash(md_geo_bits.unwrap_or(8)), which silently falls back to 8
bits when md_geo_bits is missing or invalid; change this to return a hard
ArrowIngest error instead of defaulting — detect when md_geo_bits is None or
invalid and return Err(ArrowIngest::InvalidMetadata(...)) (or the existing error
variant used in this module) with a clear message referencing the column name
and missing/invalid questdb.geohash_bits metadata so ColumnKind::Geohash is only
created when md_geo_bits is valid.
- Around line 217-221: DesignatedTimestamp::Now currently captures
TimestampNanos::now() once into the local variable now and reuses it for every
row; change the loop in the code that builds bytes (the block using now, bytes,
and row_count) to call TimestampNanos::now() inside each iteration so each row
gets a fresh per-row timestamp (i.e., compute cur =
TimestampNanos::now().as_i64() inside the for loop and extend bytes with
cur.to_le_bytes()); update any related variable names as needed to reflect
per-row timestamping.

In `@questdb-rs/src/ingress/polars.rs`:
- Line 28: The RecordBatch construction fails for empty-column DataFrames
because RecordBatch::try_new infers row count from arrays and can't handle an
empty arrays vec; use the already-computed height (let height = df.height())
when df.width() == 0 by creating per-field null/empty arrays of length height
(matching the schema) and pass those arrays to RecordBatch::try_new instead of
an empty Vec; apply the same change in the other similar block (lines referenced
around 53-56) so that df.width() == 0 cases construct arrays with nulls of
length height before calling RecordBatch::try_new.

In `@system_test/arrow_egress_fuzz.py`:
- Around line 165-169: _pyarrow_cell misuses pyarrow.Array.is_null by passing
row_idx (it returns a boolean mask), causing incorrect null checks; replace the
element-wise check with either col.is_null()[row_idx] or simpler: remove the
is_null check and directly return col[row_idx].as_py() because pyarrow
scalar.as_py() already yields None for nulls. Update the function _pyarrow_cell
to stop calling col.is_null(row_idx) and instead use the scalar-based approach
(col[row_idx].as_py()) to handle nulls safely.

In `@system_test/arrow_round_trip_fuzz.py`:
- Around line 287-288: The branch handling timestamp kinds (elif kind in
("timestamp", "timestamp_ns")) currently does nothing; change it to assert that
returned cells are not all null and that values are normalized to the expected
unit/instant after round-trip: for each returned cell corresponding to an input
timestamp ensure value is not None (or not all None), convert both input and
output to a canonical epoch integer with the expected precision (seconds for
"timestamp", nanoseconds for "timestamp_ns") and assert the normalized epoch
matches (or matches within any allowed rebucketing rule), failing the test if
any timestamp is null or the normalized instant is not as expected; update the
code around the kind variable and the timestamp branch to perform these checks.

In `@system_test/test.py`:
- Around line 47-50: The Arrow test suite imports (TestArrowEgressFuzz,
TestArrowIngressFuzz, TestArrowRoundTripFuzz, TestArrowAlignmentFuzz) cause
arrow_ffi to bind C symbols at module import time; instead, guard these imports
so discovery doesn't fail in non-Arrow builds by making them conditional or
lazy: wrap the imports in a runtime capability check (e.g. if has_arrow():
import ...), or catch ImportError/RuntimeError from importing arrow_ffi and skip
registration, or register the test classes only inside a function that performs
a try/except import of arrow_ffi when actually running the Arrow tests;
reference the symbols above and the arrow_ffi import to locate where to change
the top-level imports.

---

Nitpick comments:
In `@CMakeLists.txt`:
- Around line 373-375: The comment is misleading because there is no FATAL_ERROR
gate; the code actually uses set(QUESTDB_ENABLE_ARROW ON) earlier (see the
QUESTDB_ENABLE_ARROW variable and its set call), so update the comment to remove
the reference to a "fatal_error gate" and instead state that
QUESTDB_ENABLE_ARROW is explicitly set to ON when tests are enabled (or refer to
the set(QUESTDB_ENABLE_ARROW ON) operation); keep the rest about Apache Arrow C
Data Interface tests building alongside the suite.
- Around line 89-92: The current CMake block silently forces
QUESTDB_ENABLE_ARROW ON when QUESTDB_TESTS_AND_EXAMPLES is ON, which is
inconsistent with the QUESTDB_ENABLE_READER check that uses FATAL_ERROR; update
the QUESTDB_TESTS_AND_EXAMPLES vs QUESTDB_ENABLE_ARROW logic to mirror the
reader pattern by emitting a FATAL_ERROR (including a clear message explaining
the conflict) instead of silently overriding QUESTDB_ENABLE_ARROW, referencing
the QUESTDB_TESTS_AND_EXAMPLES and QUESTDB_ENABLE_ARROW variables so users who
set -DQUESTDB_ENABLE_ARROW=OFF are alerted to the incompatibility.

In `@questdb-rs-ffi/src/egress.rs`:
- Around line 160-162: The local ABI round-trip test
error_code_round_trips_for_every_variant() currently stops at
FailoverWouldDuplicate and misses ErrorCode::SchemaDriftMidStream,
ErrorCode::NoSchema, and ErrorCode::ArrowExport; update that test to include
those three new variants (or iterate over all ErrorCode variants) so the
round-trip assertion covers SchemaDriftMidStream, NoSchema, and ArrowExport as
well as the existing variants, and ensure the mapping logic in egress.rs that
matches those variants to line_reader_error_schema_drift,
line_reader_error_no_schema, and line_reader_error_arrow_export is exercised by
the test.

In `@system_test/arrow_alignment_fuzz.py`:
- Around line 121-141: Replace all bare assert statements in the validation
block that checks kinds (the checks around variables true_count, total, min_v,
max_v and col.type.byte_width) with explicit runtime checks that raise
AssertionError with the same messages; e.g., wherever you currently have "assert
<cond> [ , msg ]" change to "if not (<cond>): raise AssertionError(<msg>)"
(handle the no-message cases by providing a clear message), and for the boolean
sum use an explicit range check on int(true_count) that raises AssertionError
when out of range—this ensures the validation in the kind-handling logic (the
branches for "boolean", integer kinds, floating kinds, "uuid"/"long256", and
"timestamp") still runs under python -O.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c18347a-3f8f-49e1-86af-ed8999267f38

📥 Commits

Reviewing files that changed from the base of the PR and between 5db1d69 and bc2cb85.

⛔ Files ignored due to path filters (1)
  • questdb-rs-ffi/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (32)
  • CMakeLists.txt
  • ci/run_all_tests.py
  • cpp_test/test_arrow_c.c
  • cpp_test/test_arrow_egress.cpp
  • cpp_test/test_arrow_ingress.cpp
  • include/questdb/egress/line_reader.h
  • include/questdb/ingress/line_sender.h
  • questdb-rs-ffi/Cargo.toml
  • questdb-rs-ffi/src/egress.rs
  • questdb-rs-ffi/src/lib.rs
  • questdb-rs/Cargo.toml
  • questdb-rs/src/egress/arrow/convert.rs
  • questdb-rs/src/egress/arrow/mod.rs
  • questdb-rs/src/egress/arrow/polars.rs
  • questdb-rs/src/egress/arrow/reader.rs
  • questdb-rs/src/egress/arrow/schema.rs
  • questdb-rs/src/egress/arrow/tests.rs
  • questdb-rs/src/egress/error.rs
  • questdb-rs/src/egress/mod.rs
  • questdb-rs/src/egress/reader.rs
  • questdb-rs/src/error.rs
  • questdb-rs/src/ingress.rs
  • questdb-rs/src/ingress/arrow.rs
  • questdb-rs/src/ingress/buffer.rs
  • questdb-rs/src/ingress/buffer/qwp.rs
  • questdb-rs/src/ingress/polars.rs
  • system_test/arrow_alignment_fuzz.py
  • system_test/arrow_egress_fuzz.py
  • system_test/arrow_ffi.py
  • system_test/arrow_ingress_fuzz.py
  • system_test/arrow_round_trip_fuzz.py
  • system_test/test.py
👮 Files not reviewed due to content moderation or server errors (1)
  • questdb-rs/src/ingress/buffer/qwp.rs

Comment thread cpp_test/test_arrow_ingress.cpp Outdated
Comment thread questdb-rs-ffi/src/lib.rs Outdated
Comment thread questdb-rs/src/egress/arrow/convert.rs
Comment thread questdb-rs/src/egress/arrow/convert.rs Outdated
Comment thread questdb-rs/src/egress/arrow/convert.rs
Comment thread questdb-rs/src/ingress/arrow.rs
Comment thread questdb-rs/src/ingress/polars.rs Outdated
Comment thread system_test/arrow_egress_fuzz.py Outdated
Comment thread system_test/arrow_round_trip_fuzz.py Outdated
Comment thread system_test/test.py Outdated
@kafka1991 kafka1991 marked this pull request as ready for review May 29, 2026 08:57

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (3)
examples/line_sender_cpp_example_arrow.cpp (1)

26-31: ⚡ Quick win

Arrow Status results are silently discarded.

Calling .ok() on arrow::Status returns bool but the result is unused. If AppendValues or Finish fail, the example will proceed with uninitialized or partial arrays and later crash or produce garbage. Consider checking the status or using ARROW_CHECK_OK / .ValueOrDie() patterns that Arrow examples typically demonstrate.

🛡️ Recommended fix
     constexpr int64_t base = 1700000000000000LL;
-    ts_b.AppendValues({base, base + 1, base + 2}).ok();
-    price_b.AppendValues({2615.54, 2615.55, 2615.50}).ok();
+    if (!ts_b.AppendValues({base, base + 1, base + 2}).ok())
+        return nullptr;
+    if (!price_b.AppendValues({2615.54, 2615.55, 2615.50}).ok())
+        return nullptr;

     std::shared_ptr<arrow::Array> ts_arr, price_arr;
-    ts_b.Finish(&ts_arr).ok();
-    price_b.Finish(&price_arr).ok();
+    if (!ts_b.Finish(&ts_arr).ok())
+        return nullptr;
+    if (!price_b.Finish(&price_arr).ok())
+        return nullptr;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/line_sender_cpp_example_arrow.cpp` around lines 26 - 31, The example
currently discards Arrow Status results (calls like ts_b.AppendValues(...).ok()
and ts_b.Finish(&ts_arr).ok()), which can hide failures and leave
ts_arr/price_arr uninitialized; update the calls to check/propagate the Status
(e.g., replace the .ok() usage with ARROW_CHECK_OK(...) or use .ValueOrDie()
patterns) so AppendValues and Finish failures are detected and handled before
using ts_arr and price_arr (apply the same change to price_b.AppendValues and
price_b.Finish).
examples/line_reader_c_example_arrow.c (1)

25-42: ⚡ Quick win

Consider adding null handling for demonstration completeness.

The example directly accesses col->buffers[1] without checking the validity bitmap (col->buffers[0]). While this is an example, users copying this pattern may be surprised when null values print garbage or crash. A brief comment or null-check would demonstrate the full Arrow contract.

💡 Example null-check pattern
             if (strcmp(fmt, "l") == 0 || strcmp(fmt, "i") == 0)
             {
+                const uint8_t* validity = (const uint8_t*)col->buffers[0];
+                if (validity && !(validity[(r + col->offset) / 8] & (1 << ((r + col->offset) % 8))))
+                {
+                    printf("NULL");
+                    continue;
+                }
                 int64_t v;
                 if (fmt[0] == 'l')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/line_reader_c_example_arrow.c` around lines 25 - 42, Add a
null-check that inspects the validity bitmap in col->buffers[0] before
dereferencing col->buffers[1]; compute the cell index as idx = r + col->offset,
test the validity bit for idx and, if not valid, print a placeholder (e.g.,
"null") instead of accessing the value, otherwise proceed with the existing
type-specific reads that use fmt and printf; update both integer branch (fmt
'l'/'i') and floating branch (fmt 'g'/'f') to use this bitmap test so examples
referencing col->buffers[0], col->buffers[1], fmt, r, and col->offset handle
nulls correctly.
cpp_test/test_arrow_c.c (1)

108-140: 💤 Low value

No-op ternary makes has_null_bitmap_buffer_slot dead.

Line 129 evaluates to 2 in both branches (has_null_bitmap_buffer_slot ? 2 : 2), so the parameter has no effect and every caller passes 1. For Arrow primitive layouts n_buffers is correctly 2 regardless, so behavior is fine — but the parameter/ternary is misleading and reads like an unfinished ? 2 : 1. Consider dropping the parameter (or the ternary) to avoid future confusion.

♻️ Suggested simplification
-    out_arr->n_buffers = has_null_bitmap_buffer_slot ? 2 : 2;
+    out_arr->n_buffers = 2; /* validity + values; primitives always have 2 */
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp_test/test_arrow_c.c` around lines 108 - 140, The ternary in
build_primitive that sets out_arr->n_buffers uses has_null_bitmap_buffer_slot
but returns 2 in both branches, making the parameter dead; remove the misleading
conditional by either deleting the unused has_null_bitmap_buffer_slot parameter
from build_primitive (and all callers) or replace the assignment with a direct
constant (out_arr->n_buffers = 2) and remove the parameter from the signature
and any callers, updating references to has_null_bitmap_buffer_slot accordingly
so the function and callers no longer carry an unused flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp_test/test_arrow_egress.cpp`:
- Around line 100-114: The test currently accepts any egress::line_reader_error;
change the catch in the next_arrow_batch() test to verify the specific
empty-stream/no-schema variant (e.g., compare the caught exception's error code
or type to line_reader_error_no_schema) instead of swallowing all reader
failures: call next_arrow_batch() on h.cursor, and in the catch block assert
that the exception equals the documented no-schema/empty-stream error
(line_reader_error_no_schema) and rethrow or fail the test for any other
egress::line_reader_error to avoid masking unrelated protocol/decoding
regressions.

In `@include/questdb/egress/line_reader.hpp`:
- Around line 2479-2483: Make struct arrow_batch move-only: delete its copy
constructor and copy assignment operator, implement a move constructor that
transfers ownership by copying the ::ArrowArray and ::ArrowSchema values and
then nulling the source's array.release, array.private_data, schema.release and
schema.private_data, implement a move assignment operator that first safely
releases any currently-owned array/schema on *this (by calling their release
callbacks if non-null) to avoid leaks, then takes the source's structs and nulls
the source fields the same way, and add a destructor that calls
array.release/schema.release if they are non-null to ensure deterministic
cleanup; reference the types ::ArrowArray, ::ArrowSchema and the fields
release/private_data in your changes.

In `@system_test/arrow_fuzz_common.py`:
- Around line 814-817: _cmp_double_array currently returns True for any non-null
inputs which masks array mismatches; change it to perform an element-wise
comparison of arrays using the existing _deep_float_equal helper (and/or reuse
its logic) so that shapes/lengths and corresponding float values are validated;
ensure it still returns True when both are None and False if shapes differ or
any element comparison fails, and preserve the params argument behavior when
delegating to _deep_float_equal.
- Around line 291-299: EDGE_STRINGS contains an invisible Unicode literal ("​")
which triggers linter PLE2515; replace that inline invisible string with an
escaped Unicode sequence (e.g. use "\u200B\uFEFF" or '\u200B\uFEFF') so the test
case preserves the zero-width characters but the source is readable; update the
EDGE_STRINGS list entry in the module (EDGE_STRINGS) to use the escaped form.

In `@system_test/arrow_ingress_fuzz.py`:
- Around line 305-309: The _cmp_array function only checks for
non-null/non-empty actual values and must be strengthened to validate shape and
element values; update _cmp_array to deserialize the stored array (use pyarrow
Array/ChunkedArray via pyarrow.ipc or parse the JSON representation returned by
the ingress path) and perform a deep comparison of length/shape and
element-by-element equality (including order), handling nulls consistently;
ensure the function references _cmp_array and any helper like
_read_arrow_or_json so tests fail when elements are reordered, shapes differ, or
values are corrupted.
- Around line 32-58: The _iso_to_us function uses float-based (base_dt -
_epoch_us()).total_seconds() which can round nanosecond-scale values; replace
float math with integer-safe delta arithmetic: compute delta = base_dt -
_epoch_us() and use delta.days, delta.seconds, and delta.microseconds to build
the microsecond integer as (delta.days*86400 + delta.seconds)*1_000_000 +
delta.microseconds; also replace the ns_tail rounding with pure integer math
(e.g. take the first 3 ns digits and integer-divide by 1000 rather than using
round/float). Apply the same change to the analogous timestamp parser referenced
at lines 60-82.
- Around line 700-708: The test test_extra_float16_widens_to_double claims
Float16 should widen to DOUBLE but calls self._ingest_one_col with column type
"FLOAT", so update the test to use the DOUBLE column type (e.g., pass "DOUBLE"
instead of "FLOAT" to self._ingest_one_col or otherwise ensure
fresh_table("arrow_extra_f16") defines a DOUBLE column) so the ingestion path
that widens pa.float16() to double is exercised; modify the call site in
test_extra_float16_widens_to_double and any related fresh_table schema setup
accordingly.

In `@system_test/arrow_polars_fuzz.py`:
- Around line 145-167: The "all_null" null_mode produces zero surviving rows for
single-user-column per-kind tests but the test currently waits for and asserts
the full input row count; update the loop that builds modes or the per-subTest
logic to skip "all_null" when running this per-kind variant (the block creating
modes, checking spec.supports_server_null, and iterating null_mode), or
alternatively detect null_mode == "all_null" after building rb_send/rb_recv and
assert zero surviving rows by calling afc.wait_for_rows(self._fixture, table, 0)
and using self._assert_polars_round_trip with an expected zero-row batch; modify
the code around modes/null_mode, _build_batch, afc.ingest_via_arrow,
afc.wait_for_rows, and _assert_polars_round_trip to implement this change.

In `@system_test/arrow_round_trip_fuzz.py`:
- Around line 79-95: The test is adding "all_null" to modes but that case is
invalid because ingress skips rows where all user columns are null; update the
loop that builds modes (and the subsequent per-null_mode logic) to either omit
inserting "all_null" into modes when running the single-column per-kind
round-trip, or handle it specially: when null_mode == "all_null" call
afc.ingest_via_arrow as before but replace afc.wait_for_rows(self._fixture,
table, rb_in.num_rows) with waiting/asserting zero persisted rows (e.g.,
wait_for_rows(..., 0)) and call _assert_kind_round_trip with an expectation of
zero persisted rows; locate the change around the modes list construction and
the for null_mode in modes loop and adjust behavior for null_mode == "all_null"
(functions referenced: fresh_table, _build_batch, afc.ingest_via_arrow,
afc.wait_for_rows, _read_back, _assert_kind_round_trip).

In `@system_test/test_arrow_fuzz_common_unit.py`:
- Around line 89-97: The test method test_float_nan_compares_equal_to_itself in
the system_test/test_arrow_fuzz_common_unit.py uses spec =
afc.KIND_REGISTRY["double"] and currently asserts that
spec.compare(float("inf"), float("-inf")) and spec.compare(float("nan"),
float("inf")) are True; update these two assertions to expect False so that +inf
vs -inf and NaN vs +inf are not treated as equal while keeping spec.compare(nan,
nan) True, spec.compare(nan, 0.0) False, and spec.compare(float("inf"),
float("inf")) True.

---

Nitpick comments:
In `@cpp_test/test_arrow_c.c`:
- Around line 108-140: The ternary in build_primitive that sets
out_arr->n_buffers uses has_null_bitmap_buffer_slot but returns 2 in both
branches, making the parameter dead; remove the misleading conditional by either
deleting the unused has_null_bitmap_buffer_slot parameter from build_primitive
(and all callers) or replace the assignment with a direct constant
(out_arr->n_buffers = 2) and remove the parameter from the signature and any
callers, updating references to has_null_bitmap_buffer_slot accordingly so the
function and callers no longer carry an unused flag.

In `@examples/line_reader_c_example_arrow.c`:
- Around line 25-42: Add a null-check that inspects the validity bitmap in
col->buffers[0] before dereferencing col->buffers[1]; compute the cell index as
idx = r + col->offset, test the validity bit for idx and, if not valid, print a
placeholder (e.g., "null") instead of accessing the value, otherwise proceed
with the existing type-specific reads that use fmt and printf; update both
integer branch (fmt 'l'/'i') and floating branch (fmt 'g'/'f') to use this
bitmap test so examples referencing col->buffers[0], col->buffers[1], fmt, r,
and col->offset handle nulls correctly.

In `@examples/line_sender_cpp_example_arrow.cpp`:
- Around line 26-31: The example currently discards Arrow Status results (calls
like ts_b.AppendValues(...).ok() and ts_b.Finish(&ts_arr).ok()), which can hide
failures and leave ts_arr/price_arr uninitialized; update the calls to
check/propagate the Status (e.g., replace the .ok() usage with
ARROW_CHECK_OK(...) or use .ValueOrDie() patterns) so AppendValues and Finish
failures are detected and handled before using ts_arr and price_arr (apply the
same change to price_b.AppendValues and price_b.Finish).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62f0e860-74b0-48c2-b469-4f6eabed398a

📥 Commits

Reviewing files that changed from the base of the PR and between bc2cb85 and 361420c.

📒 Files selected for processing (30)
  • CMakeLists.txt
  • ci/compile.yaml
  • ci/run_fuzz_pipeline.yaml
  • ci/run_tests_pipeline.yaml
  • cpp_test/test_arrow_c.c
  • cpp_test/test_arrow_egress.cpp
  • cpp_test/test_arrow_ingress.cpp
  • examples/line_reader_c_example_arrow.c
  • examples/line_reader_cpp_example_arrow.cpp
  • examples/line_sender_cpp_example_arrow.cpp
  • include/questdb/egress/line_reader.h
  • include/questdb/egress/line_reader.hpp
  • include/questdb/ingress/line_sender.h
  • include/questdb/ingress/line_sender.hpp
  • include/questdb/ingress/line_sender_core.hpp
  • questdb-rs-ffi/src/lib.rs
  • questdb-rs/Cargo.toml
  • questdb-rs/src/ingress/arrow.rs
  • questdb-rs/src/ingress/buffer.rs
  • questdb-rs/src/ingress/buffer/qwp.rs
  • system_test/arrow_alignment_fuzz.py
  • system_test/arrow_egress_fuzz.py
  • system_test/arrow_ffi.py
  • system_test/arrow_fuzz_common.py
  • system_test/arrow_ingress_fuzz.py
  • system_test/arrow_polars_fuzz.py
  • system_test/arrow_polars_per_dtype.py
  • system_test/arrow_round_trip_fuzz.py
  • system_test/test.py
  • system_test/test_arrow_fuzz_common_unit.py
✅ Files skipped from review due to trivial changes (1)
  • examples/line_reader_cpp_example_arrow.cpp
🚧 Files skipped from review as they are similar to previous changes (4)
  • include/questdb/ingress/line_sender.h
  • questdb-rs-ffi/src/lib.rs
  • include/questdb/egress/line_reader.h
  • questdb-rs/src/ingress/buffer/qwp.rs

Comment on lines +100 to +114
// `next_arrow_batch` snapshots schema eagerly. With ZERO batches the
// adapter must EITHER:
// - throw `line_reader_error_no_schema` (when QWP protocol path
// reaches `as_record_batch_reader` with no first batch), OR
// - return `nullopt` directly (when the inner pump terminates
// first).
try
{
auto b = h.cursor.next_arrow_batch();
CHECK(!b.has_value());
}
catch (const egress::line_reader_error&)
{
// _error path acceptable per the doc.
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the specific empty-stream error instead of accepting any reader failure.

This test currently passes on every line_reader_error, including unrelated protocol or decoding regressions. It should only tolerate the documented no-schema case.

In the catch, assert the error code is the expected empty-stream/no-schema variant before treating the path as success.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp_test/test_arrow_egress.cpp` around lines 100 - 114, The test currently
accepts any egress::line_reader_error; change the catch in the
next_arrow_batch() test to verify the specific empty-stream/no-schema variant
(e.g., compare the caught exception's error code or type to
line_reader_error_no_schema) instead of swallowing all reader failures: call
next_arrow_batch() on h.cursor, and in the catch block assert that the exception
equals the documented no-schema/empty-stream error (line_reader_error_no_schema)
and rethrow or fail the test for any other egress::line_reader_error to avoid
masking unrelated protocol/decoding regressions.

Comment thread include/questdb/egress/line_reader.hpp
Comment on lines +291 to +299
EDGE_STRINGS = [
"",
"a",
"ascii",
"日本語",
"🚀🌟",
"​",
"x" * 4096,
]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Escape the zero-width edge-case string literal.

Line 297 embeds invisible Unicode directly, which Ruff flags as PLE2515 and will fail lint. Using escapes keeps the testcase intact and makes the source readable.

Proposed fix
-    "​",
+    "\u200B\uFEFF",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
EDGE_STRINGS = [
"",
"a",
"ascii",
"日本語",
"🚀🌟",
"​",
"x" * 4096,
]
EDGE_STRINGS = [
"",
"a",
"ascii",
"日本語",
"🚀🌟",
"\u200B\uFEFF",
"x" * 4096,
]
🧰 Tools
🪛 Ruff (0.15.14)

[error] 297-297: Invalid unescaped character zero-width-space, use "\u200B" instead

Replace with escape sequence

(PLE2515)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_fuzz_common.py` around lines 291 - 299, EDGE_STRINGS
contains an invisible Unicode literal ("​") which triggers linter PLE2515;
replace that inline invisible string with an escaped Unicode sequence (e.g. use
"\u200B\uFEFF" or '\u200B\uFEFF') so the test case preserves the zero-width
characters but the source is readable; update the EDGE_STRINGS list entry in the
module (EDGE_STRINGS) to use the escaped form.

Comment on lines +814 to +817
def _cmp_double_array(a, e, *, params):
if a is None or e is None:
return a is None and e is None
return True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Compare array payloads instead of treating every non-null array as equal.

_cmp_double_array() currently returns True for any non-null a/e, so array round-trip regressions won't fail these tests at all. _deep_float_equal() right below already looks like the intended comparator.

Proposed fix
 def _cmp_double_array(a, e, *, params):
     if a is None or e is None:
         return a is None and e is None
-    return True
+    return _deep_float_equal(a, e)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _cmp_double_array(a, e, *, params):
if a is None or e is None:
return a is None and e is None
return True
def _cmp_double_array(a, e, *, params):
if a is None or e is None:
return a is None and e is None
return _deep_float_equal(a, e)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_fuzz_common.py` around lines 814 - 817, _cmp_double_array
currently returns True for any non-null inputs which masks array mismatches;
change it to perform an element-wise comparison of arrays using the existing
_deep_float_equal helper (and/or reuse its logic) so that shapes/lengths and
corresponding float values are validated; ensure it still returns True when both
are None and False if shapes differ or any element comparison fails, and
preserve the params argument behavior when delegating to _deep_float_equal.

Comment on lines +32 to +58
def _iso_to_us(s: str) -> int:
"""ISO datetime string → microseconds since epoch (handles ns suffix)."""
s = s.rstrip("Z")
if "." in s:
head, frac = s.split(".", 1)
if "T" not in head:
head = head.replace(" ", "T")
frac = frac.ljust(6, "0")
us = int(frac[:6])
ns_tail = frac[6:]
if ns_tail and any(c != "0" for c in ns_tail):
us += int(round(int(ns_tail.ljust(3, "0")[:3]) / 1000.0))
try:
base_dt = _dt.datetime.fromisoformat(head).replace(
tzinfo=_dt.timezone.utc
)
except ValueError:
return -1
return int((base_dt - _epoch_us()).total_seconds() * 1_000_000) + us
head = s.replace(" ", "T") if "T" not in s else s
try:
base_dt = _dt.datetime.fromisoformat(head).replace(
tzinfo=_dt.timezone.utc
)
except ValueError:
return -1
return int((base_dt - _epoch_us()).total_seconds() * 1_000_000)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid float-based epoch math in the timestamp parsers.

These helpers convert through total_seconds(), so nanosecond-scale values can be rounded and adjacent timestamps can collapse to the same integer. That makes the timestamp_ns assertions flaky and can hide real regressions behind parser noise.

Proposed fix
 def _iso_to_us(s: str) -> int:
@@
-        return int((base_dt - _epoch_us()).total_seconds() * 1_000_000) + us
+        delta = base_dt - _epoch_us()
+        base_us = (
+            ((delta.days * 86_400) + delta.seconds) * 1_000_000
+            + delta.microseconds
+        )
+        return base_us + us
@@
-    return int((base_dt - _epoch_us()).total_seconds() * 1_000_000)
+    delta = base_dt - _epoch_us()
+    return ((delta.days * 86_400) + delta.seconds) * 1_000_000 + delta.microseconds

 def _iso_to_ns(s: str) -> int:
@@
-        return int((base_dt - _epoch_us()).total_seconds() * 1_000_000_000) + ns_part
+        delta = base_dt - _epoch_us()
+        base_ns = (
+            (((delta.days * 86_400) + delta.seconds) * 1_000_000)
+            + delta.microseconds
+        ) * 1_000
+        return base_ns + ns_part
@@
-    return int((base_dt - _epoch_us()).total_seconds() * 1_000_000_000)
+    delta = base_dt - _epoch_us()
+    return (
+        (((delta.days * 86_400) + delta.seconds) * 1_000_000)
+        + delta.microseconds
+    ) * 1_000

Also applies to: 60-82

🧰 Tools
🪛 Ruff (0.15.14)

[warning] 43-43: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_ingress_fuzz.py` around lines 32 - 58, The _iso_to_us
function uses float-based (base_dt - _epoch_us()).total_seconds() which can
round nanosecond-scale values; replace float math with integer-safe delta
arithmetic: compute delta = base_dt - _epoch_us() and use delta.days,
delta.seconds, and delta.microseconds to build the microsecond integer as
(delta.days*86400 + delta.seconds)*1_000_000 + delta.microseconds; also replace
the ns_tail rounding with pure integer math (e.g. take the first 3 ns digits and
integer-divide by 1000 rather than using round/float). Apply the same change to
the analogous timestamp parser referenced at lines 60-82.

Comment on lines +305 to +309
def _cmp_array(expected, actual) -> bool:
"""Best-effort: shape and non-null status; full string parsing is brittle."""
if expected is None:
return actual is None or actual == ""
return actual is not None and str(actual) != ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Strengthen array comparisons beyond null/non-null checks.

_cmp_array() currently returns True for any non-empty actual, so the new array ingest coverage won't catch reordered elements, wrong shapes, or corrupted values. For a feature this PR explicitly adds, that leaves a big blind spot.

Prefer reading array cases back via Arrow, like the binary path does, or parse the JSON representation and compare structure/value content instead of only presence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_ingress_fuzz.py` around lines 305 - 309, The _cmp_array
function only checks for non-null/non-empty actual values and must be
strengthened to validate shape and element values; update _cmp_array to
deserialize the stored array (use pyarrow Array/ChunkedArray via pyarrow.ipc or
parse the JSON representation returned by the ingress path) and perform a deep
comparison of length/shape and element-by-element equality (including order),
handling nulls consistently; ensure the function references _cmp_array and any
helper like _read_arrow_or_json so tests fail when elements are reordered,
shapes differ, or values are corrupted.

Comment on lines +700 to +708
def test_extra_float16_widens_to_double(self):
try:
import numpy as np
except ImportError:
self.skipTest("numpy required to build Float16 arrays via pyarrow")
arr = pa.array(np.array([1.5, -2.5, 0.0, 1.0], dtype=np.float16))
self.assertEqual(arr.type, pa.float16())
table = self.fresh_table("arrow_extra_f16")
self._ingest_one_col(table, "FLOAT", "c", arr)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Match the test schema to the claimed Float16 widening behavior.

This test says Float16 widens to DOUBLE, but it creates a FLOAT column. That still allows the call to succeed without proving the intended dispatch path.

Proposed fix
-        self._ingest_one_col(table, "FLOAT", "c", arr)
+        self._ingest_one_col(table, "DOUBLE", "c", arr)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_ingress_fuzz.py` around lines 700 - 708, The test
test_extra_float16_widens_to_double claims Float16 should widen to DOUBLE but
calls self._ingest_one_col with column type "FLOAT", so update the test to use
the DOUBLE column type (e.g., pass "DOUBLE" instead of "FLOAT" to
self._ingest_one_col or otherwise ensure fresh_table("arrow_extra_f16") defines
a DOUBLE column) so the ingestion path that widens pa.float16() to double is
exercised; modify the call site in test_extra_float16_widens_to_double and any
related fresh_table schema setup accordingly.

Comment on lines +145 to +167
modes = ["valid", "edge"]
if spec.supports_server_null:
modes[1:1] = ["partial", "all_null"]
for null_mode in modes:
with self.subTest(null_mode=null_mode):
table = self.fresh_table(f"arrow_pl_{kind_name}_{null_mode}")
kinds = [(f"c_{kind_name}", spec)]
afc.create_table_from_kinds(self._fixture, table, kinds)
ts_base = 1_700_000_000_000_000 + self._master_rng.next_int(1_000_000)
rb_orig, _vpc = _build_batch(
self._master_rng, _ROWS_PER_BATCH, kinds,
null_mode=null_mode, ts_base_us=ts_base,
)
df_send = _rb_to_polars(rb_orig)
rb_send = _polars_to_rb(df_send)
afc.ingest_via_arrow(self._fixture, table, rb_send)
afc.wait_for_rows(self._fixture, table, rb_send.num_rows)
rb_recv = _read_back(self._fixture, table, kinds)
df_recv = _rb_to_polars(rb_recv)
rb_recv_pl = _polars_to_rb(df_recv)
self._assert_polars_round_trip(
rb_orig, rb_recv_pl, kinds, null_mode,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The per-kind Polars test has the same all-null row-count mismatch.

With one user column, "all_null" makes every row empty from the ingest path's perspective, and this PR explicitly skips such rows. The current test still waits for and asserts the full input row count, so it will fail against the intended behavior.

Please skip "all_null" in this per-kind variant as well, or assert zero surviving rows for that mode.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_polars_fuzz.py` around lines 145 - 167, The "all_null"
null_mode produces zero surviving rows for single-user-column per-kind tests but
the test currently waits for and asserts the full input row count; update the
loop that builds modes or the per-subTest logic to skip "all_null" when running
this per-kind variant (the block creating modes, checking
spec.supports_server_null, and iterating null_mode), or alternatively detect
null_mode == "all_null" after building rb_send/rb_recv and assert zero surviving
rows by calling afc.wait_for_rows(self._fixture, table, 0) and using
self._assert_polars_round_trip with an expected zero-row batch; modify the code
around modes/null_mode, _build_batch, afc.ingest_via_arrow, afc.wait_for_rows,
and _assert_polars_round_trip to implement this change.

Comment on lines +79 to +95
modes = ["valid", "edge"]
if spec.supports_server_null:
modes[1:1] = ["partial", "all_null"]
for null_mode in modes:
with self.subTest(null_mode=null_mode):
table = self.fresh_table(f"arrow_rt_{kind_name}_{null_mode}")
kinds = [(f"c_{kind_name}", spec)]
afc.create_table_from_kinds(self._fixture, table, kinds)
ts_base = 1_700_000_000_000_000 + self._master_rng.next_int(1_000_000)
rb_in, vpc = _build_batch(
self._master_rng, _ROWS_PER_BATCH, kinds,
null_mode=null_mode, ts_base_us=ts_base,
)
afc.ingest_via_arrow(self._fixture, table, rb_in)
afc.wait_for_rows(self._fixture, table, rb_in.num_rows)
rb_out = _read_back(self._fixture, table, kinds)
self._assert_kind_round_trip(rb_in, rb_out, kinds, null_mode)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

all_null is invalid for the single-column per-kind round-trip case.

For these per-kind tests, "all_null" means every user column is null on every row. The ingress path in this PR intentionally skips rows where all user columns are null, so waiting for rb_in.num_rows and asserting equal row counts is testing against the documented ingest behavior.

Either skip "all_null" here or change the expectation to zero persisted rows for that mode.

🧰 Tools
🪛 Ruff (0.15.14)

[warning] 88-88: Unpacked variable vpc is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_round_trip_fuzz.py` around lines 79 - 95, The test is
adding "all_null" to modes but that case is invalid because ingress skips rows
where all user columns are null; update the loop that builds modes (and the
subsequent per-null_mode logic) to either omit inserting "all_null" into modes
when running the single-column per-kind round-trip, or handle it specially: when
null_mode == "all_null" call afc.ingest_via_arrow as before but replace
afc.wait_for_rows(self._fixture, table, rb_in.num_rows) with waiting/asserting
zero persisted rows (e.g., wait_for_rows(..., 0)) and call
_assert_kind_round_trip with an expectation of zero persisted rows; locate the
change around the modes list construction and the for null_mode in modes loop
and adjust behavior for null_mode == "all_null" (functions referenced:
fresh_table, _build_batch, afc.ingest_via_arrow, afc.wait_for_rows, _read_back,
_assert_kind_round_trip).

Comment on lines +89 to +97
def test_float_nan_compares_equal_to_itself(self):
spec = afc.KIND_REGISTRY["double"]
nan = float("nan")
self.assertTrue(spec.compare(nan, nan))
self.assertFalse(spec.compare(nan, 0.0))
self.assertTrue(spec.compare(float("inf"), float("inf")))
self.assertTrue(spec.compare(float("inf"), float("-inf")))
self.assertTrue(spec.compare(float("nan"), float("inf")))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the float comparison expectations.

These two assertions make the comparator accept obviously different values as equal:

  • +inf vs -inf
  • nan vs inf

That will mask real round-trip regressions instead of testing the intended NaN handling.

Suggested fix
     def test_float_nan_compares_equal_to_itself(self):
         spec = afc.KIND_REGISTRY["double"]
         nan = float("nan")
         self.assertTrue(spec.compare(nan, nan))
         self.assertFalse(spec.compare(nan, 0.0))
         self.assertTrue(spec.compare(float("inf"), float("inf")))
-        self.assertTrue(spec.compare(float("inf"), float("-inf")))
-        self.assertTrue(spec.compare(float("nan"), float("inf")))
+        self.assertFalse(spec.compare(float("inf"), float("-inf")))
+        self.assertFalse(spec.compare(float("nan"), float("inf")))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_float_nan_compares_equal_to_itself(self):
spec = afc.KIND_REGISTRY["double"]
nan = float("nan")
self.assertTrue(spec.compare(nan, nan))
self.assertFalse(spec.compare(nan, 0.0))
self.assertTrue(spec.compare(float("inf"), float("inf")))
self.assertTrue(spec.compare(float("inf"), float("-inf")))
self.assertTrue(spec.compare(float("nan"), float("inf")))
def test_float_nan_compares_equal_to_itself(self):
spec = afc.KIND_REGISTRY["double"]
nan = float("nan")
self.assertTrue(spec.compare(nan, nan))
self.assertFalse(spec.compare(nan, 0.0))
self.assertTrue(spec.compare(float("inf"), float("inf")))
self.assertFalse(spec.compare(float("inf"), float("-inf")))
self.assertFalse(spec.compare(float("nan"), float("inf")))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/test_arrow_fuzz_common_unit.py` around lines 89 - 97, The test
method test_float_nan_compares_equal_to_itself in the
system_test/test_arrow_fuzz_common_unit.py uses spec =
afc.KIND_REGISTRY["double"] and currently asserts that
spec.compare(float("inf"), float("-inf")) and spec.compare(float("nan"),
float("inf")) are True; update these two assertions to expect False so that +inf
vs -inf and NaN vs +inf are not treated as equal while keeping spec.compare(nan,
nan) True, spec.compare(nan, 0.0) False, and spec.compare(float("inf"),
float("inf")) True.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
cpp_test/test_arrow_c.c (1)

403-415: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Strengthen the new DTS-path assertions.

The default branch currently treats append_arrow(...)=false as a pass after cleanup, and the missing-column branch accepts false even when err_out stays null. That lets regressions in the new dispatch coverage pass silently.

Suggested fix
         bool ok = line_sender_buffer_append_arrow(
             buf, make_table("dts_default"), &arr, &sch, &err);
+        CHECK(ok, "default append should succeed");
         if (!ok)
         {
             CHECK(err != NULL, "err_out populated on failure");
             if (err)
                 line_sender_error_free(err);
@@
         bool ok = line_sender_buffer_append_arrow_at_column(
             buf, make_table("dts_at_col"), &arr, &sch, ts_col, &err);
         CHECK(!ok, "missing ts column → false");
+        CHECK(err != NULL, "missing ts column should populate err_out");
         if (err)
         {
             CHECK(line_sender_error_get_code(err) == line_sender_error_arrow_ingest,
                   "missing ts column → arrow_ingest");

Also applies to: 429-437

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp_test/test_arrow_c.c` around lines 403 - 415, When
line_sender_buffer_append_arrow returns false (variable ok) ensure the test
fails if err was not populated: update the failure branches around
line_sender_buffer_append_arrow calls so that on !ok you assert CHECK(err !=
NULL, "err_out populated on failure") (and treat a NULL err as a test failure),
then free err with line_sender_error_free(err) if set; apply the same change to
the second occurrence (lines 429-437 region). Keep existing cleanup for
arr.release and sch.release but do not accept false/NULL-err combos as a silent
pass for line_sender_buffer_append_arrow.
questdb-rs/src/ingress/polars.rs (1)

88-95: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle zero-column DataFrames explicitly.

dataframe_to_batches() now exposes dataframe_to_record_batch() for any DataFrame, but the downstream builder still relies on RecordBatch::try_new(schema, arrays). When arrays is empty, a valid DataFrame with df.width() == 0 and df.height() > 0 can still be rejected because the row count is not carried through explicitly.

🧩 Minimal fix at the construction site
 pub fn dataframe_to_record_batch(df: DataFrame) -> Result<RecordBatch> {
+    let height = df.height();
     let compat = CompatLevel::newest();
     let mut fields: Vec<Field> = Vec::with_capacity(df.width());
     let mut arrays: Vec<ArrayRef> = Vec::with_capacity(df.width());
     for column in df.into_columns() {
         let name = column.name().as_str().to_string();
@@
     }
     let schema = Arc::new(ArrowSchema::new(fields));
-    RecordBatch::try_new(schema, arrays)
+    if arrays.is_empty() {
+        RecordBatch::try_new_with_options(
+            schema,
+            arrays,
+            &arrow_array::RecordBatchOptions::new().with_row_count(Some(height)),
+        )
+    } else {
+        RecordBatch::try_new(schema, arrays)
+    }
         .map_err(|e| fmt!(ArrowIngest, "RecordBatch::try_new failed: {}", e))
 }
In arrow-array 58.0.0, what does `RecordBatch::try_new` do when the schema and column array list are both empty? For a non-zero row count, is `RecordBatch::try_new_with_options(...with_row_count(Some(height)))` required?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@questdb-rs/src/ingress/polars.rs` around lines 88 - 95, dataframe_to_batches
must explicitly handle zero-column DataFrames: detect when df.width() == 0 and
df.height() > 0 and build the RecordBatch using
RecordBatch::try_new_with_options (or the equivalent API) so the row count is
supplied (e.g., with_row_count(Some(df.height()))), instead of calling
RecordBatch::try_new with an empty arrays vector; update the code paths in
dataframe_to_batches and the call site that uses dataframe_to_record_batch to
branch on zero columns and create a schema with no fields plus call
try_new_with_options(schema, Vec::new(), Some(df.height())) to preserve row
count for Arrow 58+ semantics.
🧹 Nitpick comments (1)
system_test/arrow_ingress_fuzz.py (1)

517-526: 💤 Low value

Unused variable kinds should be prefixed with underscore.

The kinds variable from _build_small_batch() is unpacked but never used. Prefix it with _ to indicate it's intentionally ignored.

♻️ Suggested fix
     def test_dts_default(self):
-        rb, kinds = self._build_small_batch()
+        rb, _kinds = self._build_small_batch()
         no_ts_fields = [f for f in rb.schema if f.name != "ts"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@system_test/arrow_ingress_fuzz.py` around lines 517 - 526, The variable
`kinds` returned from _build_small_batch() is unused; update the unpacking call
in the block where rb, kinds = self._build_small_batch() to use a deliberately
ignored name (e.g., `_kinds` or `_`) so the intent is clear and linter warnings
are avoided; keep the rest of the logic that uses rb, rb_no_ts, and
afc.ingest_via_arrow unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/questdb/egress/line_reader.h`:
- Around line 1824-1826: Comment documents the wrong numeric discriminant for
line_reader_error_schema_drift; update the comment to show the actual enum value
(22) or remove the numeric annotation to avoid mismatch with the enum definition
(the value 24 belongs to line_reader_error_arrow_export). Edit the comment block
referencing line_reader_error_schema_drift so it matches the enum definition in
this header (or omit the “(= N)” part) to keep public API docs consistent.

In `@questdb-rs-ffi/src/egress.rs`:
- Around line 3980-3987: The pinned schema (c.arrow_schema_pin) must be cleared
when next_arrow_batch_inner signals a SchemaDriftMidStream so callers can
recover; modify the handling around the call to Cursor::next_arrow_batch_inner
(invoked via c.cursor_for_mut() / next_arrow_batch_inner) to detect the
SchemaDriftMidStream error outcome and set c.arrow_schema_pin = None before
returning the error (or propagating it), leaving the existing behavior of
setting the pin on Ok(Some(rb)) unchanged.

In `@questdb-rs/src/error.rs`:
- Around line 38-40: The change added #[non_exhaustive] to the public enum
ErrorCode which is a breaking change for downstream crates that match
Error::code exhaustively; remove the #[non_exhaustive] attribute from the
ErrorCode declaration (or alternatively make ErrorCode private and expose a
stable accessor type) so consumers can continue to exhaustively match it,
ensuring the enum remains exhaustive for public use (update the enum declaration
that starts with pub enum ErrorCode and any references like Error::code
accordingly).

---

Duplicate comments:
In `@cpp_test/test_arrow_c.c`:
- Around line 403-415: When line_sender_buffer_append_arrow returns false
(variable ok) ensure the test fails if err was not populated: update the failure
branches around line_sender_buffer_append_arrow calls so that on !ok you assert
CHECK(err != NULL, "err_out populated on failure") (and treat a NULL err as a
test failure), then free err with line_sender_error_free(err) if set; apply the
same change to the second occurrence (lines 429-437 region). Keep existing
cleanup for arr.release and sch.release but do not accept false/NULL-err combos
as a silent pass for line_sender_buffer_append_arrow.

In `@questdb-rs/src/ingress/polars.rs`:
- Around line 88-95: dataframe_to_batches must explicitly handle zero-column
DataFrames: detect when df.width() == 0 and df.height() > 0 and build the
RecordBatch using RecordBatch::try_new_with_options (or the equivalent API) so
the row count is supplied (e.g., with_row_count(Some(df.height()))), instead of
calling RecordBatch::try_new with an empty arrays vector; update the code paths
in dataframe_to_batches and the call site that uses dataframe_to_record_batch to
branch on zero columns and create a schema with no fields plus call
try_new_with_options(schema, Vec::new(), Some(df.height())) to preserve row
count for Arrow 58+ semantics.

---

Nitpick comments:
In `@system_test/arrow_ingress_fuzz.py`:
- Around line 517-526: The variable `kinds` returned from _build_small_batch()
is unused; update the unpacking call in the block where rb, kinds =
self._build_small_batch() to use a deliberately ignored name (e.g., `_kinds` or
`_`) so the intent is clear and linter warnings are avoided; keep the rest of
the logic that uses rb, rb_no_ts, and afc.ingest_via_arrow unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ff246a6-0c38-46e6-8874-21a424574b1f

📥 Commits

Reviewing files that changed from the base of the PR and between 361420c and 832878e.

📒 Files selected for processing (22)
  • CMakeLists.txt
  • cpp_test/test_arrow_c.c
  • cpp_test/test_arrow_ingress.cpp
  • include/questdb/egress/line_reader.h
  • include/questdb/egress/line_reader.hpp
  • include/questdb/ingress/line_sender.h
  • include/questdb/ingress/line_sender.hpp
  • questdb-rs-ffi/src/egress.rs
  • questdb-rs-ffi/src/lib.rs
  • questdb-rs/src/egress/arrow/convert.rs
  • questdb-rs/src/egress/arrow/polars.rs
  • questdb-rs/src/egress/arrow/tests.rs
  • questdb-rs/src/error.rs
  • questdb-rs/src/ingress.rs
  • questdb-rs/src/ingress/arrow.rs
  • questdb-rs/src/ingress/buffer/qwp.rs
  • questdb-rs/src/ingress/polars.rs
  • system_test/arrow_ffi.py
  • system_test/arrow_fuzz_common.py
  • system_test/arrow_ingress_fuzz.py
  • system_test/arrow_polars_per_dtype.py
  • system_test/test.py
💤 Files with no reviewable changes (1)
  • questdb-rs/src/ingress.rs
🚧 Files skipped from review as they are similar to previous changes (6)
  • system_test/test.py
  • CMakeLists.txt
  • questdb-rs/src/egress/arrow/polars.rs
  • include/questdb/egress/line_reader.hpp
  • questdb-rs/src/ingress/buffer/qwp.rs
  • system_test/arrow_fuzz_common.py

Comment thread include/questdb/egress/line_reader.h Outdated
Comment thread questdb-rs-ffi/src/egress.rs Outdated
Comment thread questdb-rs/src/error.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
questdb-rs/src/egress/arrow/polars.rs (1)

122-210: ⚡ Quick win

Consider adding a nullable-column round-trip test.

The current tests cover non-nullable Int64/Float64/Utf8 and the zero-row case, but not null values flowing through the Arrow C Data Interface bridge. Null handling is a common source of regressions in to_ffi/import_array_from_c round-trips, so a small test with a nullable column containing at least one None would harden this path cheaply.

💚 Example test to add
#[test]
fn record_batch_to_dataframe_preserves_nulls() {
    let mut ii = Int64Builder::new();
    ii.append_value(1);
    ii.append_null();
    ii.append_value(3);
    let schema = Arc::new(ArrowSchema::new(vec![Field::new("i", DataType::Int64, true)]));
    let rb = RecordBatch::try_new(schema, vec![Arc::new(ii.finish()) as ArrayRef]).unwrap();
    let df = record_batch_to_dataframe(rb).unwrap();
    let i64s = df.columns()[0].as_materialized_series().i64().unwrap();
    assert_eq!(i64s.get(0), Some(1));
    assert_eq!(i64s.get(1), None);
    assert_eq!(i64s.get(2), Some(3));
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@questdb-rs/src/egress/arrow/polars.rs` around lines 122 - 210, Add a new test
that verifies nullable columns round-trip through record_batch_to_dataframe:
create an Int64Builder, append 1, append_null(), append 3, construct a schema
with Field::new("i", DataType::Int64, true), build a RecordBatch and call
record_batch_to_dataframe(rb).unwrap(), then get
df.columns()[0].as_materialized_series().i64().unwrap() and assert
get(0)==Some(1), get(1)==None, get(2)==Some(3). This uses
record_batch_to_dataframe, Int64Builder, Field::new, DataType::Int64 and
as_materialized_series().i64().unwrap() to locate the relevant code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@questdb-rs/src/egress/arrow/polars.rs`:
- Around line 122-210: Add a new test that verifies nullable columns round-trip
through record_batch_to_dataframe: create an Int64Builder, append 1,
append_null(), append 3, construct a schema with Field::new("i",
DataType::Int64, true), build a RecordBatch and call
record_batch_to_dataframe(rb).unwrap(), then get
df.columns()[0].as_materialized_series().i64().unwrap() and assert
get(0)==Some(1), get(1)==None, get(2)==Some(3). This uses
record_batch_to_dataframe, Int64Builder, Field::new, DataType::Int64 and
as_materialized_series().i64().unwrap() to locate the relevant code paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ae845be-9330-421d-a86a-d34e112fca46

📥 Commits

Reviewing files that changed from the base of the PR and between 832878e and 4fd1c67.

📒 Files selected for processing (8)
  • include/questdb/egress/line_reader.h
  • include/questdb/egress/line_reader.hpp
  • questdb-rs-ffi/src/egress.rs
  • questdb-rs-ffi/src/lib.rs
  • questdb-rs/src/egress/arrow/convert.rs
  • questdb-rs/src/egress/arrow/polars.rs
  • questdb-rs/src/ingress/arrow.rs
  • questdb-rs/src/ingress/polars.rs
🚧 Files skipped from review as they are similar to previous changes (6)
  • include/questdb/egress/line_reader.hpp
  • include/questdb/egress/line_reader.h
  • questdb-rs/src/egress/arrow/convert.rs
  • questdb-rs/src/ingress/polars.rs
  • questdb-rs-ffi/src/egress.rs
  • questdb-rs-ffi/src/lib.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@system_test/test.py`:
- Around line 154-161: The current check in skip_if_unsupported_qwp_ws_fixture
only skips when fixture._root_dir exists and is not 'repo', but
QuestDbExternalFixture instances created via run_with_existing() lack _root_dir
so they don't get skipped; change the conditional in
skip_if_unsupported_qwp_ws_fixture to treat a missing _root_dir as a reason to
skip as well (e.g. replace the (root_dir is not None and root_dir.name != 'repo'
and is_unsupported_qwp_ws_fixture_error(error)) check with one that evaluates to
true when (root_dir is None or root_dir.name != 'repo') and
is_unsupported_qwp_ws_fixture_error(error)), keeping the same raised
unittest.SkipTest behavior so external fixtures (QuestDbExternalFixture) are
skipped on unsupported QWP/WS errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: baa5ea7a-77d5-42a1-aeb7-9855eb88a285

📥 Commits

Reviewing files that changed from the base of the PR and between 4fd1c67 and 53c77b2.

📒 Files selected for processing (3)
  • questdb-rs/src/ingress/polars.rs
  • system_test/arrow_fuzz_common.py
  • system_test/test.py

Comment thread system_test/test.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE This PR should NOT be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant