Add Arrow Native (ADBC) Server Protocol for High-Performance Data Access

_Is your feature request related to a problem? Please describe._

  Analytics backends and data science tools increasingly demand high-performance, binary data transfer protocols. The current REST HTTP API, while flexible and widely compatible, introduces significant overhead for data-intensive workloads:

  - JSON serialization/deserialization adds latency
  - Text-based protocols are inefficient for large result sets
  - No standard binary protocol means each client must implement custom optimizations

  Modern analytics ecosystems (Python/pandas, R, Julia, Elixir/Livebook) are converging on Arrow as the standard in-memory columnar format, and ADBC (Arrow Database Connectivity) as the standard database access API. Users expect databases and semantic layers to support these standards natively.

_Describe the solution you'd like_

  Add an Arrow Native server to CubeSQL that:

  1. Speaks Arrow IPC protocol on a dedicated port (default: 8120)
  2. Returns Arrow RecordBatches directly - no JSON serialization overhead
  3. Works with this [ADBC client](https://github.com/borodark/adbc/pull/2), Python.
  5. Optional query result caching for repeated queries
  This enables 8-15x faster data transfer compared to the REST API for typical analytics workloads.

_Describe alternatives you've considered_

  1. Arrow Flight SQL - More complex protocol, requires gRPC. ADBC is simpler and sufficient for CubeSQL's use case.
  2. Optimizing REST API - JSON will always have serialization overhead. Binary protocols are fundamentally faster for columnar data.
  3. Custom binary protocol - Would require custom clients. ADBC is an emerging standard with growing ecosystem support.

_Additional context_

  The ADBC ecosystem is maturing rapidly:

  - Elixir/Livebook: The https://github.com/livebook-dev/adbc library provides ADBC bindings for the Elixir ecosystem. A working CubeSQL client extension is available at https://github.com/borodark/adbc/pull/2.
  - Real-world usage: The https://github.com/borodark/power_of_three/pull/5 library demonstrates ADBC integration with Cube, showing 8-15x performance improvements over REST in production-like scenarios.
  - Python/pandas: ADBC is becoming the recommended way to fetch data into DataFrames, replacing older approaches.

  Having options is good - especially when one option is significantly faster. Users connecting BI tools via PostgreSQL protocol still work. Users calling the REST API still work. But users who need maximum performance now have a path: ADBC on port 8120.

  Performance comparison (cached, 20K rows):
  - REST HTTP API:  2133ms
  - Arrow Native:      8ms  (266x faster)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Arrow Native (ADBC) Server Protocol for High-Performance Data Access #10296

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add Arrow Native (ADBC) Server Protocol for High-Performance Data Access #10296

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions