Is your feature request related to a problem? Please describe.
Analytics backends and data science tools increasingly demand high-performance, binary data transfer protocols. The current REST HTTP API, while flexible and widely compatible, introduces significant overhead for data-intensive workloads:
- JSON serialization/deserialization adds latency
- Text-based protocols are inefficient for large result sets
- No standard binary protocol means each client must implement custom optimizations
Modern analytics ecosystems (Python/pandas, R, Julia, Elixir/Livebook) are converging on Arrow as the standard in-memory columnar format, and ADBC (Arrow Database Connectivity) as the standard database access API. Users expect databases and semantic layers to support these standards natively.
Describe the solution you'd like
Add an Arrow Native server to CubeSQL that:
- Speaks Arrow IPC protocol on a dedicated port (default: 8120)
- Returns Arrow RecordBatches directly - no JSON serialization overhead
- Works with this ADBC client, Python.
- Optional query result caching for repeated queries
This enables 8-15x faster data transfer compared to the REST API for typical analytics workloads.
Describe alternatives you've considered
- Arrow Flight SQL - More complex protocol, requires gRPC. ADBC is simpler and sufficient for CubeSQL's use case.
- Optimizing REST API - JSON will always have serialization overhead. Binary protocols are fundamentally faster for columnar data.
- Custom binary protocol - Would require custom clients. ADBC is an emerging standard with growing ecosystem support.
Additional context
The ADBC ecosystem is maturing rapidly:
Having options is good - especially when one option is significantly faster. Users connecting BI tools via PostgreSQL protocol still work. Users calling the REST API still work. But users who need maximum performance now have a path: ADBC on port 8120.
Performance comparison (cached, 20K rows):
- REST HTTP API: 2133ms
- Arrow Native: 8ms (266x faster)
Is your feature request related to a problem? Please describe.
Analytics backends and data science tools increasingly demand high-performance, binary data transfer protocols. The current REST HTTP API, while flexible and widely compatible, introduces significant overhead for data-intensive workloads:
Modern analytics ecosystems (Python/pandas, R, Julia, Elixir/Livebook) are converging on Arrow as the standard in-memory columnar format, and ADBC (Arrow Database Connectivity) as the standard database access API. Users expect databases and semantic layers to support these standards natively.
Describe the solution you'd like
Add an Arrow Native server to CubeSQL that:
This enables 8-15x faster data transfer compared to the REST API for typical analytics workloads.
Describe alternatives you've considered
Additional context
The ADBC ecosystem is maturing rapidly:
Having options is good - especially when one option is significantly faster. Users connecting BI tools via PostgreSQL protocol still work. Users calling the REST API still work. But users who need maximum performance now have a path: ADBC on port 8120.
Performance comparison (cached, 20K rows):