Releases: databricks/zerobus-sdk-py
Release v0.3.0
Release v0.3.0
Major Changes
- Rust-Backed Implementation: Complete rewrite of the Python SDK as a thin wrapper around the Databricks Zerobus Rust SDK
- All core logic (gRPC, authentication, recovery, stream management) now handled by native Rust code
- Python bindings built using PyO3 and maturin
- Significant performance improvements: 2-5x throughput, lower latency, reduced memory footprint
- Single source of truth: Python SDK automatically inherits all Rust SDK improvements
- Architecture: Native Rust core with PyO3 bindings and full type stubs (
_zerobus_core.pyi) - Build System: Migrated from setuptools to maturin for Rust/Python integration
- Benefits: Native performance, Rust's memory safety guarantees, easier maintenance, consistent behavior across all SDK languages
New Features and Improvements
- Configurable Logging: Added support for
RUST_LOGenvironment variable to control log levels- Users can now set
RUST_LOG=debugorRUST_LOG=tracefor detailed diagnostics - Default level is
infowhen not specified - Supports granular control:
RUST_LOG=zerobus_sdk=trace,tokio=info
- Users can now set
- Flexible Record Serialization:
ingest_record()now accepts multiple input types, giving clients control over serialization:- JSON mode: Accepts both
dict(SDK serializes) andstr(pre-serialized JSON string) - Protobuf mode: Accepts both
Messageobjects (SDK serializes) andbytes(pre-serialized) - This allows clients to optimize serialization separately or use custom serialization logic while maintaining backward compatibility
- JSON mode: Accepts both
Bug Fixes
Documentation
- Updated README with new Delta type mappings (TIMESTAMP_NTZ, VARIANT)
- Updated
ingest_record()API documentation to show all accepted record types - Added inline examples demonstrating both serialization approaches (SDK-controlled vs. client-controlled)
- Updated examples README with clear explanations of serialization options
Internal Changes
-
Implemented
get_unacked_records()andget_unacked_batches(): Return actual unacknowledged records/batches (as bytes) for recovery and monitoringget_unacked_records()returnsList[bytes]of unacknowledged record payloadsget_unacked_batches()returnsList[List[bytes]]where each batch contains record payloads- Available in both sync and async APIs
- Useful for implementing custom retry logic or monitoring stream health
-
Added
env-filterfeature totracing-subscriberdependency forRUST_LOGsupport -
generate_proto tool: Added support for TIMESTAMP_NTZ and VARIANT data types
- TIMESTAMP_NTZ maps to int64 (timestamp without timezone, microseconds since epoch)
- VARIANT maps to string (unshredded, JSON string format)
-
generate_proto tool: Added comprehensive unit tests for all pure functions (84 tests covering type parsing, type mapping, field validation, and proto file generation)
-
Enhanced
ingest_record()type validation to accept wider range of input types -
Added test coverage for both high-level objects (dict/Message) and pre-serialized data (str/bytes)
Breaking Changes
-
BREAKING: Host endpoints now require
https://scheme- Impact:
SERVER_ENDPOINTandUNITY_CATALOG_ENDPOINTmust includehttps://prefix - Migration: Update endpoint URLs to include
https:// - Old:
SERVER_ENDPOINT = "your-shard-id.zerobus.region.cloud.databricks.com" - New:
SERVER_ENDPOINT = "https://your-shard-id.zerobus.region.cloud.databricks.com"
- Impact:
-
BREAKING: Removed
create_stream_with_headers_provider()method- Migration: Use
create_stream()with theheaders_providerparameter instead - Old:
sdk.create_stream_with_headers_provider(custom_provider, table_properties, options) - New:
sdk.create_stream(client_id, client_secret, table_properties, options, headers_provider=custom_provider)
- Migration: Use
-
BREAKING: Removed
StreamStateenum- Reason: Internal state management now handled by Rust SDK
- Impact:
get_state()method no longer returns a meaningful state enum - Migration: Not typically used in primary workflows; remove any code that depends on
StreamState
-
Changed:
get_unacked_records()implementation (backward compatible)- Old: Returned
Iteratorthat yielded record payloads from the Python wrapper's internal queue - New: Returns
Iterator[bytes]that yields unacknowledged record payloads directly from the Rust SDK - Migration: No migration needed - iteration pattern remains the same:
for record in stream.get_unacked_records(): - Benefit: Direct access to Rust SDK's unacked records; more accurate representation of what hasn't been acknowledged by the server
- Note: Still returns an iterator for backward compatibility and memory efficiency
- Old: Returned
-
BREAKING: Changed
ack_callbacksignature inStreamConfigurationOptions- Old: Callback received detailed acknowledgment response object
- New: Callback receives single
offset: intparameter - Migration: Update callback signature from
def on_ack(self, response)todef on_ack(self, offset: int) - Impact: Simplified API; offset is the primary acknowledgment information needed
Deprecations
- DEPRECATED:
ingest_record()method (both sync and async)- Reason: Offers significantly lower throughput compared to
ingest_record_offset()andingest_record_nowait() - Migration:
- For sync API: Use
ingest_record_offset()for offset tracking oringest_record_nowait()for maximum throughput - For async API: Use
ingest_record_offset()with batchedasyncio.gather()pattern oringest_record_nowait()for maximum throughput
- For sync API: Use
- Performance Impact: New methods are 2-40x faster depending on record size
- Note: Method remains available for backward compatibility but will be removed in a future major version
- Reason: Offers significantly lower throughput compared to
API Changes
- Added optional
headers_providerparameter tocreate_stream()methods- Defaults to internal OAuth 2.0 Client Credentials authentication when not provided
- Widened
ingest_record()type signature to accept:- JSON mode:
Union[dict, str](previouslystronly) - Protobuf mode:
Union[Message, bytes](previouslyMessageonly)
- JSON mode:
- All changes except removal of
create_stream_with_headers_provider()are backward compatible
Release v0.2.0
Minor release of the Databricks Zerobus SDK for Python.
Release v0.1.0
Initial release of the Databricks Zerobus Ingest SDK for Python.