Skip to content
This repository was archived by the owner on Mar 6, 2026. It is now read-only.

Releases: databricks/zerobus-sdk-py

Release v0.3.0

19 Feb 09:40
8955b7f

Choose a tag to compare

Release v0.3.0

Major Changes

  • Rust-Backed Implementation: Complete rewrite of the Python SDK as a thin wrapper around the Databricks Zerobus Rust SDK
    • All core logic (gRPC, authentication, recovery, stream management) now handled by native Rust code
    • Python bindings built using PyO3 and maturin
    • Significant performance improvements: 2-5x throughput, lower latency, reduced memory footprint
    • Single source of truth: Python SDK automatically inherits all Rust SDK improvements
    • Architecture: Native Rust core with PyO3 bindings and full type stubs (_zerobus_core.pyi)
    • Build System: Migrated from setuptools to maturin for Rust/Python integration
    • Benefits: Native performance, Rust's memory safety guarantees, easier maintenance, consistent behavior across all SDK languages

New Features and Improvements

  • Configurable Logging: Added support for RUST_LOG environment variable to control log levels
    • Users can now set RUST_LOG=debug or RUST_LOG=trace for detailed diagnostics
    • Default level is info when not specified
    • Supports granular control: RUST_LOG=zerobus_sdk=trace,tokio=info
  • Flexible Record Serialization: ingest_record() now accepts multiple input types, giving clients control over serialization:
    • JSON mode: Accepts both dict (SDK serializes) and str (pre-serialized JSON string)
    • Protobuf mode: Accepts both Message objects (SDK serializes) and bytes (pre-serialized)
    • This allows clients to optimize serialization separately or use custom serialization logic while maintaining backward compatibility

Bug Fixes

Documentation

  • Updated README with new Delta type mappings (TIMESTAMP_NTZ, VARIANT)
  • Updated ingest_record() API documentation to show all accepted record types
  • Added inline examples demonstrating both serialization approaches (SDK-controlled vs. client-controlled)
  • Updated examples README with clear explanations of serialization options

Internal Changes

  • Implemented get_unacked_records() and get_unacked_batches(): Return actual unacknowledged records/batches (as bytes) for recovery and monitoring

    • get_unacked_records() returns List[bytes] of unacknowledged record payloads
    • get_unacked_batches() returns List[List[bytes]] where each batch contains record payloads
    • Available in both sync and async APIs
    • Useful for implementing custom retry logic or monitoring stream health
  • Added env-filter feature to tracing-subscriber dependency for RUST_LOG support

  • generate_proto tool: Added support for TIMESTAMP_NTZ and VARIANT data types

    • TIMESTAMP_NTZ maps to int64 (timestamp without timezone, microseconds since epoch)
    • VARIANT maps to string (unshredded, JSON string format)
  • generate_proto tool: Added comprehensive unit tests for all pure functions (84 tests covering type parsing, type mapping, field validation, and proto file generation)

  • Enhanced ingest_record() type validation to accept wider range of input types

  • Added test coverage for both high-level objects (dict/Message) and pre-serialized data (str/bytes)

Breaking Changes

  • BREAKING: Host endpoints now require https:// scheme

    • Impact: SERVER_ENDPOINT and UNITY_CATALOG_ENDPOINT must include https:// prefix
    • Migration: Update endpoint URLs to include https://
    • Old: SERVER_ENDPOINT = "your-shard-id.zerobus.region.cloud.databricks.com"
    • New: SERVER_ENDPOINT = "https://your-shard-id.zerobus.region.cloud.databricks.com"
  • BREAKING: Removed create_stream_with_headers_provider() method

    • Migration: Use create_stream() with the headers_provider parameter instead
    • Old: sdk.create_stream_with_headers_provider(custom_provider, table_properties, options)
    • New: sdk.create_stream(client_id, client_secret, table_properties, options, headers_provider=custom_provider)
  • BREAKING: Removed StreamState enum

    • Reason: Internal state management now handled by Rust SDK
    • Impact: get_state() method no longer returns a meaningful state enum
    • Migration: Not typically used in primary workflows; remove any code that depends on StreamState
  • Changed: get_unacked_records() implementation (backward compatible)

    • Old: Returned Iterator that yielded record payloads from the Python wrapper's internal queue
    • New: Returns Iterator[bytes] that yields unacknowledged record payloads directly from the Rust SDK
    • Migration: No migration needed - iteration pattern remains the same: for record in stream.get_unacked_records():
    • Benefit: Direct access to Rust SDK's unacked records; more accurate representation of what hasn't been acknowledged by the server
    • Note: Still returns an iterator for backward compatibility and memory efficiency
  • BREAKING: Changed ack_callback signature in StreamConfigurationOptions

    • Old: Callback received detailed acknowledgment response object
    • New: Callback receives single offset: int parameter
    • Migration: Update callback signature from def on_ack(self, response) to def on_ack(self, offset: int)
    • Impact: Simplified API; offset is the primary acknowledgment information needed

Deprecations

  • DEPRECATED: ingest_record() method (both sync and async)
    • Reason: Offers significantly lower throughput compared to ingest_record_offset() and ingest_record_nowait()
    • Migration:
      • For sync API: Use ingest_record_offset() for offset tracking or ingest_record_nowait() for maximum throughput
      • For async API: Use ingest_record_offset() with batched asyncio.gather() pattern or ingest_record_nowait() for maximum throughput
    • Performance Impact: New methods are 2-40x faster depending on record size
    • Note: Method remains available for backward compatibility but will be removed in a future major version

API Changes

  • Added optional headers_provider parameter to create_stream() methods
    • Defaults to internal OAuth 2.0 Client Credentials authentication when not provided
  • Widened ingest_record() type signature to accept:
    • JSON mode: Union[dict, str] (previously str only)
    • Protobuf mode: Union[Message, bytes] (previously Message only)
  • All changes except removal of create_stream_with_headers_provider() are backward compatible

Release v0.2.0

05 Nov 15:58
123ad79

Choose a tag to compare

Minor release of the Databricks Zerobus SDK for Python.

Release v0.1.0

21 Oct 11:32
9b0adc0

Choose a tag to compare

Initial release of the Databricks Zerobus Ingest SDK for Python.