Skip to content

[All] Arrow ingestion can return retryable errors while retaining batches for recovery #236

@danilonajkov-db

Description

@danilonajkov-db

SDK

All

SDK Version

newest

Operating System

No response

Description

When Arrow Flight ingestion receives a retryable server error such as RESOURCE_EXHAUSTED, recovery starts, but new ingest_batch() calls can return an error during the reconnect window.

This is ambiguous because ingest_batch() adds the batch to pending_batches before checking whether the sender is available. If batch_tx is temporarily None, the caller receives an error, but the batch may still be retained and replayed during recovery. The caller cannot know whether it should retry the batch, which risks duplicate ingestion.

Reproduction

Create an Arrow Flight stream with recovery enabled and a small recovery_backoff_ms / recovery_timeout_ms so the recovery window is easy to hit.

Configure the server or mock Flight server to return a retryable error, for example gRPC RESOURCE_EXHAUSTED, after receiving a batch.

Call ingest_batch(batch1) to trigger the server error.

While recovery is in progress, keep calling ingest_batch(batchN) from the client.

Observe that some ingest_batch() calls return an error instead of blocking or returning an offset.

Also note that the failed ingest_batch() call may have already inserted the batch into Arrow pending_batches before returning the error, so recovery may replay a batch the caller believes was rejected.

Expected behavior

ingest_batch() should have clear enqueue semantics during retryable recovery, like proto. Accept and retain the batch, return an offset, and let recovery replay it.

Proto ingestion has an explicit pause path for graceful stream closure, but Arrow currently has no equivalent pause/recovery state, so retryable Flight errors can surface directly to callers during recovery.

Is it a regression?

No

Logs

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions