Skip to content

Expose ExecutionPlan statistics across the FFI boundary#22157

Open
mailmindlin wants to merge 11 commits into
apache:mainfrom
mailmindlin:feature/ffi-statistics
Open

Expose ExecutionPlan statistics across the FFI boundary#22157
mailmindlin wants to merge 11 commits into
apache:mainfrom
mailmindlin:feature/ffi-statistics

Conversation

@mailmindlin
Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

ExecutionPlan::partition_statistics and TableProvider::statistics are not currently transported across the DataFusion FFI boundary, so foreign plans and providers always report Statistics::new_unknown / None. This blocks optimizer rules that depend on statistics (e.g. join reordering, partition pruning) from working with out-of-process plugins, which defeats the point of exposing those hooks to plugin authors.

Statistics contains Precision<ScalarValue> for column min/max/sum. ScalarValue is a large enum that's impractical to mirror in #[repr(C)], so I reuse the existing datafusion_proto_common::Statistics prost encoding — the same pattern this crate already uses for filter expressions.

What changes are included in this PR?

  • New datafusion_ffi::statistics module with [de]serialize_statistics helpers wrapping thedatafusion_proto_common::Statistics round-trip.
  • New partition_statistics field on FFI_ExecutionPlan and corresponding ExecutionPlan::partition_statistics impl on ForeignExecutionPlan
  • New statistics field on FFI_TableProvider and corresponding TableProvider::statistics impl on ForeignTableProvider. Since the trait returns Option<Statistics>, the implementation cannot propagate decode errors, it logs a log::warn! and triggers a debug_assert!.

This PR is expected to be merged after #22136 so it includes those changes.

Are these changes tested?

Yes:

  • Unit tests in statistics.rs cover three round-trip cases: Statistics::new_unknown, fully-exact statistics with ScalarValue::Int32/Int64/Utf8 min/max/sum, and mixed Precision::Exact/Inexact/Absent values.
  • A new round-trip integration test in execution_plan.rs exercises ForeignExecutionPlan::partition_statistics with both None and Some(idx) partitions, against a plan with no statistics (returns Statistics::new_unknown) and a plan with concrete statistics.
  • A new round-trip integration test in table_provider.rs uses a thin TableWithStats wrapper over MemTable to verify both the None path and the concrete Statistics path through ForeignTableProvider::statistics.

Are there any user-facing changes?

This is a breaking ABI change for the datafusion-ffi crate:

  • FFI_ExecutionPlan gains a partition_statistics field.
  • FFI_TableProvider gains a statistics field.

Plugins compiled against earlier versions of datafusion-ffi will need to be recompiled. There are no breaking changes to the Rust trait surface or to Statistics itself; downstream ExecutionPlan / TableProvider implementations require no changes.

@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates ffi Changes to the ffi crate labels May 13, 2026
@github-actions
Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-ffi v53.1.0 (current)
       Built [  65.038s] (current)
     Parsing datafusion-ffi v53.1.0 (current)
      Parsed [   0.060s] (current)
    Building datafusion-ffi v53.1.0 (baseline)
       Built [  59.992s] (baseline)
     Parsing datafusion-ffi v53.1.0 (baseline)
      Parsed [   0.056s] (baseline)
    Checking datafusion-ffi v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.259s] 222 checks: 220 pass, 1 fail, 1 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field FFI_ExecutionPlan.metrics in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:76
  field FFI_ExecutionPlan.partition_statistics in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:82

--- warning repr_c_plain_struct_fields_reordered: struct fields reordered in repr(C) struct ---

Description:
A public repr(C) struct had its fields reordered. This can change the struct's memory layout, possibly breaking FFI use cases that depend on field position and order.
        ref: https://doc.rust-lang.org/reference/type-layout.html#reprc-structs
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/repr_c_plain_struct_fields_reordered.ron

Failed in:
  FFI_ExecutionPlan.clone moved from position 7 to 9, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:89
  FFI_ExecutionPlan.release moved from position 8 to 10, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:92
  FFI_ExecutionPlan.private_data moved from position 9 to 11, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:96
  FFI_ExecutionPlan.library_marker_id moved from position 10 to 12, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:101
  FFI_TableProvider.logical_codec moved from position 6 to 7, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/table_provider.rs:145
  FFI_TableProvider.version moved from position 9 to 10, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/table_provider.rs:155
  FFI_TableProvider.library_marker_id moved from position 11 to 12, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/table_provider.rs:164

     Summary semver requires new major version: 1 major and 0 minor checks failed
     Warning produced 1 major and 0 minor level warnings
    Finished [ 127.127s] datafusion-ffi
    Building datafusion-physical-expr-common v53.1.0 (current)
       Built [  20.866s] (current)
     Parsing datafusion-physical-expr-common v53.1.0 (current)
      Parsed [   0.020s] (current)
    Building datafusion-physical-expr-common v53.1.0 (baseline)
       Built [  20.631s] (baseline)
     Parsing datafusion-physical-expr-common v53.1.0 (baseline)
      Parsed [   0.021s] (baseline)
    Checking datafusion-physical-expr-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.200s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  42.563s] datafusion-physical-expr-common

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change ffi Changes to the ffi crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose ExecutionPlan statistics across the FFI boundary

1 participant