Skip to content

Commit fe1dd57

Browse files
alambclaude
andauthored
deprecate: mark Statistics V2 framework (PR #14699) as deprecated (#22071)
## Which issue does this PR close? - Closes to #14896 - Related to #21120 ## Rationale for this change I think the fact that we have 2 sets of statics functions is confusing. I think we should mark the copy that is not used as deprecated. A bit over a year ago, thanks to @Fly-Style and @ozankabak, DataFusion we merged a PR with a "V2" statistics framework - #14699 The work to migrate the code to use this new framework is tracked in a follow on ticket - #14896 Sadly, no progress seems to have been made in this migration in over a year. PR #14699 was merged on 2025-02-24, ~15 months ago. Since then, the only commits touching `datafusion/expr-common/src/statistics.rs` and `datafusion/physical-expr/src/statistics/` have been mechanical — no operator or planner has been taught to call `evaluate_statistics` / `propagate_statistics` or construct a `Distribution` outside of the framework's own tests. In practice it has never been wired into the optimizer or any execution operator. Recently, thanks to @asolimando we have been starting down a different path of a more extensible system: - #21120 That issue explicitly describes the V2 distribution-based API as "significantly more complex to implement and adopt" and proposes that distribution-based estimation, if useful, be plugged in later as a custom analyzer rather than as a `PhysicalExpr` trait surface. Rather than continue carrying an unused public framework that we don't intend to build on, let's deprecate it so downstream users aren't confused ## What changes are included in this PR? This PR adds `#[deprecated(since = "54.0.0", ...)]` attributes to the public abstractions introduced in #1469 There is no behavior changes; the V2 code paths still compile and run, so any out-of-tree consumer that has already adopted them sees a deprecation warning rather than a breakage. ## Are these changes tested? No new tests; the existing tests for the deprecated items continue to pass. ## Are there any user-facing changes? The public API items listed above are now marked `#[deprecated]`. Downstream code that uses them will see a compiler warning pointing to #21120, but will continue to compile and run unchanged. The deprecated items will be removed in a future release. Partly generated 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1f2b020 commit fe1dd57

9 files changed

Lines changed: 121 additions & 1 deletion

File tree

datafusion/expr-common/src/statistics.rs

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,16 @@
1515
// specific language governing permissions and limitations
1616
// under the License.
1717

18+
//! Probabilistic distributions for expression-level statistics (unused).
19+
//!
20+
//! Note: All public items in this module are **deprecated** as of `54.0.0`.
21+
//!
22+
//! See <https://github.com/apache/datafusion/pull/22071> for details.
23+
24+
// The whole module is deprecated; suppress warnings from intra-module uses
25+
// of the deprecated types so the module continues to compile.
26+
#![allow(deprecated)]
27+
1828
use std::f64::consts::LN_2;
1929

2030
use crate::interval_arithmetic::{Interval, apply_operator};
@@ -37,6 +47,10 @@ use datafusion_common::{
3747
/// is the main unit of calculus when evaluating expressions in a statistical
3848
/// context. Notions like column and table statistics are built on top of this
3949
/// object and the operations it supports.
50+
#[deprecated(
51+
since = "54.0.0",
52+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
53+
)]
4054
#[derive(Clone, Debug, PartialEq)]
4155
pub enum Distribution {
4256
Uniform(UniformDistribution),
@@ -214,6 +228,10 @@ impl Distribution {
214228
///
215229
/// <https://en.wikipedia.org/wiki/Continuous_uniform_distribution>
216230
/// <https://en.wikipedia.org/wiki/Prior_probability#Improper_priors>
231+
#[deprecated(
232+
since = "54.0.0",
233+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
234+
)]
217235
#[derive(Clone, Debug, PartialEq)]
218236
pub struct UniformDistribution {
219237
interval: Interval,
@@ -236,6 +254,10 @@ pub struct UniformDistribution {
236254
/// For more information, see:
237255
///
238256
/// <https://en.wikipedia.org/wiki/Exponential_distribution>
257+
#[deprecated(
258+
since = "54.0.0",
259+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
260+
)]
239261
#[derive(Clone, Debug, PartialEq)]
240262
pub struct ExponentialDistribution {
241263
rate: ScalarValue,
@@ -249,6 +271,10 @@ pub struct ExponentialDistribution {
249271
/// For a more in-depth discussion, see:
250272
///
251273
/// <https://en.wikipedia.org/wiki/Normal_distribution>
274+
#[deprecated(
275+
since = "54.0.0",
276+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
277+
)]
252278
#[derive(Clone, Debug, PartialEq)]
253279
pub struct GaussianDistribution {
254280
mean: ScalarValue,
@@ -259,6 +285,10 @@ pub struct GaussianDistribution {
259285
/// the success probability is unknown. For a more in-depth discussion, see:
260286
///
261287
/// <https://en.wikipedia.org/wiki/Bernoulli_distribution>
288+
#[deprecated(
289+
since = "54.0.0",
290+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
291+
)]
262292
#[derive(Clone, Debug, PartialEq)]
263293
pub struct BernoulliDistribution {
264294
p: ScalarValue,
@@ -268,6 +298,10 @@ pub struct BernoulliDistribution {
268298
/// approximated via some summary statistics. For a more in-depth discussion, see:
269299
///
270300
/// <https://en.wikipedia.org/wiki/Summary_statistics>
301+
#[deprecated(
302+
since = "54.0.0",
303+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
304+
)]
271305
#[derive(Clone, Debug, PartialEq)]
272306
pub struct GenericDistribution {
273307
mean: ScalarValue,
@@ -594,6 +628,10 @@ impl GenericDistribution {
594628
/// This function takes a logical operator and two Bernoulli distributions,
595629
/// and it returns a new Bernoulli distribution that represents the result of
596630
/// the operation. Currently, only `AND` and `OR` operations are supported.
631+
#[deprecated(
632+
since = "54.0.0",
633+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
634+
)]
597635
pub fn combine_bernoullis(
598636
op: &Operator,
599637
left: &BernoulliDistribution,
@@ -649,6 +687,10 @@ pub fn combine_bernoullis(
649687
/// see:
650688
///
651689
/// <https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>
690+
#[deprecated(
691+
since = "54.0.0",
692+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
693+
)]
652694
pub fn combine_gaussians(
653695
op: &Operator,
654696
left: &GaussianDistribution,
@@ -673,6 +715,10 @@ pub fn combine_gaussians(
673715
/// Expects `op` to be a comparison operator, with `left` and `right` having
674716
/// numeric distributions. The resulting distribution has the `Float64` data
675717
/// type.
718+
#[deprecated(
719+
since = "54.0.0",
720+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
721+
)]
676722
pub fn create_bernoulli_from_comparison(
677723
op: &Operator,
678724
left: &Distribution,
@@ -751,6 +797,10 @@ pub fn create_bernoulli_from_comparison(
751797
/// given binary operation on two unknown quantities represented by their
752798
/// [`Distribution`] objects. The function computes the mean, median and
753799
/// variance if possible.
800+
#[deprecated(
801+
since = "54.0.0",
802+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
803+
)]
754804
pub fn new_generic_from_binary_op(
755805
op: &Operator,
756806
left: &Distribution,
@@ -766,6 +816,10 @@ pub fn new_generic_from_binary_op(
766816

767817
/// Computes the mean value for the result of the given binary operation on
768818
/// two unknown quantities represented by their [`Distribution`] objects.
819+
#[deprecated(
820+
since = "54.0.0",
821+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
822+
)]
769823
pub fn compute_mean(
770824
op: &Operator,
771825
left: &Distribution,
@@ -798,6 +852,10 @@ pub fn compute_mean(
798852
/// the median is calculable only for addition and subtraction operations on:
799853
/// - [`Uniform`] and [`Uniform`] distributions, and
800854
/// - [`Gaussian`] and [`Gaussian`] distributions.
855+
#[deprecated(
856+
since = "54.0.0",
857+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
858+
)]
801859
pub fn compute_median(
802860
op: &Operator,
803861
left: &Distribution,
@@ -835,6 +893,10 @@ pub fn compute_median(
835893

836894
/// Computes the variance value for the result of the given binary operation on
837895
/// two unknown quantities represented by their [`Distribution`] objects.
896+
#[deprecated(
897+
since = "54.0.0",
898+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
899+
)]
838900
pub fn compute_variance(
839901
op: &Operator,
840902
left: &Distribution,

datafusion/ffi/src/expr/distribution.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,13 @@
1515
// specific language governing permissions and limitations
1616
// under the License.
1717

18+
//! FFI types for the deprecated Statistics V2 [`Distribution`] framework.
19+
//!
20+
//! These FFI types mirror the deprecated probabilistic distribution types.
21+
//! See <https://github.com/apache/datafusion/pull/22071> for details.
22+
23+
#![allow(deprecated)]
24+
1825
use datafusion_common::DataFusionError;
1926
use datafusion_expr::statistics::{
2027
BernoulliDistribution, Distribution, ExponentialDistribution, GaussianDistribution,

datafusion/ffi/src/physical_expr/mod.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ use datafusion_common::{Result, ffi_datafusion_err};
3131
use datafusion_expr::ColumnarValue;
3232
use datafusion_expr::interval_arithmetic::Interval;
3333
use datafusion_expr::sort_properties::ExprProperties;
34+
#[expect(deprecated)]
3435
use datafusion_expr::statistics::Distribution;
3536
use datafusion_physical_expr::PhysicalExpr;
3637
use datafusion_physical_expr_common::physical_expr::fmt_sql;
@@ -295,6 +296,7 @@ unsafe extern "C" fn propagate_constraints_fn_wrapper(
295296
FFI_Result::Ok(result.into())
296297
}
297298

299+
#[expect(deprecated)]
298300
unsafe extern "C" fn evaluate_statistics_fn_wrapper(
299301
expr: &FFI_PhysicalExpr,
300302
children: SVec<FFI_Distribution>,
@@ -313,6 +315,7 @@ unsafe extern "C" fn evaluate_statistics_fn_wrapper(
313315
)
314316
}
315317

318+
#[expect(deprecated)]
316319
unsafe extern "C" fn propagate_statistics_fn_wrapper(
317320
expr: &FFI_PhysicalExpr,
318321
parent: FFI_Distribution,
@@ -630,6 +633,7 @@ impl PhysicalExpr for ForeignPhysicalExpr {
630633
}
631634
}
632635

636+
#[expect(deprecated)]
633637
fn evaluate_statistics(&self, children: &[&Distribution]) -> Result<Distribution> {
634638
unsafe {
635639
let children = children
@@ -643,6 +647,7 @@ impl PhysicalExpr for ForeignPhysicalExpr {
643647
}
644648
}
645649

650+
#[expect(deprecated)]
646651
fn propagate_statistics(
647652
&self,
648653
parent: &Distribution,
@@ -739,6 +744,7 @@ mod tests {
739744
use datafusion_common::tree_node::DynTreeNode;
740745
use datafusion_common::{DataFusionError, ScalarValue};
741746
use datafusion_expr::interval_arithmetic::Interval;
747+
#[expect(deprecated)]
742748
use datafusion_expr::statistics::Distribution;
743749
use datafusion_physical_expr::expressions::{Column, NegativeExpr, NotExpr};
744750
use datafusion_physical_expr_common::physical_expr::{PhysicalExpr, fmt_sql};
@@ -879,6 +885,7 @@ mod tests {
879885
}
880886

881887
#[test]
888+
#[expect(deprecated)]
882889
fn ffi_physical_expr_statistics() -> Result<(), DataFusionError> {
883890
let (negative_expr, foreign_neg) = create_test_negative_expr();
884891
let interval =

datafusion/physical-expr-common/src/physical_expr.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ use datafusion_expr_common::columnar_value::ColumnarValue;
3737
use datafusion_expr_common::interval_arithmetic::Interval;
3838
use datafusion_expr_common::placement::ExpressionPlacement;
3939
use datafusion_expr_common::sort_properties::ExprProperties;
40+
#[expect(deprecated)]
4041
use datafusion_expr_common::statistics::Distribution;
4142

4243
use itertools::izip;
@@ -250,6 +251,11 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug + DynEq + DynHash {
250251
/// statistics accordingly. The default implementation simply creates an
251252
/// unknown output distribution by combining input ranges. This logic loses
252253
/// distribution information, but is a safe default.
254+
#[deprecated(
255+
since = "54.0.0",
256+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
257+
)]
258+
#[expect(deprecated)]
253259
fn evaluate_statistics(&self, children: &[&Distribution]) -> Result<Distribution> {
254260
let children_ranges = children
255261
.iter()
@@ -298,6 +304,11 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug + DynEq + DynHash {
298304
/// default implementation simply creates an unknown distribution if it can
299305
/// narrow the range by propagating ranges. This logic loses distribution
300306
/// information, but is a safe default.
307+
#[deprecated(
308+
since = "54.0.0",
309+
note = "Part of the unused Statistics V2 framework; see https://github.com/apache/datafusion/pull/22071"
310+
)]
311+
#[expect(deprecated)]
301312
fn propagate_statistics(
302313
&self,
303314
parent: &Distribution,

datafusion/physical-expr/src/expressions/binary.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,9 @@ use datafusion_common::{Result, ScalarValue, internal_err, not_impl_err};
3636
use datafusion_expr::binary::BinaryTypeCoercer;
3737
use datafusion_expr::interval_arithmetic::{Interval, apply_operator};
3838
use datafusion_expr::sort_properties::ExprProperties;
39+
#[expect(deprecated)]
3940
use datafusion_expr::statistics::Distribution::{Bernoulli, Gaussian};
41+
#[expect(deprecated)]
4042
use datafusion_expr::statistics::{
4143
Distribution, combine_bernoullis, combine_gaussians,
4244
create_bernoulli_from_comparison, new_generic_from_binary_op,
@@ -501,6 +503,7 @@ impl PhysicalExpr for BinaryExpr {
501503
}
502504
}
503505

506+
#[expect(deprecated)]
504507
fn evaluate_statistics(&self, children: &[&Distribution]) -> Result<Distribution> {
505508
let (left, right) = (children[0], children[1]);
506509

@@ -4673,6 +4676,7 @@ mod tests {
46734676

46744677
/// Test for Uniform-Uniform, Unknown-Uniform, Uniform-Unknown and Unknown-Unknown evaluation.
46754678
#[test]
4679+
#[expect(deprecated)]
46764680
fn test_evaluate_statistics_combination_of_range_holders() -> Result<()> {
46774681
let schema = &Schema::new(vec![Field::new("a", DataType::Float64, false)]);
46784682
let a = Arc::new(Column::new("a", 0)) as _;
@@ -4740,6 +4744,7 @@ mod tests {
47404744
}
47414745

47424746
#[test]
4747+
#[expect(deprecated)]
47434748
fn test_evaluate_statistics_bernoulli() -> Result<()> {
47444749
let schema = &Schema::new(vec![
47454750
Field::new("a", DataType::Int64, false),
@@ -4775,6 +4780,7 @@ mod tests {
47754780
}
47764781

47774782
#[test]
4783+
#[expect(deprecated)]
47784784
fn test_propagate_statistics_combination_of_range_holders_arithmetic() -> Result<()> {
47794785
let schema = &Schema::new(vec![Field::new("a", DataType::Float64, false)]);
47804786
let a = Arc::new(Column::new("a", 0)) as _;
@@ -4844,6 +4850,7 @@ mod tests {
48444850
}
48454851

48464852
#[test]
4853+
#[expect(deprecated)]
48474854
fn test_propagate_statistics_combination_of_range_holders_comparison() -> Result<()> {
48484855
let schema = &Schema::new(vec![Field::new("a", DataType::Float64, false)]);
48494856
let a = Arc::new(Column::new("a", 0)) as _;

datafusion/physical-expr/src/expressions/negative.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ use arrow::{
3131
use datafusion_common::{Result, internal_err, plan_err};
3232
use datafusion_expr::interval_arithmetic::Interval;
3333
use datafusion_expr::sort_properties::ExprProperties;
34+
#[expect(deprecated)]
3435
use datafusion_expr::statistics::Distribution::{
3536
self, Bernoulli, Exponential, Gaussian, Generic, Uniform,
3637
};
@@ -134,6 +135,7 @@ impl PhysicalExpr for NegativeExpr {
134135
.map(|result| vec![result]))
135136
}
136137

138+
#[expect(deprecated)]
137139
fn evaluate_statistics(&self, children: &[&Distribution]) -> Result<Distribution> {
138140
match children[0] {
139141
Uniform(u) => Distribution::new_uniform(u.range().arithmetic_negate()?),
@@ -258,6 +260,7 @@ mod tests {
258260
}
259261

260262
#[test]
263+
#[expect(deprecated)]
261264
fn test_evaluate_statistics() -> Result<()> {
262265
let negative_expr = NegativeExpr::new(Arc::new(Column::new("a", 0)));
263266

@@ -337,6 +340,7 @@ mod tests {
337340
}
338341

339342
#[test]
343+
#[expect(deprecated)]
340344
fn test_propagate_statistics_range_holders() -> Result<()> {
341345
let negative_expr = NegativeExpr::new(Arc::new(Column::new("a", 0)));
342346
let original_child_interval = Interval::make(Some(-2), Some(3))?;

datafusion/physical-expr/src/expressions/not.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ use arrow::record_batch::RecordBatch;
2828
use datafusion_common::{Result, ScalarValue, cast::as_boolean_array, internal_err};
2929
use datafusion_expr::ColumnarValue;
3030
use datafusion_expr::interval_arithmetic::Interval;
31+
#[expect(deprecated)]
3132
use datafusion_expr::statistics::Distribution::{self, Bernoulli};
3233

3334
/// Not expression
@@ -126,6 +127,7 @@ impl PhysicalExpr for NotExpr {
126127
.map(|result| vec![result]))
127128
}
128129

130+
#[expect(deprecated)]
129131
fn evaluate_statistics(&self, children: &[&Distribution]) -> Result<Distribution> {
130132
match children[0] {
131133
Bernoulli(b) => {
@@ -141,6 +143,7 @@ impl PhysicalExpr for NotExpr {
141143
}
142144
}
143145

146+
#[expect(deprecated)]
144147
fn propagate_statistics(
145148
&self,
146149
parent: &Distribution,
@@ -253,6 +256,7 @@ mod tests {
253256
}
254257

255258
#[test]
259+
#[expect(deprecated)]
256260
fn test_evaluate_statistics() -> Result<()> {
257261
let _schema = &Schema::new(vec![Field::new("a", DataType::Boolean, false)]);
258262
let a = Arc::new(Column::new("a", 0)) as _;

datafusion/physical-expr/src/statistics/mod.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@
1515
// specific language governing permissions and limitations
1616
// under the License.
1717

18-
//! Statistics and constraint propagation library
18+
//! Statistics and constraint propagation library.
19+
//!
20+
//! All items exported from this module are **deprecated**;
21+
//! see <https://github.com/apache/datafusion/pull/22071> for details.
1922
2023
pub mod stats_solver;

0 commit comments

Comments
 (0)