Skip to content

Conversation

@cshuo
Copy link
Collaborator

@cshuo cshuo commented Dec 23, 2025

…mns without column stats

Describe the issue this Pull Request addresses

Improving data skipping with column stats when there exist columns without column stats, fixes #17598

Summary and Changelog

  1. Build ColumnStatsProbe with filtered predicates that contain columns with column stats index.
  2. Add HoodieFlinkIndexClient to support persist column stats index definition for flink write client.

Impact

Improve batch query performance for flink reader when column stats is enabled.

Risk Level

low.

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@cshuo cshuo force-pushed the improve_columns_stats_prune branch from bada983 to 0bde0f9 Compare December 23, 2025 12:55
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Dec 23, 2025
@cshuo cshuo force-pushed the improve_columns_stats_prune branch from 0bde0f9 to 2ff626d Compare December 24, 2025 03:09
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

}
List<String> indexedCols = HoodieTableMetadataUtil.getValidIndexedColumns(indexDefinition, tableSchema, metaClient.getTableConfig());
return expressions.stream().filter(expr -> {
String[] refs = referencedColumns(Collections.singletonList(expr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column name is case-sensitive right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, column name is case-sensitive for flink default sql dialect, and there is no config to change the behavior now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improving ColumnStatsProbe when filter include columns without column stats index

3 participants