Skip to content

s3Cluster with use_hive_partitioning does not prune hive partitions (SIZES_OF_COLUMNS_DOESNT_MATCH / read amplification) #1855

@CarlosFelipeOR

Description

@CarlosFelipeOR

Type of problem

Bug report - something's broken

Describe the situation

s3Cluster with use_hive_partitioning=1 does not prune hive partitions when a WHERE filter is applied on a partition column that exists only in the object path. The plain s3() function prunes correctly (reads 1 partition), but s3Cluster():

  • Crashes with Code: 9 SIZES_OF_COLUMNS_DOESNT_MATCH when the partition column exists only in the path.
  • Reads all partitions (no pruning, read amplification) when the partition column also exists in the file content, filtering only after reading.

Discovered via the /s3/minio/part 3/hive partitioning regression scenarios (hive_partitioning_cluster_*), whose timing assertion fails due to the read amplification.

This issue:

  • Was failing on 25.8.22.20001.altinityantalya and is now also failing on 26.3.10.20001.altinityantalya
  • Is NOT analyzer-dependent (reproduced with allow_experimental_analyzer 0 and 1)
  • Is environment-independent (AWS m8g ARM and Hetzner ARM both read all partitions)

How to reproduce the behavior

Environment

  • Version: 26.3.10.20001.altinityantalya (also 25.8.22.20001.altinityantalya)
  • Cluster: 3 shards, S3/MinIO backend

Steps

  1. Create 25 hive partitions where the CSV content has only d UInt64; the date column
    exists only in the path:
-- repeat for X = 01..25
INSERT INTO FUNCTION s3('http://minio:9000/bucket/date=2000-01-<X>/data.csv',
  'user','pass','CSVWithNames','d UInt64')
SELECT number FROM numbers(1000000) SETTINGS s3_truncate_on_insert=1;
  1. s3() prunes correctly:
SELECT count(*) FROM s3('http://minio:9000/bucket/date=2000-01-*/data.csv',
  'user','pass','CSVWithNames','d UInt64')
WHERE date='2000-01-01' SETTINGS use_hive_partitioning=1;
-- => 1000000  (reads only the date=2000-01-01 object)
  1. s3Cluster() without WHERE works:
SELECT count(*) FROM s3Cluster('cluster','http://minio:9000/bucket/date=2000-01-*/data.csv',
  'user','pass','CSVWithNames','d UInt64') SETTINGS use_hive_partitioning=1;
-- => 25000000
  1. s3Cluster() with WHERE on the hive partition column crashes:
SELECT count(*) FROM s3Cluster('cluster','http://minio:9000/bucket/date=2000-01-*/data.csv',
  'user','pass','CSVWithNames','d UInt64')
WHERE date='2000-01-01' SETTINGS use_hive_partitioning=1;

Expected behavior

Returns 1000000, reading only the date=2000-01-01 partition (same as s3()).

Actual behavior

Code: 9. DB::Exception: Size of filter (25) doesn't match size of column (0): While executing Remote.

When the partition column also exists in the file content, no crash occurs, but the query reads all 25 partitions (25,000,000 rows) instead of pruning to 1,000,000.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions