Skip to content

Hudi 1.1 and ICEBERG nested partitioned filter data validation fails.Β #775

@vinishjail97

Description

@vinishjail97

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

➜  test_table_4da7fea6_f8d5_4571_bc53_1cfecdabebfb_v1 ls -ltr
total 0
drwxr-xr-x@  8 vinishreddy  staff  256 22 Dec 17:36 WARN
drwxr-xr-x@  8 vinishreddy  staff  256 22 Dec 17:36 INFO
drwxr-xr-x@  8 vinishreddy  staff  256 22 Dec 17:36 ERROR
drwxr-xr-x@  8 vinishreddy  staff  256 22 Dec 17:36 __HIVE_DEFAULT_PARTITION__
drwxr-xr-x@ 20 vinishreddy  staff  640 22 Dec 17:36 metadata

...

avro-tools tojson .hoodie/timeline/20251223013640337_20251223013644463.commit | jq -r '.partitionToWriteStats.map[][].path.string'

25/12/22 17:42:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
__HIVE_DEFAULT_PARTITION__/ffb2f34c-47b9-4579-9698-c43fdc33dc68-3_0-36-40_20251223013640337.parquet
ERROR/ffb2f34c-47b9-4579-9698-c43fdc33dc68-2_0-36-40_20251223013640337.parquet
INFO/ffb2f34c-47b9-4579-9698-c43fdc33dc68-1_0-36-40_20251223013640337.parquet
WARN/ffb2f34c-47b9-4579-9698-c43fdc33dc68-0_0-36-40_20251223013640337.parquet

The expected rows of the table with filter level=INFO with source format as HUDI do not match with the actual rows from from ICEBERG.

Expected with filter of level=INFO. 
org.opentest4j.AssertionFailedError: Datasets are not equivalent when reading from Spark. Source: HUDI, Target: ICEBERG ==> 
Expected :[{"key":"073d4b89-b136-4764-ab85-f076e1bac6ec","ts":1766453800328,"level":"INFO","double_field":0.8905820524458916,"float_field":0.28693622,"int_field":-721922711,"long_field":-1510732180141577976,"boolean_field":true,"string_field":"PRNXgshrMN","byt ...

Actual   :[{"key":"073d4b89-b136-4764-ab85-f076e1bac6ec","ts":1766453800328,"level":"WARN","double_field":0.8905820524458916,"float_field":0.28693622,"int_field":-721922711,"long_field":-1510732180141577976,"boolean_field":true,"string_field":"PRNXgshrMN","byt ...

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions