Skip to content

Hudi 1.1 MDT col-stats generation is failing for array and map types. #773

@vinishjail97

Description

@vinishjail97

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

There are failures when MDT col-stats are enabled for tables having array/map types in the schema for hudi 1.1

org.apache.hudi.exception.HoodieException: Failed to generate column stats records for metadata table

	at org.apache.hudi.metadata.HoodieTableMetadataUtil.convertMetadataToColumnStatsRecords(HoodieTableMetadataUtil.java:1594)
	at org.apache.hudi.metadata.HoodieMetadataWriteUtils.convertMetadataToRecords(HoodieMetadataWriteUtils.java:387)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter$BatchMetadataConversionFunction.convertMetadata(HoodieBackedTableMetadataWriter.java:1460)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:1165)
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1399)
	at org.apache.hudi.client.BaseHoodieClient.writeTableMetadata(BaseHoodieClient.java:285)
	at org.apache.hudi.client.BaseHoodieWriteClient.writeToMetadataTable(BaseHoodieWriteClient.java:339)
	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:320)
	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:276)
	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:244)
	at org.apache.hudi.client.HoodieJavaWriteClient.commit(HoodieJavaWriteClient.java:97)
	at org.apache.hudi.client.HoodieJavaWriteClient.commit(HoodieJavaWriteClient.java:52)
	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:226)
	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:221)
	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:211)
	at org.apache.xtable.TestJavaHudiTable.insertRecordsWithCommitAlreadyStarted(TestJavaHudiTable.java:198)
	at org.apache.xtable.TestAbstractHudiTable.insertRecords(TestAbstractHudiTable.java:272)
	at org.apache.xtable.hudi.TestHudiFileStatsExtractor.columnStatsWithMetadataTable(TestHudiFileStatsExtractor.java:145)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: org.apache.avro.AvroRuntimeException: Not a record: {"type":"map","values":{"type":"record","name":"Nested","namespace":"test.nested_record","fields":[{"name":"nested_int","type":"int","default":0}]}}
	at org.apache.avro.Schema.getField(Schema.java:275)
	at org.apache.hudi.avro.HoodieAvroUtils.getSchemaForField(HoodieAvroUtils.java:1652)
	at org.apache.hudi.avro.HoodieAvroUtils.getSchemaForField(HoodieAvroUtils.java:1656)
	at org.apache.hudi.avro.HoodieAvroUtils.getSchemaForField(HoodieAvroUtils.java:1642)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getColumnsToIndexWithoutRequiredMetaFields$48(HoodieTableMetadataUtil.java:1696)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getColumnsToIndexWithoutRequiredMetaFields(HoodieTableMetadataUtil.java:1698)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getColumnsToIndex(HoodieTableMetadataUtil.java:1655)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getColumnsToIndex(HoodieTableMetadataUtil.java:1615)
	at org.apache.hudi.metadata.HoodieTableMetadataUtil.convertMetadataToColumnStatsRecords(HoodieTableMetadataUtil.java:1583)
	... 24 more

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions