Skip to content

Conversation

@kamronis
Copy link
Contributor

Describe the issue this Pull Request addresses

This closes #17679
Move optimized method for shallow copy with new schema to utils and use it where applicable.

Summary and Changelog

Searched HoodieAvroUtils.rewriteRecordWithNewSchema calls and found one class with dead code and one class where function can be applied for prepending metadata.

Impact

Performance improve

Risk Level

None

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Dec 23, 2025
@kamronis
Copy link
Contributor Author

@hudi-bot run azure

1 similar comment
@wombatu-kun
Copy link
Contributor

@hudi-bot run azure

@kamronis
Copy link
Contributor Author

@danny0405 Please take a look

} else {
GenericData.Record rec = new GenericData.Record(targetSchema);
for (Schema.Field field : targetSchema.getFields()) {
if (record.hasField(field.name())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record.hasField and record.get will both search field in the internal schema, suggested changes.

Field sourceField = record.getSchema().getField(field.name());
if (sourceField == null) {
    rec.put(field.pos(), null);
} else {
    rec.put(field.pos(), record.get(sourceField.pos()));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@kamronis kamronis force-pushed the master branch 2 times, most recently from 82f4c1c to 7a08b71 Compare December 26, 2025 03:36
@kamronis
Copy link
Contributor Author

@danny0405 @cshuo
Please take a look. CI keeps failing but locally everything is ok for me

* the schemas are identical in field count.
*/

public static GenericRecord projectRecordToNewSchemaShallow(GenericRecord record, Schema targetSchema) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's safer to keep the parameter type as IndexedRecord like rewriteRecordWithNewSchema, avoiding unnecessary type coercion in HoodieAvroRecord.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@cshuo
Copy link
Collaborator

cshuo commented Dec 29, 2025

thks for contributing, lgtm.

@wombatu-kun wombatu-kun merged commit d640774 into apache:master Dec 30, 2025
69 of 72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check all the usages of HoodieAvroUtils.rewriteRecordWithNewSchema to make it performant

4 participants