-
Notifications
You must be signed in to change notification settings - Fork 2.5k
refactor: Replace HoodieHadoopStorage instantiation with HoodieStorageFactory #17661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Replace HoodieHadoopStorage instantiation with HoodieStorageFactory #17661
Conversation
4c50af2 to
5c00a52
Compare
| * @return {@link HoodieStorage} instance | ||
| * @throws HoodieException if unable to create the storage instance | ||
| */ | ||
| public static HoodieStorage getStorage(StorageConfiguration<?> conf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove HoodieStorageUtils and directly use HoodieStorageFactory for construction? Move all instantiation logic into HoodieStorageFactory.
|
@CTTY could you review this PR? |
|
Hi @KiteSoar @yihua , I'm having second thoughts on this change now. Initially I was thinking of something like: And then we can slowly migrate the entry point of storage from just It makes more sense to me right now that we don't add a new I still want to stop people from using the Please let me know what you think! |
Using existing |
We should revisit the places to instantiate the |
Thanks for the review @CTTY @yihua. |
5c00a52 to
6fe4b19
Compare
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CTTY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @KiteSoar , I just did another round of review. The approach looks good in general while I have some concerns about introducing a new util method that accepts Object[] params to instantiate storage classes. We should assume that a storage implementation only has constructor that takes {StoragePath.class, StorageConfiguration.class}
hudi-common/src/main/java/org/apache/hudi/storage/HoodieStorageUtils.java
Outdated
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/SchemaEvolutionContext.java
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/SchemaEvolutionContext.java
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java
Outdated
Show resolved
Hide resolved
| @@ -91,7 +93,10 @@ void testFetchNextBatchFromSource(Boolean useRowWriter, Boolean hasTransformer, | |||
| Boolean isNullTargetSchema, Boolean hasErrorTable, Boolean shouldTryWriteToErrorTable) { | |||
| //basic deltastreamer inputs | |||
| HoodieSparkEngineContext hoodieSparkEngineContext = mock(HoodieSparkEngineContext.class); | |||
| HoodieStorage storage = new HoodieHadoopStorage(mock(FileSystem.class)); | |||
| HoodieStorage storage = HoodieStorageUtils.getStorage( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we can also use HoodieStorageUtils to get storage instance in tests like this(not with the FileSystem). But I'm not sure how feasible it is. If it's quite tricky then we can just use HoodieHadoopStorage constructor for tests
...t-common/src/main/java/org/apache/hudi/index/bucket/partition/PartitionBucketIndexUtils.java
Outdated
Show resolved
Hide resolved
Hi @CTTY, |
CTTY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'm ok with using HoodieHadoopStorage in tests. Thanks for the deep dive!
We can merge after the CI turns green
CTTY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looks like there are some build issues, could you fix it?
I’ve fixed it. Thanks for your help with my first Hudi contribution! |
CTTY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is green, let's go!
Describe the issue this Pull Request addresses
This PR refactors the codebase to use HoodieStorageFactory instead of direct HoodieHadoopStorage instantiation, improving code maintainability and extensibility.
closes #17549
Summary and Changelog
Summary:
Users and developers will benefit from a more maintainable and extensible storage layer architecture. The factory pattern allows for easier switching between different storage implementations in the future.
Changelog:
new HoodieHadoopStorage()instantiations with HoodieStorageFactory.getStorage() callsImpact
Public API Changes: None - This is an internal refactoring only.
User-Facing Changes: None - No behavioral changes for end users.
Performance Impact: Negligible - The factory pattern adds minimal overhead (one additional method call).
Risk Level
Low
Mitigation:
Documentation Update
none - This is an internal code refactoring with no user-facing changes, configuration updates, or new features requiring documentation.
Contributor's checklist