Add support for use assertDataFrameDataEquals with map types#473
Add support for use assertDataFrameDataEquals with map types#473iburquijo wants to merge 10 commits into
Conversation
|
Thanks for working on this @iburquijo |
|
@CodeRabbit review |
There was a problem hiding this comment.
Pull request overview
This PR adds support for comparing DataFrames containing Map types in the assertDataFrameDataEquals method by converting Maps to arrays of structs before comparison. This is necessary because MapType is not orderable in Spark.
Key Changes:
- Introduced
convertMapToArrayStructmethod that recursively converts Map types to arrays of structs using SQL expressions - Modified
assertDataFrameDataEqualsto apply map conversion before comparison - Added comprehensive tests for the new functionality including schema validation and data comparison
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
| core/src/main/2.4/scala/com/holdenkarau/spark/testing/DataFrameSuiteBase.scala | Added private convertMapToArrayStruct method and integrated it into assertDataFrameDataEquals to handle Map types by converting them to arrays of structs |
| core/src/test/2.4/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala | Added test case classes with nested Maps and two test cases: one verifying schema conversion and another validating end-to-end comparison with Maps |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
80de222 to
4e863ef
Compare
8e71da1 to
a033b69
Compare
|
@holdenk already reviewed an implemented changes propused by Copilot. Thank you and sorry for my late review. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@iburquijo I think this largely looks good, the one trick is I've done update on the main branch so we support doing DataFrame equality tests on SparkConnect, so there's some conflicts with the RDD code path. If you've got the time to update this that would be awesome, if not no worries I can take a crack at it. |
|
@holdenk sure no problem. I'll take a look and resolve the conflicts!! |
|
@holdenk already resolved conflicts!!!! |
References issue #444
This changes propuse convert Map type in dataframe schema to array of struct for having the possibility for compare two dataframes. Due to MapType is not orderable.
I added the private method convertMapToArrayStruct which is compatible with all current Spark version, using sql expressions instead of Spark API. (We could move to Spark API in versions greater than 3.0.0)
I also added two new test.
All unit test are passed in my laptop, with differents Java version and in Github Actions.