Skip to content

Add support for use assertDataFrameDataEquals with map types#473

Open
iburquijo wants to merge 10 commits into
holdenk:mainfrom
iburquijo:map_dataframe_equals
Open

Add support for use assertDataFrameDataEquals with map types#473
iburquijo wants to merge 10 commits into
holdenk:mainfrom
iburquijo:map_dataframe_equals

Conversation

@iburquijo
Copy link
Copy Markdown

References issue #444

This changes propuse convert Map type in dataframe schema to array of struct for having the possibility for compare two dataframes. Due to MapType is not orderable.

I added the private method convertMapToArrayStruct which is compatible with all current Spark version, using sql expressions instead of Spark API. (We could move to Spark API in versions greater than 3.0.0)

I also added two new test.

  • One is for see that the method converts properly the types into array of struct
  • Second one is for see the function assertDataFrameDataEquals works properly, compares the data and not raise and excepcion when map types is present into the dataframe.

All unit test are passed in my laptop, with differents Java version and in Github Actions.

@holdenk
Copy link
Copy Markdown
Owner

holdenk commented Dec 22, 2025

Thanks for working on this @iburquijo

@holdenk
Copy link
Copy Markdown
Owner

holdenk commented Dec 22, 2025

@CodeRabbit review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for comparing DataFrames containing Map types in the assertDataFrameDataEquals method by converting Maps to arrays of structs before comparison. This is necessary because MapType is not orderable in Spark.

Key Changes:

  • Introduced convertMapToArrayStruct method that recursively converts Map types to arrays of structs using SQL expressions
  • Modified assertDataFrameDataEquals to apply map conversion before comparison
  • Added comprehensive tests for the new functionality including schema validation and data comparison

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.

File Description
core/src/main/2.4/scala/com/holdenkarau/spark/testing/DataFrameSuiteBase.scala Added private convertMapToArrayStruct method and integrated it into assertDataFrameDataEquals to handle Map types by converting them to arrays of structs
core/src/test/2.4/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala Added test case classes with nested Maps and two test cases: one verifying schema conversion and another validating end-to-end comparison with Maps

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/src/test/2.4/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala Outdated
Comment thread core/src/main/2.4/scala/com/holdenkarau/spark/testing/DataFrameSuiteBase.scala Outdated
Comment thread core/src/test/2.4/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala Outdated
Comment thread core/src/main/2.4/scala/com/holdenkarau/spark/testing/DataFrameSuiteBase.scala Outdated
Comment thread core/src/test/2.4/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala Outdated
Comment thread core/src/main/2.4/scala/com/holdenkarau/spark/testing/DataFrameSuiteBase.scala Outdated
Comment thread core/src/test/2.4/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala Outdated
@iburquijo iburquijo force-pushed the map_dataframe_equals branch from 80de222 to 4e863ef Compare February 4, 2026 13:18
@iburquijo iburquijo force-pushed the map_dataframe_equals branch from 8e71da1 to a033b69 Compare February 4, 2026 13:24
@iburquijo
Copy link
Copy Markdown
Author

@holdenk already reviewed an implemented changes propused by Copilot. Thank you and sorry for my late review.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@holdenk
Copy link
Copy Markdown
Owner

holdenk commented Apr 20, 2026

@iburquijo I think this largely looks good, the one trick is I've done update on the main branch so we support doing DataFrame equality tests on SparkConnect, so there's some conflicts with the RDD code path. If you've got the time to update this that would be awesome, if not no worries I can take a crack at it.

@iburquijo
Copy link
Copy Markdown
Author

@holdenk sure no problem. I'll take a look and resolve the conflicts!!

@iburquijo
Copy link
Copy Markdown
Author

@holdenk already resolved conflicts!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants