Enhancement suggestion for assertSmallDataFrameEquality
See original GitHub issueHi:
When assertSmallDataFrameEquality
fails, the error message prints the top 5 rows of each DataSet
. This makes it extremely difficult to tell what exactly is different, and therefore it’s hard to know what to correct.
I found some code on Stack Overflow that would diff the DataSet
s and show the differences. I my case, I added some code based on this to run show
s on the differences and was able to determine what they were, although the output was tough to spot amongst all the Spark job output.
Perhaps some version of this, maybe where it could highlight in a different color within the pretty-printed DataSet
could be integrated so it’s much easier to focus in on what’s different?
Thanks, Ken
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:6 (3 by maintainers)
Top Results From Across the Web
assertSmallDataFrameEquality throwing ... - GitHub
Hi,. I am trying to use assertSmallDataFrameEquality for the test below. ... The schemas for both data frames are similar, only the order...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@MrPowers :
jitpack.io
as aresolver
worked for me. I’ve pulled in the new release and ran some tests with deliberately mismatched data to see them come back as red in the list. This definitely makes it easier to tell which row(s) to look at.Thanks, Ken
@khampson - Can you try accessing the latest release via JitPack? Here’s the code that should work:
I need to figure out how to upload this to Maven as well. I am going to drop Spark Packages support because that project has been broken for a long time and doesn’t allow users to specify spark-fast-tests as a test dependency (the
% "test"
part is important in the code snippet above!).Let me know if this works for you!