question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inequality when comparing two empty numpy arrays

See original GitHub issue

Comparison of two empty numpy arrays currently return False, which results in showing diffs where there shouldn’t be.

This is due to the way numpy compares empty arrays. Running bool(np.array([]) == np.array([])) returns False and throws this warning:

The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.

Reproduce this bug with:

df1 = pd.DataFrame({"some_col": [np.array([]) for _ in range(10)], "id": [i for i in range(10)]})
df2 = pd.DataFrame({"some_col": [np.array([]) for _ in range(10)], "id": [i for i in range(10)]})

pdcompare = datacompy.Compare(df1, df2, join_columns="id")
print(pdcompare.report())

output:

DataComPy Comparison
--------------------

DataFrame Summary
-----------------

  DataFrame  Columns  Rows
0       df1        2    10
1       df2        2    10

Column Summary
--------------

Number of columns in common: 2
Number of columns in df1 but not in df2: 0
Number of columns in df2 but not in df1: 0

Row Summary
-----------

Matched on: id
Any duplicates on match values: No
Absolute Tolerance: 0
Relative Tolerance: 0
Number of rows in common: 10
Number of rows in df1 but not in df2: 0
Number of rows in df2 but not in df1: 0

Number of rows with some compared columns unequal: 10
Number of rows with all compared columns equal: 0

Column Comparison
-----------------

Number of columns compared with some values unequal: 1
Number of columns compared with all values equal: 1
Total number of values which compare unequal: 10

Columns with Unequal Values or Types
------------------------------------

     Column df1 dtype df2 dtype  # Unequal  Max Diff  # Null Diff
0  some_col    object    object         10         0            0

Sample Rows with Unequal Values
-------------------------------

   id some_col (df1) some_col (df2)
9   9             []             []
0   0             []             []
3   3             []             []
7   7             []             []
5   5             []             []
1   1             []             []
4   4             []             []
2   2             []             []
8   8             []             []
6   6             []             []

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:26 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
fdosanicommented, Apr 13, 2020

Hey @simonwongwong hope all is well! I’ll take a closer look at this sometime this week. Thanks for bringing this up and opening up the issue.

0reactions
fdosanicommented, Sep 20, 2021

That is perfectly fine, that is something I can lean into.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Comparing two NumPy arrays for equality, element-wise
For some reason, the comparison A==B returns an empty array, for which the all operator returns True .
Read more >
NumPy How to Compare Two Arrays - codingem.com
To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if...
Read more >
numpy.array_equal — NumPy v1.24 Manual
Whether to compare NaN's as equal. ... Returns True if the arrays are equal. ... Returns True if two arrays are element-wise equal...
Read more >
Numpy Guide for People In a Hurry - Towards Data Science
Unlike a list, you are not able to create an empty Numpy array. ... comparison operators to compare two arrays such as ==...
Read more >
Look Ma, No For-Loops: Array Programming With NumPy
How do these two equivalent functions compare in terms of performance? In this particular case, the vectorized NumPy call wins out by a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found