question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API: how strict should the equals() method be?

See original GitHub issue

While adding an equals() method to ExtensionArray (https://github.com/pandas-dev/pandas/issues/27081, https://github.com/pandas-dev/pandas/pull/30652), some questions come up about how strict the method should be:

  1. Other array-likes are not equivalent, even if they are all equal?
  2. But subclasses are equivalent, when they are all equal?
  3. Objects with different dtype are not equivalent (eg int8 vs int16), even if all values are equal?

And it seems that right now, we are somewhat inconsistent with this in pandas regarding being strict on the data type.

Series is strict about the dtype, while Index is not:

>>> pd.Series([1, 2, 3], dtype="int64").equals(pd.Series([1, 2, 3], dtype="int32"))
False

>>> pd.Index([1, 2, 3], dtype="int64").equals(pd.Index([1, 2, 3], dtype="int32"))
True

For Index, this not only gives True for different integer dtypes as above, but also for float/int, object/int (both examples give False for Series):

>>> pd.Index([1, 2, 3], dtype="int64").equals(pd.Index([1, 2, 3], dtype="object"))
True

>>> pd.Index([1, 2, 3], dtype="int64").equals(pd.Index([1, 2, 3], dtype="float64"))
True

Index and Series are consistent when it comes to not being equal with other array-likes:

# all those cases return False
pd.Series([1, 2, 3]).equals(np.array([1, 2, 3])) 
pd.Index([1, 2, 3]).equals(np.array([1, 2, 3])) 
pd.Series([1, 2, 3]).equals(pd.Index([1, 2, 3])) 
pd.Index([1, 2, 3]).equals(pd.Series([1, 2, 3]))
pd.Series([1, 2, 3]).equals([1, 2, 3])
pd.Index([1, 2, 3]).equals([1, 2, 3])

Both Index and Series also seem to allow subclasses:

class MySeries(pd.Series): 
    pass 

>>> pd.Series([1, 2, 3]).equals(MySeries([1, 2, 3]))
True

So in the end, I think the main discussion point is: should the dtype be exactly the same, or should only the values be equal?

For DataFrame, it shares the implementation with Series so follows that behaviour (except that for DataFrame there are some additional rules about how column names need to compare equal).

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
attack68commented, Sep 2, 2020

Just to add to this from #36065 the testing mod equals also behave inconsistently:

import pandas as pd
import pandas._testing as tm

i1 = pd.date_range("2008-01-01", periods=1000, freq="12H")
i2 = pd.date_range("2008-01-01", periods=1000, freq="12H")
i2.freq = None

i1.equals(i2)  # True
tm.assert_index_equal(i1, i2)  # True

df = pd.DataFrame({'dates': i1})
df2 = pd.DataFrame({'dates': i2})

df.equals(df2)  # True
tm.assert_frame_equal(df, df2)  # True

df.index = i1
df2.index = i2

df.equals(df2)  #True
tm.assert_frame_equal(df, df2)  # FAILS lidx.freq != ridx.freq

The reason being that for assert_series_equals after checking the index it checks the frequency of the index. A very easy change would be to add the freq check to the assert_index_equals with argument and remove it from the assert_series_equals.

1reaction
dsaxtoncommented, May 3, 2020

It feels to me like when calling equals on some container you’d generally be interested in equality of the entire container not just the values, otherwise you’d use == (maybe modulo some differences around missing values). You might argue that even this behavior is questionable:

[ins] In [11]: pd.Series([1, 2, 3], name="abc").equals(pd.Series([1, 2, 3], name="xyz"))
Out[11]: True
Read more comments on GitHub >

github_iconTop Results From Across the Web

Overriding the equals method vs creating a new method
Overriding the equals method is necessary if you want to test equivalence in standard library classes (for example, ensuring a java.util.
Read more >
CSSNumericValue.equals() - Web APIs - MDN Web Docs
The equals() method of the CSSNumericValue interface returns a boolean indicating whether the passed value are strictly equal.
Read more >
Object (Java Platform SE 8 ) - Oracle Help Center
The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x ......
Read more >
MET08-J. Preserve the equality contract when overriding the ...
This noncompliant code example defines a CaseInsensitiveString class that includes a String and overrides the equals() method. The CaseInsensitiveString class ...
Read more >
@Data classes: problems with equals and hashCode()
if y.equals(x) returns true." So if I have an immutable class, it should be final because of the equals(Object) and hashCode() methods. Do...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found