API: how strict should the equals() method be?
See original GitHub issueWhile adding an equals()
method to ExtensionArray (https://github.com/pandas-dev/pandas/issues/27081, https://github.com/pandas-dev/pandas/pull/30652), some questions come up about how strict the method should be:
- Other array-likes are not equivalent, even if they are all equal?
- But subclasses are equivalent, when they are all equal?
- Objects with different dtype are not equivalent (eg int8 vs int16), even if all values are equal?
And it seems that right now, we are somewhat inconsistent with this in pandas regarding being strict on the data type.
Series is strict about the dtype, while Index is not:
>>> pd.Series([1, 2, 3], dtype="int64").equals(pd.Series([1, 2, 3], dtype="int32"))
False
>>> pd.Index([1, 2, 3], dtype="int64").equals(pd.Index([1, 2, 3], dtype="int32"))
True
For Index, this not only gives True for different integer dtypes as above, but also for float/int, object/int (both examples give False for Series):
>>> pd.Index([1, 2, 3], dtype="int64").equals(pd.Index([1, 2, 3], dtype="object"))
True
>>> pd.Index([1, 2, 3], dtype="int64").equals(pd.Index([1, 2, 3], dtype="float64"))
True
Index and Series are consistent when it comes to not being equal with other array-likes:
# all those cases return False
pd.Series([1, 2, 3]).equals(np.array([1, 2, 3]))
pd.Index([1, 2, 3]).equals(np.array([1, 2, 3]))
pd.Series([1, 2, 3]).equals(pd.Index([1, 2, 3]))
pd.Index([1, 2, 3]).equals(pd.Series([1, 2, 3]))
pd.Series([1, 2, 3]).equals([1, 2, 3])
pd.Index([1, 2, 3]).equals([1, 2, 3])
Both Index and Series also seem to allow subclasses:
class MySeries(pd.Series):
pass
>>> pd.Series([1, 2, 3]).equals(MySeries([1, 2, 3]))
True
So in the end, I think the main discussion point is: should the dtype be exactly the same, or should only the values be equal?
For DataFrame, it shares the implementation with Series so follows that behaviour (except that for DataFrame there are some additional rules about how column names need to compare equal).
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (8 by maintainers)
Top GitHub Comments
Just to add to this from #36065 the testing mod equals also behave inconsistently:
The reason being that for assert_series_equals after checking the index it checks the frequency of the index. A very easy change would be to add the freq check to the assert_index_equals with argument and remove it from the assert_series_equals.
It feels to me like when calling equals on some container you’d generally be interested in equality of the entire container not just the values, otherwise you’d use == (maybe modulo some differences around missing values). You might argue that even this behavior is questionable: