question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

assert_frame_equal not differentiating NaN and None within object dtype

See original GitHub issue

Code Sample, a copy-pastable example if possible

df1 = pd.DataFrame([['foo', 'bar', 'baz'], [None, None, None]])
df2 = pd.DataFrame([['foo', 'bar', 'baz'], [np.nan, np.nan, np.nan]])
tm.assert_frame_equal(df1, df2)

Problem description

No AssertionError gets raised in this case, even though I would not expect None and np.nan to be considered equal values (I found this while testing #18450)

Note that if you simply compared a DataFrame with only np.nan with another containing only None you would get an AttributeError that the dtypes are different (float64 vs object) but because the missing values are mixed into object dtypes here I think that differentiation gets lost

Expected Output

AssertionError: DataFrame.iloc[1, :] are different

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

commit: d64995a4fcc3269dff4366988230563b8aeffb9f python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 17.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0.dev0+205.gd64995a4f pytest: 3.2.5 pip: 9.0.1 setuptools: 36.4.0 Cython: 0.26 numpy: 1.13.1 scipy: None pyarrow: None xarray: None IPython: 6.2.1 sphinx: 1.6.3 patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jrebackcommented, Nov 24, 2017

from above, this is actually wrong and we should strictly check this.

In [18]: df1.equals(df2)
Out[18]: True
0reactions
TomAugspurgercommented, Oct 9, 2020

I think this also affects pd.NA.

In [45]: a = pd.Series([1, np.nan], dtype=object)

In [46]: b = pd.Series([1, pd.NA], dtype=object)

In [47]: pd.testing.assert_series_equal(a, b)

A keyword to control this would be great. Based on the recent experience with check_freq, we should perhaps not make things strict by default. But we could do non-strict + a warning by default, with the option to override in the function or via a global option.

def assert_series_equal(..., check_na=None):
    if check_na is None:
        check_na = pandas.config.get("testing.check_na")
    if check_na is None;
        warnings.warn("default to false, changing to strict in the future")
        check_na = False
Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas DataFrames with NaNs equality comparison
You can use assert_frame_equals with check_names=False (so as not to check the index/columns names), which will raise if they are not equal: In...
Read more >
Dealing with Missing Values NaN and None in Python - Medium
To detect missing values, df. isnull() returns True for both NaN and None . To eliminate missing values, df. fillna() also works for...
Read more >
Checking If Any Value is NaN in a Pandas DataFrame - Chartio
Within pandas, a null value is considered missing and is denoted by NaN. This article details how to evalute pandas for missing data...
Read more >
How to compare two DataFrames in Python Pandas with ...
This Numpy NaN value has some interesting mathematical properties. For example, it is not equal to itself. However, Python None object evaluates ...
Read more >
Nullable integer data type - Pandas
Because NaN is a float, this forces an array of integers with any missing values to become floating point. In some cases, this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found