question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Should DataFrame.merge match NaN with NaN?

See original GitHub issue

Code Sample, a copy-pastable example if possible

pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, np.nan]}).merge(pd.DataFrame({'c': [6, 7, 8, 9], 'd': [4, np.nan, np.nan, 5]}), how='left', left_on='b', right_on='d')

Problem description

df1:

| a | b – | – | – 0 | 1 | 4.0 1 | 2 | 5.0 2 | 3 | NaN

df2:

| c | d – | – | – 0 | 6 | 4.0 1 | 7 | NaN 2 | 8 | NaN 3 | 9 | 5.0

Current output:

| a | b | c | d – | – | – | – | – 0 | 1 | 4.0 | 6 | 4.0 1 | 2 | 5.0 | 9 | 5.0 2 | 3 | NaN | 7 | NaN 3 | 3 | NaN | 8 | NaN

Expected Output

| a | b | c | d – | – | – | – | – 0 | 1 | 4.0 | 6 | 4.0 1 | 2 | 5.0 | 9 | 5.0

What’s happening is the NaN is df1.b is matching the NaNs in df2.d.

I don’t see a situation in which this would be desirable behavior, but if such a situation exists, surely the opposite is also conceivable, and so there should be some documented option in DataFrame.merge which accomplishes this.

What do you think?

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.7.0.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4 pytest: None pip: 18.0 setuptools: 39.0.1 Cython: None numpy: 1.15.1 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: None patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0.1 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:5
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

4reactions
jorisvandenbosschecommented, Aug 24, 2018

Agreed, I also would not expect NAs to match here. It might be good to explore a bit if we have always been doing that, and if we do this consistently within pandas (in which case we should certainly do some kind of deprecation if we want to change this)

0reactions
mroeschkecommented, Jul 21, 2021

Closing as duplicate of https://github.com/pandas-dev/pandas/issues/32306 with a more recent discussion on the future policy we want.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas Merge returns NaN - python - Stack Overflow
I have issues with the merging of two large Dataframes since the merge returns NaN values though there are fitting values.
Read more >
Merge Python Pandas dataframe with a common column and ...
To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name. To set...
Read more >
Merge, join, and concatenate — pandas 0.17.0 documentation
In the case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the...
Read more >
Learn to Merge and Join DataFrames with Pandas and Python
In outer joins, every row from the left and right dataframes is retained in the result, with NaNs where there are no matched...
Read more >
Merge two dataframes on multiple columns, only if not NaN
If you remove all the "_other" from the column names of your df2, then you can do df1.set_index(['common_3' ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found