question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Right/outer merge behaviour on left column and right index is unexpected

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
big_index = [123, 124, 125, 126, 127, 128, 129, 130]
big_dat = {'year': pd.Series([2000, 2000, 2000, 2001, 2002, 2002, 2002, 2004], index=big_index),
          'other': pd.Series(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], index=big_index)}
big_df = pd.DataFrame(big_dat)

year_index = [2003, 2000, 2001, 2002]
year_dat = {'a': pd.Series([1, 2, 3, 4], index=year_index),
            'b': pd.Series([5, 6, 7, 8], index=year_index)}
year_df = pd.DataFrame(year_dat)

merged_right = pd.merge(
        big_df,
        year_df,
        how='right',
        left_on='year',
        right_index=True
        )
merged_outer = pd.merge(
        big_df,
        year_df,
        how='outer',
        left_on='year',
        right_index=True
        )

merged_right
Out[5]: 
    other  year  a  b
123     a  2000  2  6
124     b  2000  2  6
125     c  2000  2  6
126     d  2001  3  7
127     e  2002  4  8
128     f  2002  4  8
129     g  2002  4  8
130   NaN  2003  1  5

merged_outer
Out[6]: 
    other  year    a    b
123     a  2000  2.0  6.0
124     b  2000  2.0  6.0
125     c  2000  2.0  6.0
126     d  2001  3.0  7.0
127     e  2002  4.0  8.0
128     f  2002  4.0  8.0
129     g  2002  4.0  8.0
130     h  2004  NaN  NaN
130   NaN  2003  1.0  5.0

Problem description

merged outer and merged right both return year 2003, a 1, b 5 associated with index 130.

Expected Output

Either a NaN index entry or an error/warning. There’s no obvious reason to associate that data with 130 index

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.20.2 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
gfyoungcommented, Aug 15, 2017

Ah, that’s a little better to read now! I do agree: I too am a little perplexed why there would be such a row, as I don’t know where that would show up in either such join.

0reactions
phoflcommented, May 28, 2020

Ok thanks.

I have a small follow up question about the initial example. I would expect that the index there is the index of the right DataFrame (year_df) because we have a right join. It seems wrong that we get the left index after a right join.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas merge shows unexpected results using index on left ...
I'm trying to merge two pandas DataFrames, one on its index, the other on a regular column: import pandas as pd df_left =...
Read more >
pandas.merge — pandas 1.5.2 documentation
This is different from usual SQL join behaviour and can lead to unexpected results. Parameters. leftDataFrame: rightDataFrame or named Series.
Read more >
Right outer join - Power Query - Microsoft Learn
An article on how to do a merge operation in Power Query using the right outer join kind.
Read more >
pandas/merge.py at main - GitHub
Use the index of the right DataFrame as the join key. by : column name or list of column names. Match on these...
Read more >
Merge, join, and concatenate - Pandas 中文
left_on : Columns or index levels from the left DataFrame or Series to use as ... right, RIGHT OUTER JOIN, Use keys from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found