DataFrame.join left_index right_index inverted
See original GitHub issueCode Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
df_left = pd.DataFrame(data=['X'],columns=['C'],index=[22])
df_right = pd.DataFrame(data=['X'],columns=['C'],index=[999])
merge = pd.merge(df_left,df_right,on=['C'], left_index=True)
print merge.index
Problem description
The copied code print a DataFrame where the key is 999. As I understand from the documentation where left_index=True the keys from the left DataFrame should be used as join keys. My output: Int64Index([999], dtype=‘int64’) Expected output: Int64Index([22], dtype=‘int64’)
pandas: 0.23.3 pytest: None pip: 18.0 setuptools: 20.7.0 Cython: None numpy: 1.15.0 scipy: None pyarrow: None xarray: None IPython: 5.8.0 sphinx: None patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: 1.1.0 xlwt: None xlsxwriter: 1.0.5 lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
I think we’re not quite on the same page here.
on
allows us to specify the merge columns to use in both dataframes via one argument, in this application we’re usingon
instead ofleft_on
and/orright_on
.In this situation, if
left_index
orright_index
are included (butleft_on
andright_on
are excluded), the behaviour mentioned in this issue occurs.here’s an example:
Here, we should end up with two dataframes which both contain the combined age and height data for Ash and Charlie (as they’re the only records with both an age and a height provided), with index values as follows:
common_records_left_index
should have index keys of 1 and 3 (preserved from theages
dataframe), andcommon_records_right_index
should have index keys of 91 and 93 (preserved from theheights
dataframe)However, the opposite case is true -
left_index=True
preserves the keys from the right dataframe during the merge, andright_index=True
preserves the keys from the left dataframe.would take a PR for a test that replicates the OP