Join result takes the index order of the other (right) DataFrame instead of the calling's (left) one
See original GitHub issueCode Sample, a copy-pastable example if possible
# Your code here
df1 = pd.DataFrame({'a': [0, 10, 20]})
df2 = pd.DataFrame({'b': [200, 100]}, index=[2,1])
print(df1.join(df2, how='inner'))
print(df2.join(df1, how='inner'))
print(df1.join(df2, how='inner', sort=True))
Problem description
Contrary to what is stated in the documentation of DataFrame.join(), when using the default sort=False, the return DataFrame preserves the index order of the other (right) DataFrame, instead of the index order of the calling (left) DataFrame.
Besides, the sort=True argument does not work.
Expected Output
The expected output is that the return DataFrame should preserve the index order of the calling (left) DataFrame.
Output of pd.show_versions()
pandas: 0.19.2 nose: 1.3.7 pip: 8.1.2 setuptools: 27.2.0.post20161106 Cython: 0.24.1 numpy: 1.11.1 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.4.6 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: 1.1.0 tables: 3.2.3.1 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.6.4 bs4: 4.5.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.42.0 pandas_datareader: 0.2.1
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (7 by maintainers)
Top GitHub Comments
The bug is not at the DataFrame level but at the Index level.
The swap of arguments in the call to merge proposed by @sleepdeprivation is going to generate a new bug with swapped lsuffix and rsuffix. Moreover the order of the columns in the result is also exchanged (first columns from the right DataFrame, then columns from the left one).
The real bug is at the Index level, concretely in the Index.intersection() method. The get_indexer must be called on the right index and with left index as argument, so that it returns a mask which transforms the caller index (right) into the passed index (left), and not the other way around. This is the desired output: a mask on the right index which picks its elements in such a way that it gets aligned with the left index.
I have implemented/updated some tests in my PR.
pandas/tests/tools/test_join.py
(there is a single sorted test now)