pandas 0.23 fails on column<->index boolean merge
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
df1 = pd.DataFrame({'x': pd.Series([True, False], dtype=bool)})
df2 = df1.set_index('x')
df1.merge(df2, left_on='x', right_index=True)
Problem description
On Pandas 0.23, I get the following Exception:
venv/local/lib/python2.7/site-packages/pandas/core/frame.pyc in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
6377 right_on=right_on, left_index=left_index,
6378 right_index=right_index, sort=sort, suffixes=suffixes,
-> 6379 copy=copy, indicator=indicator, validate=validate)
6380
6381 def round(self, decimals=0, *args, **kwargs):
venv/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
58 right_index=right_index, sort=sort, suffixes=suffixes,
59 copy=copy, indicator=indicator,
---> 60 validate=validate)
61 return op.get_result()
62
venv/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
552 # validate the merge keys dtypes. We may need to coerce
553 # to avoid incompat dtypes
--> 554 self._maybe_coerce_merge_keys()
555
556 # If argument passed to validate,
venv/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in _maybe_coerce_merge_keys(self)
976 # incompatible dtypes GH 9780, GH 15800
977 elif is_numeric_dtype(lk) and not is_numeric_dtype(rk):
--> 978 raise ValueError(msg)
979 elif not is_numeric_dtype(lk) and is_numeric_dtype(rk):
980 raise ValueError(msg)
ValueError: You are trying to merge on bool and object columns. If you wish to proceed you should use pd.concat
Expected Output
x
0 True
1 False
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.49-moby
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.23.0
pytest: 3.4.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.6.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (10 by maintainers)
Top Results From Across the Web
What's New — pandas 0.23.0 documentation - PyData |
Merging / sorting on a combination of columns and index levels. Extending Pandas with custom types. Excluding unobserved categories from groupby.
Read more >pandas.DataFrame.loc — pandas 0.23.1 documentation
Access a group of rows and columns by label(s) or a boolean array. .loc[] is primarily label based, but may also be used...
Read more >What's new in 1.4.0 (January 22, 2022) - Pandas
Bug in merge() with MultiIndex as column index for the on argument returning an error when assigning a column internally (GH43734).
Read more >pandas.merge — pandas 0.23.1 documentation
Merge DataFrame objects by performing a database-style join operation by columns or indexes. ... indicator : boolean or string, default False.
Read more >Merge, join, and concatenate — pandas 0.23.0 documentation
ignore_index : boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@crepererum Thanks for figuring that out. So the underlying cause (which now comes up due to that change), is that boolean values in the index are stored as object array (we don’t have a BooleanIndex), so it things the types are not compatible (object vs bool).
So we will have to ‘infer’ the dtype of the index to cover those cases.
The problem here is: the check introduced is in this commit does the correct thing. The index dtype is wrong (it’s
object, notbool), which can also be shown be this simple example (identical result for0.22.0and0.23.0):Or in other words: the index dtype is wrong in both versions, the check that was introduced in-between just makes the problem visible.