question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pandas 0.23 fails on column<->index boolean merge

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd

df1 = pd.DataFrame({'x': pd.Series([True, False], dtype=bool)})
df2 = df1.set_index('x')

df1.merge(df2, left_on='x', right_index=True)

Problem description

On Pandas 0.23, I get the following Exception:

venv/local/lib/python2.7/site-packages/pandas/core/frame.pyc in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   6377                      right_on=right_on, left_index=left_index,
   6378                      right_index=right_index, sort=sort, suffixes=suffixes,
-> 6379                      copy=copy, indicator=indicator, validate=validate)
   6380
   6381     def round(self, decimals=0, *args, **kwargs):

venv/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     58                          right_index=right_index, sort=sort, suffixes=suffixes,
     59                          copy=copy, indicator=indicator,
---> 60                          validate=validate)
     61     return op.get_result()
     62

venv/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    552         # validate the merge keys dtypes. We may need to coerce
    553         # to avoid incompat dtypes
--> 554         self._maybe_coerce_merge_keys()
    555
    556         # If argument passed to validate,

venv/local/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in _maybe_coerce_merge_keys(self)
    976             # incompatible dtypes GH 9780, GH 15800
    977             elif is_numeric_dtype(lk) and not is_numeric_dtype(rk):
--> 978                 raise ValueError(msg)
    979             elif not is_numeric_dtype(lk) and is_numeric_dtype(rk):
    980                 raise ValueError(msg)

ValueError: You are trying to merge on bool and object columns. If you wish to proceed you should use pd.concat

Expected Output

       x
0   True
1  False

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.49-moby
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.0
pytest: 3.4.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.6.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, May 23, 2018

@crepererum Thanks for figuring that out. So the underlying cause (which now comes up due to that change), is that boolean values in the index are stored as object array (we don’t have a BooleanIndex), so it things the types are not compatible (object vs bool).

So we will have to ‘infer’ the dtype of the index to cover those cases.

1reaction
crepererumcommented, May 23, 2018

The problem here is: the check introduced is in this commit does the correct thing. The index dtype is wrong (it’s object, not bool), which can also be shown be this simple example (identical result for 0.22.0 and 0.23.0):

>>> pd.Index([True, False], dtype=bool)
Index([True, False], dtype='object')

Or in other words: the index dtype is wrong in both versions, the check that was introduced in-between just makes the problem visible.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What's New — pandas 0.23.0 documentation - PyData |
Merging / sorting on a combination of columns and index levels. Extending Pandas with custom types. Excluding unobserved categories from groupby.
Read more >
pandas.DataFrame.loc — pandas 0.23.1 documentation
Access a group of rows and columns by label(s) or a boolean array. .loc[] is primarily label based, but may also be used...
Read more >
What's new in 1.4.0 (January 22, 2022) - Pandas
Bug in merge() with MultiIndex as column index for the on argument returning an error when assigning a column internally (GH43734).
Read more >
pandas.merge — pandas 0.23.1 documentation
Merge DataFrame objects by performing a database-style join operation by columns or indexes. ... indicator : boolean or string, default False.
Read more >
Merge, join, and concatenate — pandas 0.23.0 documentation
ignore_index : boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found