Unexpected results when filtering with .isin (some fields contain python datastructures)
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
data = [
{'id': 1, 'content': [{'values': 3}]},
{'id': 2, 'content': u'whats going on'},
{'id': 3, 'content': u'whaaaaaaaaat'},
{'id': 4, 'content': [{'values': 4}]}
]
if __name__ == '__main__':
df = pd.DataFrame.from_dict(data)
v = [u'whats going on', u'whaaaaaat']
print df[df.content.isin(v)]
v = [u'whats going on', u'what']
print df[df.content.isin(v)]
Problem description
The first print statement executes sucessfully, filtering to the single row 'id': 2, 'content': u'whats going on'
, however the second filter throws an error even though the only difference is the length of one of the elements in the list v
.
Output for the code snippet above:
content id
1 whats going on 2
/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py:473: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
return max(0, -(-(self._stop - self._start) // self._step))
Traceback (most recent call last):
File "test_pandas.py", line 15, in <module>
print df[df.content.isin(v)]
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 2804, in isin
return self._constructor(result, index=self.index).__finalize__(self)
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 264, in __init__
raise_cast_failure=True)
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 3269, in _sanitize_array
if len(subarr) != len(index) and len(subarr) == 1:
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py", line 473, in __len__
return max(0, -(-(self._stop - self._start) // self._step))
TypeError: unhashable type: 'list'
pandas: 0.22.0 pytest: 2.9.2 pip: 9.0.1 setuptools: 36.4.0 Cython: None numpy: 1.14.2 scipy: 0.18.1 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.1.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: 1.0.5 pymysql: None psycopg2: 2.6.1 (dt dec pq3 ext lo64) jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (5 by maintainers)
Using pandas 1.2.4 with Python 3.9.2, it’s still not possible to use
Series.isin()
/DataFrame.isin()
with unhashable types, which I don’t think is to be expected from the documentation.I think most people would expect
isin()
to behave like applying the Python keywordin
elementwise, which is not the case:pd.Series([1, [2]]).isin([1])
throws the above-mentioned error, whilepd.Series([1, [2]]).apply(lambda x: x in [1])
works and returns the expected result. In my opinion, the latter should be a fallback option, or the documentation should state that only hashable types are allowed forisin()
.Simpler test case:
(so unrelated to indexing, or
DataFrame
).