Unexpected results when filtering with .isin (some fields contain python datastructures)
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
data = [
{'id': 1, 'content': [{'values': 3}]},
{'id': 2, 'content': u'whats going on'},
{'id': 3, 'content': u'whaaaaaaaaat'},
{'id': 4, 'content': [{'values': 4}]}
]
if __name__ == '__main__':
df = pd.DataFrame.from_dict(data)
v = [u'whats going on', u'whaaaaaat']
print df[df.content.isin(v)]
v = [u'whats going on', u'what']
print df[df.content.isin(v)]
Problem description
The first print statement executes sucessfully, filtering to the single row 'id': 2, 'content': u'whats going on', however the second filter throws an error even though the only difference is the length of one of the elements in the list v.
Output for the code snippet above:
content id
1 whats going on 2
/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py:473: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
return max(0, -(-(self._stop - self._start) // self._step))
Traceback (most recent call last):
File "test_pandas.py", line 15, in <module>
print df[df.content.isin(v)]
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 2804, in isin
return self._constructor(result, index=self.index).__finalize__(self)
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 264, in __init__
raise_cast_failure=True)
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 3269, in _sanitize_array
if len(subarr) != len(index) and len(subarr) == 1:
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py", line 473, in __len__
return max(0, -(-(self._stop - self._start) // self._step))
TypeError: unhashable type: 'list'
pandas: 0.22.0 pytest: 2.9.2 pip: 9.0.1 setuptools: 36.4.0 Cython: None numpy: 1.14.2 scipy: 0.18.1 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.1.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: 1.0.5 pymysql: None psycopg2: 2.6.1 (dt dec pq3 ext lo64) jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (5 by maintainers)

Top Related StackOverflow Question
Using pandas 1.2.4 with Python 3.9.2, it’s still not possible to use
Series.isin()/DataFrame.isin()with unhashable types, which I don’t think is to be expected from the documentation.I think most people would expect
isin()to behave like applying the Python keywordinelementwise, which is not the case:pd.Series([1, [2]]).isin([1])throws the above-mentioned error, whilepd.Series([1, [2]]).apply(lambda x: x in [1])works and returns the expected result. In my opinion, the latter should be a fallback option, or the documentation should state that only hashable types are allowed forisin().Simpler test case:
(so unrelated to indexing, or
DataFrame).