question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected results when filtering with .isin (some fields contain python datastructures)

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd

data = [
    {'id': 1, 'content': [{'values': 3}]},
    {'id': 2, 'content': u'whats going on'},
    {'id': 3, 'content': u'whaaaaaaaaat'},
    {'id': 4, 'content': [{'values': 4}]}
]

if __name__ == '__main__':
    df = pd.DataFrame.from_dict(data)
    v = [u'whats going on', u'whaaaaaat']
    print df[df.content.isin(v)]
    v = [u'whats going on', u'what']
    print df[df.content.isin(v)]

Problem description

The first print statement executes sucessfully, filtering to the single row 'id': 2, 'content': u'whats going on', however the second filter throws an error even though the only difference is the length of one of the elements in the list v.

Output for the code snippet above:

          content  id
1  whats going on   2
/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py:473: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  return max(0, -(-(self._stop - self._start) // self._step))
Traceback (most recent call last):
  File "test_pandas.py", line 15, in <module>
    print df[df.content.isin(v)]
  File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 2804, in isin
    return self._constructor(result, index=self.index).__finalize__(self)
  File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 264, in __init__
    raise_cast_failure=True)
  File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 3269, in _sanitize_array
    if len(subarr) != len(index) and len(subarr) == 1:
  File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py", line 473, in __len__
    return max(0, -(-(self._stop - self._start) // self._step))
TypeError: unhashable type: 'list'
INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: None.None

pandas: 0.22.0 pytest: 2.9.2 pip: 9.0.1 setuptools: 36.4.0 Cython: None numpy: 1.14.2 scipy: 0.18.1 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.1.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: 1.0.5 pymysql: None psycopg2: 2.6.1 (dt dec pq3 ext lo64) jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
panda-bytecommented, Apr 19, 2021

Using pandas 1.2.4 with Python 3.9.2, it’s still not possible to use Series.isin() / DataFrame.isin() with unhashable types, which I don’t think is to be expected from the documentation.

I think most people would expect isin() to behave like applying the Python keyword in elementwise, which is not the case: pd.Series([1, [2]]).isin([1]) throws the above-mentioned error, while pd.Series([1, [2]]).apply(lambda x: x in [1]) works and returns the expected result. In my opinion, the latter should be a fallback option, or the documentation should state that only hashable types are allowed for isin().

3reactions
toobazcommented, Jun 29, 2019

Simpler test case:

pd.Series([0, [1, 2]]).isin(['a', 'b'])

(so unrelated to indexing, or DataFrame).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python pandas.dataframe.isin returning unexpected results
I tried to execute you code with a sample DataFrame and it works: import pandas as pd ar_data = [[10,20],[11,2],[9,3]] df = pd....
Read more >
Filter a Pandas DataFrame by a Partial String or Pattern in 8 ...
Here, we want to filter by the contents of a particular column. We will use the Series.isin([list_of_values] ) function from Pandas which ...
Read more >
Python | Pandas DataFrame.isin() - GeeksforGeeks
Pandas isin() method is used to filter data frames. isin() method helps in selecting rows with having a particular(or Multiple) value in a ......
Read more >
Indexing and selecting data — pandas 1.5.2 documentation
pandas provides a suite of methods in order to get purely integer based indexing. The semantics follow closely Python and NumPy slicing. These...
Read more >
Pandas Isin to Filter a Dataframe like SQL IN and NOT IN
If you pass in a dataframe, both the columns and the index must match for which you want to filter the data. Filtering...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found