DataFrame query method - numexpr safety check fails
See original GitHub issueCode Sample, a copy-pastable example if possible
# Your code here
import pandas as pd
df = pd.DataFrame({'a': ['1','2','3'], 'b': [4,5,6]})
df.query("a.astype('int') < 2")
raises TypeError: unhashable type: 'numpy.ndarray'
Problem description
Background
When using numexpr, Pandas has an internal function, _check_ne_builtin_clash
, for detecting when a variable used in a method like query clashes with a numexpr built-in.
Here’s an example of the function raising an error as intended…
df = pd.DataFrame({'abs': [1,2,3]})
df.query("abs > 2")
# Raises NumExprClobberingError: Variables ... overlap with builtins: ('abs')
Mostly, the names it protects again are math functions like sin
, cos
, sum
, etc…
Why my original example fails
The trouble with my original code is that check_ne_builtin_clash
is checking the name of both sides of the BinaryExpr AST node corresponding to "a.astype('int') < 2"
.
It does this by putting them into a frozenset.
However, the LHS ends up being a Constant node, with the name array([1,2,3])
, which is an ndarray, so is not hashable.
Solution
It seems like the helper function _check_ne_builtin_clash
should consider any name that is unhashable safe, since it can’t conflict with the function names being searched for. If this seems like a reasonable behavior, let me know and I will submit a PR!
code for function:
code for var names it looks for:
https://github.com/pandas-dev/pandas/blob/master/pandas/core/computation/ops.py#L20-L26
Expected Output
> df.query("a.astype('int') < 2")
a b
0 1 4
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.23.4 pytest: 3.2.1 pip: 9.0.1 setuptools: 40.0.0 Cython: 0.24 numpy: 1.15.0 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.4.9 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 4.2.2 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.10 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Reactions:5
- Comments:9 (5 by maintainers)
Top GitHub Comments
If anyone is having trouble with
unhashable type
error when using Pandasquery
, you can addengine="python"
argument if the performance isn’t a problem.Example:
You can also use the old-style masking instead.
If anyone is having trouble with
unhashable type
error when using the Pandas query, you can upgrade to pandas 1.4 (which requires Python 3.8).pip install pandas==1.4.3
fixes the problem for me.