Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame query method - numexpr safety check fails

See original GitHub issue

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
df = pd.DataFrame({'a': ['1','2','3'], 'b': [4,5,6]})
df.query("a.astype('int') < 2")

raises TypeError: unhashable type: 'numpy.ndarray'

Problem description

Background When using numexpr, Pandas has an internal function, _check_ne_builtin_clash, for detecting when a variable used in a method like query clashes with a numexpr built-in.

Here’s an example of the function raising an error as intended…

df = pd.DataFrame({'abs': [1,2,3]})
df.query("abs > 2")
# Raises NumExprClobberingError: Variables ... overlap with builtins: ('abs')

Mostly, the names it protects again are math functions like sin, cos, sum, etc…

Why my original example fails

The trouble with my original code is that check_ne_builtin_clash is checking the name of both sides of the BinaryExpr AST node corresponding to "a.astype('int') < 2". It does this by putting them into a frozenset. However, the LHS ends up being a Constant node, with the name array([1,2,3]), which is an ndarray, so is not hashable.

Solution

It seems like the helper function _check_ne_builtin_clash should consider any name that is unhashable safe, since it can’t conflict with the function names being searched for. If this seems like a reasonable behavior, let me know and I will submit a PR!

code for function:

https://github.com/pandas-dev/pandas/blob/b82253590a66b4a35ed682bca244f668f16c3e0b/pandas/core/computation/engines.py#L23-L38

code for var names it looks for:

https://github.com/pandas-dev/pandas/blob/master/pandas/core/computation/ops.py#L20-L26

Expected Output

> df.query("a.astype('int') < 2")
   a  b
0  1  4

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4 pytest: 3.2.1 pip: 9.0.1 setuptools: 40.0.0 Cython: 0.24 numpy: 1.15.0 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.4.9 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 4.2.2 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.10 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 5 years ago
Reactions:5
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

untanglerealitycommented, Sep 9, 2021

If anyone is having trouble with unhashable type error when using Pandas query, you can add engine="python" argument if the performance isn’t a problem.

Example:

orders.query("item_name.str.contains('Chicken')", engine="python")

You can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine. Source: DataFrame.query documentation

You can also use the old-style masking instead.

orders[orders.item_name.str.contains('Chicken')]

0reactions

jiagengliucommented, Jul 31, 2022

If anyone is having trouble with unhashable type error when using the Pandas query, you can upgrade to pandas 1.4 (which requires Python 3.8).

pip install pandas==1.4.3 fixes the problem for me.

Top Results From Across the Web

Using query() when column names are numeric or contain ...

Is it possible to use the query method in pandas where columns don't have names or their name have special characters in it?...

pandas.DataFrame.query — pandas 1.5.2 documentation

This method uses the top-level eval() function to evaluate the passed query. The query() method uses a slightly modified Python syntax by default....

Enhancing performance — pandas 1.5.2 documentation

In this part of the tutorial, we will investigate how to speed up certain functions operating on pandas DataFrame using three different techniques:...

pandas.eval — pandas 1.5.2 documentation

'numexpr' : This default engine evaluates pandas objects using numexpr for large speed ups ... For example, this is used in the query()...

pandas.DataFrame.query — pandas 0.22.0 documentation

The query() method uses a slightly modified Python syntax by default. ... This is not recommended as it is inefficient compared to using...