question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame query method - numexpr safety check fails

See original GitHub issue

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
df = pd.DataFrame({'a': ['1','2','3'], 'b': [4,5,6]})
df.query("a.astype('int') < 2")

raises TypeError: unhashable type: 'numpy.ndarray'

Problem description

Background When using numexpr, Pandas has an internal function, _check_ne_builtin_clash, for detecting when a variable used in a method like query clashes with a numexpr built-in.

Here’s an example of the function raising an error as intended…

df = pd.DataFrame({'abs': [1,2,3]})
df.query("abs > 2")
# Raises NumExprClobberingError: Variables ... overlap with builtins: ('abs')

Mostly, the names it protects again are math functions like sin, cos, sum, etc…

Why my original example fails

The trouble with my original code is that check_ne_builtin_clash is checking the name of both sides of the BinaryExpr AST node corresponding to "a.astype('int') < 2". It does this by putting them into a frozenset. However, the LHS ends up being a Constant node, with the name array([1,2,3]), which is an ndarray, so is not hashable.

Solution

It seems like the helper function _check_ne_builtin_clash should consider any name that is unhashable safe, since it can’t conflict with the function names being searched for. If this seems like a reasonable behavior, let me know and I will submit a PR!

code for function:

https://github.com/pandas-dev/pandas/blob/b82253590a66b4a35ed682bca244f668f16c3e0b/pandas/core/computation/engines.py#L23-L38

code for var names it looks for:

https://github.com/pandas-dev/pandas/blob/master/pandas/core/computation/ops.py#L20-L26

Expected Output

> df.query("a.astype('int') < 2")
   a  b
0  1  4

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4 pytest: 3.2.1 pip: 9.0.1 setuptools: 40.0.0 Cython: 0.24 numpy: 1.15.0 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.4.9 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 4.2.2 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.10 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:5
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
untanglerealitycommented, Sep 9, 2021

If anyone is having trouble with unhashable type error when using Pandas query, you can add engine="python" argument if the performance isn’t a problem.

Example:

orders.query("item_name.str.contains('Chicken')", engine="python")

You can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine. Source: DataFrame.query documentation

You can also use the old-style masking instead.

orders[orders.item_name.str.contains('Chicken')]
0reactions
jiagengliucommented, Jul 31, 2022

If anyone is having trouble with unhashable type error when using the Pandas query, you can upgrade to pandas 1.4 (which requires Python 3.8).

pip install pandas==1.4.3 fixes the problem for me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using query() when column names are numeric or contain ...
Is it possible to use the query method in pandas where columns don't have names or their name have special characters in it?...
Read more >
pandas.DataFrame.query — pandas 1.5.2 documentation
This method uses the top-level eval() function to evaluate the passed query. The query() method uses a slightly modified Python syntax by default....
Read more >
Enhancing performance — pandas 1.5.2 documentation
In this part of the tutorial, we will investigate how to speed up certain functions operating on pandas DataFrame using three different techniques:...
Read more >
pandas.eval — pandas 1.5.2 documentation
'numexpr' : This default engine evaluates pandas objects using numexpr for large speed ups ... For example, this is used in the query()...
Read more >
pandas.DataFrame.query — pandas 0.22.0 documentation
The query() method uses a slightly modified Python syntax by default. ... This is not recommended as it is inefficient compared to using...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found