Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Runtime warning with groupby/tail when `None` appears in group column

See original GitHub issue

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
df = pandas.DataFrame(
    [
        ["a", 1],
        ["a", 2],
        [None, 0],
        ["b", 2],
        ["b", 3],

    ],
    columns=["x", "y"],
)
df.groupby("x", dropna=False).tail(1)   # Succeeds
df.groupby("x", dropna=True).tail(1)   # Produces warning

Issue Description

👋 In pandas main, a groupby/tail on a DataFrame which contains nulls in the grouped column produces a RuntimeWarning suggesting a mishandled case in indexing logic:

/lib/python3.8/site-packages/pandas/core/groupby/indexing.py:217: RuntimeWarning: invalid value encountered in remainder
  mask &= offset_array % step == 0

If I set dropna=False, the snippet succeeds without producing a warning.

Based on the warning location and git blame, it may be related to https://github.com/pandas-dev/pandas/pull/42947

Expected Behavior

The above groupby should succeed without producing a warning.

Installed Versions

Pandas main, as well as 1.4.

Issue Analytics

State:
Created a year ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

mroeschkecommented, Jul 5, 2022

Closing as it seems this was a numpy isssue

0reactions

ian-r-rosecommented, Apr 22, 2022

Thanks @rhshadrach! Upgrading numpy is a fine workaround from my perspective.

Top Results From Across the Web

powerful Python data analysis toolkit - pandas

read_csv has improved support for duplicate column names . ... Use numpy.errstate around the source of the RuntimeWarning to control how ...

warnings — Warning control — Python 3.11.1 documentation

The warnings filter controls whether warnings are ignored, displayed, or turned into errors (raising an exception). ... Since the Warning class is derived...

Spark SQL — PySpark 3.2.0 documentation

A distributed collection of data grouped into named columns. ... based on ascending order of the column, and null values appear after non-null...

Python 3.6 project structure leads to RuntimeWarning

Reading through the bug report, it appears that there is a double-import which happens, and this RuntimeWarning is correct in alerting the user....

Source code for pandas.core.categorical

RuntimeWarning, stacklevel=2) if (len(values) and is_integer_dtype(values) and (codes == -1).all()): warn("None of the categories were found in values.