question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Runtime warning with groupby/tail when `None` appears in group column

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
df = pandas.DataFrame(
    [
        ["a", 1],
        ["a", 2],
        [None, 0],
        ["b", 2],
        ["b", 3],

    ],
    columns=["x", "y"],
)
df.groupby("x", dropna=False).tail(1)   # Succeeds
df.groupby("x", dropna=True).tail(1)   # Produces warning

Issue Description

👋 In pandas main, a groupby/tail on a DataFrame which contains nulls in the grouped column produces a RuntimeWarning suggesting a mishandled case in indexing logic:

/lib/python3.8/site-packages/pandas/core/groupby/indexing.py:217: RuntimeWarning: invalid value encountered in remainder
  mask &= offset_array % step == 0

If I set dropna=False, the snippet succeeds without producing a warning.

Based on the warning location and git blame, it may be related to https://github.com/pandas-dev/pandas/pull/42947

Expected Behavior

The above groupby should succeed without producing a warning.

Installed Versions

Pandas main, as well as 1.4.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mroeschkecommented, Jul 5, 2022

Closing as it seems this was a numpy isssue

0reactions
ian-r-rosecommented, Apr 22, 2022

Thanks @rhshadrach! Upgrading numpy is a fine workaround from my perspective.

Read more comments on GitHub >

github_iconTop Results From Across the Web

powerful Python data analysis toolkit - pandas
read_csv has improved support for duplicate column names . ... Use numpy.errstate around the source of the RuntimeWarning to control how ...
Read more >
warnings — Warning control — Python 3.11.1 documentation
The warnings filter controls whether warnings are ignored, displayed, or turned into errors (raising an exception). ... Since the Warning class is derived...
Read more >
Spark SQL — PySpark 3.2.0 documentation
A distributed collection of data grouped into named columns. ... based on ascending order of the column, and null values appear after non-null...
Read more >
Python 3.6 project structure leads to RuntimeWarning
Reading through the bug report, it appears that there is a double-import which happens, and this RuntimeWarning is correct in alerting the user....
Read more >
Source code for pandas.core.categorical
RuntimeWarning, stacklevel=2) if (len(values) and is_integer_dtype(values) and (codes == -1).all()): warn("None of the categories were found in values.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found