BUG: Runtime warning with groupby/tail when `None` appears in group column
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas
df = pandas.DataFrame(
[
["a", 1],
["a", 2],
[None, 0],
["b", 2],
["b", 3],
],
columns=["x", "y"],
)
df.groupby("x", dropna=False).tail(1) # Succeeds
df.groupby("x", dropna=True).tail(1) # Produces warning
Issue Description
👋 In pandas main
, a groupby/tail on a DataFrame which contains nulls in the grouped column produces a RuntimeWarning
suggesting a mishandled case in indexing logic:
/lib/python3.8/site-packages/pandas/core/groupby/indexing.py:217: RuntimeWarning: invalid value encountered in remainder
mask &= offset_array % step == 0
If I set dropna=False
, the snippet succeeds without producing a warning.
Based on the warning location and git blame, it may be related to https://github.com/pandas-dev/pandas/pull/42947
Expected Behavior
The above groupby should succeed without producing a warning.
Installed Versions
Pandas main
, as well as 1.4.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
powerful Python data analysis toolkit - pandas
read_csv has improved support for duplicate column names . ... Use numpy.errstate around the source of the RuntimeWarning to control how ...
Read more >warnings — Warning control — Python 3.11.1 documentation
The warnings filter controls whether warnings are ignored, displayed, or turned into errors (raising an exception). ... Since the Warning class is derived...
Read more >Spark SQL — PySpark 3.2.0 documentation
A distributed collection of data grouped into named columns. ... based on ascending order of the column, and null values appear after non-null...
Read more >Python 3.6 project structure leads to RuntimeWarning
Reading through the bug report, it appears that there is a double-import which happens, and this RuntimeWarning is correct in alerting the user....
Read more >Source code for pandas.core.categorical
RuntimeWarning, stacklevel=2) if (len(values) and is_integer_dtype(values) and (codes == -1).all()): warn("None of the categories were found in values.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Closing as it seems this was a numpy isssue
Thanks @rhshadrach! Upgrading numpy is a fine workaround from my perspective.