Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

result from groupby / nlargest with data frame with one row does not include the groupby key in the resulting index

See original GitHub issue

Code Sample, a copy-pastable example if possible

In [1]: df = pandas.DataFrame([["Dog", 1], ["Dog", 2]], columns=["animal", "value"])

In [2]: df.groupby("animal").value.nlargest(5)
Out[2]: 
animal   
Dog     1    2
        0    1
Name: value, dtype: int64
In [3]: df = pandas.DataFrame([["Dog", 1]], columns=["animal", "value"])
In [4]: df.groupby("animal").value.nlargest(5)
Out[4]: 
0    1
Name: value, dtype: int64

Problem description

Expected Output

In [3]: df = pandas.DataFrame([["Dog", 1]], columns=["animal", "value"])
In [4]: df.groupby("animal").value.nlargest(5)
Out[4]: 
animal
Dog      0    1
Name: value, dtype: int64

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: None statsmodels: None xarray: None IPython: 6.0.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 boto: None pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

phoflcommented, Mar 30, 2020

Hi, @jreback @mroeschke

I looked into this a bit. I found the reason, why the column is dropped sometimes and sometimes not. For functions like nlarges and apply (not head) the column is always dropped, if the input DataFrame equals the output DataFrame (sorting too!). It is checked, if the Index was changed. Depending on the result, different code is executed. While I tried to fix this, I ran into some issues with existing unittests.

If we do something like:

`` df = pd.DataFrame({“key”: [“b”] * 10, “value”: 2})

actual = df.groupby("key")["value"].cumprod()

Should the resulting Series have the column b in the Index or is

0    2
1    4
2    8
3    16
4    32
5    64
6    128
7    256
8    512
9    1024

Name: value, dtype: int64

the desired output?

Top Results From Across the Web

Get index of rows after groupby and nlargest - Stack Overflow

I need columns b and c present in this resulting dataframe. I use the following to subset the original dataframe but it returns...

Group by: split-apply-combine — pandas 1.5.2 documentation

These will split the DataFrame on its index (rows). ... If a non-unique index is used as the group key in a groupby...

Pandas GroupBy: Group, Summarize, and Aggregate Data in ...

We can create a GroupBy object by applying the method to our DataFrame and ... The values of these keys are actually the...

pandas GroupBy: Your Guide to Grouping Data in Python

In this tutorial, you'll learn how to work adeptly with the pandas GroupBy facility while mastering ways to manipulate, transform, ...

Comprehensive Guide to Grouping and Aggregating with ...

Pandas groupby and aggregation provide powerful capabilities for ... may call an aggregation function on one or more columns of a DataFrame.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

result from groupby / nlargest with data frame with one row does not include the groupby key in the resulting index

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

DataFrame.eval errors with AttributeError: 'UnaryOp'

Query re Deprecation of groupby.agg() with a dictionary when renaming

result from groupby / nlargest with data frame with one row does not include the groupby key in the resulting index

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

DataFrame.eval errors with AttributeError: 'UnaryOp'

Query re Deprecation of groupby.agg() with a dictionary when renaming

Output of `pd.show_versions()`