question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

result from groupby / nlargest with data frame with one row does not include the groupby key in the resulting index

See original GitHub issue

Code Sample, a copy-pastable example if possible

In [1]: df = pandas.DataFrame([["Dog", 1], ["Dog", 2]], columns=["animal", "value"])

In [2]: df.groupby("animal").value.nlargest(5)
Out[2]: 
animal   
Dog     1    2
        0    1
Name: value, dtype: int64
In [3]: df = pandas.DataFrame([["Dog", 1]], columns=["animal", "value"])
In [4]: df.groupby("animal").value.nlargest(5)
Out[4]: 
0    1
Name: value, dtype: int64

Problem description

Expected Output

In [3]: df = pandas.DataFrame([["Dog", 1]], columns=["animal", "value"])
In [4]: df.groupby("animal").value.nlargest(5)
Out[4]: 
animal
Dog      0    1
Name: value, dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: None statsmodels: None xarray: None IPython: 6.0.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 boto: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
phoflcommented, Mar 30, 2020

Hi, @jreback @mroeschke

I looked into this a bit. I found the reason, why the column is dropped sometimes and sometimes not. For functions like nlarges and apply (not head) the column is always dropped, if the input DataFrame equals the output DataFrame (sorting too!). It is checked, if the Index was changed. Depending on the result, different code is executed. While I tried to fix this, I ran into some issues with existing unittests.

If we do something like:

`` df = pd.DataFrame({“key”: [“b”] * 10, “value”: 2})

actual = df.groupby("key")["value"].cumprod()

``

Should the resulting Series have the column b in the Index or is

0    2
1    4
2    8
3    16
4    32
5    64
6    128
7    256
8    512
9    1024

Name: value, dtype: int64

the desired output?

Similar question: If we execute

`` base_df = pd.DataFrame({“A”: [1, 1, 1, 1, 2, 2, 2, 2], “B”: [np.nan] * 8})

expected = pd.DataFrame({"B": [np.nan] * 8})
result = base_df.groupby("A").cummax()

``

should the output Series look like

B
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN

or should the column A be part of the Index?

I would really appreciate an answert about the output format of these two functions. Depending on that I may have found a way to fix this issue and the issues related with this (#29129 for example). If the columns should not be part of the Index, the solution is more complex.

Thanks very much.

0reactions
rhshadrachcommented, Jul 18, 2021

@phofl - transformers should return the index of the original DataFrame.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Get index of rows after groupby and nlargest - Stack Overflow
I need columns b and c present in this resulting dataframe. I use the following to subset the original dataframe but it returns...
Read more >
Group by: split-apply-combine — pandas 1.5.2 documentation
These will split the DataFrame on its index (rows). ... If a non-unique index is used as the group key in a groupby...
Read more >
Pandas GroupBy: Group, Summarize, and Aggregate Data in ...
We can create a GroupBy object by applying the method to our DataFrame and ... The values of these keys are actually the...
Read more >
pandas GroupBy: Your Guide to Grouping Data in Python
In this tutorial, you'll learn how to work adeptly with the pandas GroupBy facility while mastering ways to manipulate, transform, ...
Read more >
Comprehensive Guide to Grouping and Aggregating with ...
Pandas groupby and aggregation provide powerful capabilities for ... may call an aggregation function on one or more columns of a DataFrame.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found