result from groupby / nlargest with data frame with one row does not include the groupby key in the resulting index
See original GitHub issueCode Sample, a copy-pastable example if possible
In [1]: df = pandas.DataFrame([["Dog", 1], ["Dog", 2]], columns=["animal", "value"])
In [2]: df.groupby("animal").value.nlargest(5)
Out[2]:
animal
Dog 1 2
0 1
Name: value, dtype: int64
In [3]: df = pandas.DataFrame([["Dog", 1]], columns=["animal", "value"])
In [4]: df.groupby("animal").value.nlargest(5)
Out[4]:
0 1
Name: value, dtype: int64
Problem description
Expected Output
In [3]: df = pandas.DataFrame([["Dog", 1]], columns=["animal", "value"])
In [4]: df.groupby("animal").value.nlargest(5)
Out[4]:
animal
Dog 0 1
Name: value, dtype: int64
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: None statsmodels: None xarray: None IPython: 6.0.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 boto: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
Get index of rows after groupby and nlargest - Stack Overflow
I need columns b and c present in this resulting dataframe. I use the following to subset the original dataframe but it returns...
Read more >Group by: split-apply-combine — pandas 1.5.2 documentation
These will split the DataFrame on its index (rows). ... If a non-unique index is used as the group key in a groupby...
Read more >Pandas GroupBy: Group, Summarize, and Aggregate Data in ...
We can create a GroupBy object by applying the method to our DataFrame and ... The values of these keys are actually the...
Read more >pandas GroupBy: Your Guide to Grouping Data in Python
In this tutorial, you'll learn how to work adeptly with the pandas GroupBy facility while mastering ways to manipulate, transform, ...
Read more >Comprehensive Guide to Grouping and Aggregating with ...
Pandas groupby and aggregation provide powerful capabilities for ... may call an aggregation function on one or more columns of a DataFrame.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, @jreback @mroeschke
I looked into this a bit. I found the reason, why the column is dropped sometimes and sometimes not. For functions like nlarges and apply (not head) the column is always dropped, if the input
DataFrame
equals the outputDataFrame
(sorting too!). It is checked, if theIndex
was changed. Depending on the result, different code is executed. While I tried to fix this, I ran into some issues with existing unittests.If we do something like:
`` df = pd.DataFrame({“key”: [“b”] * 10, “value”: 2})
``
Should the resulting
Series
have the column b in theIndex
or isthe desired output?
Similar question: If we execute
`` base_df = pd.DataFrame({“A”: [1, 1, 1, 1, 2, 2, 2, 2], “B”: [np.nan] * 8})
``
should the output
Series
look likeor should the column A be part of the
Index
?I would really appreciate an answert about the output format of these two functions. Depending on that I may have found a way to fix this issue and the issues related with this (#29129 for example). If the columns should not be part of the
Index
, the solution is more complex.Thanks very much.
@phofl - transformers should return the index of the original DataFrame.