Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make parameter keep=False keep duplicates for nlargest/nsmallest

See original GitHub issue

Code Sample, a copy-pastable example if possible

>>> s = pd.Series([10,9,8,7,7,7,6])
>>> s.nlargest(4)
0    10
1     9
2     8
3     7
dtype: int64

Problem description

The docstrings list False as one of the possible argument values for keep. pandas raises a ValueError when attempting to use this parameter.

Expected Output

It would be nice to have nlargest work like this.

>>> s.nlargest(4, keep=False)
0    10
1     9
2     8
3     7
4     7
5     7

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.2 pytest: 3.0.7 pip: 9.0.1 setuptools: 35.0.2 Cython: 0.25.2 numpy: 1.13.0 scipy: 0.19.0 xarray: None IPython: 6.0.0 sphinx: 1.5.5 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.0 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999999999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: 0.3.0.post

Issue Analytics

State:
Created 6 years ago
Comments:12 (8 by maintainers)

Top GitHub Comments

3reactions

tdpetroucommented, Dec 6, 2017

@gfyoung This doesn’t make sense to me. The keep option is only important whenever there are ties for the last value. Arguably, the most useful thing to do is to keep all of the ties which would be keep=False. This option still exists in the docstrings in 0.21.

Also, the duplicated method has same three options so it matches that. Additionally, it was just recently removed without discussion.

edit: A better value for the parameter would be keep='all'

0reactions

artinthetreescommented, Jan 28, 2021

it would be useful also to have a keep option of “none”!

s = pd.Series([10,9,8,7,7,7,6]) s.nlargest(4) 0 10 1 9 2 8 dtype: int64

Top Results From Across the Web

pandas.DataFrame.nlargest — pandas 1.5.2 documentation

Parameters. nint ... In the following example, we will use nlargest to select the three rows ... When using keep='all' , all duplicate...

Become a pandas ninja with nlargest(), nsmallest(), query and ...

You can add keep parameter to tell nlargest() method how to do in front of duplicate rows. The default value for keep is...

Python | Pandas DataFrame.nlargest() - GeeksforGeeks

Pandas nlargest() method is used to get n largest values from a data frame ... keep: object to set which value to select...

pandas drop_duplicates() "keep" parameter gives very ...

keep defines which duplicate value you want to keep. 1) First specifies to keep the first duplicate value and drop the rest.

Powerful One-liners in Pandas Every Data Scientist Should ...

However, the keep argument used in nlargest() makes all the difference. Considering the example above, nlargest() with keep=”all" returns potential duplicates ...