Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: .nlargest with unsigned integers

See original GitHub issue

Code Sample, a copy-pastable example if possible

pd.Series(np.array([0, 0, 0, 100, 1000, 10000, 100], dtype='uint32')).nlargest(5) 
0        0
1        0
2        0
5    10000
4     1000

Problem description

nlargest favours 0 above positive values. Common to both uint32 and uint64 types and possibly others.

Expected Output

5    10000
4     1000
3      100
6      100
0        0
dtype: uint32

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.6.2.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-327.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.20.3 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.4.0 Cython: None numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 6.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.8 xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: 1.1.13 pymysql: 0.7.9.None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:13 (13 by maintainers)

Top GitHub Comments

1reaction

jschendelcommented, Jun 11, 2018

I suspect the issue is with this block of code: https://github.com/pandas-dev/pandas/blob/480790531ffcc4329f280ddf6877d028d08e969f/pandas/core/algorithms.py#L1136-L1138

Specifically for uint data, I don’t think -arr behaves as intended:

In [2]: arr = np.array([0, 0, 0, 100, 1000, 10000, 100], dtype='uint64')

In [3]: -arr
Out[3]:
array([                   0,                    0,                    0,
       18446744073709551516, 18446744073709550616, 18446744073709541616,
       18446744073709551516], dtype=uint64)

0reactions

gfyoungcommented, Jun 11, 2018

Sigh…that’s symptomatic of the same overflow issue presented with uint. Good catch!