jupyter notebook kernel crashes when doing groupby, having one row with NaN
See original GitHub issueCode Sample, a copy-pastable example if possible
df.groupby('City').count()['Bib'].sort_values(ascending=False).head(20)
Problem description
Data set has 26630 rows and one row of df[“City”] has a NaN. The jupyter notebook kernel crashes.
Expected Output
A list of the top 20 cities is expected here.
df[df[‘City’].notnull()].groupby(‘City’).count()[‘Bib’].sort_values(ascending=False).head(20)
produces the expected output.
https://github.com/rojour/boston_results
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: 0.2.1
Issue Analytics
- State:
- Created 6 years ago
- Comments:10 (7 by maintainers)
Top GitHub Comments
With the same versions, but on Windows, I cannot reproduce this (EDIT: same on linux)
It appears that this issue is not reproducible and our testing suite doesn’t support jupyter kernels. Unless we receive more specific information to reproduce the issue, going to close.