Regression in 0.24: TypeError exception when using dropna on dataframe with categorical index
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
dd = pd.DataFrame(np.arange(10))
dd['x2'] = dd[0] * dd[0]
dd['q'] = pd.qcut(dd['x2'], 5)
dd.set_index('q', inplace=True)
dd.dropna()
Problem description
The call to dropna raised the following exception:
TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
This seems to happen only with a categorical index for which the intervals are not all of the same length.
There was no issue in version 0.23.4 and the issue is not fixed in the master
Expected Output
No exception should be raised.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: 25ff4729243d69bada4eaf1eeeebc7ec41418977 python: 3.7.2.final.0 python-bits: 64 OS: Linux OS-release: 4.19.15-300.fc29.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8
pandas: 0.25.0.dev0+44.g25ff47292 pytest: None pip: 19.0.1 setuptools: 40.6.3 Cython: 0.29.4 numpy: 1.16.1 scipy: 1.2.0 pyarrow: None xarray: None IPython: 7.2.0 sphinx: None patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.9 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 3.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (8 by maintainers)
This will be fixed by #27100.
A more concise equivalent example for testing purposes:
I’m running into the same error for this sequence:
‘0.24.2’