question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regression in 0.24: TypeError exception when using dropna on dataframe with categorical index

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

dd = pd.DataFrame(np.arange(10))
dd['x2'] = dd[0] * dd[0]
dd['q'] = pd.qcut(dd['x2'], 5)
dd.set_index('q', inplace=True)
dd.dropna()

Problem description

The call to dropna raised the following exception:

TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

This seems to happen only with a categorical index for which the intervals are not all of the same length.

There was no issue in version 0.23.4 and the issue is not fixed in the master

Expected Output

No exception should be raised.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 25ff4729243d69bada4eaf1eeeebc7ec41418977 python: 3.7.2.final.0 python-bits: 64 OS: Linux OS-release: 4.19.15-300.fc29.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8

pandas: 0.25.0.dev0+44.g25ff47292 pytest: None pip: 19.0.1 setuptools: 40.6.3 Cython: 0.29.4 numpy: 1.16.1 scipy: 1.2.0 pyarrow: None xarray: None IPython: 7.2.0 sphinx: None patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.9 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 3.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jschendelcommented, Jun 28, 2019

This will be fixed by #27100.

A more concise equivalent example for testing purposes:

In [1]: import pandas as pd; pd.__version__                                                                             
Out[1]: '0.24.2'

In [2]: idx = pd.CategoricalIndex(pd.IntervalIndex.from_breaks([0, 2.78, 3.14, 6.28]))                                  

In [3]: df = pd.DataFrame({'A': list('abc')}, index=idx)                                                                

In [4]: df                                                                                                              
Out[4]: 
              A
(0.0, 2.78]   a
(2.78, 3.14]  b
(3.14, 6.28]  c

In [5]: df.dropna()                                                                                                     
---------------------------------------------------------------------------
TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

0reactions
alexdevmotioncommented, Apr 18, 2019

I’m running into the same error for this sequence:

import pandas as pd
display(pd.__version__)
binned_series_1 = pd.qcut([1,2,3], 2)
binned_series_2 = pd.qcut([4,5,6], 2)
ct = pd.crosstab(binned_series_1, binned_series_2)

‘0.24.2’


TypeError Traceback (most recent call last) <ipython-input-1-4f1271f5d374> in <module>() 2 binned_series_1 = pd.qcut([1,2,3], 2) 3 binned_series_2 = pd.qcut([4,5,6], 2) ----> 4 ct = pd.crosstab(binned_series_1, binned_series_2)

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\reshape\pivot.py in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize) 519 table = df.pivot_table(‘dummy’, index=rownames, columns=colnames, 520 margins=margins, margins_name=margins_name, –> 521 dropna=dropna, **kwargs) 522 523 # Post-process

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name) 5757 aggfunc=aggfunc, fill_value=fill_value, 5758 margins=margins, dropna=dropna, -> 5759 margins_name=margins_name) 5760 5761 def stack(self, level=-1, dropna=True):

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\reshape\pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name) 145 # GH 15193 Make sure empty columns are removed if dropna=True 146 if isinstance(table, ABCDataFrame) and dropna: –> 147 table = table.dropna(how=‘all’, axis=1) 148 149 return table

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\frame.py in dropna(self, axis, how, thresh, subset, inplace) 4596 raise TypeError(‘must specify how or thresh’) 4597 -> 4598 result = self.loc(axis=axis)[mask] 4599 4600 if inplace:

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\indexing.py in getitem(self, key) 1498 1499 maybe_callable = com.apply_if_callable(key, self.obj) -> 1500 return self._getitem_axis(maybe_callable, axis=axis) 1501 1502 def _is_scalar_access(self, key):

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis) 1857 axis = self.axis or 0 1858 -> 1859 if is_iterator(key): 1860 key = list(key) 1861

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\dtypes\inference.py in is_iterator(obj) 155 # Python 3 generators have 156 # next instead of next –> 157 return hasattr(obj, ‘next’) 158 159

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\generic.py in getattr(self, name) 5063 return object.getattribute(self, name) 5064 else: -> 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] 5067 return object.getattribute(self, name)

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\indexes\base.py in _can_hold_identifiers_and_holds_name(self, name) 3983 “”" 3984 if self.is_object() or self.is_categorical(): -> 3985 return name in self 3986 return False 3987

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\indexes\category.py in contains(self, key) 325 return self.hasnans 326 –> 327 return contains(self, key, container=self._engine) 328 329 @Appender(_index_shared_docs[‘contains’] % _index_doc_kwargs)

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\arrays\categorical.py in contains(cat, key, container) 186 # can’t be in container either. 187 try: –> 188 loc = cat.categories.get_loc(key) 189 except KeyError: 190 return False

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\indexes\interval.py in get_loc(self, key, method) 768 key = self._maybe_cast_slice_bound(key, ‘left’, None) 769 –> 770 start, stop = self._find_non_overlapping_monotonic_bounds(key) 771 772 if start is None or stop is None:

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\indexes\interval.py in _find_non_overlapping_monotonic_bounds(self, key) 715 # scalar or index-like 716 –> 717 start = self._searchsorted_monotonic(key, ‘left’) 718 stop = self._searchsorted_monotonic(key, ‘right’) 719 return start, stop

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\indexes\interval.py in _searchsorted_monotonic(self, label, side, exclude_label) 679 label = _get_prev_label(label) 680 –> 681 return sub_idx._searchsorted_monotonic(label, side) 682 683 def _get_loc_only_exact_matches(self, key):

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\indexes\base.py in _searchsorted_monotonic(self, label, side) 4754 def _searchsorted_monotonic(self, label, side=‘left’): 4755 if self.is_monotonic_increasing: -> 4756 return self.searchsorted(label, side=side) 4757 elif self.is_monotonic_decreasing: 4758 # np.searchsorted expects ascending sort order, have to reverse

~\AppData\Local\conda\conda\envs\main\lib\site-packages\pandas\core\base.py in searchsorted(self, value, side, sorter) 1499 def searchsorted(self, value, side=‘left’, sorter=None): 1500 # needs coercion on the key (DatetimeIndex does already) -> 1501 return self._values.searchsorted(value, side=side, sorter=sorter) 1502 1503 def drop_duplicates(self, keep=‘first’, inplace=False):

TypeError: Cannot cast array data from dtype(‘float64’) to dtype(‘<U32’) according to the rule ‘safe’

Read more comments on GitHub >

github_iconTop Results From Across the Web

What's new in 1.4.0 (January 22, 2022) - Pandas
Now the renaming checks if a.1 already exists when changing the name of the second column and jumps this index. The second column...
Read more >
What's new in 0.24.0 (January 25, 2019) - Pandas
pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of extension...
Read more >
What's new in 1.2.0 (December 26, 2020) - Pandas
When aggregating using concat() or the DataFrame constructor, pandas will now attempt to preserve index and column names whenever possible (GH35847).
Read more >
What's new in 0.25.0 (July 18, 2019) - Pandas
Bug in DataFrame.dropna() when the DataFrame has a CategoricalIndex containing Interval objects incorrectly raised a TypeError (GH25087) ...
Read more >
DataFrame.dropna - pandas 0.24.2 documentation
0, or 'index' : Drop rows which contain missing values. 1, or 'columns' : Drop columns which contain missing value. Deprecated since version...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found