astype errors with Categorical
See original GitHub issueimport pandas as pd
cat = pd.Categorical(['CA', 'AL'])
df = pd.DataFrame([['CA', 'CA'], ['AL', 'CA']], index=['foo', 'bar'], columns=range(2))
df[1].astype(cat)
with 0.20.1, this raises:
if dtype == CategoricalDtype():
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This doesn’t appear to be quite the intended usage. .astype("category", categories=cat)
also fails, though .astype("category", categories=cat.categories)
is OK.
I suspect this is related to similar errors in trying to identify which columns of a DataFrame are categorical (possible repeat of #16659):
df.dtypes[colname] == 'category'
evaluates as True for categorical columns and raises TypeError: data type "category" not understood
for np.float64 columns.
df.dtypes == pd.Categorical
raises TypeError: Could not compare <type 'type'> type with Series
Also related: #15078
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
pandas astype categories not working - Stack Overflow
You now need to now pass it in via CategorialDtype as the astype method no longer accepts them from pandas.api.types import CategoricalDtype df...
Read more >pandas.DataFrame.astype — pandas 1.5.2 documentation
dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.
Read more >Using pandas categories properly is tricky... here's why
Pandas category errors, categories turn to strings, category merges change datatypes, empty groups when groupby on cats.
Read more >Python | Pandas DataFrame.astype() - GeeksforGeeks
Syntax: DataFrame.astype(dtype, copy=True, errors='raise', **kwargs) ... Change the Name column to categorical type and Age column to int64 ...
Read more >Using The Pandas Category Data Type
Introduction to pandas categorical data type and how to use it. ... a loop to convert all the columns we care about using...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Simple solution , which works for me always:
str( df.dtypes[colname] ) == ‘category’
You can use
select_dtypes
The problem with
df.dtypes == 'category'
is that this fails (as you have noted) for other dtypes (numpy datatypes), which is beyond pandas to fix (it is due to a specification how numpy dtype comparison works).You mean a pre-existing set of categories? Then you can do the
.astype("category", categories=cat.categories)