question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

astype errors with Categorical

See original GitHub issue
import pandas as pd
cat = pd.Categorical(['CA', 'AL'])
df = pd.DataFrame([['CA', 'CA'], ['AL', 'CA']], index=['foo', 'bar'], columns=range(2))
df[1].astype(cat)

with 0.20.1, this raises:

    if dtype == CategoricalDtype():
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This doesn’t appear to be quite the intended usage. .astype("category", categories=cat) also fails, though .astype("category", categories=cat.categories) is OK.

I suspect this is related to similar errors in trying to identify which columns of a DataFrame are categorical (possible repeat of #16659):

df.dtypes[colname] == 'category' evaluates as True for categorical columns and raises TypeError: data type "category" not understood for np.float64 columns.

df.dtypes == pd.Categorical raises TypeError: Could not compare <type 'type'> type with Series

Also related: #15078

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

11reactions
rafiqhasancommented, Jun 30, 2018

Simple solution , which works for me always:

str( df.dtypes[colname] ) == ‘category’

0reactions
jorisvandenbosschecommented, Jun 15, 2017

Is there a canonical way to pick out categorical columns within a DataFrame?

You can use select_dtypes

The problem with df.dtypes == 'category' is that this fails (as you have noted) for other dtypes (numpy datatypes), which is beyond pandas to fix (it is due to a specification how numpy dtype comparison works).

Is there a way to pin a pre-existing Category to a Series?

You mean a pre-existing set of categories? Then you can do the .astype("category", categories=cat.categories)

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas astype categories not working - Stack Overflow
You now need to now pass it in via CategorialDtype as the astype method no longer accepts them from pandas.api.types import CategoricalDtype df...
Read more >
pandas.DataFrame.astype — pandas 1.5.2 documentation
dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.
Read more >
Using pandas categories properly is tricky... here's why
Pandas category errors, categories turn to strings, category merges change datatypes, empty groups when groupby on cats.
Read more >
Python | Pandas DataFrame.astype() - GeeksforGeeks
Syntax: DataFrame.astype(dtype, copy=True, errors='raise', **kwargs) ... Change the Name column to categorical type and Age column to int64 ...
Read more >
Using The Pandas Category Data Type
Introduction to pandas categorical data type and how to use it. ... a loop to convert all the columns we care about using...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found