Behavior of Series.values when dtype is "category" is surprising
See original GitHub issueSay I make a category-type Series:
s = pd.Series(["a", "b", "c"], dtype="category")
I want to pass this to a function that expects a numpy array, so I use the values
property. According to the documentation:
s.values?
Type: property
String form: <property object at 0x106d84050>
Docstring:
Return Series as ndarray
Returns
-------
arr : numpy.ndarray
However:
s.values
[a, b, c]
Categories (3, object): [a < b < c]
Issue Analytics
- State:
- Created 9 years ago
- Comments:10 (10 by maintainers)
Top Results From Across the Web
Using pandas categories properly is tricky... here's why
dtype.categories which contain the unique categorical values, rather than on the whole series. Merging with categorical columns.
Read more >Categorical data — pandas 1.5.2 documentation
Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, number ......
Read more >Using The Pandas Category Data Type
category, NA, NA, Finite list of text values ... In addition, categorical data can yield some surprising behaviors in real world usage.
Read more >Pandas' read_csv with dtype=pd.CategoricalDtype() creates ...
This is a bit surprising because if I create a list of numbers and then generate a Series, without using read_csv, the categories...
Read more >What's new in 0.25.0 (July 18, 2019) - Joris Van den Bossche
Starting with the 0.25.x series of releases, pandas only supports Python 3.5.3 and higher. ... this was an undesired behavior and may have...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The docstring of
.values
is updated in the meantime (https://github.com/pandas-dev/pandas/pull/10477/files#diff-150fd3c5a732ae915ec47bc54a933c41R328), so closing this. The different output type is of course still confusing, but that is not something we are going to solve before pandas 2.i’ll clarify
I meant
unadvertised
. Its in the public-api.