BUG: unstack doesn't preserve categorical dtype
See original GitHub issueCode Sample, a copy-pastable example if possible
idx = pd.MultiIndex.from_product([['A'], [0, 1]])
df = pd.DataFrame({'cat': pd.Categorical(['a', 'b'])}, index=idx)
df
Out[19]:
cat
A 0 a
1 b
df.unstack()
Out[21]:
cat
0 1
A a b
Categorical dtype is lost:
df.unstack().dtypes
Out[22]:
cat 0 object
1 object
dtype: object
But it works ok for a Series:
df['cat'].unstack()
Out[24]:
0 1
A a b
df['cat'].unstack().dtypes
Out[25]:
0 category
1 category
dtype: object
Expected Output
I believe df.unstack()
should preserve categorical dtypes as df['cat'].unstack()
does.
output of pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: 5d791cc7d955c0b074ad602eb03fa32bd3e17503
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)_i5-2520M_CPU_@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.18.1+368.g5d791cc
nose: 1.3.7
pip: 8.1.2
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
...
This is related to #13743 (and PR #13854), though a bit more difficult to fix.
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Retaining categorical dtype upon dataframe concatenation
I just learned the hard way that pd.Categorical( df[col], categories=uc.categories ) will leave the category codes of df[col] unchanged, but ...
Read more >Using pandas categories properly is tricky... here's why
unstack () (which for the uninitiated, flips the index into the columns much like a pivot) moves the categorical index into the column...
Read more >Version 0.19.0 (October 2, 2016) — pandas 1.5.2 documentation
MultiIndex constructors, groupby and set_index preserve categorical dtypes. Function read_csv will progressively enumerate chunks. Sparse changes.
Read more >Unleash the Power of Pandas 'category' Dtype - Medium
If a machine learning package cannot directly handle categorical variables, conventionally, there are two conventions to encode categorical data ...
Read more >What's New — pandas 0.23.4 documentation
Bug in Series.clip() and DataFrame.clip() cannot accept list-like threshold ... the preservation of dtypes and index names, amongst other metadata.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sorry don’t have time for a call. But the output should be as noted in https://github.com/pandas-dev/pandas/issues/14018#issuecomment-650831541
Sure @Abhispy007.
pandas/tests/frame/tests_reshape.py
would be a good place this test