question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: unstack doesn't preserve categorical dtype

See original GitHub issue

Code Sample, a copy-pastable example if possible

idx = pd.MultiIndex.from_product([['A'], [0, 1]])
df = pd.DataFrame({'cat': pd.Categorical(['a', 'b'])}, index=idx)
df
Out[19]: 
    cat
A 0   a
  1   b

df.unstack()
Out[21]: 
  cat   
    0  1
A   a  b

Categorical dtype is lost:

df.unstack().dtypes
Out[22]: 
cat  0    object
     1    object
dtype: object

But it works ok for a Series:

df['cat'].unstack()
Out[24]: 
   0  1
A  a  b
df['cat'].unstack().dtypes
Out[25]: 
0    category
1    category
dtype: object

Expected Output

I believe df.unstack() should preserve categorical dtypes as df['cat'].unstack() does.

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 5d791cc7d955c0b074ad602eb03fa32bd3e17503
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)_i5-2520M_CPU_@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.18.1+368.g5d791cc
nose: 1.3.7
pip: 8.1.2
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
...

This is related to #13743 (and PR #13854), though a bit more difficult to fix.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mroeschkecommented, Aug 5, 2020

Sorry don’t have time for a call. But the output should be as noted in https://github.com/pandas-dev/pandas/issues/14018#issuecomment-650831541

1reaction
mroeschkecommented, Aug 4, 2020

Sure @Abhispy007. pandas/tests/frame/tests_reshape.py would be a good place this test

Read more comments on GitHub >

github_iconTop Results From Across the Web

Retaining categorical dtype upon dataframe concatenation
I just learned the hard way that pd.Categorical( df[col], categories=uc.categories ) will leave the category codes of df[col] unchanged, but ...
Read more >
Using pandas categories properly is tricky... here's why
unstack () (which for the uninitiated, flips the index into the columns much like a pivot) moves the categorical index into the column...
Read more >
Version 0.19.0 (October 2, 2016) — pandas 1.5.2 documentation
MultiIndex constructors, groupby and set_index preserve categorical dtypes. Function read_csv will progressively enumerate chunks. Sparse changes.
Read more >
Unleash the Power of Pandas 'category' Dtype - Medium
If a machine learning package cannot directly handle categorical variables, conventionally, there are two conventions to encode categorical data ...
Read more >
What's New — pandas 0.23.4 documentation
Bug in Series.clip() and DataFrame.clip() cannot accept list-like threshold ... the preservation of dtypes and index names, amongst other metadata.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found