Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: unstack doesn't preserve categorical dtype

See original GitHub issue

Code Sample, a copy-pastable example if possible

idx = pd.MultiIndex.from_product([['A'], [0, 1]])
df = pd.DataFrame({'cat': pd.Categorical(['a', 'b'])}, index=idx)
df
Out[19]: 
    cat
A 0   a
  1   b

df.unstack()
Out[21]: 
  cat   
    0  1
A   a  b

Categorical dtype is lost:

df.unstack().dtypes
Out[22]: 
cat  0    object
     1    object
dtype: object

But it works ok for a Series:

df['cat'].unstack()
Out[24]: 
   0  1
A  a  b
df['cat'].unstack().dtypes
Out[25]: 
0    category
1    category
dtype: object

Expected Output

I believe df.unstack() should preserve categorical dtypes as df['cat'].unstack() does.

output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 5d791cc7d955c0b074ad602eb03fa32bd3e17503
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)_i5-2520M_CPU_@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.18.1+368.g5d791cc
nose: 1.3.7
pip: 8.1.2
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
...

This is related to #13743 (and PR #13854), though a bit more difficult to fix.

Issue Analytics

State:
Created 7 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

1reaction

mroeschkecommented, Aug 5, 2020

Sorry don’t have time for a call. But the output should be as noted in https://github.com/pandas-dev/pandas/issues/14018#issuecomment-650831541

1reaction

mroeschkecommented, Aug 4, 2020

Sure @Abhispy007. pandas/tests/frame/tests_reshape.py would be a good place this test

Read more comments on GitHub >

Top Results From Across the Web

Retaining categorical dtype upon dataframe concatenation

I just learned the hard way that pd.Categorical( df[col], categories=uc.categories ) will leave the category codes of df[col] unchanged, but ...

Using pandas categories properly is tricky... here's why

unstack () (which for the uninitiated, flips the index into the columns much like a pivot) moves the categorical index into the column...

Version 0.19.0 (October 2, 2016) — pandas 1.5.2 documentation

MultiIndex constructors, groupby and set_index preserve categorical dtypes. Function read_csv will progressively enumerate chunks. Sparse changes.

Unleash the Power of Pandas 'category' Dtype - Medium

If a machine learning package cannot directly handle categorical variables, conventionally, there are two conventions to encode categorical data ...

What's New — pandas 0.23.4 documentation

Bug in Series.clip() and DataFrame.clip() cannot accept list-like threshold ... the preservation of dtypes and index names, amongst other metadata.

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

BUG: Truncated "Y" conversion with to_timedelta

Rolling groupby should not maintain the by column in the resulting DataFrame