BUG: DataFrame.where with category dtype
See original GitHub issueCode Sample (it is copy-pastable)
import pandas as pd, numpy as np
df = pd.DataFrame(np.arange(2*3).reshape(2,3), columns=list('abc'))
mask = np.random.rand(*df.shape) < 0.5
df.where(mask)
# Output is correct:
# a b c
# 0 NaN NaN 2.0
# 1 3.0 NaN NaN
df.a = df.a.astype('category')
df.b = df.b.astype('category')
df.c = df.c.astype('category')
df.where(mask)
# ValueError: Wrong number of items passed 2, placement implies 1
# Expected output: the same as before, but now with dtype `category`.
df.a.where(mask[:,0])
# 0 NaN
# 1 3.0
# Name: a, dtype: float64
# should stay in dtype category
df.a.where(mask[:,0], other=None)
# 0 None
# 1 3
# Name: a, dtype: object
# Expected output: should stay in dtype category
Problem description
df.where
should work with all dtypes, the documentation doesn’t say it works only for some dtypes. Also, NaNs are already correctly handled as missing data in pd.Series
of type ‘category’, so one should be able to assign NaNs to them. Same with converting the dtype.
While writing this report I found that doing it column-by-column works correctly, so I’ll use that as a workaround.
Output of pd.show_versions()
INSTALLED VERSIONS [1/1839]
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-81-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.20.2 pytest: None pip: 9.0.1 setuptools: 36.0.1 Cython: None numpy: 1.13.1 scipy: 0.19.0 xarray: None IPython: 6.1.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
Ubuntu lsb_release -a
:
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.2 LTS Release: 16.04 Codename: xenial
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
@ganevgv : I would try opening a PR with this test, but add print statements to confirm whether the dtype is actually changing. It might actually be a platform thing where the dtype is already
int32
.can you make a separate issue about the astype (and remove from the top from here).