Serialize/deserialize a Categorical whose values are taken from an enum
See original GitHub issueCode Sample, a copy-pastable example if possible
should run as standalone
# Your code here
import pandas as pd
from enum import Enum, IntEnum, auto
import argparse
# Your code here
class ConnectionRoles(Enum):
Client = auto()
Server = auto()
csv_filename = "test.csv"
dtype_role = pd.api.types.CategoricalDtype(categories=list(ConnectionRoles), ordered=True)
df = pd.DataFrame({ "tcpdest": [ConnectionRoles.Server] }, dtype=dtype_role)
print(df.info())
print(df)
df.to_csv(csv_filename)
loaded = pd.read_csv(csv_filename, dtype= {"tcpdest": dtype_role})
print(loaded.info())
print(loaded)
which outputs
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
tcpdest 1 non-null category
dtypes: category(1)
memory usage: 177.0 bytes
None
tcpdest
0 ConnectionRoles.Server
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 2 columns):
Unnamed: 0 1 non-null int64
tcpdest 0 non-null category
dtypes: category(1), int64(1)
memory usage: 185.0 bytes
None
Unnamed: 0 tcpdest
0 0 NaN
The value ConnectionRoles.Server
became nan through the serialization/deserialization process:
Problem description
I want to be able to serialize (to_csv) then read (read_csv) a CategoricalDType that takes its values from a python Enum (or IntEnum).
Actually the dtype I use in my project (contrary to the toy example) is:
dtype_role = pd.api.types.CategoricalDtype(categories=list(ConnectionRoles), ordered=True)
class ConnectionRoles(Enum):
"""
Used to filter datasets and keep packets flowing in only one direction !
Parser should accept --destination Client --destination Server if you want both.
"""
Client = auto()
Server = auto()
def __str__(self):
# Note that defining __str__ is required to get ArgumentParser's help output to include
# the human readable (values) of Color
return self.name
@staticmethod
def from_string(s):
try:
return ConnectionRoles[s]
except KeyError:
raise ValueError()
def __next__(self):
if self.value == 0:
return ConnectionRoles.Server
else:
return ConnectionRoles.Client
I’ve search the tracker and the most relevant ones (but yet different) might be:
- https://github.com/pandas-dev/pandas/issues/20498
- my past issue https://github.com/pandas-dev/pandas/issues/22262
Expected Output
Output of pd.show_versions()
I am using v0.23.4 with a patch from master to fix some bug.
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None python: 3.7.2.final.0 python-bits: 64 OS: Linux OS-release: 4.19.0 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8
pandas: 0+unknown pytest: None pip: 18.1 setuptools: 40.6.3 Cython: None numpy: 1.16.0 scipy: 1.2.0 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.9 feather: None matplotlib: 3.0.2 openpyxl: 2.5.12 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: None lxml.etree: 4.2.6 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.14 pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
This also affects
to_parquet
, which doesn’t have aconverters
parameter.I am not sure if I fully understand this issue here. Is it the same problem described here? https://stackoverflow.com/q/68591255/4865723
Do you want to specify a columns dtype as an ordered Categorial while doing
read_csv()
?