question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Serialize/deserialize a Categorical whose values are taken from an enum

See original GitHub issue

Code Sample, a copy-pastable example if possible

should run as standalone

# Your code here
import pandas as pd
from enum import Enum, IntEnum, auto
import argparse

# Your code here
class ConnectionRoles(Enum):
    Client = auto()
    Server = auto()

csv_filename = "test.csv"

dtype_role = pd.api.types.CategoricalDtype(categories=list(ConnectionRoles), ordered=True)


df  = pd.DataFrame({ "tcpdest": [ConnectionRoles.Server] }, dtype=dtype_role)
print(df.info())
print(df)
df.to_csv(csv_filename)

loaded = pd.read_csv(csv_filename, dtype= {"tcpdest": dtype_role})
print(loaded.info())
print(loaded)

which outputs

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
tcpdest    1 non-null category
dtypes: category(1)
memory usage: 177.0 bytes
None
                  tcpdest
0  ConnectionRoles.Server
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 2 columns):
Unnamed: 0    1 non-null int64
tcpdest       0 non-null category
dtypes: category(1), int64(1)
memory usage: 185.0 bytes
None
   Unnamed: 0 tcpdest
0           0     NaN

The value ConnectionRoles.Server became nan through the serialization/deserialization process:

Problem description

I want to be able to serialize (to_csv) then read (read_csv) a CategoricalDType that takes its values from a python Enum (or IntEnum).

Actually the dtype I use in my project (contrary to the toy example) is:

dtype_role = pd.api.types.CategoricalDtype(categories=list(ConnectionRoles), ordered=True)


class ConnectionRoles(Enum):
    """
    Used to filter datasets and keep packets flowing in only one direction !
    Parser should accept --destination Client --destination Server if you want both.
    """
    Client = auto()
    Server = auto()

    def __str__(self):
        # Note that defining __str__ is required to get ArgumentParser's help output to include
        # the human readable (values) of Color
        return self.name

    @staticmethod
    def from_string(s):
        try:
            return ConnectionRoles[s]
        except KeyError:
            raise ValueError()

    def __next__(self):
        if self.value == 0:
            return ConnectionRoles.Server
        else:
            return ConnectionRoles.Client

I’ve search the tracker and the most relevant ones (but yet different) might be:

Expected Output

Output of pd.show_versions()

I am using v0.23.4 with a patch from master to fix some bug.

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None python: 3.7.2.final.0 python-bits: 64 OS: Linux OS-release: 4.19.0 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8

pandas: 0+unknown pytest: None pip: 18.1 setuptools: 40.6.3 Cython: None numpy: 1.16.0 scipy: 1.2.0 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.9 feather: None matplotlib: 3.0.2 openpyxl: 2.5.12 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: None lxml.etree: 4.2.6 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.14 pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
astrojuanlucommented, Feb 10, 2020

This also affects to_parquet, which doesn’t have a converters parameter.

0reactions
buhtzcommented, Aug 12, 2021

I am not sure if I fully understand this issue here. Is it the same problem described here? https://stackoverflow.com/q/68591255/4865723

Do you want to specify a columns dtype as an ordered Categorial while doing read_csv()?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Serialize and Deserialize Enums with Jackson
In this quick tutorial, we'll learn how to control the way Java Enums are serialized and deserialized with Jackson 2.
Read more >
Jackson - Serialize / Deserialize Enums with Integer fields
There is a very similar question here - Jackson: Serialize and deserialize enum values as integers which deals with using Jackson ...
Read more >
Serialize and deserialize enum values to custom string in C# ...
Serialization and deserialization to a custom string can be done with two steps. The first is to add an attribute to all enum...
Read more >
Enumerated type - Wikipedia
In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members, enumeral, or enumerators...
Read more >
Enums - one of the underrated features of Java - Banterly
Basically they define a type of which only certain values are ... then a lot of string comparisons will have to take place...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found