Serialization issues with dataclasses and IntEnum
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Mint 19
- Ray installed from (source or binary): binary
- Ray version: 0.6.2
- Python version: 3.6.5
Problems
We encountered multiple issues with the serialization. The first had to deal with a complex object that referred to dataclasses (with the @dataclass decorator). This object could be pickled and unpickled if performing the operation manually but a “MappingProxy” error appeared when trying with ray. We had to ‘flatten’ our object to a dictionary of tuples, which required quite some work and leads to strong divergences between repo branches.
Just trying to define a:
@dataclass def Foo(): pass
And calling Foo() in the _setup of the trainable class failed.
The second issue is related to the IntEnum type from the enum package. Any IntEnum instance passed to the trainable class would lead to a failure. We tried to write a reduce method but it didn’t help. We were able to pickle our IntEnum instances but were not able to use them with ray. We have temporarily changed them to dictionaries, but again this is not sustainable in our code base.
Source code / logs
Issue 1: Mapping Proxy Error File “/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/pickle.py”, line 496, in save rv = reduce(self.proto) TypeError: can’t pickle mappingproxy objects
Issue 2: IntEnum (note: ray.get(ray.put()) did not work) Traceback (most recent call last): File “/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/tune/trial_runner.py”, line 378, in _process_events result = self.trial_executor.fetch_result(trial) File “/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py”, line 228, in fetch_result result = ray.get(trial_future[0]) File “/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/worker.py”, line 2211, in get raise value ray.worker.RayTaskError: [36mray_MainArgs:train()[39m (pid=21123, host=jessica-Z370P-D3) File “/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ray/utils.py”, line 437, in _wrapper return orig_attr(*args, **kwargs) File “pyarrow/_plasma.pyx”, line 531, in pyarrow._plasma.PlasmaClient.get File “pyarrow/serialization.pxi”, line 448, in pyarrow.lib.deserialize File “pyarrow/serialization.pxi”, line 411, in pyarrow.lib.deserialize_from File “pyarrow/serialization.pxi”, line 262, in pyarrow.lib.SerializedPyObject.deserialize File “pyarrow/serialization.pxi”, line 171, in pyarrow.lib.SerializationContext._deserialize_callback File “/home/jessica/anaconda3/envs/my-rdkit-env/lib/python3.6/enum.py”, line 135, in new enum_members = {k: classdict[k] for k in classdict._member_names} AttributeError: ‘dict’ object has no attribute ‘_member_names’
Thank you!
Issue Analytics
- State:
- Created 5 years ago
- Comments:16 (9 by maintainers)
Top GitHub Comments
I created a PR to fix the dataclass pickling issue upstream: https://github.com/cloudpipe/cloudpickle/pull/245
This is fixed now.