Training with ray DaskEngine to_parquet fails with TypeError: cannot pickle 'pickle5.PickleBuffer' object
See original GitHub issueReproducible by training any model , e.x. examples/titanic/simple_model_training.py
If I uninstall ray or otherwise force Ludwig to use PandasEngine, training works.
With ray and dask installed, Ludwig uses DaskEngine and fails with error:
TypeError: cannot pickle 'pickle5.PickleBuffer' object
Stack trace:
File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/api.py", line 424, in train
preprocessed_data = self.preprocess(
File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/api.py", line 1268, in preprocess
preprocessed_data = preprocess_for_training(
File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/data/preprocessing.py", line 1433, in preprocess_for_training
processed = cache.put(*processed)
File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/data/cache/manager.py", line 46, in put
training_set = self.dataset_manager.save(
File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/data/dataset/ray.py", line 100, in save
self.backend.df_engine.to_parquet(dataset, cache_path)
File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/data/dataframe/dask.py", line 84, in to_parquet
df.to_parquet(
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/dask/dataframe/core.py", line 4127, in to_parquet
return to_parquet(self, path, *args, **kwargs)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/dask/dataframe/io/parquet/core.py", line 671, in to_parquet
out = out.compute(**compute_kwargs)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/dask/base.py", line 283, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/dask/base.py", line 565, in compute
results = schedule(dsk, keys, **kwargs)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/util/dask/scheduler.py", line 127, in ray_dask_get
result = ray_get_unpack(object_refs, progress_bar_actor=pb_actor)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/util/dask/scheduler.py", line 420, in ray_get_unpack
computed_result = get_result(object_refs)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/util/dask/scheduler.py", line 408, in get_result
return ray.get(object_refs)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/worker.py", line 1713, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ray::dask:to-parquet-7442830550491c434ff968e79a5f5f70 (pid=40584, ip=127.0.0.1)
At least one of the input arguments for this task could not be computed:
ray.exceptions.RayTaskError: ray::dask:('to-parquet-7442830550491c434ff968e79a5f5f70', 0) (pid=40584, ip=127.0.0.1)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/serialization.py", line 361, in serialize
return self._serialize_to_msgpack(value)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/serialization.py", line 341, in _serialize_to_msgpack
self._serialize_to_pickle5(metadata, python_objects)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/serialization.py", line 301, in _serialize_to_pickle5
raise e
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/serialization.py", line 297, in _serialize_to_pickle5
inband = pickle.dumps(
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 620, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle 'pickle5.PickleBuffer' object
Issue Analytics
- State:
- Created 2 years ago
- Comments:7
Top Results From Across the Web
[Bug] Can't pickle function objects error on ray.init · Issue #19938
I have no tensorflow on my system without virtual environment or pickle5 Problem disappeared with python==3.7.10. It might be a issue related to ......
Read more >Python: can't pickle module objects error - Stack Overflow
I can reproduce the error message this way: import cPickle class Foo(object): def __init__(self): self.mod=cPickle foo=Foo() with ...
Read more >pickle — Python object serialization — Python 3.11.1 ...
The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python ...
Read more >PicklingError with structured logger - Ray Serve
I'm using a minimal structured logger with ray/serve which is causing a ... PicklingError: Cannot pickle files that map to tty objects.
Read more >Multiprocessing and Pickle, How to Easily fix that?
How to serialize an object using both pickle and dill packages. ... tasks can't be pickled; it would raise an error failing to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Would you happen to know what engine dask is using to read parquet (fastparquet vs pyarrow)?
I ran into errors with pyarrow 5.0.0 and ludwig, but that was resolved after upgrading to pyarrow 6.0.1
Caused by: https://github.com/ray-project/ray/issues/22562
Fixed by: https://github.com/ludwig-ai/ludwig/pull/1763