question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training with ray DaskEngine to_parquet fails with TypeError: cannot pickle 'pickle5.PickleBuffer' object

See original GitHub issue

Reproducible by training any model , e.x. examples/titanic/simple_model_training.py

If I uninstall ray or otherwise force Ludwig to use PandasEngine, training works.

With ray and dask installed, Ludwig uses DaskEngine and fails with error: TypeError: cannot pickle 'pickle5.PickleBuffer' object

Stack trace:

  File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/api.py", line 424, in train
    preprocessed_data = self.preprocess(
  File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/api.py", line 1268, in preprocess
    preprocessed_data = preprocess_for_training(
  File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/data/preprocessing.py", line 1433, in preprocess_for_training
    processed = cache.put(*processed)
  File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/data/cache/manager.py", line 46, in put
    training_set = self.dataset_manager.save(
  File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/data/dataset/ray.py", line 100, in save
    self.backend.df_engine.to_parquet(dataset, cache_path)
  File "/Users/daniel/Desktop/github/dantreiman-ludwig/ludwig/data/dataframe/dask.py", line 84, in to_parquet
    df.to_parquet(
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/dask/dataframe/core.py", line 4127, in to_parquet
    return to_parquet(self, path, *args, **kwargs)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/dask/dataframe/io/parquet/core.py", line 671, in to_parquet
    out = out.compute(**compute_kwargs)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/dask/base.py", line 283, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/dask/base.py", line 565, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/util/dask/scheduler.py", line 127, in ray_dask_get
    result = ray_get_unpack(object_refs, progress_bar_actor=pb_actor)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/util/dask/scheduler.py", line 420, in ray_get_unpack
    computed_result = get_result(object_refs)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/util/dask/scheduler.py", line 408, in get_result
    return ray.get(object_refs)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/worker.py", line 1713, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ray::dask:to-parquet-7442830550491c434ff968e79a5f5f70 (pid=40584, ip=127.0.0.1)
  At least one of the input arguments for this task could not be computed:
ray.exceptions.RayTaskError: ray::dask:('to-parquet-7442830550491c434ff968e79a5f5f70', 0) (pid=40584, ip=127.0.0.1)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/serialization.py", line 361, in serialize
    return self._serialize_to_msgpack(value)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/serialization.py", line 341, in _serialize_to_msgpack
    self._serialize_to_pickle5(metadata, python_objects)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/serialization.py", line 301, in _serialize_to_pickle5
    raise e
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/serialization.py", line 297, in _serialize_to_pickle5
    inband = pickle.dumps(
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/Users/daniel/mambaforge/envs/ludwig39-dev/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 620, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'pickle5.PickleBuffer' object

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
hungcscommented, Jan 25, 2022

Would you happen to know what engine dask is using to read parquet (fastparquet vs pyarrow)?

I ran into errors with pyarrow 5.0.0 and ludwig, but that was resolved after upgrading to pyarrow 6.0.1

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Bug] Can't pickle function objects error on ray.init · Issue #19938
I have no tensorflow on my system without virtual environment or pickle5 Problem disappeared with python==3.7.10. It might be a issue related to ......
Read more >
Python: can't pickle module objects error - Stack Overflow
I can reproduce the error message this way: import cPickle class Foo(object): def __init__(self): self.mod=cPickle foo=Foo() with ...
Read more >
pickle — Python object serialization — Python 3.11.1 ...
The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python ...
Read more >
PicklingError with structured logger - Ray Serve
I'm using a minimal structured logger with ray/serve which is causing a ... PicklingError: Cannot pickle files that map to tty objects.
Read more >
Multiprocessing and Pickle, How to Easily fix that?
How to serialize an object using both pickle and dill packages. ... tasks can't be pickled; it would raise an error failing to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found