`_pickle.PicklingError: logger cannot be pickled` in multiprocessing `map`
See original GitHub issueDescribe the bug
When I map
with multiple processes, this error occurs. The .name
of the logger
that fails to pickle in the final line is datasets.fingerprint
.
File "~/project/dataset.py", line 204, in <dictcomp>
split: dataset.map(
File ".../site-packages/datasets/arrow_dataset.py", line 2489, in map
transformed_shards[index] = async_result.get()
File ".../site-packages/multiprocess/pool.py", line 771, in get
raise self._value
File ".../site-packages/multiprocess/pool.py", line 537, in _handle_tasks
put(task)
File ".../site-packages/multiprocess/connection.py", line 214, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File ".../site-packages/multiprocess/reduction.py", line 54, in dumps
cls(buf, protocol, *args, **kwds).dump(obj)
File ".../site-packages/dill/_dill.py", line 620, in dump
StockPickler.dump(self, obj)
File ".../pickle.py", line 487, in dump
self.save(obj)
File ".../pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File ".../pickle.py", line 902, in save_tuple
save(element)
File ".../pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File ".../site-packages/dill/_dill.py", line 1963, in save_function
_save_with_postproc(pickler, (_create_function, (
File ".../site-packages/dill/_dill.py", line 1140, in _save_with_postproc
pickler.save_reduce(*reduction, obj=obj)
File ".../pickle.py", line 717, in save_reduce
save(state)
File ".../pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File ".../pickle.py", line 887, in save_tuple
save(element)
File ".../pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File ".../site-packages/dill/_dill.py", line 1251, in save_module_dict
StockPickler.save_dict(pickler, obj)
File ".../pickle.py", line 972, in save_dict
self._batch_setitems(obj.items())
File ".../pickle.py", line 998, in _batch_setitems
save(v)
File ".../pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File ".../site-packages/dill/_dill.py", line 1963, in save_function
_save_with_postproc(pickler, (_create_function, (
File ".../site-packages/dill/_dill.py", line 1140, in _save_with_postproc
pickler.save_reduce(*reduction, obj=obj)
File ".../pickle.py", line 717, in save_reduce
save(state)
File ".../pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File ".../pickle.py", line 887, in save_tuple
save(element)
File ".../pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File ".../site-packages/dill/_dill.py", line 1251, in save_module_dict
StockPickler.save_dict(pickler, obj)
File ".../pickle.py", line 972, in save_dict
self._batch_setitems(obj.items())
File ".../pickle.py", line 998, in _batch_setitems
save(v)
File ".../pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File ".../site-packages/dill/_dill.py", line 1963, in save_function
_save_with_postproc(pickler, (_create_function, (
File ".../site-packages/dill/_dill.py", line 1154, in _save_with_postproc
pickler._batch_setitems(iter(source.items()))
File ".../pickle.py", line 998, in _batch_setitems
save(v)
File ".../pickle.py", line 578, in save
rv = reduce(self.proto)
File ".../logging/__init__.py", line 1774, in __reduce__
raise pickle.PicklingError('logger cannot be pickled')
_pickle.PicklingError: logger cannot be pickled
Steps to reproduce the bug
Sorry I failed to have a minimal reproducible example, but the offending line on my end is
dataset.map(
lambda examples: self.tokenize(examples), # this doesn't matter, lambda e: [1] * len(...) also breaks. In fact I'm pretty sure it breaks before executing this lambda
batched=True,
num_proc=4,
)
This does work when num_proc=1
, so it’s likely a multiprocessing thing.
Expected results
map
succeeds
Actual results
The error trace above.
Environment info
datasets
version: 1.16.1 and 2.5.1 both failed- Platform: Ubuntu 20.04.4 LTS
- Python version: 3.10.4
- PyArrow version: 9.0.0
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
multiprocessing errors: logger cannot be pickled, EOFError
multiprocessing uses pickle to send arguments to processes - but problem is picke can send normal data but not running object like cap...
Read more >Multiprocessing and Pickle, How to Easily fix that?
Pickling or Serialization transforms from object state into a series of bits — the object could be methods, data, class, API end-points, etc....
Read more >Python “multiprocessing” “Can't pickle…” - TedChen - Medium
'Can't pickle local object' error. It's easy to know that the 'subprocess_function' can't be pickled because it's a local object inside the decorator...
Read more >Lib/logging/__init__.py - platform/external/python/cpython3
If you don't want multiprocessing information in the log, ... Mapping rather than, as before, dict. ... PicklingError('logger cannot be pickled').
Read more >Issue 29168: multiprocessing pickle error - Python tracker
TypeError: can't pickle _thread.RLock objects Furthermore, also for customized logging handler classes it doesn't work anymore. class ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I did some binary search and seems like the root cause is either
multiprocess
ordill
. python 3.10 is fine. Specifically:multiprocess==0.70.12.2, dill==0.3.4
: worksmultiprocess==0.70.12.2, dill==0.3.5.1
: doesn’t workmultiprocess==0.70.13, dill==0.3.5.1
: doesn’t workmultiprocess==0.70.13, dill==0.3.4
: can’t test,multiprocess==0.70.13
requiresdill>=0.3.5.1
I will pin their versions on my end. I don’t have enough knowledge of how python multiprocessing works to debug this, but ideally there could be a fix. It’s also possible that I’m doing something wrong in my code, but again the
.name
of the logger that failed to pickle isdatasets.fingerprint
, which I’m not using directly.Ok I see, not sure why it triggers this error though, in
logging.py
the code ishttps://github.com/python/cpython/blob/c9da063e32725a66495e4047b8a5ed13e72d9e8e/Lib/logging/__init__.py#L1769-L1775
and on my side it works on 3.10 with dill 0.3.5.1 and multiprocess 0.70.13
Could you try to run this code ?
Are you in an environment where the loggers are instantiated differently ? Can you check the source code of
logging.Logger.__reduce__
in".../logging/__init__.py", line 1774
?