question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError

See original GitHub issue

I’m using a custom dataset and json to train the network. After a few iterations over the dataset, train.py crashes with the following error: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError However if i set --num_workers=0 or --num_workers=1 the training works properly.

This is how the traceback looks: Traceback (most recent call last): File "/home/amuresan/anaconda3/envs/pytorch1.3-gpu/lib/python3.7/multiprocessing/queues.py", line 236, in _feed obj = _ForkingPickler.dumps(obj) File "/home/amuresan/anaconda3/envs/pytorch1.3-gpu/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError

I’m using torch 1.3.1 with CUDA 10.2. I’ve tried multiple versions of pytorch and the results are the same.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
che85commented, Apr 30, 2020

I am experiencing the same issues with CUDA 10.2 and torch 1.4. The machine is definitely not running out of memory. The maximum consumption of memory I have seen before the exceptions occurred was 60Gib / 128Gib.

(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62601)     obj = _ForkingPickler.dumps(obj)
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62601)     cls(buf, protocol).dump(obj)
(pid=62601) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
(pid=62601) Traceback (most recent call last):
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62601)     obj = _ForkingPickler.dumps(obj)
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62601)     cls(buf, protocol).dump(obj)
(pid=62601) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
(pid=62607) Traceback (most recent call last):
(pid=62607)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62607)     obj = _ForkingPickler.dumps(obj)
(pid=62607)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62607)     cls(buf, protocol).dump(obj)
(pid=62607) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
(pid=62601) Train Epoch: 1 [10/120 (8%)] Loss: 0.831871
(pid=62607) Train Epoch: 1 [20/120 (17%)] Loss: 0.883739
(pid=62601) Train Epoch: 1 [20/120 (17%)] Loss: 0.754329
(pid=62607) Train Epoch: 1 [30/120 (25%)] Loss: 0.769951
(pid=62607) ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62607)  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62607)  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62607)  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62607)  
(pid=62601) ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62601)  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62601)  
(pid=62601) Traceback (most recent call last):
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62601)     obj = _ForkingPickler.dumps(obj)
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62601)     cls(buf, protocol).dump(obj)
(pid=62601) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
(pid=62607) Traceback (most recent call last):
(pid=62607)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62607)     obj = _ForkingPickler.dumps(obj)
(pid=62607)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62607)     cls(buf, protocol).dump(obj)
(pid=62607) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
0reactions
megasergcommented, Aug 14, 2020

@Traderain Check your /dev/shm size, and increase if needed 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Can't pickle <class 'MemoryError'>: it's not the same object as ...
Reason: 'PicklingError("Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError")'. I am using code like below,
Read more >
[Example code]-Pickle error while cloning a function
Coding example for the question Pickle error while cloning a function : Can't pickle it's not the same object as.
Read more >
multiprocessing.pool.MaybeEncodingError - CSDN博客
Reason: 'PicklingError("Can't pickle <class 'MemoryError'>: it's not the same object as builtins.Memor.
Read more >
[web2py] PicklingError: Can't pickle <class 'Foo'>
[web2py] PicklingError: Can't pickle <class 'Foo'>: it's not the same object as Foo ... The class isn't really named Foo, of course, but...
Read more >
Multiprocessing and Pickle, How to Easily fix that?
However, the multiprocess tasks can't be pickled; it would raise an error failing to pickle. That's because when dividing a single task over ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found