Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError

See original GitHub issue

I’m using a custom dataset and json to train the network. After a few iterations over the dataset, train.py crashes with the following error: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError However if i set --num_workers=0 or --num_workers=1 the training works properly.

This is how the traceback looks: Traceback (most recent call last): File "/home/amuresan/anaconda3/envs/pytorch1.3-gpu/lib/python3.7/multiprocessing/queues.py", line 236, in _feed obj = _ForkingPickler.dumps(obj) File "/home/amuresan/anaconda3/envs/pytorch1.3-gpu/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError

I’m using torch 1.3.1 with CUDA 10.2. I’ve tried multiple versions of pytorch and the results are the same.

Issue Analytics

State:
Created 4 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

che85commented, Apr 30, 2020

I am experiencing the same issues with CUDA 10.2 and torch 1.4. The machine is definitely not running out of memory. The maximum consumption of memory I have seen before the exceptions occurred was 60Gib / 128Gib.

(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62601)     obj = _ForkingPickler.dumps(obj)
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62601)     cls(buf, protocol).dump(obj)
(pid=62601) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
(pid=62601) Traceback (most recent call last):
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62601)     obj = _ForkingPickler.dumps(obj)
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62601)     cls(buf, protocol).dump(obj)
(pid=62601) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
(pid=62607) Traceback (most recent call last):
(pid=62607)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62607)     obj = _ForkingPickler.dumps(obj)
(pid=62607)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62607)     cls(buf, protocol).dump(obj)
(pid=62607) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
(pid=62601) Train Epoch: 1 [10/120 (8%)] Loss: 0.831871
(pid=62607) Train Epoch: 1 [20/120 (17%)] Loss: 0.883739
(pid=62601) Train Epoch: 1 [20/120 (17%)] Loss: 0.754329
(pid=62607) Train Epoch: 1 [30/120 (25%)] Loss: 0.769951
(pid=62607) ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62607)  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62607)  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62607)  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62607)  
(pid=62601) ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62601)  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
(pid=62601)  
(pid=62601) Traceback (most recent call last):
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62601)     obj = _ForkingPickler.dumps(obj)
(pid=62601)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62601)     cls(buf, protocol).dump(obj)
(pid=62601) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
(pid=62607) Traceback (most recent call last):
(pid=62607)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
(pid=62607)     obj = _ForkingPickler.dumps(obj)
(pid=62607)   File "/home/herzc/.conda/envs/torch1.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
(pid=62607)     cls(buf, protocol).dump(obj)
(pid=62607) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError

0reactions

megasergcommented, Aug 14, 2020

@Traderain Check your /dev/shm size, and increase if needed 😃