Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
See original GitHub issueI’m using a custom dataset and json to train the network.
After a few iterations over the dataset, train.py crashes with the following error:
Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
However if i set --num_workers=0
or --num_workers=1
the training works properly.
This is how the traceback looks:
Traceback (most recent call last): File "/home/amuresan/anaconda3/envs/pytorch1.3-gpu/lib/python3.7/multiprocessing/queues.py", line 236, in _feed obj = _ForkingPickler.dumps(obj) File "/home/amuresan/anaconda3/envs/pytorch1.3-gpu/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <class 'MemoryError'>: it's not the same object as builtins.MemoryError
I’m using torch 1.3.1 with CUDA 10.2. I’ve tried multiple versions of pytorch and the results are the same.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
I am experiencing the same issues with CUDA 10.2 and torch 1.4. The machine is definitely not running out of memory. The maximum consumption of memory I have seen before the exceptions occurred was 60Gib / 128Gib.
@Traderain Check your
/dev/shm
size, and increase if needed 😃