Runtime error during phase 0 training
See original GitHub issueHi @dianchen96 and @bradyz
I am at the stage 0 of training an image agent. There is a runtime error that looks related to a bug of PyTorch with Python 3.5. I am able to train once I set num_workers=0
but I am wondering if you know another way around that does not sacrifice training speed. Thanks!
Please find the error messages below.
(lbc) peiyunh@ubuntu:~/code/lbc/training$ CUDA_VISIBLE_DEVICES=0 PYTHONPATH="/home/peiyunh/software/CARLA_0.9.6/PythonAPI" python train_image_phase0.py --log_dir ../ckpts/image_phase0 --pretrained --teacher_path ../ckpts/priveleged/model-128.th --dataset_dir ../data
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
augment with None
Finished loading ../data/train. Length: 167789
augment with None
Finished loading ../data/val. Length: 52600
Loading ResNet weights from : https://download.pytorch.org/models/resnet34-333f7ec4.pth
Epoch: 0%| | 0/3 [00:00<?, ?it/sException in thread Thread-4: | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Runtime Error during Phase Start Data Selection
A frequent error is that selection criteria for data reduction (for example selection groups) are wrongly defined which causes the database ...
Read more >Runtime error while training the model in pytorch
i am using one of the pretrained models from torchvision.models to get the image features. Build and train a new feed-forward classifier using ......
Read more >Expected condition, x and y to be on the same device, but ...
During the training phase I encountered a code error problem .RuntimeError: Expected condition, x and y to be on the same device, but...
Read more >nnUNet/common_problems_and_solutions.md at master
nnU-Net training: RuntimeError: CUDA out of memory ... This message appears when the GPU memory is insufficient. For most datasets, nnU-Net uses about...
Read more >Troubleshooting Runtime Error 0 when printing reports in ...
Runtime Error 0 is a general error. It usually indicates that there's a shortage of resources. Because there's no one-size-fits-all solution ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We have released our birdview and phase 2 checkpoints, and we do not benchmark phase 0 model as its sole purpose is to make sure the gradient for phase 1 do not go NaN (due to the reprojection). For phase 1 model performance you can refer to the one on index.md.
Great to know. Will try that.
Do you by any chance plan to release a checkpoint model for each phase? I am very interested in reproducing the perforamnce and running diagnostics on the intermediate models. Having a reference would be really helpful for me to make sure I am on the right track.