question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Runtime error during phase 0 training

See original GitHub issue

Hi @dianchen96 and @bradyz

I am at the stage 0 of training an image agent. There is a runtime error that looks related to a bug of PyTorch with Python 3.5. I am able to train once I set num_workers=0 but I am wondering if you know another way around that does not sacrifice training speed. Thanks!

Please find the error messages below.

(lbc) peiyunh@ubuntu:~/code/lbc/training$ CUDA_VISIBLE_DEVICES=0 PYTHONPATH="/home/peiyunh/software/CARLA_0.9.6/PythonAPI" python train_image_phase0.py --log_dir ../ckpts/image_phase0 --pretrained --teacher_path ../ckpts/priveleged/model-128.th --dataset_dir ../data
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
augment with  None
Finished loading ../data/train. Length: 167789
augment with  None
Finished loading ../data/val. Length: 52600
Loading ResNet weights from : https://download.pytorch.org/models/resnet34-333f7ec4.pth
Epoch:   0%|                                                                                                     | 0/3 [00:00<?, ?it/sException in thread Thread-4:                                                                                    | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range


Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/peiyunh/miniconda3/envs/lbc/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
dotchencommented, Jul 20, 2020

We have released our birdview and phase 2 checkpoints, and we do not benchmark phase 0 model as its sole purpose is to make sure the gradient for phase 1 do not go NaN (due to the reprojection). For phase 1 model performance you can refer to the one on index.md.

0reactions
peiyunhcommented, Jul 20, 2020

Great to know. Will try that.

Do you by any chance plan to release a checkpoint model for each phase? I am very interested in reproducing the perforamnce and running diagnostics on the intermediate models. Having a reference would be really helpful for me to make sure I am on the right track.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Runtime Error during Phase Start Data Selection
A frequent error is that selection criteria for data reduction (for example selection groups) are wrongly defined which causes the database ...
Read more >
Runtime error while training the model in pytorch
i am using one of the pretrained models from torchvision.models to get the image features. Build and train a new feed-forward classifier using ......
Read more >
Expected condition, x and y to be on the same device, but ...
During the training phase I encountered a code error problem .RuntimeError: Expected condition, x and y to be on the same device, but...
Read more >
nnUNet/common_problems_and_solutions.md at master
nnU-Net training: RuntimeError: CUDA out of memory ... This message appears when the GPU memory is insufficient. For most datasets, nnU-Net uses about...
Read more >
Troubleshooting Runtime Error 0 when printing reports in ...
Runtime Error 0 is a general error. It usually indicates that there's a shortage of resources. Because there's no one-size-fits-all solution ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found