question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

died with <Signals.SIGSEGV: 11>.

See original GitHub issue

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is. I cloned the project in July and used it train a faster-rcnn model and now I wanna transfer this project to another machine which means the enviroment is different. I downloaded the latest project and copy the data and start to train but the error occurs and it didn’t show where the problem is. Plz help me with that. The enviroment difference is v100 -> p40 cuda 9.1 -> cuda9.0 torch1.1.0 ->torch1.1.0 no change python 3.7.3 -> 3.6.6

I run the “python setup.py develop” everytime I installed the new cloned project. but the error remains the same.

Reproduction

  1. What command or script did you run?
tools/dist_train ...
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  • OS: centos7
  • PyTorch version 1.1.0 installed before
  • GPU model p40
  • CUDA and CUDNN version 9.0
  • [optional] Other information that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

2019-09-21 17:51:55,877 - INFO - workflow: [('train', 1)], max: 15 epochs
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/python3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/python3/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module>
    main()
  File "/usr/local/python3/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'tools/train.py', '--local_rank=0', 'configs/pascal_voc/faster_rcnn_terrorism_cate13_badcase.py', '--launcher', 'pytorch', '--validate', '--work_dir', 'work_dirs/faster_rcnn_terrorism_cate13_badcase_0920']' died with <Signals.SIGSEGV: 11>.
A placeholder for trackback.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
ZwwWaynecommented, Nov 17, 2019

Hi @qpfhuan , Signals.SIGSEGV: 11 seems to be a common bug when you are using different libraries/environments for compiling and running the code. I find a similar issue in other projects such as nvvl, this might give you a hint to check the running/compiling libraries to fix the bug.

0reactions
DeannaJcommented, May 20, 2021

hi, have you fix this problem? I have no idea how to fix it. I have re-created a new virtual environment and that solved the problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Command died with <Signals.SIGSEGV: 11> - Accelerate
I trained my model on HPC with one node and 8 GPUs, but my program crashed with a fatal error, which seems nothing...
Read more >
Killed by signal 11(SIGSEGV) and/or 6(SIGABRT) [closed]
In Visual Studio 2012 it runs fine, but if I compile it with G++ (yes, for reasons above me, I have to use...
Read more >
Command 'openmc -s 8' died with <Signals.SIGSEGV: 11 ...
Hello! OpenMC group , For my case, when the “openmc.run()” running for some time ,then thow the error like : I want help...
Read more >
[SOLVED] What is mean by "program terminated with signal ...
Signal 11 (segmentation fault) means that the program accessed an unassigned memory location. It is usually a bug in the code. For example ......
Read more >
Full Text Bug Listing - Red Hat Bugzilla
[abrt] blueberry: run(): subprocess.py:524:run:subprocess.CalledProcessError: Command '['bt-device', '--info=28:33:34:63:92:01']' died with <Signals.SIGSEGV: 11> ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found