question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

test_inception_v3_eval and squeezenet1_0 segmentation fault on latest main

See original GitHub issue

🐛 Describe the bug

After the nightly release of the PyTorch on 20211014, the main branch of TorchVision started failing with segmentation fault errors:

test/test_models.py::test_inception_v3_eval Fatal Python error: Segmentation fault

Thread 0x00007f4edb7bf700 (most recent call first):

Current thread 0x00007f50c837e740 (most recent call first):
  File "/root/project/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1106 in _call_impl
  File "/root/project/test/test_models.py", line 106 in assert_export_import_module
  File "/root/project/test/test_models.py", line 144 in _check_jit_scriptable
  File "/root/project/test/test_models.py", line 352 in test_inception_v3_eval
  File "/root/project/env/lib/python3.6/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/root/project/env/lib/python3.6/site-packages/_pytest/python.py", line 1641 in runtest
  File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 255 in <lambda>
  File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 311 in from_call
  File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 255 in call_runtest_hook
  File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 215 in call_and_report
  File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
  File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/root/project/env/lib/python3.6/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/root/project/env/lib/python3.6/site-packages/_pytest/main.py", line 323 in _main
  File "/root/project/env/lib/python3.6/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/root/project/env/lib/python3.6/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
  File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/root/project/env/lib/python3.6/site-packages/_pytest/config/__init__.py", line 163 in main
  File "/root/project/env/lib/python3.6/site-packages/_pytest/config/__init__.py", line 185 in console_main
  File "/root/project/env/bin/pytest", line 11 in <module>
.circleci/unittest/linux/scripts/run_test.sh: line 10:  1357 Segmentation fault      (core dumped) 

Skipping the above test, we see that more models segment fault:

test/test_models.py::test_classification_model[cuda-squeezenet1_0] Fatal Python error: Segmentation fault

Thread 0x00007ff6acfd1700 (most recent call first):
<no Python frame>

Current thread 0x00007ff7fcc98740 (most recent call first):
  File "/home/circleci/project/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1106 in _call_impl
  File "/home/circleci/project/test/test_models.py", line 107 in assert_export_import_module
  File "/home/circleci/project/test/test_models.py", line 145 in _check_jit_scriptable
  File "/home/circleci/project/test/test_models.py", line 468 in test_classification_model
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/python.py", line 1641 in runtest
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 255 in <lambda>
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 311 in from_call
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 215 in call_and_report
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/main.py", line 323 in _main
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/config/__init__.py", line 162 in main
  File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/config/__init__.py", line 185 in console_main
  File "/home/circleci/project/env/bin/pytest", line 11 in <module>
.circleci/unittest/linux/scripts/run_test.sh: line 10:    58 Segmentation fault      

The failures appear in all platforms and python version.

Versions

Latest main: 8fe72d131d6d2862b9db1efb3ffa2a6ded15efc8 PyTorch nightly: 20211014

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
malfetcommented, Oct 14, 2021

Likely caused by https://github.com/pytorch/pytorch/pull/66273, which is already reverted on pytorch main trunk, but still present in nightly

0reactions
jjsjann123commented, Oct 15, 2021

I tried to run torchvision test in the new PR but things seems to be working fine. The bt posted does look concerning.

Dumb question, is there any knob to change the executor? I’m running tests with this (vvv), but it doesn’t seem to dump log from profiling executor (which is the executor that uses the code path in backtrace).

PYTORCH_JIT_LOG_LEVEL="profiling_graph_executor_impl" pytest test_models.py -k test_inception_v3_eval

I also tried to run the same command as with the failing CI, no luck there neither (no log nor repro)

pytest --cov=torchvision --junitxml=test-results/junit.xml -v --durations 20 -k test_inception_v3_eval test --ignore=test/test_datasets_download.py
Read more comments on GitHub >

github_iconTop Results From Across the Web

Identify what's causing segmentation faults (segfaults)
A segmentation fault (aka segfault) is a common condition that causes programs to crash; they are often associated with a file named core...
Read more >
Segmentation Fault before reaching main - C++ - Stack Overflow
The problem I'm running into is that my code raises a segmentation fault, but gdb finds that it raises the following error upon...
Read more >
Segmentation Fault During Elaboration with ModelSim SE - Intel
If you run the ld or elab commandsin the msim_setup.tcl script for a UniPHY-based IPcore in ModelSim SE, you may encounter a segmentation...
Read more >
Segmentation fault - Wikipedia
Segmentation faults are a common class of error in programs written in languages like C that provide low-level memory access and few to...
Read more >
Common Causes of Segmentation Faults (Segfaults)
A segmentation fault (often called a segfault) can occur if a program you are running attempts to access an invalid memory location.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found