test_inception_v3_eval and squeezenet1_0 segmentation fault on latest main
See original GitHub issue🐛 Describe the bug
After the nightly release of the PyTorch on 20211014, the main branch of TorchVision started failing with segmentation fault errors:
test/test_models.py::test_inception_v3_eval Fatal Python error: Segmentation fault
Thread 0x00007f4edb7bf700 (most recent call first):
Current thread 0x00007f50c837e740 (most recent call first):
File "/root/project/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1106 in _call_impl
File "/root/project/test/test_models.py", line 106 in assert_export_import_module
File "/root/project/test/test_models.py", line 144 in _check_jit_scriptable
File "/root/project/test/test_models.py", line 352 in test_inception_v3_eval
File "/root/project/env/lib/python3.6/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
File "/root/project/env/lib/python3.6/site-packages/_pytest/python.py", line 1641 in runtest
File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 255 in <lambda>
File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 311 in from_call
File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 255 in call_runtest_hook
File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 215 in call_and_report
File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 126 in runtestprotocol
File "/root/project/env/lib/python3.6/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
File "/root/project/env/lib/python3.6/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
File "/root/project/env/lib/python3.6/site-packages/_pytest/main.py", line 323 in _main
File "/root/project/env/lib/python3.6/site-packages/_pytest/main.py", line 269 in wrap_session
File "/root/project/env/lib/python3.6/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
File "/root/project/env/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/root/project/env/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/root/project/env/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
File "/root/project/env/lib/python3.6/site-packages/_pytest/config/__init__.py", line 163 in main
File "/root/project/env/lib/python3.6/site-packages/_pytest/config/__init__.py", line 185 in console_main
File "/root/project/env/bin/pytest", line 11 in <module>
.circleci/unittest/linux/scripts/run_test.sh: line 10: 1357 Segmentation fault (core dumped)
Skipping the above test, we see that more models segment fault:
test/test_models.py::test_classification_model[cuda-squeezenet1_0] Fatal Python error: Segmentation fault
Thread 0x00007ff6acfd1700 (most recent call first):
<no Python frame>
Current thread 0x00007ff7fcc98740 (most recent call first):
File "/home/circleci/project/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1106 in _call_impl
File "/home/circleci/project/test/test_models.py", line 107 in assert_export_import_module
File "/home/circleci/project/test/test_models.py", line 145 in _check_jit_scriptable
File "/home/circleci/project/test/test_models.py", line 468 in test_classification_model
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/python.py", line 1641 in runtest
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 255 in <lambda>
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 311 in from_call
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 215 in call_and_report
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/main.py", line 323 in _main
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/main.py", line 269 in wrap_session
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/circleci/project/env/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/config/__init__.py", line 162 in main
File "/home/circleci/project/env/lib/python3.8/site-packages/_pytest/config/__init__.py", line 185 in console_main
File "/home/circleci/project/env/bin/pytest", line 11 in <module>
.circleci/unittest/linux/scripts/run_test.sh: line 10: 58 Segmentation fault
The failures appear in all platforms and python version.
Versions
Latest main: 8fe72d131d6d2862b9db1efb3ffa2a6ded15efc8 PyTorch nightly: 20211014
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Identify what's causing segmentation faults (segfaults)
A segmentation fault (aka segfault) is a common condition that causes programs to crash; they are often associated with a file named core...
Read more >Segmentation Fault before reaching main - C++ - Stack Overflow
The problem I'm running into is that my code raises a segmentation fault, but gdb finds that it raises the following error upon...
Read more >Segmentation Fault During Elaboration with ModelSim SE - Intel
If you run the ld or elab commandsin the msim_setup.tcl script for a UniPHY-based IPcore in ModelSim SE, you may encounter a segmentation...
Read more >Segmentation fault - Wikipedia
Segmentation faults are a common class of error in programs written in languages like C that provide low-level memory access and few to...
Read more >Common Causes of Segmentation Faults (Segfaults)
A segmentation fault (often called a segfault) can occur if a program you are running attempts to access an invalid memory location.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Likely caused by https://github.com/pytorch/pytorch/pull/66273, which is already reverted on pytorch main trunk, but still present in nightly
I tried to run torchvision test in the new PR but things seems to be working fine. The bt posted does look concerning.
Dumb question, is there any knob to change the executor? I’m running tests with this (vvv), but it doesn’t seem to dump log from profiling executor (which is the executor that uses the code path in backtrace).
I also tried to run the same command as with the failing CI, no luck there neither (no log nor repro)