TorchVision Main branch fails with `slow_conv2d: grad_weight must be contiguous`
See original GitHub issue🐛 Describe the bug
It looks like a breakage caused by the 20220512 PyTorch Core Nightly.
Several CI jobs fail with:
test_classification_model[cpu-convnext_base]
Traceback (most recent call last):
File "/root/project/test/test_models.py", line 628, in test_classification_model
_check_input_backprop(model, x)
File "/root/project/test/test_models.py", line 181, in _check_input_backprop
out[0].sum().backward()
File "/root/project/env/lib/python3.10/site-packages/torch/_tensor.py", line 399, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/project/env/lib/python3.10/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: slow_conv2d: grad_weight must be contiguous
So far I’ve identified the following PRs as potential causes:
Given that this network uses LayerNorm changes from the following PRs are also like to have caused it:
Versions
Latest main branch 03bb324576d9fe86a15a7b86a43638a838234fbd
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
i will check if i need to add more test case in pytorch (the old test cases failed to find out the code flaw)
Fixed on latest nightly