RuntimeError in PyTorch 1.6
See original GitHub issueI am using pytorch 1.6.0, CUDA 10.2 and Pytorch_encoding master branch.
Traceback (most recent call last):
File "train_SSL.py", line 612, in <module>
main()
File "train_SSL.py", line 438, in main
pred = F.interpolate((model(images)), size=input_shape, mode='bilinear', align_corners=True)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/10T_1/project/model/deeplabv2.py", line 207, in forward
x = self.bn1(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch_encoding-1.2.2b20200908-py3.6-linux-x86_64.egg/encoding/nn/syncbn.py", line 202, in forward
self.activation, self.slope).view(input_shape)
RuntimeError: Some elements marked as dirty during the forward method were not returned as output. The inputs that are modified inplace must all be outputs of the Function.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Pytorch 1.6.0 - RuntimeError: CUDA error: device-side assert ...
When I run this function: def evaluate(model, dataloader, calc_loss=False): ''' Helper function to get classification accuracy and ...
Read more >Error when building custom CUDA kernels with PyTorch 1.6.0
RuntimeError: Error compiling objects for extension. For reproducing the error with PyTorch >= 1.5.1 (installed using conda):
Read more >PyTorch>=1.6.0 cannot coexist with PyCuda
After PyTorch update, I encountered many CUDA errors in various places (illegal memory access, misaligned address, cuDNN error: ...
Read more >RuntimeError: cuDNN error - PyTorch Forums
It could be that the model and data is in fact overloading the GPU, it happened to me when I tried running some...
Read more >CUBLAS_STATUS_ALLOC_FAI...
When I try to train my model, I get the runtime error precisely at the line indicated below: model ... I'm using CUDA...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think i had fix the problem,thanks for @zhangbin0917 advice,my torch==1.7.0
1、fix the code in …\site-packages\encoding\nn\syncbn.py at about line 200 from return syncbatchnorm(…).view(input_shape) to x, _, _=syncbatchnorm(…) x=x.view(input_shape) return x
2.、fix the code …\site-packages\encodings\functions\syncbn.py at about line 102 from ctx.save_for_backward(x,_ex,_exs,gamma,beta) return y to ctx.save_for_backward(x,_ex,_exs,gamma,beta) ctx.mark_non_differentiable(running_mean,running_var) return y,running_mean,running_var
3、fix the code …\site-packages\encodings\functions\syncbn.py at about line 109 from def backward(ctx,dz) to def backward(ctx,dz,_druning_mean,_druning_var)
I found that this issue also appeared in inplace_abn issue#166. They have solved this problem in commit. I have tried this solution in syncbn, and it works.