question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError in PyTorch 1.6

See original GitHub issue

I am using pytorch 1.6.0, CUDA 10.2 and Pytorch_encoding master branch.

Traceback (most recent call last):
  File "train_SSL.py", line 612, in <module>
    main()
  File "train_SSL.py", line 438, in main
    pred = F.interpolate((model(images)), size=input_shape, mode='bilinear', align_corners=True)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/10T_1/project/model/deeplabv2.py", line 207, in forward
    x = self.bn1(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch_encoding-1.2.2b20200908-py3.6-linux-x86_64.egg/encoding/nn/syncbn.py", line 202, in forward
    self.activation, self.slope).view(input_shape)
RuntimeError: Some elements marked as dirty during the forward method were not returned as output. The inputs that are modified inplace must all be outputs of the Function.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

14reactions
Zhaoguanhuacommented, Dec 22, 2020

I think i had fix the problem,thanks for @zhangbin0917 advice,my torch==1.7.0

1、fix the code in …\site-packages\encoding\nn\syncbn.py at about line 200 from return syncbatchnorm(…).view(input_shape) to x, _, _=syncbatchnorm(…) x=x.view(input_shape) return x

2.、fix the code …\site-packages\encodings\functions\syncbn.py at about line 102 from ctx.save_for_backward(x,_ex,_exs,gamma,beta) return y to ctx.save_for_backward(x,_ex,_exs,gamma,beta) ctx.mark_non_differentiable(running_mean,running_var) return y,running_mean,running_var

3、fix the code …\site-packages\encodings\functions\syncbn.py at about line 109 from def backward(ctx,dz) to def backward(ctx,dz,_druning_mean,_druning_var)

5reactions
zhangbin0917commented, Sep 14, 2020

I found that this issue also appeared in inplace_abn issue#166. They have solved this problem in commit. I have tried this solution in syncbn, and it works.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pytorch 1.6.0 - RuntimeError: CUDA error: device-side assert ...
When I run this function: def evaluate(model, dataloader, calc_loss=False): ''' Helper function to get classification accuracy and ...
Read more >
Error when building custom CUDA kernels with PyTorch 1.6.0
RuntimeError: Error compiling objects for extension. For reproducing the error with PyTorch >= 1.5.1 (installed using conda):
Read more >
PyTorch>=1.6.0 cannot coexist with PyCuda
After PyTorch update, I encountered many CUDA errors in various places (illegal memory access, misaligned address, cuDNN error: ...
Read more >
RuntimeError: cuDNN error - PyTorch Forums
It could be that the model and data is in fact overloading the GPU, it happened to me when I tried running some...
Read more >
CUBLAS_STATUS_ALLOC_FAI...
When I try to train my model, I get the runtime error precisely at the line indicated below: model ... I'm using CUDA...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found