Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

copy_if failed to synchronize: device-side assert triggered

See original GitHub issue

❓ Questions and Help

I was training a customized module in fbnet and encountered the following error:

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [40,0,0], thread: [104,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [40,0,0], thread: [105,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [40,0,0], thread: [106,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [40,0,0], thread: [107,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "tools/train_net.py", line 186, in <module>
    main()
  File "tools/train_net.py", line 179, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 85, in train
    arguments,
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 67, in do_train
    loss_dict = model(images, targets)
  File "/data/ryancheng/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ryancheng/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 194, in new_fwd
    **applier(kwargs, input_caster))
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/data/ryancheng/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 159, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 175, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "/data/ryancheng/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 115, in forward_for_single_feature_map
    boxlist = remove_small_boxes(boxlist, self.min_size)
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
    (ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: device-side assert triggered

Any idea on what this error is caused by? Thanks in advance!

Issue Analytics

State:
Created 4 years ago
Comments:7

Top GitHub Comments

6reactions

jiushishuai88commented, Oct 9, 2019

i encountered this issue when i set wrong num of class.It solved by correct the output num_class

5reactions

Jayiscommented, May 30, 2019

same error

if the problem happened in training, smaller learning rate helps but I’ve also encountered this error while testing once… while testing, that’s not possible to solve it by using smaller learning rate, right?

I’ve tried to debug, but I can’t even access the “boxlist”. Runtime error happened when I try to print the “boxlist”.

I really want to know if anybody had another solution rather than just “using smaller learning rate”…

Top Results From Across the Web

PyTorch: copy_if failed to synchronize: device-side assert ...

Sometimes when we run code using cuda, it gives error message having device-side assert triggered which hides the real error message.

RuntimeError: copy_if failed to synchronize: device ... - GitHub

229 is an illegal memory access was encountered but what I met is device-side assert triggered . I have changed the NUM_CLASSES as...

RuntimeError: copy_if failed to synchronize ... - PyTorch Forums

I'm getting the following errors with my code. It is an adapted version of the PyTorch DQN example.

CUDA Error: Device-Side Assert Triggered: Solved | Built In

A CUDA Error: Device-Side Assert Triggered can either be caused by an inconsistency between the number of labels and output units or an ......

Tpetra: Run-time error in idot unit test, in CUDA build only

I suspect the issue is that KokkosBlas::dot uses its X vector input to determine the execution space, but a raw pointer result argument...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

copy_if failed to synchronize: device-side assert triggered

❓ Questions and Help

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

ImportError when trying Inference in a few lines

Random crop error: ZeroDivisionError: float division by zero