question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

copy_if failed to synchronize: device-side assert triggered

See original GitHub issue

❓ Questions and Help

I was training a customized module in fbnet and encountered the following error:

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [40,0,0], thread: [104,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [40,0,0], thread: [105,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [40,0,0], thread: [106,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [40,0,0], thread: [107,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "tools/train_net.py", line 186, in <module>
    main()
  File "tools/train_net.py", line 179, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 85, in train
    arguments,
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 67, in do_train
    loss_dict = model(images, targets)
  File "/data/ryancheng/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ryancheng/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 194, in new_fwd
    **applier(kwargs, input_caster))
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/data/ryancheng/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 159, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 175, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "/data/ryancheng/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 115, in forward_for_single_feature_map
    boxlist = remove_small_boxes(boxlist, self.min_size)
  File "/data/ryancheng/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
    (ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: device-side assert triggered

Any idea on what this error is caused by? Thanks in advance!

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7

github_iconTop GitHub Comments

6reactions
jiushishuai88commented, Oct 9, 2019

i encountered this issue when i set wrong num of class.It solved by correct the output num_class

5reactions
Jayiscommented, May 30, 2019

same error

if the problem happened in training, smaller learning rate helps but I’ve also encountered this error while testing once… while testing, that’s not possible to solve it by using smaller learning rate, right?

I’ve tried to debug, but I can’t even access the “boxlist”. Runtime error happened when I try to print the “boxlist”.

I really want to know if anybody had another solution rather than just “using smaller learning rate”…

Read more comments on GitHub >

github_iconTop Results From Across the Web

PyTorch: copy_if failed to synchronize: device-side assert ...
Sometimes when we run code using cuda, it gives error message having device-side assert triggered which hides the real error message.
Read more >
RuntimeError: copy_if failed to synchronize: device ... - GitHub
229 is an illegal memory access was encountered but what I met is device-side assert triggered . I have changed the NUM_CLASSES as...
Read more >
RuntimeError: copy_if failed to synchronize ... - PyTorch Forums
I'm getting the following errors with my code. It is an adapted version of the PyTorch DQN example.
Read more >
CUDA Error: Device-Side Assert Triggered: Solved | Built In
A CUDA Error: Device-Side Assert Triggered can either be caused by an inconsistency between the number of labels and output units or an ......
Read more >
Tpetra: Run-time error in idot unit test, in CUDA build only
I suspect the issue is that KokkosBlas::dot uses its X vector input to determine the execution space, but a raw pointer result argument...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found