question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: copy_if failed to synchronize: an illegal memory access was encountered

See original GitHub issue

When I`am training, the model works fine. But when I start testing, this error occurs. Why?

epoch: 1 loss: 3.392380952835083 train_batch_acc: 0.0625
epoch: 1 loss: 2.941084146499634 train_batch_acc: 0.15625
epoch: 1 loss: 2.138481616973877 train_batch_acc: 0.34375
epoch: 1 loss: 1.3853774070739746 train_batch_acc: 0.59375
epoch: 1 loss: 1.3555893898010254 train_batch_acc: 0.65625
epoch: 1 loss: 0.9268301725387573 train_batch_acc: 0.75
epoch: 1 loss: 0.8250795006752014 train_batch_acc: 0.6875
train acc is 0.436500
epoch: 1 test_batch_acc: 0.03125
Traceback (most recent call last):
  File "train_3d_graph_gait.py", line 174, in <module>
    test_acc = test(test_loader)
  File "train_3d_graph_gait.py", line 134, in test
    end_point = model(data)
  File "/home/anaconda/envs/graph-wyx/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "train_3d_graph_gait.py", line 82, in forward
    x = max_pool_x(cluster, data.x, data.batch, size=1)
  File "/home/anaconda/envs/graph-wyx/lib/python3.6/site-packages/torch_geometric/nn/pool/max_pool.py", line 33, in max_pool_x
    return _max_pool_x(cluster, x, (batch.max().item() + 1) * size)
  File "/home/anaconda/envs/graph-wyx/lib/python3.6/site-packages/torch_geometric/nn/pool/max_pool.py", line 9, in _max_pool_x
    return scatter_('max', x, cluster, dim=0, dim_size=size)
  File "/home/anaconda/envs/graph-wyx/lib/python3.6/site-packages/torch_geometric/utils/scatter.py", line 39, in scatter_
    out[out == fill_value] = 0
RuntimeError: copy_if failed to synchronize: an illegal memory access was encountered

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:17 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
rusty1scommented, May 24, 2020

Seems like not all data is on the same device. Can you check that?

1reaction
netphantomcommented, Oct 22, 2020

My fault. I forgot to move the model to CUDA.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: copy_if failed to synchronize: an illegal memory ...
I add CUDA_LAUNCH_BLOCKING=1 before running the script, some new message appears. RuntimeError: cuda runtime error (77) : an illegal memory ...
Read more >
python - PyTorch - RuntimeError: transform: failed to synchronize
RuntimeError : transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered.
Read more >
Error Means GPU Out of Memory? - Google Groups
I'm using K40, but the error also pops with another GPU. Error when tring to find the memory information on the GPU: an...
Read more >
RuntimeError: CUDA error: an illegal memory access was ...
When I run the code, I got random CUDA errors. RuntimeError: CUDA error: an illegal memory access was encountered. This is one of...
Read more >
CUDA error: an illegal memory access was encountered - Part ...
When I am running following code on Gradient, it is working fine but it is throwing me error after running for few seconds...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found