RuntimeError: copy_if failed to synchronize: an illegal memory access was encountered
See original GitHub issueWhen I`am training, the model works fine. But when I start testing, this error occurs. Why?
epoch: 1 loss: 3.392380952835083 train_batch_acc: 0.0625
epoch: 1 loss: 2.941084146499634 train_batch_acc: 0.15625
epoch: 1 loss: 2.138481616973877 train_batch_acc: 0.34375
epoch: 1 loss: 1.3853774070739746 train_batch_acc: 0.59375
epoch: 1 loss: 1.3555893898010254 train_batch_acc: 0.65625
epoch: 1 loss: 0.9268301725387573 train_batch_acc: 0.75
epoch: 1 loss: 0.8250795006752014 train_batch_acc: 0.6875
train acc is 0.436500
epoch: 1 test_batch_acc: 0.03125
Traceback (most recent call last):
File "train_3d_graph_gait.py", line 174, in <module>
test_acc = test(test_loader)
File "train_3d_graph_gait.py", line 134, in test
end_point = model(data)
File "/home/anaconda/envs/graph-wyx/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "train_3d_graph_gait.py", line 82, in forward
x = max_pool_x(cluster, data.x, data.batch, size=1)
File "/home/anaconda/envs/graph-wyx/lib/python3.6/site-packages/torch_geometric/nn/pool/max_pool.py", line 33, in max_pool_x
return _max_pool_x(cluster, x, (batch.max().item() + 1) * size)
File "/home/anaconda/envs/graph-wyx/lib/python3.6/site-packages/torch_geometric/nn/pool/max_pool.py", line 9, in _max_pool_x
return scatter_('max', x, cluster, dim=0, dim_size=size)
File "/home/anaconda/envs/graph-wyx/lib/python3.6/site-packages/torch_geometric/utils/scatter.py", line 39, in scatter_
out[out == fill_value] = 0
RuntimeError: copy_if failed to synchronize: an illegal memory access was encountered
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (4 by maintainers)
Top Results From Across the Web
RuntimeError: copy_if failed to synchronize: an illegal memory ...
I add CUDA_LAUNCH_BLOCKING=1 before running the script, some new message appears. RuntimeError: cuda runtime error (77) : an illegal memory ...
Read more >python - PyTorch - RuntimeError: transform: failed to synchronize
RuntimeError : transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered.
Read more >Error Means GPU Out of Memory? - Google Groups
I'm using K40, but the error also pops with another GPU. Error when tring to find the memory information on the GPU: an...
Read more >RuntimeError: CUDA error: an illegal memory access was ...
When I run the code, I got random CUDA errors. RuntimeError: CUDA error: an illegal memory access was encountered. This is one of...
Read more >CUDA error: an illegal memory access was encountered - Part ...
When I am running following code on Gradient, it is working fine but it is throwing me error after running for few seconds...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Seems like not all data is on the same device. Can you check that?
My fault. I forgot to move the model to CUDA.