question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: CUDA error: an illegal memory access was encountered

See original GitHub issue
  File "examples/sem_seg_sparse/train.py", line 142, in <module>
    main()
  File "examples/sem_seg_sparse/train.py", line 61, in main
    train(model, train_loader, optimizer, scheduler, criterion, opt)
  File "examples/sem_seg_sparse/train.py", line 79, in train
    out = model(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/drive/My Drive/deep_gcns_torch/examples/sem_seg_sparse/architecture.py", line 69, in forward
    feats.append(self.gunet(feats[-1],edge_index=edge_index ,batch=batch))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch_geometric/nn/models/graph_unet.py", line 83, in forward
    x.size(0))
  File "/usr/local/lib/python3.6/dist-packages/torch_geometric/nn/models/graph_unet.py", line 120, in augment_adj
    num_nodes)
  File "/usr/local/lib/python3.6/dist-packages/torch_sparse/spspmm.py", line 30, in spspmm
    C = matmul(A, B)
  File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 107, in matmul
    return spspmm(src, other, reduce)
  File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 95, in spspmm
    return spspmm_sum(src, other)
  File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 83, in spspmm_sum
    rowptrA, colA, valueA, rowptrB, colB, valueB, K)
RuntimeError: CUDA error: an illegal memory access was encountered (launch_kernel at /pytorch/aten/src/ATen/native/cuda/Loops.cuh:103)

hi, i’m intergrating the GraphU-Net and other model on the google colab, but there are some bug , could you help me ? thanks.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:20 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
rusty1scommented, Aug 17, 2020

The error seems to stem from the fact cuSPARSE cannot handle duplicated edges in edge_index. The reason for that is that it fails to compute the correct amount of output edges this way. In your case, it might well be that you have some initial self-loop edges in your graph, which should be removed before calling add_self_loops. I think your fix for augment_adj is correct, and I added it to the GraphUNet model in PyG.

2reactions
Flawless1202commented, Aug 17, 2020

@vthost @rusty1s Hi, I also met this error when use my own dataset to train Graph-UNet. This error randomly occurred when using GPU but never occurred when using CPU. I changed the augment_adj function, added the remove_self_loops function at first, and then the problem was solved. But I don’t know why.

def augment_adj(self, edge_index, edge_weight, num_nodes):
    edge_index, edge_weight = remove_self_loops(edge_index, edge_weight)
    edge_index, edge_weight = add_self_loops(edge_index, edge_weight, num_nodes=num_nodes)
    edge_index, edge_weight = sort_edge_index(edge_index, edge_weight, num_nodes)
    edge_index, edge_weight = spspmm(edge_index, edge_weight, edge_index, edge_weight, num_nodes, num_nodes, num_nodes)
    edge_index, edge_weight = remove_self_loops(edge_index, edge_weight)
    return edge_index, edge_weight
Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: CUDA error: an illegal memory access was ...
Hi,everyone! I met a strange illegal memory access error. It happens randomly without any regular pattern. The code is really simple.
Read more >
PyTorch CUDA error: an illegal memory access was ...
RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API ...
Read more >
CUDA error: an illegal memory access was encountered with ...
Try to use the latest PyTorch (1.10). The error indicates an out of bound memory access similar to a segfault on the CPU,...
Read more >
PyTorch RuntimeError: CUDA error: an illegal memory access ...
I've designed a network, which gives a weird error. It occurs randomly and can throw an exception in different epochs.
Read more >
CUDA error: an illegal memory access was encountered - Part ...
RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found