Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: CUDA error: an illegal memory access was encountered

See original GitHub issue

  File "examples/sem_seg_sparse/train.py", line 142, in <module>
    main()
  File "examples/sem_seg_sparse/train.py", line 61, in main
    train(model, train_loader, optimizer, scheduler, criterion, opt)
  File "examples/sem_seg_sparse/train.py", line 79, in train
    out = model(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/drive/My Drive/deep_gcns_torch/examples/sem_seg_sparse/architecture.py", line 69, in forward
    feats.append(self.gunet(feats[-1],edge_index=edge_index ,batch=batch))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch_geometric/nn/models/graph_unet.py", line 83, in forward
    x.size(0))
  File "/usr/local/lib/python3.6/dist-packages/torch_geometric/nn/models/graph_unet.py", line 120, in augment_adj
    num_nodes)
  File "/usr/local/lib/python3.6/dist-packages/torch_sparse/spspmm.py", line 30, in spspmm
    C = matmul(A, B)
  File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 107, in matmul
    return spspmm(src, other, reduce)
  File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 95, in spspmm
    return spspmm_sum(src, other)
  File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 83, in spspmm_sum
    rowptrA, colA, valueA, rowptrB, colB, valueB, K)
RuntimeError: CUDA error: an illegal memory access was encountered (launch_kernel at /pytorch/aten/src/ATen/native/cuda/Loops.cuh:103)

hi, i’m intergrating the GraphU-Net and other model on the google colab, but there are some bug , could you help me ? thanks.

Issue Analytics

State:
Created 3 years ago
Comments:20 (8 by maintainers)

Top GitHub Comments

2reactions

rusty1scommented, Aug 17, 2020

The error seems to stem from the fact cuSPARSE cannot handle duplicated edges in edge_index. The reason for that is that it fails to compute the correct amount of output edges this way. In your case, it might well be that you have some initial self-loop edges in your graph, which should be removed before calling add_self_loops. I think your fix for augment_adj is correct, and I added it to the GraphUNet model in PyG.

2reactions

Flawless1202commented, Aug 17, 2020

@vthost @rusty1s Hi, I also met this error when use my own dataset to train Graph-UNet. This error randomly occurred when using GPU but never occurred when using CPU. I changed the augment_adj function, added the remove_self_loops function at first, and then the problem was solved. But I don’t know why.

def augment_adj(self, edge_index, edge_weight, num_nodes):
    edge_index, edge_weight = remove_self_loops(edge_index, edge_weight)
    edge_index, edge_weight = add_self_loops(edge_index, edge_weight, num_nodes=num_nodes)
    edge_index, edge_weight = sort_edge_index(edge_index, edge_weight, num_nodes)
    edge_index, edge_weight = spspmm(edge_index, edge_weight, edge_index, edge_weight, num_nodes, num_nodes, num_nodes)
    edge_index, edge_weight = remove_self_loops(edge_index, edge_weight)
    return edge_index, edge_weight