Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

spspmm raises error in cuda but works well in cpu

See original GitHub issue

🐛 Bug

To Reproduce

The net is similar with Graph UNet, but has only downsample blocks. The Code is


import torch
import torch.nn as nn
import torch_geometric.nn as gnn
from torch_geometric.nn import GCNConv, TopKPooling
from torch_geometric.utils import add_self_loops, sort_edge_index, remove_self_loops
from torch_sparse import spspmm



class GCNConvBnReLu(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv = GCNConv(in_channels, out_channels, bias=False, improved=True)
        self.bn = gnn.BatchNorm(out_channels)
        self.relu = nn.ReLU()

    def forward(self, x, edge_index, edge_weight=None):
        x = self.conv(x, edge_index, edge_weight)
        x = self.bn(x)
        x = self.relu(x)
        return x


class MyNet(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, pool_ratios=0.5, depth=3):
        super().__init__()

        channels = in_channels
        self.depth = depth
        self.down_convs = nn.ModuleList()
        self.pools = nn.ModuleList()
        self.down_convs.append(GCNConvBnReLu(channels, hidden_channels))
        self.fc = nn.Linear(hidden_channels, out_channels)

        for i in range(depth):
            self.down_convs.append(GCNConvBnReLu(hidden_channels, hidden_channels))
            self.pools.append(TopKPooling(hidden_channels, ratio=pool_ratios))


    def forward(self, x, edge_index, batch=None):
        depth = self.depth
        edge_weight = x.new_ones(edge_index.size(1))
        x = self.down_convs[0](x, edge_index, edge_weight)

        for i in range(1, depth + 1):
            print(edge_index.shape)
            print(edge_index.min(), edge_index.max(), x.size(0))

            edge_index, edge_weight = self.augment_adj(edge_index, edge_weight, x.size(0))

            print(edge_index.shape)
            print(edge_index.min(), edge_index.max(), x.size(0))
            print('----------')

            x, edge_index, edge_weight, batch, _, _ = self.pools[i-1](x, edge_index, edge_weight, batch)
            x = self.down_convs[i](x, edge_index, edge_weight)

        x = gnn.global_mean_pool(x, batch)
        out = self.fc(x)
        return out

    def augment_adj(self, edge_index, edge_weight, num_nodes):
        edge_index, edge_weight = remove_self_loops(edge_index, edge_weight)
        edge_index, edge_weight = add_self_loops(edge_index, edge_weight,
                                                 num_nodes=num_nodes)
        edge_index, edge_weight = sort_edge_index(edge_index, edge_weight,
                                                  num_nodes)
        edge_index, edge_weight = spspmm(edge_index, edge_weight, edge_index,
                                         edge_weight, num_nodes, num_nodes,
                                         num_nodes)
        edge_index, edge_weight = remove_self_loops(edge_index, edge_weight)
        return edge_index, edge_weight

Test the MyNet as follow. The test data can be download in Google Drive


device = torch.device('cuda')
model = MyNet(3, 64, 4, 0.5, 3).to(device)
data1 = torch.load('success.pt').to(device)
y1 = model(data1.x, data1.edge_index)
data2 = torch.load('failed.pt').to(device)
y2 = model(data2.x, data2.edge_index)

Expected behavior

The error log is

  File "D:\Software\anaconda3\lib\site-packages\torch_sparse\spspmm.py", line 30, in spspmm
    C = matmul(A, B)
  File "D:\Software\anaconda3\lib\site-packages\torch_sparse\matmul.py", line 125, in matmul
    return spspmm(src, other, reduce)
  File "D:\Software\anaconda3\lib\site-packages\torch_sparse\matmul.py", line 102, in spspmm
    return spspmm_sum(src, other)
  File "D:\Software\anaconda3\lib\site-packages\torch_sparse\matmul.py", line 92, in spspmm_sum
    sparse_sizes=(M, K), is_sorted=True)
  File "D:\Software\anaconda3\lib\site-packages\torch_sparse\tensor.py", line 25, in __init__
    is_sorted=is_sorted)
  File "D:\Software\anaconda3\lib\site-packages\torch_sparse\storage.py", line 70, in __init__
    assert col.max().item() < sparse_sizes[1]
AssertionError

When I set model.eval() or device='cpu', the code works well.

Environment

OS: Win10
Python version: 3.7
PyTorch version: 1.8.1
PyG version: 1.7.2
CUDA/cuDNN version: 10.2 / 8.0.5
GCC version:
Any other relevant information:

Additional context

Issue Analytics

State:
Created 2 years ago
Comments:13 (6 by maintainers)

Top GitHub Comments

1reaction

rusty1scommented, Feb 14, 2022

@wrccrwx @KimKyuSik It’s really a bummer that I cannot reproduce this issue. I’m really sorry. I basically followed the instructions from the cusparse documentation for implementing our CUDA routine in spspmm_cuda.cu:

// assume matrices A, B and D are ready.
int baseC, nnzC;
csrgemm2Info_t info = NULL;
size_t bufferSize;
void *buffer = NULL;
// nnzTotalDevHostPtr points to host memory
int *nnzTotalDevHostPtr = &nnzC;
double alpha = -1.0;
double beta  =  1.0;
cusparseSetPointerMode(handle, CUSPARSE_POINTER_MODE_HOST);

// step 1: create an opaque structure
cusparseCreateCsrgemm2Info(&info);

// step 2: allocate buffer for csrgemm2Nnz and csrgemm2
cusparseDcsrgemm2_bufferSizeExt(handle, m, n, k, &alpha,
    descrA, nnzA, csrRowPtrA, csrColIndA,
    descrB, nnzB, csrRowPtrB, csrColIndB,
    &beta,
    descrD, nnzD, csrRowPtrD, csrColIndD,
    info,
    &bufferSize);
cudaMalloc(&buffer, bufferSize);

// step 3: compute csrRowPtrC
cudaMalloc((void**)&csrRowPtrC, sizeof(int)*(m+1));
cusparseXcsrgemm2Nnz(handle, m, n, k,
        descrA, nnzA, csrRowPtrA, csrColIndA,
        descrB, nnzB, csrRowPtrB, csrColIndB,
        descrD, nnzD, csrRowPtrD, csrColIndD,
        descrC, csrRowPtrC, nnzTotalDevHostPtr,
        info, buffer );
if (NULL != nnzTotalDevHostPtr){
    nnzC = *nnzTotalDevHostPtr;
}else{
    cudaMemcpy(&nnzC, csrRowPtrC+m, sizeof(int), cudaMemcpyDeviceToHost);
    cudaMemcpy(&baseC, csrRowPtrC, sizeof(int), cudaMemcpyDeviceToHost);
    nnzC -= baseC;
}

// step 4: finish sparsity pattern and value of C
cudaMalloc((void**)&csrColIndC, sizeof(int)*nnzC);
cudaMalloc((void**)&csrValC, sizeof(double)*nnzC);
// Remark: set csrValC to null if only sparsity pattern is required.
cusparseDcsrgemm2(handle, m, n, k, &alpha,
        descrA, nnzA, csrValA, csrRowPtrA, csrColIndA,
        descrB, nnzB, csrValB, csrRowPtrB, csrColIndB,
        &beta,
        descrD, nnzD, csrValD, csrRowPtrD, csrColIndD,
        descrC, csrValC, csrRowPtrC, csrColIndC,
        info, buffer);

// step 5: destroy the opaque structure
cusparseDestroyCsrgemm2Info(info);

Any chance you can debug where our routine crashes by installing torch-sparse from source?

0reactions

KimKyuSikcommented, Feb 13, 2022

I have exact same issue when I use Titan RTX and RTX 3090. Is there any way to solve it?