Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Abnormal memory usage of SageConv and GraphConv

See original GitHub issue

I am trying to run the following 5 algorithms on my custom dataset (one single graph):

- GCNConv
- SAGEConv
- GATConv
- GraphConv
- HyperGraphConv

In all cases the task is node classification.

Three of them run perfectly fine, but when I replace the layer of my network with either SAGEConv or GraphConv, I get a memory error saying I am trying to allocate 667946000000 bytes to memory.

Here is my small network:

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        #torch.manual_seed(12345)
        #self.conv1 = GCNConv(dataset_num_features, 16)
        #self.conv1 = SAGEConv(dataset_num_features, 16)     # Does NOT work.
        #self.conv1 = GATConv(dataset_num_features, 16, 10, concat=False)
        self.conv1 = GraphConv(dataset_num_features, 16, aggr='mean')  # Does NOT work.
        #self.conv1 = HypergraphConv(dataset_num_features, 16, use_attention=True, heads=5, concat=False)   
        
        self.lin = nn.Linear(hidden_channels, dataset_num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        
        x = self.conv1(x, edge_index)
        x = x.relu()
        #x = global_mean_pool(x, batch) 
        x = F.dropout(x, p=0.8, training=self.training)
        x = self.lin(x)
        return x

and here is the error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-c43cd0da8520> in <module>
      9     data.to(device)
     10     optimizer.zero_grad()
---> 11     out = model(data)
     12     loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask].squeeze())
     13     loss.backward()

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

<ipython-input-6-b86f676abb44> in forward(self, data)
     18         x, edge_index, edge_attr = data.x, data.edge_index, data.edge_attr
     19 
---> 20         x = self.conv1(x, edge_index)#, edge_weight=edge_attr.squeeze())
     21         x = x.relu()
     22         #x = global_mean_pool(x, batch)

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/graph_conv.py in forward(self, x, edge_index, edge_weight, size)
     60 
     61         # propagate_type: (x: OptPairTensor, edge_weight: OptTensor)
---> 62         out = self.propagate(edge_index, x=x, edge_weight=edge_weight,
     63                              size=size)
     64         out = self.lin_l(out)

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py in propagate(self, edge_index, size, **kwargs)
    231         # Otherwise, run both functions in separation.
    232         elif isinstance(edge_index, Tensor) or not self.fuse:
--> 233             coll_dict = self.__collect__(self.__user_args__, edge_index, size,
    234                                          kwargs)
    235 

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py in __collect__(self, args, edge_index, size, kwargs)
    155                 if isinstance(data, Tensor):
    156                     self.__set_size__(size, dim, data)
--> 157                     data = self.__lift__(data, edge_index,
    158                                          j if arg[-2:] == '_j' else i)
    159 

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py in __lift__(self, src, edge_index, dim)
    125         if isinstance(edge_index, Tensor):
    126             index = edge_index[dim]
--> 127             return src.index_select(self.node_dim, index)
    128         elif isinstance(edge_index, SparseTensor):
    129             if dim == 1:

RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 667946000000 bytes. Error code 12 (Cannot allocate memory)

Is there any reason why these two layer-types use so much memory? Is there something I can do to solve this problem?

EDIT : In the case of SAGEConv, I managed to run it by using the SparseTensor functionality. However it seems GraphConv does not work with SparseTensors.

Issue Analytics

State:
Created 3 years ago
Comments:16 (8 by maintainers)

Top GitHub Comments

1reaction

cszhangzhencommented, Jun 9, 2022

Yes, I understand project is False by default. We can conduct dimensionality reduction in project, and the users can decide to use it or not. Then, the OOM issue is solved.

You can close this issue now. Thanks for you nice work on this wonderful GNN library.

0reactions

rusty1scommented, Jun 9, 2022

Note that project is False by default. However, you are right, the first transformation will not do a dimensionality reduction.

Top Results From Across the Web

Abnormal memory usage of SageConv and GraphConv #2074

I am trying to run the following 5 algorithms on my custom dataset (one single graph): - GCNConv - SAGEConv - GATConv -...

torch_geometric.nn — pytorch_geometric documentation

Feature decomposition reduces the peak memory usage by slicing the feature dimensions into separated feature decomposition layers during GNN aggregation.

Memory Leak (and Growth) Flame Graphs - Brendan Gregg

On this page page I'll summarize four tracing approaches I use for analyzing memory growths and leaks on an already running application. These ......

Hands-on Graph Neural Networks ... - Towards Data Science

Let's use the following graph to demonstrate how to create a Data ... I changed the GraphConv layer with our self-implemented SAGEConv layer ......

Hands-on Graph Neural Networks with PyTorch & PyTorch ...

Let's use the following graph to demonstrate how to create a Data ... one is for data that fit in your RAM, while...