question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Abnormal memory usage of SageConv and GraphConv

See original GitHub issue

I am trying to run the following 5 algorithms on my custom dataset (one single graph):

- GCNConv
- SAGEConv
- GATConv
- GraphConv
- HyperGraphConv

In all cases the task is node classification.

Three of them run perfectly fine, but when I replace the layer of my network with either SAGEConv or GraphConv, I get a memory error saying I am trying to allocate 667946000000 bytes to memory.

Here is my small network:

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        #torch.manual_seed(12345)
        #self.conv1 = GCNConv(dataset_num_features, 16)
        #self.conv1 = SAGEConv(dataset_num_features, 16)     # Does NOT work.
        #self.conv1 = GATConv(dataset_num_features, 16, 10, concat=False)
        self.conv1 = GraphConv(dataset_num_features, 16, aggr='mean')  # Does NOT work.
        #self.conv1 = HypergraphConv(dataset_num_features, 16, use_attention=True, heads=5, concat=False)   
        
        self.lin = nn.Linear(hidden_channels, dataset_num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        
        x = self.conv1(x, edge_index)
        x = x.relu()
        #x = global_mean_pool(x, batch) 
        x = F.dropout(x, p=0.8, training=self.training)
        x = self.lin(x)
        return x

and here is the error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-c43cd0da8520> in <module>
      9     data.to(device)
     10     optimizer.zero_grad()
---> 11     out = model(data)
     12     loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask].squeeze())
     13     loss.backward()

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

<ipython-input-6-b86f676abb44> in forward(self, data)
     18         x, edge_index, edge_attr = data.x, data.edge_index, data.edge_attr
     19 
---> 20         x = self.conv1(x, edge_index)#, edge_weight=edge_attr.squeeze())
     21         x = x.relu()
     22         #x = global_mean_pool(x, batch)

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/graph_conv.py in forward(self, x, edge_index, edge_weight, size)
     60 
     61         # propagate_type: (x: OptPairTensor, edge_weight: OptTensor)
---> 62         out = self.propagate(edge_index, x=x, edge_weight=edge_weight,
     63                              size=size)
     64         out = self.lin_l(out)

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py in propagate(self, edge_index, size, **kwargs)
    231         # Otherwise, run both functions in separation.
    232         elif isinstance(edge_index, Tensor) or not self.fuse:
--> 233             coll_dict = self.__collect__(self.__user_args__, edge_index, size,
    234                                          kwargs)
    235 

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py in __collect__(self, args, edge_index, size, kwargs)
    155                 if isinstance(data, Tensor):
    156                     self.__set_size__(size, dim, data)
--> 157                     data = self.__lift__(data, edge_index,
    158                                          j if arg[-2:] == '_j' else i)
    159 

~/anaconda3/envs/py38/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py in __lift__(self, src, edge_index, dim)
    125         if isinstance(edge_index, Tensor):
    126             index = edge_index[dim]
--> 127             return src.index_select(self.node_dim, index)
    128         elif isinstance(edge_index, SparseTensor):
    129             if dim == 1:

RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 667946000000 bytes. Error code 12 (Cannot allocate memory)

Is there any reason why these two layer-types use so much memory? Is there something I can do to solve this problem?

EDIT : In the case of SAGEConv, I managed to run it by using the SparseTensor functionality. However it seems GraphConv does not work with SparseTensors.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
cszhangzhencommented, Jun 9, 2022

Yes, I understand project is False by default. We can conduct dimensionality reduction in project, and the users can decide to use it or not. Then, the OOM issue is solved.

You can close this issue now. Thanks for you nice work on this wonderful GNN library.

0reactions
rusty1scommented, Jun 9, 2022

Note that project is False by default. However, you are right, the first transformation will not do a dimensionality reduction.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Abnormal memory usage of SageConv and GraphConv #2074
I am trying to run the following 5 algorithms on my custom dataset (one single graph): - GCNConv - SAGEConv - GATConv -...
Read more >
torch_geometric.nn — pytorch_geometric documentation
Feature decomposition reduces the peak memory usage by slicing the feature dimensions into separated feature decomposition layers during GNN aggregation.
Read more >
Memory Leak (and Growth) Flame Graphs - Brendan Gregg
On this page page I'll summarize four tracing approaches I use for analyzing memory growths and leaks on an already running application. These ......
Read more >
Hands-on Graph Neural Networks ... - Towards Data Science
Let's use the following graph to demonstrate how to create a Data ... I changed the GraphConv layer with our self-implemented SAGEConv layer ......
Read more >
Hands-on Graph Neural Networks with PyTorch & PyTorch ...
Let's use the following graph to demonstrate how to create a Data ... one is for data that fit in your RAM, while...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found