question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training on multiple graphs

See original GitHub issue

From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.

❓ Questions & Help

I am developing a model for the node classification task. I batch multiple graphs into the training and testing batches . After I train the model against one batch, I obtain some result that seem suspicious to me.

Let us say, my batch that contains the nodes for the training is given as follows: Batch(batch=[5811], edge_attr=[8340, 1], edge_index=[2, 8340], ptr=[11], test_mask=[5811], train_mask=[5811], val_mask=[5811], x=[5811, 40], y=[5811]) It contains 10 graphs as can be seen in ptr.

Next I train the model:

train_epoch = 200
for i in range (len(data_batched.ptr)-1):
    loss_trained = np.zeros(train_epoch, dtype = float)
    optimizer = torch.optim.Adam(modelGraphConv.parameters(), lr= 0.01, weight_decay=5e-4)
    criterion = torch.nn.CrossEntropyLoss() 
    for epoch in range (1, train_epoch+1):
        loss = train(data_batched[i], modelGraphConv)
        loss_trained[epoch-1] = loss
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
    print('=========================')
    plt.plot(np.arange(train_epoch), loss_trained)
    plt.show()

and the result of the first three trainings is depicted below

image image image

Let us ignore the high loss for now. The thing that confuses me the most, is that at the beginning of each training, the loss jumps back to the value approximately 2. I would expect it continuously going down (or at least remain at the same level), since the multiple graphs that I use for training comes from the same simulation.

So the question is: do I make a mistake in the programming, or is it my misunderstanding of the neural network performance?

Thank you!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:30 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
RostyslavUAcommented, Jun 8, 2021

All right, so I have collapsed my 40 dimensional feature vector to 2D via T-SNE for one training graph and one test graph

data_emb = TSNE(n_components=2).fit_transform(data.x)
data_test_emb = TSNE(n_components=2).fit_transform(data_test.x)

and this is the result image

If I can interpret the result correctly, it looks like the test set is not out-of-distribution, since both test and train data are distributed in a very similar manner.

Essentially, my node feature vector represents the coordinates in 2D, so before one-hot encoding (before bringing it up to 40 dimensions), it looks like this image

In other words, picture above depicts the collected data.

At this moment, I cannot think about anything else I could try with the current dataset. If you got any other ideas by looking at my datasets regarding how to improve accuracy of the model or process/modify/analyze the data, I would widely appreciate if you shared. Otherwise, I thank you very much for helping me and we can close the thread.

0reactions
rusty1scommented, Jun 23, 2021

The final_edge_index will use a threshold of 0.5 to decide whether to include an edge or not, which might be to low in your experiment. To only keep edges with higher probability, run:

prob_adj = z @ z.t().sigmoid()
return (prob_adj > threshold).nonzero(as_tuple=False).t()
Read more comments on GitHub >

github_iconTop Results From Across the Web

Training on multiple graphs · Issue #2677 · pyg-team ... - GitHub
I am developing a model for the node classification task. I batch multiple graphs into the training and testing batches . After I...
Read more >
Training on multiple graphs - Questions - Deep Graph Library
I'm trying to write a syntactic parser using the DGL library but I'm struggling with understanding the model training loop. In a link...
Read more >
Improved Semi-Supervised Learning with Multiple Graphs
Abstract. We present a new approach for graph based semi-supervised learning based on a multi-component extension to the Gaussian MRF model. This approach ......
Read more >
Learning to Coordinate via Multiple Graph Neural Networks
This paper introduces MGAN for collaborative multi-agent reinforcement learning, a new algorithm that combines graph convolutional networks ...
Read more >
(PDF) Learning multiple graphs for document recommendations
Scheduling dependent jobs on multiple machines is modeled by the graph multi-coloring problem. In this paper, we consider the problem of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found