Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training on multiple graphs

See original GitHub issue

From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.

❓ Questions & Help

I am developing a model for the node classification task. I batch multiple graphs into the training and testing batches . After I train the model against one batch, I obtain some result that seem suspicious to me.

Let us say, my batch that contains the nodes for the training is given as follows: Batch(batch=[5811], edge_attr=[8340, 1], edge_index=[2, 8340], ptr=[11], test_mask=[5811], train_mask=[5811], val_mask=[5811], x=[5811, 40], y=[5811]) It contains 10 graphs as can be seen in ptr.

Next I train the model:

train_epoch = 200
for i in range (len(data_batched.ptr)-1):
    loss_trained = np.zeros(train_epoch, dtype = float)
    optimizer = torch.optim.Adam(modelGraphConv.parameters(), lr= 0.01, weight_decay=5e-4)
    criterion = torch.nn.CrossEntropyLoss() 
    for epoch in range (1, train_epoch+1):
        loss = train(data_batched[i], modelGraphConv)
        loss_trained[epoch-1] = loss
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
    print('=========================')
    plt.plot(np.arange(train_epoch), loss_trained)
    plt.show()

and the result of the first three trainings is depicted below

Let us ignore the high loss for now. The thing that confuses me the most, is that at the beginning of each training, the loss jumps back to the value approximately 2. I would expect it continuously going down (or at least remain at the same level), since the multiple graphs that I use for training comes from the same simulation.

So the question is: do I make a mistake in the programming, or is it my misunderstanding of the neural network performance?

Thank you!

Issue Analytics

State:
Created 2 years ago
Comments:30 (14 by maintainers)

Top GitHub Comments

1reaction

RostyslavUAcommented, Jun 8, 2021

All right, so I have collapsed my 40 dimensional feature vector to 2D via T-SNE for one training graph and one test graph

data_emb = TSNE(n_components=2).fit_transform(data.x)
data_test_emb = TSNE(n_components=2).fit_transform(data_test.x)

and this is the result

If I can interpret the result correctly, it looks like the test set is not out-of-distribution, since both test and train data are distributed in a very similar manner.

Essentially, my node feature vector represents the coordinates in 2D, so before one-hot encoding (before bringing it up to 40 dimensions), it looks like this

In other words, picture above depicts the collected data.

At this moment, I cannot think about anything else I could try with the current dataset. If you got any other ideas by looking at my datasets regarding how to improve accuracy of the model or process/modify/analyze the data, I would widely appreciate if you shared. Otherwise, I thank you very much for helping me and we can close the thread.

0reactions

rusty1scommented, Jun 23, 2021

The final_edge_index will use a threshold of 0.5 to decide whether to include an edge or not, which might be to low in your experiment. To only keep edges with higher probability, run:

prob_adj = z @ z.t().sigmoid()
return (prob_adj > threshold).nonzero(as_tuple=False).t()

Top Results From Across the Web

Training on multiple graphs · Issue #2677 · pyg-team ... - GitHub

I am developing a model for the node classification task. I batch multiple graphs into the training and testing batches . After I...

Training on multiple graphs - Questions - Deep Graph Library

I'm trying to write a syntactic parser using the DGL library but I'm struggling with understanding the model training loop. In a link...

Improved Semi-Supervised Learning with Multiple Graphs

Abstract. We present a new approach for graph based semi-supervised learning based on a multi-component extension to the Gaussian MRF model. This approach ......

Learning to Coordinate via Multiple Graph Neural Networks

This paper introduces MGAN for collaborative multi-agent reinforcement learning, a new algorithm that combines graph convolutional networks ...

(PDF) Learning multiple graphs for document recommendations

Scheduling dependent jobs on multiple machines is modeled by the graph multi-coloring problem. In this paper, we consider the problem of ...