question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Weird behavior when using GraphConv with norm=right

See original GitHub issue

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Define model with GraphConv layer and set norm=right
  2. Train model and evaluate error/metrics on train data
  3. Metrics logged while training improves as expected, but with the same data and model under model.eval() gives near-random performance
  4. Re-run the same code, but with removed norm=right
  5. As expected, evaluating metrics on train data shows improvement.

From what I can gather, setting norm=‘right’ introduces some form of error somehow (which doesn’t make a lot of sense, after a brief look at the implementation). The model itself does not have any sources of non-determinism like Dropout either, so that part is ruled out as well.

Also, the error goes away if I do not set the model to evaluation mode (and let it stay in train mode) while evaluating, which doesn’t make any sense: the only difference between the two for this model would be gradient accumulation.

Code snippet to reproduce

from dgl.nn.pytorch import GraphConv
import torch.nn as nn
import torch.optim as optim
import torch as ch
from tqdm import tqdm


class GCN(nn.Module):
    def __init__(self, n_inp, n_hidden, n_layers, n_classes=2, residual=False):
        super(GCN, self).__init__()
        self.layers = nn.ModuleList()
        self.residual = residual

        # input layer
        self.layers.append(
            GraphConv(n_inp, n_hidden, norm='right'))
            # GraphConv(n_inp, n_hidden))

        # hidden layers
        for i in range(n_layers-1):
            self.layers.append(
                GraphConv(n_hidden, n_hidden, norm='right'))
                # GraphConv(n_hidden, n_hidden))

        # output layer
        self.final = GraphConv(n_hidden, n_classes, norm='right')
        # self.final = GraphConv(n_hidden, n_classes)
        self.activation = nn.ReLU()

    def forward(self, g, latent=None):

        if latent is not None:
            if latent < 0 or latent > len(self.layers):
                raise ValueError("Invald interal layer requested")

        x = g.ndata['feat']
        for i, layer in enumerate(self.layers):
            xo = self.activation(layer(g, x))

            # Add prev layer directly, if requested
            if self.residual and i != 0:
                xo = self.activation(xo + x)

            x = xo

            # Return representation, if requested
            if i == latent:
                return x

        return self.final(g, x)


def true_positive(pred, target):
    return (target[pred == 1] == 1).sum().item()


def get_metrics(y, y_pred, threshold=0.5):
    y_ = 1 * (y_pred > threshold)
    tp = true_positive(y_, y)
    precision = tp / ch.sum(y_ == 1)
    recall = tp / ch.sum(y == 1)
    f1 = (2 * precision * recall) / (precision + recall)

    precision = precision.item()
    recall = recall.item()
    f1 = f1.item()

    # Check for NaNs
    if precision != precision:
        precision = 0
    if recall != recall:
        recall = 0
    if f1 != f1:
        f1 = 0

    return (precision, recall, f1)


# @ch.no_grad()
def lmao(model, loader, gpu):
    loss_func = nn.CrossEntropyLoss()

    tot_loss, precision, recall, f1 = 0, 0, 0, 0
    iterator = enumerate(loader)
    iterator = tqdm(iterator, total=len(loader))

    for e, batch in iterator:

        # Shift graph to GPU
        if gpu:
            batch = batch.to('cuda')

        # Get model predictions and get loss
        labels = batch.ndata['y'].long()
        logits = model(batch)
        loss = loss_func(logits, labels)
        probs = ch.softmax(logits, dim=1)[:, 1]

        # Get metrics
        m = get_metrics(labels, probs)
        precision += m[0]
        recall += m[1]
        f1 += m[2]

        tot_loss += loss.item()
        iterator.set_description(
            "Loss: %.5f | Precision: %.3f | Recall: %.3f | F-1: %.3f" %
            (tot_loss / (e+1), precision / (e+1), recall / (e+1), f1 / (e+1)))
    return tot_loss / (e+1)


def epoch(model, loader, gpu, optimizer=None, verbose=False):
    loss_func = nn.CrossEntropyLoss()
    is_train = True
    if optimizer is None:
        is_train = False

    tot_loss, precision, recall, f1 = 0, 0, 0, 0
    iterator = enumerate(loader)
    if verbose:
        iterator = tqdm(iterator, total=len(loader))

    with ch.set_grad_enabled(is_train):
        for e, batch in iterator:

            if gpu:
                # Shift graph to GPU
                batch = batch.to('cuda')

            # Get model predictions and get loss
            labels = batch.ndata['y'].long()
            logits = model(batch)
            loss = loss_func(logits, labels)

            with ch.no_grad():
                probs = ch.softmax(logits, dim=1)[:, 1]

            # Backprop gradients if training
            if is_train:
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            # Get metrics
            m = get_metrics(labels, probs)
            precision += m[0]
            recall += m[1]
            f1 += m[2]

            tot_loss += loss.detach().item()
            if verbose:
                iterator.set_description(
                    "Loss: %.5f | Precision: %.3f | Recall: %.3f | F-1: %.3f" %
                    (tot_loss / (e+1), precision / (e+1), recall / (e+1), f1 / (e+1)))
    return tot_loss / (e+1)


def train_model(net, ds, args):
    train_loader, test_loader = ds.get_loaders(1, shuffle=False)
    optimizer = optim.Adam(net.parameters(), lr=args.lr)

    for e in range(args.epochs):
        # Train
        print("[Train]")
        net.train()
        epoch(net, train_loader, args.gpu, optimizer, verbose=args.verbose)

        # Test
        print("[Eval]")
        net.eval()

        epoch(net, train_loader, args.gpu, None, verbose=args.verbose)
        print()

Expected behavior

Loss/metrics keep improving as the model is trained, so re-evaluating them on the SAME data should indeed show similar performance. Instead, the performance logged while training keeps on improving while checking performance on the same dataset and model in the evaluation model leads to near-random performance. Example of what I’m talking about (evaluation is also done on train data):

image

Environment

  • DGL Version: 0.6.1
  • Backend Library & Version: PyTorch 1.7.1
  • OS: Linux
  • How you installed DGL: pip
  • Python version: 3.6.10
  • CUDA/cuDNN version: 10.1/7.5.0
  • GPU models and configuration: NVidia Quadro RTX 4000

Additional context

Error persists without GPU as well (training on CPU)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:32

github_iconTop GitHub Comments

1reaction
BarclayIIcommented, Jul 7, 2021
  1. This is a bidirected graph, so the in-degrees and out-degrees are the same.

The in-degree and out-degree are the same for the same node. However the denominator of both is the square root of the product between the out-degree of source node and in-degree of destination node, which are not necessarily the same.

As you can see, the output activation indeed depends on the node degrees. Even if all node features are the same, the graph will output different features based on the degrees of nodes and not the same features for all nodes, as you suggested.

Before the code you showed, the output representation is computed via summing the incoming messages. Since the number of incoming messages of a node is the same as the node’s in-degree, the output will be the same value.

  1. The architecture I posted above has been used in existing work (off of which I based this experiment) and reached an F-1 score upwards of 0.9. The only difference between their implementation and this one is the library used: they used torch_geometric, while this code is for dgl.

The difference between their normalization and ours is that they divide the outgoing messages by out-degrees before message passing. That is OK.

If I write down the equations things will get clearer. Assuming that x is the same input feature for all nodes.

  • Theirs:
  • Our both:
  • Our right:

With DGL 0.6+ you can specify your own normalization weights using the EdgeWeightNorm module, though I can add another normalization option in GraphConv if you want to.

1reaction
Rhett-Yingcommented, Jul 2, 2021

DROPPUT, DROPOUT, DROPOUT

Finally, issue could be reproduced in my side. The reason why I cannot repro is no dropout is configured in the code snippet you pasted at the top of this post. Dropout is configured in the ‘gist.py’ you just shared.

As for the issue, I’d like to blame dropout which is the main difference between model.train() and model.eval(). Why do I blame to dropout? If dropout=0.0 when calling model.train() with norm=‘right’, the precision is always ~0.000(this is what I reproduced before), not mention model.eval() on train_loader and test_loader. In other words, if dropout=0.0, model.train() is almost same as model.eval() because no dropout at all. But if dropout=0.5, this takes effect in model.train() which obtains good precision(>0.7) while no dropout at all in model.eval() which results in 0.000 precision.

In short, model is vulnerable and sensitive to dropout if norm=‘right’. If norm=‘both’, model is more robust and less sensitive to dropout even dropout=0.0, according to my experiment.

I think we’d better train with GraphConv(norm=‘both’), dropout=0.5 to obtain a robust model in this scenario.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found