Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to get test predictions (and other non-scalar metrics)?

See original GitHub issue

❓ Questions and Help

What is your question?

If I have a trained model and I want to test it using Trainer.test(), how do I get the actual predictions of the model on the test set?

I tried to log the predictions and writing a Callback to get the logs at test end, but it seems like I can only log scalar Tensors in the dictionary returned by my model’s test_end().

Issue Analytics

State:
Created 4 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

6reactions

awaelchlicommented, Mar 17, 2020

That’s not the same as logging, that was not clear in your original question. You will have to collect your predictions in test_step in a variable like self.predictions which is a list or something. Then after you call trainer.test() you can access model.predictions in your notebook. What do you think?

1reaction

hankyul2commented, Jan 12, 2022

I know this is a old question. But I think this question & answer could be a walk around.

import os

import torch
from torch.utils.data import DataLoader

from torchvision import models, transforms
from torchvision.datasets import CIFAR10

from pytorch_lightning import LightningModule, LightningDataModule, Trainer

os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'


class CIFAR(LightningDataModule):
    def __init__(self, img_size=32, batch_size=32):
        super().__init__()
        self.img_size = img_size if isinstance(img_size, tuple) else (img_size, img_size)
        self.batch_size = batch_size

        self.test_transforms = transforms.Compose([
            transforms.Resize(self.img_size),
            transforms.CenterCrop(self.img_size),
            transforms.ToTensor(),
            transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
        ])

    def prepare_data(self) -> None:
        CIFAR10(root='data', train=True, download=True)
        CIFAR10(root='data', train=False, download=True)
    
    def setup(self, stage=None):
        self.test_ds = CIFAR10(root='data', train=False, download=False, transform=self.test_transforms)

    def test_dataloader(self):
        return DataLoader(self.test_ds, num_workers=4, batch_size=self.batch_size, shuffle=False)


class BasicModule(LightningModule):
    def __init__(self):
        super().__init__()
        self.model = models.resnet18(num_classes=10, pretrained=False)

    def test_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.model(x)
        return y, y_hat.argmax(dim=-1)

    def test_epoch_end(self, outputs):
        results = torch.zeros((10, 10)).to(self.device)
        for output in outputs:
            for label, prediction in zip(*output):
                results[int(label), int(prediction)] += 1
        torch.distributed.reduce(results, 0, torch.distributed.ReduceOp.SUM)
        acc = results.diag().sum() / results.sum()
        if self.trainer.is_global_zero:
            self.log("test_metric", acc, rank_zero_only=True)
            self.trainer.results = results
        
    
if __name__ == '__main__':
    data = CIFAR(batch_size=512)
    model = BasicModule()
    trainer = Trainer(max_epochs=2, gpus='0,1', strategy="ddp", precision=16)
    test_results = trainer.test(model, data)
    if trainer.is_global_zero:
        print(test_results)
        print(trainer.results)

Top Results From Across the Web

How to get test predictions (and other non-scalar metrics)?

If I have a trained model and I want to test it using Trainer.test() , how do I get the actual predictions of...

Evaluate predictions - Hugging Face

To learn more about how to use metrics, take a look at the library Evaluate! In addition to metrics, you can find more...

Going Beyond Scalar Metrics: Behavioral Testing of NLP Models

There are 4 different user perspectives that shape a behavioral test suite. ... DIR tests expect predictions to change in a certain way, ......

A Comprehensive Guide on How to Monitor Your Models in ...

In terms of model predictions, the most important thing to monitor is model performance in line with business metrics. Model evaluation metrics.

3.3. Metrics and scoring: quantifying the quality of predictions

There are 3 different APIs for evaluating the quality of a model's predictions: Estimator ... Use sklearn.metrics.get_scorer_names() to get valid options.