Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FID calculation is five orders of magnitude different from the reference FID implementation

See original GitHub issue

🐛 Bug description

The documentation states that ignite.metrics.FID is inspired by the reference Pytorch implementation of FID here. However, when I compare ignite’s calculated FID score to the reference implementation’s, I see that their scores are off by about five orders of magnitude.

Here is a script based on the example in ignite’s documentation.

import torch
import torchvision
import tqdm
from ignite.metrics.gan import FID

torch.manual_seed(0)

m = FID()

y_pred, y = torch.rand(100, 3, 299, 299), torch.rand(100, 3, 299, 299)
for i in tqdm.tqdm(range(len(y_pred))):
    torchvision.utils.save_image(y_pred[i], f'pred/{i}.png')
    torchvision.utils.save_image(y[i], f'gt/{i}.png')
    m.update((y_pred[i:i+1], y[i:i+1]))

print('ignite online FID', m.compute())  # 8.98434690701287e-05

m = FID()
m.update((y_pred, y))
print('ignite batch FID', m.compute())  # 8.98434072559458e-05

This snippet of code saves y_pred to a folder called pred and y to a folder called gt. I then installed pytorch-fid from the reference implementation’s repo and ran

python -m pytorch_fid pred gt --num-workers 8

and the FID score I got was 5.980631998318767.

I would have expected the FID scores to be the same, or at least within numerical error. However, 5.980631998318767 and 8.98434072559458e-05 are off by 5 orders of magnitude.

Environment

PyTorch Version (e.g., 1.4): 1.7.0
Ignite Version (e.g., 0.3.0): 0.4.7
OS (e.g., Linux): MacOS 10.14.6
How you installed Ignite (conda, pip, source): pip
Python version: 3.8.5
Any other relevant information:

Issue Analytics

State:
Created 2 years ago
Comments:11 (4 by maintainers)

Top GitHub Comments

2reactions

louis-shecommented, Jan 23, 2022

I have had a look on this, the difference of the 2 implementations are coming from 2 reasons:

the input is not exactly the same

the code provided is not strict enough. the ignite version called torchvision.utils.save_image to convert the random 0 ~ 1 float tensor to png, and the FID official using torchvision.transform.ToTensor to convert that png to 0 ~ 1 float number.

There will be a slight different from these transforms, which cause the float input to ignite version is not the same as the float input to FID official model. We can using the following code to avoid this.

import torch
import torchvision
import tqdm
from ignite.metrics.gan import FID
from torchvision import transforms
from PIL import Image
import numpy as np

from pytorch_fid.inception import InceptionV3

device = "cuda"
dims = 2048
block_idx = InceptionV3.BLOCK_INDEX_BY_DIM[dims]
model = InceptionV3([block_idx]).to(device)

import torch.nn as nn


class WrapperInceptionV3(nn.Module):

    def __init__(self, fid_incv3):
        super().__init__()
        self.fid_incv3 = fid_incv3

    @torch.no_grad()
    def forward(self, x):
        y = self.fid_incv3(x)
        y = y[0]
        y = y[:, :, 0, 0]
        return y

wrapper_model = WrapperInceptionV3(model)
wrapper_model.eval();

m = FID(num_features=dims, feature_extractor=wrapper_model, device=device)

torch.manual_seed(0)

y_pred, y = [[Image.fromarray(np.random.randint(0, 255, (299, 299, 3), dtype=np.uint8)) for _ in range(2)] for _ in range(2)]

transform = transforms.ToTensor()
y_pred_normed, y_normed = torch.stack([transform(y_pred[k]) for k in range(2)]), torch.stack([transform(y[k]) for k in range(2)])

for i in tqdm.tqdm(range(2)):
    y_pred[i].save(f'pred/{i}.png', format="png")
    y[i].save(f'gt/{i}.png', format="png")
    m.update((y_pred_normed[i:i+1], y_normed[i:i+1]))

print('ignite online FID', m.compute())  # 6.109078042628951

Batch is not the same

the ignite version inference with batch size 1 and official version using batch size 2. it surprised me a lot that batch size 1 and batch size 2 will produce different results. here is the evidence:

you can see that with input batch and batch[0:1], the result of the first element is slightly different.

and i confirmed that the model(batch[0:1])'s output is the same as ignite model output.

1reaction

vfdev-5commented, Feb 28, 2022

@louis-she thanks a lot for the investigations ! Let’s do the following:

let’s open a new issue for cpu/cuda results inconsistency.
update docs of ignite FID with code snippets from this issue providing an example and explaining how to compute almost similar results for pytorch-FID. @sdesrozis can you do this ?

Top Results From Across the Web

How to Implement the Frechet Inception Distance (FID) for ...

The Frechet Inception Distance summarizes the distance between the Inception feature vectors for real and generated images in the same domain.

How to Evaluate GANs using Frechet Inception Distance (FID)

In this article, we will briefly discuss the details of GAN evaluation and how to implement the Frechet Inception Distance (FID) evaluation pipeline....

Nines of Qubit Fidelity - Jack Krupansky - Medium

Nines are a shorthand and simply defer to orders of magnitude, powers of ten. Actually, nines are the order of magnitude of the...

Bubble Sort – Algorithm, Source Code, Time Complexity

Below you will find the optimized implementation of Bubble Sort described above. In the first iteration, the largest element moves to the far ......

FIR Filter Design - MATLAB & Simulink - MathWorks

For an order n linear phase FIR filter, the group delay is n/2, ... You can find the delayed Hilbert transform of a...