question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

random seed is wrong implementation

See original GitHub issue

I cloned the latest version open-reid (latest commit is a1df21b). First, I run the example code:

python examples/softmax_loss.py -d viper -b 64 -j 2 -a resnet50 --logs-dir logs/softmax-loss/viper-resnet50

The result is:

Mean AP: 15.5%
CMC Scores    allshots      cuhk03  market1501
  top-1           7.1%       12.2%        7.1%
  top-5          23.6%       35.6%       23.6%
  top-10         32.9%       47.3%       32.9%

Then, I run the same code again on the same machine:

python examples/softmax_loss.py -d viper -b 64 -j 2 -a resnet50 --logs-dir logs/softmax-loss/viper-resnet50

The result is:

Mean AP: 15.6%
CMC Scores    allshots      cuhk03  market1501
  top-1           7.9%       13.0%        7.9%
  top-5          20.9%       32.8%       20.9%
  top-10         30.9%       44.8%       30.9%

It’s weird that they are different. It seems that these two lines are not work: https://github.com/Cysu/open-reid/blob/a1df21b00f9d3ecfce1329fef55af11f406c16a8/examples/softmax_loss.py#L71-L72 In Dataloader, train_transformer use RandomSizedRectCrop and RandomHorizontalFlip: https://github.com/Cysu/open-reid/blob/a1df21b00f9d3ecfce1329fef55af11f406c16a8/examples/softmax_loss.py#L36-L41 But RandomSizedRectCrop and RandomHorizontalFlip use python built-in random module other than numpy.random. https://github.com/Cysu/open-reid/blob/a1df21b00f9d3ecfce1329fef55af11f406c16a8/reid/utils/data/transforms.py#L19-L42


class RandomHorizontalFlip(object):
    """Horizontally flip the given PIL.Image randomly with a probability of 0.5."""

    def __call__(self, img):
        """
        Args:
            img (PIL.Image): Image to be flipped.
        Returns:
            PIL.Image: Randomly flipped image.
        """
        if random.random() < 0.5:
            return img.transpose(Image.FLIP_LEFT_RIGHT)
        return img

(Note: RandomHorizontalFlip source code at here)

So in examples/softmax_loss.py , I import random and change:

def main(args):
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)

to:

def main(args):
    random.seed(args.seed)
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)

Then I run the same example code twice. The results are still different. Then, in reid/utils/data/transforms.py, I change: https://github.com/Cysu/open-reid/blob/a1df21b00f9d3ecfce1329fef55af11f406c16a8/reid/utils/data/transforms.py#L26-L29 to

for attempt in range(10):
    area = img.size[0] * img.size[1]
    target_area = random.uniform(0.64, 1.0) * area
    print(target_area)
    aspect_ratio = random.uniform(2, 3)

Then run the example code twice. The target_area differ in first run and second run, indicating that random.seed(args.seed) is not work. So I rewrite the reid/utils/data/transforms.py with numpy.random. The final reid/utils/data/transforms.py is:

from __future__ import absolute_import

from torchvision.transforms import *
import numpy as np


class RandomHorizontalFlip(object):
    """Horizontally flip the given PIL.Image randomly with a probability of 0.5."""

    def __call__(self, img):
        """
        Args:
            img (PIL.Image): Image to be flipped.
        Returns:
            PIL.Image: Randomly flipped image.
        """
        if np.random.random() < 0.5:
            return img.transpose(Image.FLIP_LEFT_RIGHT)
        return img


class RectScale(object):
    def __init__(self, height, width, interpolation=Image.BILINEAR):
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def __call__(self, img):
        w, h = img.size
        if h == self.height and w == self.width:
            return img
        return img.resize((self.width, self.height), self.interpolation)


class RandomSizedRectCrop(object):
    def __init__(self, height, width, interpolation=Image.BILINEAR):
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def __call__(self, img):
        for attempt in range(10):
            area = img.size[0] * img.size[1]
            target_area = np.random.uniform(0.64, 1.0) * area
            print(target_area)
            aspect_ratio = np.random.uniform(2, 3)

            h = int(round(math.sqrt(target_area * aspect_ratio)))
            w = int(round(math.sqrt(target_area / aspect_ratio)))

            if w <= img.size[0] and h <= img.size[1]:
                x1 = np.random.randint(0, img.size[0] - w + 1)
                y1 = np.random.randint(0, img.size[1] - h + 1)

                img = img.crop((x1, y1, x1 + w, y1 + h))
                assert(img.size == (w, h))

                return img.resize((self.width, self.height), self.interpolation)

        # Fallback
        scale = RectScale(self.height, self.width,
                          interpolation=self.interpolation)
        return scale(img)

Then run the example code twice. The target_area is the same between first run and second run. But the final results (mAP, CMC) are still different. I’m wondering what’s wrong with the code. Could you check the code and answer my quesion?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Cysucommented, Sep 12, 2017

@zydou I mean some of the cuda kernels that used by cudnn or torch C-implementation could be non-deterministic. One reason could be floating number addition is not associative. You can try in python 0.7 + 0.2 + 0.1 == 0.7 + 0.1 + 0.2. It will print False. This implies that the reduce Op with multiple threads / processes is non-deterministic.

When setting batch size to 1, I suspect there is no need to call the reduce Op. And thus lead to the same result.

0reactions
zydoucommented, Sep 12, 2017

@Cysu Thanks a lot!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Random number generator seed mistakes
Stories of projects that have made mistakes seeding a random number generator and better ways to do it.
Read more >
The Danger of Random Seeds - Towards Data Science
A case study in R about how being pedantic with your random seeds can ... Let's see what my friend did wrong, and...
Read more >
Properly Setting the Random Seed in ML Experiments. Not as ...
While SGD might lead to a noisier error in the gradient estimate, ... The good news is that by carefully setting the random...
Read more >
can't reproduce results even set all random seeds #7068
So the seed function should be implemented every time the randomness is generated (e.g., shuffle) or once for all in the beginning of...
Read more >
The set seed value won't provide the right output of numbers
You say that you are seeding the random number generator with the same seed, making the same calls, and yet the number sequences...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found