random seed is wrong implementation
See original GitHub issueI cloned the latest version open-reid (latest commit is a1df21b). First, I run the example code:
python examples/softmax_loss.py -d viper -b 64 -j 2 -a resnet50 --logs-dir logs/softmax-loss/viper-resnet50
The result is:
Mean AP: 15.5%
CMC Scores allshots cuhk03 market1501
top-1 7.1% 12.2% 7.1%
top-5 23.6% 35.6% 23.6%
top-10 32.9% 47.3% 32.9%
Then, I run the same code again on the same machine:
python examples/softmax_loss.py -d viper -b 64 -j 2 -a resnet50 --logs-dir logs/softmax-loss/viper-resnet50
The result is:
Mean AP: 15.6%
CMC Scores allshots cuhk03 market1501
top-1 7.9% 13.0% 7.9%
top-5 20.9% 32.8% 20.9%
top-10 30.9% 44.8% 30.9%
It’s weird that they are different. It seems that these two lines are not work:
https://github.com/Cysu/open-reid/blob/a1df21b00f9d3ecfce1329fef55af11f406c16a8/examples/softmax_loss.py#L71-L72
In Dataloader, train_transformer
use RandomSizedRectCrop
and RandomHorizontalFlip
:
https://github.com/Cysu/open-reid/blob/a1df21b00f9d3ecfce1329fef55af11f406c16a8/examples/softmax_loss.py#L36-L41
But RandomSizedRectCrop
and RandomHorizontalFlip
use python built-in random module other than numpy.random.
https://github.com/Cysu/open-reid/blob/a1df21b00f9d3ecfce1329fef55af11f406c16a8/reid/utils/data/transforms.py#L19-L42
class RandomHorizontalFlip(object):
"""Horizontally flip the given PIL.Image randomly with a probability of 0.5."""
def __call__(self, img):
"""
Args:
img (PIL.Image): Image to be flipped.
Returns:
PIL.Image: Randomly flipped image.
"""
if random.random() < 0.5:
return img.transpose(Image.FLIP_LEFT_RIGHT)
return img
(Note: RandomHorizontalFlip
source code at here)
So in examples/softmax_loss.py
, I import random
and change:
def main(args):
np.random.seed(args.seed)
torch.manual_seed(args.seed)
to:
def main(args):
random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
Then I run the same example code twice. The results are still different.
Then, in reid/utils/data/transforms.py
, I change:
https://github.com/Cysu/open-reid/blob/a1df21b00f9d3ecfce1329fef55af11f406c16a8/reid/utils/data/transforms.py#L26-L29
to
for attempt in range(10):
area = img.size[0] * img.size[1]
target_area = random.uniform(0.64, 1.0) * area
print(target_area)
aspect_ratio = random.uniform(2, 3)
Then run the example code twice. The target_area
differ in first run and second run, indicating that random.seed(args.seed)
is not work.
So I rewrite the reid/utils/data/transforms.py
with numpy.random. The final reid/utils/data/transforms.py
is:
from __future__ import absolute_import
from torchvision.transforms import *
import numpy as np
class RandomHorizontalFlip(object):
"""Horizontally flip the given PIL.Image randomly with a probability of 0.5."""
def __call__(self, img):
"""
Args:
img (PIL.Image): Image to be flipped.
Returns:
PIL.Image: Randomly flipped image.
"""
if np.random.random() < 0.5:
return img.transpose(Image.FLIP_LEFT_RIGHT)
return img
class RectScale(object):
def __init__(self, height, width, interpolation=Image.BILINEAR):
self.height = height
self.width = width
self.interpolation = interpolation
def __call__(self, img):
w, h = img.size
if h == self.height and w == self.width:
return img
return img.resize((self.width, self.height), self.interpolation)
class RandomSizedRectCrop(object):
def __init__(self, height, width, interpolation=Image.BILINEAR):
self.height = height
self.width = width
self.interpolation = interpolation
def __call__(self, img):
for attempt in range(10):
area = img.size[0] * img.size[1]
target_area = np.random.uniform(0.64, 1.0) * area
print(target_area)
aspect_ratio = np.random.uniform(2, 3)
h = int(round(math.sqrt(target_area * aspect_ratio)))
w = int(round(math.sqrt(target_area / aspect_ratio)))
if w <= img.size[0] and h <= img.size[1]:
x1 = np.random.randint(0, img.size[0] - w + 1)
y1 = np.random.randint(0, img.size[1] - h + 1)
img = img.crop((x1, y1, x1 + w, y1 + h))
assert(img.size == (w, h))
return img.resize((self.width, self.height), self.interpolation)
# Fallback
scale = RectScale(self.height, self.width,
interpolation=self.interpolation)
return scale(img)
Then run the example code twice. The target_area
is the same between first run and second run. But the final results (mAP, CMC) are still different.
I’m wondering what’s wrong with the code. Could you check the code and answer my quesion?
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
@zydou I mean some of the cuda kernels that used by cudnn or torch C-implementation could be non-deterministic. One reason could be floating number addition is not associative. You can try in python
0.7 + 0.2 + 0.1 == 0.7 + 0.1 + 0.2
. It will printFalse
. This implies that the reduce Op with multiple threads / processes is non-deterministic.When setting batch size to 1, I suspect there is no need to call the reduce Op. And thus lead to the same result.
@Cysu Thanks a lot!