Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

can't guarantee the reproducibility with seeding everything

See original GitHub issue

In the training code, I set all the random seeds,

  random.seed(cfg.RNG_SEED)
  np.random.seed(cfg.RNG_SEED)
  torch.manual_seed(cfg.RNG_SEED)
  if args.cuda:
    torch.cuda.manual_seed(cfg.RNG_SEED)
    torch.cuda.manual_seed_all(cfg.RNG_SEED)
  torch.backends.cudnn.deterministic = True
  torch.backends.cudnn.benchmark = False
  os.environ['PYTHONHASHSEED'] = str(cfg.RNG_SEED)

But when I tried to run the same code (with same settings and hyperparameters) multiple times, I got different loss curves and evaluation results. Do you have any idea about this? Very much thanks.@jwyang

Issue Analytics

State:
Created 5 years ago
Comments:10

Top GitHub Comments

2reactions

Jokoe66commented, Mar 25, 2019

@squirrel16 I have found the same problem. Have you got any ideas?

@jwyang @squirrel16 Finally I find that with RoiPooling the model is reproducible after every rng_seed is set and cudnn is deterministic. I’ m not clear what makes RoIAlign non-deterministic. I’ m grateful if you can dig out the reason.

0reactions

Jokoe66commented, Oct 17, 2019

Usually the seed in worker_init_fn is set to SEED+worker_id. Maybe you can have a try.

------------------ Original ------------------ From: Mohandass Muthuraja <notifications@github.com> Date: Thu,Oct 17,2019 7:07 PM To: jwyang/faster-rcnn.pytorch <faster-rcnn.pytorch@noreply.github.com> Cc: Jokoe66 <531211903@qq.com>, Mention <mention@noreply.github.com> Subject: Re: [jwyang/faster-rcnn.pytorch] can’t guarantee the reproducibility with seeding everything (#394)

yes . I did . np.random.seed(3)``` and dataloader = torch.utils.data.DataLoader(dataset, batch_size=args.batch_size, sampler=sampler_batch, num_workers=args.num_workers, worker_init_fn=_init_fn)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.