Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training results(IS and FID) are not good as yours with same training process

See original GitHub issue

Hi ajbrock, I was running the training code on ImageNet by using default script launch_BigGAN_bs256x8.sh. It has finished 134k iterations and here is the log file. Screen Shot 2019-05-16 at 9 33 58 AM

Compare with the log file that you released, I got the worse results. I kept all the parameters as same as your default settings. The training is on 8xV100. Do you have any suggestion to make it better? Or what should I check to get a similar result as yours?

Thanks a lot!

Issue Analytics

State:
Created 4 years ago
Comments:14 (4 by maintainers)

Top GitHub Comments

2reactions

ajbrockcommented, May 16, 2019

Hi Qi,

There can be a substantial amount of variance in the time to convergence for a model (I only had time to train one with this codebase as I don’t have unfettered access to that kind of compute) so it’s not surprising that yours might need longer to converge/collapse–it appears to still be training.

I’d say let it run and see what IS/FID it gets to when it explodes and dies. This would also be a helpful datapoint for this repo to start getting a better sense of the variance in #itrs required =); if you wouldn’t mind posting the full logfile (e.g. in a pastebin) I can take a look at them and check for any anomalies.

0reactions

lihuiknightcommented, Mar 13, 2021

hi guys, how to handle the above issues? do you reproduce the released results.