Optimal parameters for Google Colab
See original GitHub issueHello,
First of all, thank you for sharing your code and insights with the rest of us!
As for your code, I plan to run it for 12 hours on Google Colab, similarly to the set-up for what is shown in the README.
My datasets consists of images of 256x256 resolution, and I have started training with the following command-line:
!lightweight_gan \
--data {image_dir} \
--disc-output-size 5 \
--aug-prob 0.25 \
--aug-types [translation,cutout,color] \
--amp \
I have noticed that the expected training time is 112.5 hours with 150k iterations (the default setting), which is consistent with the average time of 2.7 seconds per iteration shown in the log. However, it is ~ 9 times more than what is shown in the README. So I wonder if I am doing something wrong, and I see 2 solutions.
First, I could decrease the number of iterations so that it takes 12 hours, by choosing 16k iterations instead of 150k with:
--num-train-steps 16000 \
Is it what you have done for the results shown in the README?
Second, I have noticed that I am only using 3.8 GB of GPU memory, so I could increase the batch size, as you mentioned in https://github.com/lucidrains/lightweight-gan/issues/13#issuecomment-732486110. Edit: However, the training time increases with a larger batch size. For instance, I am using 7.2 GB of GPU memory, and it takes 8.2 seconds per iteration, with the following:
--batch-size 32 \
--gradient-accumulate-every 4 \
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
augmentations are the new feature engineering
It is the official paper for StyleGAN2-ADA, which is an augmentation strategy to make StyleGAN2 work with limited data.
https://github.com/NVlabs/stylegan2-ada
https://arxiv.org/abs/2006.06676
The run-time is longer than advertised (at least for me on Colab, especially if I end up with a K80 GPU). So I cannot comment about results. As I don’t have much computing power, I try to go straight for the most likely parameters.
The reason why I chose 0.4 is that the paper for StyleGAN2-ADA mentions that it was the optimal value when looking for a fixed augmentation strength for their dataset with 2k images.
The reason why I chose “translation” is that the same paper mentions that:
Among the augmentations offered in this repository, only “translation” is very likely to benefit to runs with limited data.
Naturally, take everything with a grain of salt, because this repository is not about StyleGAN2.