question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

help with training

See original GitHub issue

Thanks for the awesome code! I am training my own model right now and have a few questions:

  • currently I am using 100k (out of around 1.8m) images from CelebAMask-HQ, ffhq and vggface to train the model. did you use the full set to train your model?
  • I didn’t see large improvement for most losses anymore (160k steps trained, 4gpus x 12images/batch); is this normal? should I just continue training for more steps? image image image image image
  • I also checked the validation results, and the reconstruction is not good. image image image
  • I noticed shuffle for the training dataloader is not set to True, did you use the same setting?

Thanks!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:28

github_iconTop GitHub Comments

2reactions
niuyuanccommented, Nov 19, 2022

@antonsanchez Could you please share the pretrained weights? Thank you so much 🙏🙏🙏 My email: niuyuanc@163.com Thanks🙏🙏🙏

2reactions
y-x-ccommented, Nov 9, 2020

Hi! You did very fast training!

  1. Yes, I used full-set dataset. I don’t know about IJB-C dataset. The distribution of dataset can influence to your model.
  2. In the paper, they trained for 500K steps. I trained for over 500K. In my eye, your losses are getting down for attribute_loss but unstable for Rec and ID loss. In my case, the two losses are more stable and lower at the same steps.
  3. shuffle option in training dataloader should be True. It is clearly my mistake while publishing.

Thanks for your reply.

  1. I just corrected the description, I am using the same datasets (CelebAMask-HQ, ffhq and vggface) as well.

  • So in your case, each step has 64 images; and let’s say there are 1.5m images in those three datasets, so you trained for around 4 epochs (= 64 * 500000 / 1500000 / 5 ) in total?
  • In my case, each step only has 48 images, so maybe that’s why the two losses are higher at the same steps.
  • I found the Rec loss is going much lower in the third epoch, and the results are much better than before. I will continue my current training and see what’s going on.
  1. Thanks for the clarification, I also changed to True during my training.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Find money for training | CareerOneStop
You may be eligible for programs that help cover the cost of training or retraining. Find out more on CareerOneStop.
Read more >
The Importance of Training Employees: 11 Benefits | Indeed.com
One of the best ways to enhance knowledge and skills is through training. Providing employees with relevant and consistent training can help ......
Read more >
Types of Training/Education - HRA - NYC.gov
The NYC Training Guide provides important information about training classes, including location, cost, and reviews from students. Learn how to use the guide....
Read more >
Adult Training Programs - U.S. Department of Labor
For a list of programs nearest you, contact an American Job Center or call ETA's toll-free help line at 1-877-US-2JOBS (TTY: 1-877-889-5267).
Read more >
HELP 0-3 Training - self-paced Module (UK)
The University of Kentucky (UK) offers a self-paced online training module on HELP® 0-3. Click here to go to the UK Registration page...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found