Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Discussions for training / VoxSRC

See original GitHub issue

Changing --n_mels from 40 to 64 leads to a small increase in performance.
Using --log_input also leads to a small increase in performance.
Combining two loss functions (e.g. angleproto and softmax) sometimes has positive effect. This should be defined as a new loss function that returns the sum of two losses in the loss directory.
Zero padding of the input causes to a significant adverse effect on performance. When there is a large variation in the length of input audio files (e.g. VoxSRC), I recommend --eval_frames 0 which uses whatever length of audio is available without padding or cropping.

For example, this configuration gives 1.98% EER using the standard train and test lists. I believe that many of you have trained better models using this trainer. I would appreciate if you are able to share your knowledge!

Issue Analytics

State:
Created 3 years ago
Comments:33

Top GitHub Comments

1reaction

ukemamastercommented, Jun 14, 2021

@zh794390558 Did you solve your problem of slow training? I am having the same problem, one epoch takes almost 3 hours (sometimes more than that) on 8 Tesla T4 GPUs using distributed training.

But my case is a little different, explained here in detail.

If you have solved your problem, could you please share your solution?

0reactions

yy835055664commented, Oct 10, 2020

Using --log_input also leads to a small increase in performance.

Hello, Joonson. Thank you for your ideas. For——log_input features, what is the principle of this method? How to improve performance. Hope you can reply Thank you