[Bug] VITS LJSpeech recipe no improvement in 70k steps (batch size 16)
See original GitHub issueDescribe the bug
I’ve used the VITS LJSpeech recipe for 70k steps (batch size 16) and have seen no drops in loss, the alignment is always perfect, and the audio always sounds the same.
Loss:
Alignment at 12k:
Alignment at 70k:
Audio at 12k:
Audio at 70k:
To Reproduce
Apply the fix described in Additional context below.
Copy recipes/ljspeech/vits_tts/train_vits.py
to runs2/train_vits.py
and make the following changes:
- Fix ljspeech path
- batch_size=16
- eval_batch_size=8
Run it with CUDA_VISIBLE_DEVICES=1 python runs2/train_vits.py
.
Expected behavior
I expect that:
- Loss drops over time.
- Alignment starts out blurry and develops a line over time.
- The audio changes over time.
Environment (please complete the following information):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- PyTorch or TensorFlow version (use command below): PyTorch 1.10
- Python version: 3.7
- CUDA/cuDNN version: 11.3 / 8200
- GPU model and memory: RTX 3080 10GB
- Exact command to reproduce:
CUDA_VISIBLE_DEVICES=1 python runs2/train_vits.py
Additional context
When I originally ran it I ran into this bug: https://github.com/NVIDIA/apex/issues/694
And I applied this fix: https://github.com/NVIDIA/apex/issues/694#issuecomment-918833904
EDIT
Be careful applying the fix I mentioned; I think it’s the reason that training was broken for me.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
Looks like apex is supported. Too bad apex package from conda doesn’t run on >=python3.9. With mixed_precision: False suddenly 90% of 3090 memory is used when running batch 32, instead of 65-70% utilisation with mixed_precision: True.
Had more or less the same issue, ‘solved’ by turning mixed_precision to False in config.