Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] VITS LJSpeech recipe no improvement in 70k steps (batch size 16)

See original GitHub issue

Describe the bug

I’ve used the VITS LJSpeech recipe for 70k steps (batch size 16) and have seen no drops in loss, the alignment is always perfect, and the audio always sounds the same.

Loss:

Alignment at 12k:

Alignment at 70k:

Audio at 12k:

https://user-images.githubusercontent.com/88913682/142049725-c539f76b-2e10-41ef-a6e8-c29d0798738a.mp4

Audio at 70k:

https://user-images.githubusercontent.com/88913682/142049732-801815eb-886f-4b30-bb85-810fd471c6b9.mp4

To Reproduce

Apply the fix described in Additional context below.

Copy recipes/ljspeech/vits_tts/train_vits.py to runs2/train_vits.py and make the following changes:

Fix ljspeech path
batch_size=16
eval_batch_size=8

Run it with CUDA_VISIBLE_DEVICES=1 python runs2/train_vits.py.

Expected behavior

I expect that:

Loss drops over time.
Alignment starts out blurry and develops a line over time.
The audio changes over time.

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
PyTorch or TensorFlow version (use command below): PyTorch 1.10
Python version: 3.7
CUDA/cuDNN version: 11.3 / 8200
GPU model and memory: RTX 3080 10GB
Exact command to reproduce: CUDA_VISIBLE_DEVICES=1 python runs2/train_vits.py

Additional context

When I originally ran it I ran into this bug: https://github.com/NVIDIA/apex/issues/694

And I applied this fix: https://github.com/NVIDIA/apex/issues/694#issuecomment-918833904

EDIT

Be careful applying the fix I mentioned; I think it’s the reason that training was broken for me.

Issue Analytics

State:
Created 2 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

skol101commented, Nov 23, 2021

Looks like apex is supported. Too bad apex package from conda doesn’t run on >=python3.9. With mixed_precision: False suddenly 90% of 3090 memory is used when running batch 32, instead of 65-70% utilisation with mixed_precision: True.

1reaction

khusainovaidarcommented, Nov 21, 2021

Had more or less the same issue, ‘solved’ by turning mixed_precision to False in config.

Top Results From Across the Web

Multiple independent models, only one requires apex.amp ...

fijipants mentioned this issue on Nov 16, 2021. [Bug] VITS LJSpeech recipe no improvement in 70k steps (batch size 16) coqui-ai/TTS#938.

VISinger: Variational Inference with Adversarial Learning for ...

In this paper, we build upon VITS and propose VISinger, an end-to-end singing voice synthesis system based on variational inference (VI). To the ......

0.xml - Kaggle

... 2016-07-16T03:50:21.863Z https://www.kaggle.com/datasets/uciml/electric-power-consumption-data-set 2016-08-23T17:02:15.51Z ...

XXI congresso nazionale. - Morlacchi Editore

does not exacerbate existing IP system inequalities. We therefore propose steps that industry, civil society, and policymakers can take to ...

Speech synthesis issues and how to fix | GitAnswer

Use ESPnet as a library, the acc doesn't improve. ... TTS vITS LJSpeech recipe no improvement in 70k steps (batch size 16) ...