Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Different behaviour of training HifiGan depending on number of GPUs used

See original GitHub issue

Describe the bug Running HifiGan training through distribute.py is showing different stats from running HifiGan training through train_vocoder.py

To Reproduce Steps to reproduce the behavior:

Run single GPU: CUDA_VISIBLE_DEVICES=0 python …/…/TTS/TTS/bin/distribute.py --script train_hifigan.py
Observe output:

--> STEP: 25/349 -- GLOBAL_STEP: 9475
     | > G_l1_spec_loss: 0.28888  (0.28544)
     | > G_gen_loss: 12.99941  (12.84473)
     | > G_adv_loss: 0.00000  (0.00000)
     | > loss_0: 12.99941  (12.84473)
     | > grad_norm_0: 94.30904  (85.09504)
     | > current_lr_0: 0.00048 
     | > current_lr_1: 0.00100 
     | > step_time: 0.29350  (0.29282)
     | > loader_time: 0.00140  (0.00138)

Run another command: CUDA_VISIBLE_DEVICES=0 python …/…/TTS/TTS/bin/train_vocoder.py --config_path config.json
Observe output:

 --> STEP: 150/699 -- GLOBAL_STEP: 150
     | > G_l1_spec_loss: 0.65889  (0.96479)
     | > G_mse_fake_loss: 0.35978  (0.37755)
     | > G_feat_match_loss: 0.02501  (0.01835)
     | > G_gen_loss: 29.65002  (43.41537)
     | > G_adv_loss: 3.06036  (2.35913)
     | > loss_0: 32.71038  (45.77450)
     | > grad_norm_0: 0.00000  (0.00000)
     | > D_mse_gan_loss: 0.46084  (0.54663)
     | > D_mse_gan_real_loss: 0.10781  (0.08349)
     | > D_mse_gan_fake_loss: 0.02431  (0.06644)
     | > loss_1: 0.46084  (0.54663)
     | > grad_norm_1: 0.00000  (0.00000)
     | > current_lr_0: 0.00086 
     | > current_lr_1: 0.00086 
     | > step_time: 1.24380  (1.24340)
     | > loader_time: 0.00150  (0.00167)

Expected behavior Both ways should work equally.

Environment (please complete the following information): OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04 PyTorch or TensorFlow version (use command below): pytorch 1.10.0 Python version: 3.8.11 CUDA/cuDNN version: py3.8_cuda11.3_cudnn8.2.0_0 GPU model and memory: 2xRTX 3090

Additional context TTS verion 4.0.2.-dev

Here’s train_hifigan.py

import os

from TTS.trainer import Trainer, TrainingArgs
from TTS.utils.audio import AudioProcessor
from TTS.vocoder.configs import HifiganConfig
from TTS.vocoder.datasets.preprocess import load_wav_data
from TTS.vocoder.models.gan import GAN

output_path = os.path.dirname(os.path.abspath(__file__))

config = HifiganConfig(
    batch_size=64,
    eval_batch_size=16,
    num_loader_workers=4,
    num_eval_loader_workers=4,
    run_eval=True,
    test_delay_epochs=5,
    epochs=1000,
    seq_len=8192,
    pad_short=2000,
    use_noise_augment=True,
    eval_split_size=10,
    print_step=25,
    print_eval=False,
    mixed_precision=False,
    lr_gen=1e-3,
    lr_disc=1e-3,
    data_path=os.path.join(output_path, "../datasets/vctk_all_wavs"),
    output_path=output_path,
)

# init audio processor
ap = AudioProcessor(**config.audio.to_dict())

# load training samples
eval_samples, train_samples = load_wav_data(config.data_path, config.eval_split_size)


# init model
model = GAN(config)

# init the trainer and 🚀
trainer = Trainer(
    TrainingArgs(),
    config,
    output_path,
    model=model,
    train_samples=train_samples,
    eval_samples=eval_samples,
    training_assets={"audio_processor": ap},
)
trainer.fit()

Issue Analytics

State:
Created 2 years ago
Comments:13 (2 by maintainers)

Top GitHub Comments

1reaction

skol101commented, Nov 30, 2021

UPDATE: actually all it matters to get different output is number of GPUs. With this command

 CUDA_VISIBLE_DEVICES="0" python ../../TTS/TTS/bin/distribute.py --script ../../TTS/TTS/bin/train_vocoder.py --config_path config.json
['../../TTS/TTS/bin/train_vocoder.py', '--continue_path=', '--restore_path=', '--config_path=config.json', '--group_id=group_2021_11_30-141550', '--use_ddp=true', '--rank=0']

the model trains as expected:

 --> STEP: 225/699 -- GLOBAL_STEP: 225
     | > G_l1_spec_loss: 0.51882  (0.84050)
     | > G_mse_fake_loss: 0.31469  (0.38687)
     | > G_feat_match_loss: 0.01629  (0.01684)
     | > G_gen_loss: 23.34689  (37.82264)
     | > G_adv_loss: 2.07442  (2.20507)
     | > loss_0: 25.42131  (40.02770)
     | > grad_norm_0: 0.00000  (0.00000)
     | > D_mse_gan_loss: 0.45587  (0.55319)
     | > D_mse_gan_real_loss: 0.06953  (0.09215)
     | > D_mse_gan_fake_loss: 0.05403  (0.07818)
     | > loss_1: 0.45587  (0.55319)
     | > grad_norm_1: 0.00000  (0.00000)
     | > current_lr_0: 0.00080 
     | > current_lr_1: 0.00080 
     | > step_time: 1.23120  (1.23022)
     | > loader_time: 0.00170  (0.00163)

0reactions

skol101commented, Dec 1, 2021

Thank you, indeed the training started working on 2 GPUs, but no improvement for me after 8k steps:

Here’s step 6k:

 --> STEP: 325/349 -- GLOBAL_STEP: 5925
     | > G_l1_spec_loss: 0.35088  (0.33620)
     | > G_mse_fake_loss: 0.39742  (0.38448)
     | > G_feat_match_loss: 0.05728  (0.05410)
     | > G_gen_loss: 15.78973  (15.12918)
     | > G_adv_loss: 6.58330  (6.22701)
     | > loss_0: 22.37303  (21.35619)
     | > grad_norm_0: 0.00000  (0.00000)
     | > D_mse_gan_loss: 0.38689  (0.39288)
     | > D_mse_gan_real_loss: 0.04797  (0.04380)
     | > D_mse_gan_fake_loss: 0.03050  (0.03773)
     | > loss_1: 0.38689  (0.39288)
     | > grad_norm_1: 0.00000  (0.00000)
     | > current_lr_0: 2.6612350037403064e-06 
     | > current_lr_1: 2.6612350037403064e-06 
     | > step_time: 1.21690  (1.21557)
     | > loader_time: 0.00150  (0.00175)


 > EVALUATION 


  --> EVAL PERFORMANCE
     | > avg_loader_time: 0.00036 (+0.00003)
     | > avg_G_l1_spec_loss: 0.32643 (-0.00004)
     | > avg_G_mse_fake_loss: 0.36232 (+0.01273)
     | > avg_G_feat_match_loss: 0.05402 (+0.00055)
     | > avg_G_gen_loss: 14.68919 (-0.00173)
     | > avg_G_adv_loss: 6.19600 (+0.07220)
     | > avg_loss_0: 20.88519 (+0.07047)
     | > avg_D_mse_gan_loss: 0.41815 (+0.00291)
     | > avg_D_mse_gan_real_loss: 0.03527 (+0.00155)
     | > avg_D_mse_gan_fake_loss: 0.04699 (-0.00054)
     | > avg_loss_1: 0.41815 (+0.00291)

Here’s step 9k:

 --> STEP: 325/349 -- GLOBAL_STEP: 9775
     | > G_l1_spec_loss: 0.31023  (0.32699)
     | > G_mse_fake_loss: 0.37565  (0.37749)
     | > G_feat_match_loss: 0.04670  (0.05104)
     | > G_gen_loss: 13.96039  (14.71434)
     | > G_adv_loss: 5.41960  (5.89001)
     | > loss_0: 19.37999  (20.60435)
     | > grad_norm_0: 0.00000  (0.00000)
     | > D_mse_gan_loss: 0.39573  (0.39730)
     | > D_mse_gan_real_loss: 0.04231  (0.04481)
     | > D_mse_gan_fake_loss: 0.03292  (0.03838)
     | > loss_1: 0.39573  (0.39730)
     | > grad_norm_1: 0.00000  (0.00000)
     | > current_lr_0: 5.6521398267571595e-08 
     | > current_lr_1: 5.6521398267571595e-08 
     | > step_time: 1.21540  (1.21563)
     | > loader_time: 0.00150  (0.00164)


 > EVALUATION 


  --> EVAL PERFORMANCE
     | > avg_loader_time: 0.00039 (+0.00005)
     | > avg_G_l1_spec_loss: 0.32615 (+0.00004)
     | > avg_G_mse_fake_loss: 0.36399 (-0.00736)
     | > avg_G_feat_match_loss: 0.05499 (-0.00002)
     | > avg_G_gen_loss: 14.67659 (+0.00172)
     | > avg_G_adv_loss: 6.30315 (-0.00954)
     | > avg_loss_0: 20.97974 (-0.00782)
     | > avg_D_mse_gan_loss: 0.41710 (+0.00061)
     | > avg_D_mse_gan_real_loss: 0.03614 (-0.00112)
     | > avg_D_mse_gan_fake_loss: 0.04358 (+0.00183)
     | > avg_loss_1: 0.41710 (+0.00061)

Top Results From Across the Web

[BUG] HiFiGAN Training w/ Multiple GPUs · Issue #958 · coqui-ai ...

Describe the bug When using distribute.py to train HiFiGAN, you get an error message that says TypeError: get_data_loader() takes 7 positional arguments but ......

arXiv:2110.10139v2 [eess.AS] 3 Mar 2022

CARGAN features fast training, reduced pitch error, and equivalent or improved subjective quality relative to previous GAN-based models.

HiFi-GAN: High-Fidelity Denoising and Dereverberation ...

This paper introduces HiFi-GAN, a deep learning method to transform recorded ... They reduce artifacts over other waveform-based networks, ...

hifi-gan - PyPI

In our paper, we proposed HiFi-GAN: a GAN-based model capable of generating high fidelity speech efficiently.

HiFi-GAN for PyTorch - NVIDIA NGC

HiFi-GAN model implements a spectrogram inversion model that allows to synthesize ... After training, the generator is used for synthesis, ...