question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GravesAttention with Tacotron 1 yields empty alignment plots during training and throwns no attribute error during inference

See original GitHub issue

I’ve trained a model using T1 with GST and GravesAttention. During training, all training and eval alignment plots have been empty (trained 80k+ steps). The model produced audio in the tensorboards, however using the logic from one of the notebooks to evaluate a model and synthesize speech, it threw me the following error: AttributeError: 'GravesAttention' object has no attribute 'init_win_idx'. referring to layers/tacotorn.py --> 478 self.attention.init_win_idx(). I suspect maybe that the Tacotron 1 model is not configured to use GravesAttention because some of those methods defined in layers/tacotron.py do not exist in the GravesAttention class.

Config:

{ “model”: “Tacotron”, “run_name”: “blizzard-gts”, “run_description”: “tacotron GST.”, “audio”: {

"fft_size": 1024, 
"win_length": 1024, 
"hop_length": 256, 
"frame_length_ms": null, 
"frame_shift_ms": null, 


"sample_rate": 24000, 
"preemphasis": 0.0, 
"ref_level_db": 20, 


"do_trim_silence": true, 
"trim_db": 60, 


"power": 1.5, 
"griffin_lim_iters": 60, 


"num_mels": 80, 
"mel_fmin": 95.0, 
"mel_fmax": 12000.0, 
"spec_gain": 20,


"signal_norm": true, 
"min_level_db": -100, 
"symmetric_norm": true, 
"max_norm": 4.0, 
"clip_norm": true, 
"stats_path": null 

},

“distributed”: { “backend”: “nccl”, “url”: “tcp://localhost:54321” },

“reinit_layers”: [],

“batch_size”: 128, “eval_batch_size”: 16, “r”: 7, “gradual_training”: [ [0, 7, 64], [1, 5, 64], [50000, 3, 32], [130000, 2, 32], [290000, 1, 32] ], “mixed_precision”: true,

“loss_masking”: false, “decoder_loss_alpha”: 0.5, “postnet_loss_alpha”: 0.25, “postnet_diff_spec_alpha”: 0.25, “decoder_diff_spec_alpha”: 0.25, “decoder_ssim_alpha”: 0.5, “postnet_ssim_alpha”: 0.25, “ga_alpha”: 5.0, “stopnet_pos_weight”: 15.0,

“run_eval”: true, “test_delay_epochs”: 10, “test_sentences_file”: null,

“noam_schedule”: false, “grad_clip”: 1.0, “epochs”: 300000, “lr”: 0.0001, “wd”: 0.000001, “warmup_steps”: 4000, “seq_len_norm”: false,

“memory_size”: -1, “prenet_type”: “original”, “prenet_dropout”: true,

“attention_type”: “graves”, “attention_heads”: 4, “attention_norm”: “sigmoid”, “windowing”: false, “use_forward_attn”: false, “forward_attn_mask”: false, “transition_agent”: false, “location_attn”: true, “bidirectional_decoder”: false, “double_decoder_consistency”: false, “ddc_r”: 7,

“stopnet”: true, “separate_stopnet”: true,

“print_step”: 25, “tb_plot_step”: 100, “print_eval”: false, “save_step”: 5000, “checkpoint”: true, “tb_model_param_stats”: false,

“text_cleaner”: “phoneme_cleaners”, “enable_eos_bos_chars”: false, “num_loader_workers”: 8, “num_val_loader_workers”: 8, “batch_group_size”: 4, “min_seq_len”: 6, “max_seq_len”: 153, “compute_input_seq_cache”: false, “use_noise_augment”: true,

“output_path”: “/home/big-boy/Models/Blizzard/”,

“phoneme_cache_path”: “/home/big-boy/Models/phoneme_cache/”, “use_phonemes”: true, “phoneme_language”: “en-us”,

“use_speaker_embedding”: false, “use_gst”: true, “use_external_speaker_embedding_file”: false, “external_speaker_embedding_file”: “…/…/speakers-vctk-en.json”, “gst”: { “gst_style_input”: null, “gst_embedding_dim”: 512, “gst_num_heads”: 4, “gst_style_tokens”: 10, “gst_use_speaker_embedding”: false },

“datasets”: [{ “name”: “ljspeech”, “path”: “/Data/blizzard2013/segmented/”, “meta_file_train”: “metadata.csv”, “meta_file_val”: null }] }

Alignment plots:

image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
erogolcommented, Apr 29, 2021

Good to hear that but I am personally not sure if the implementation is right comparing to this paper https://arxiv.org/abs/1910.10288

AFAIK this is the most robust Graves attention so far proposed for TTS. It may be wrong.

Itd be nice if you could double check.

0reactions
a-froghyarcommented, May 17, 2021

Closing this because the no attribute bug was fixed in https://github.com/coqui-ai/TTS/pull/479 and GMM (Graves) Attention will be looked at in a separate discussion.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Regotron: Regularizing the Tacotron2 architecture via ... - arXiv
Our method augments the vanilla Tacotron2 objective function with an additional term, which penalizes non-monotonic alignments in the ...
Read more >
Gated Recurrent Attention for Multi-Style Speech Synthesis
The attention alignment plots during the early stage of training the Tacotron2-GST with guided attention and decaying guided attention can be found in...
Read more >
FPETS : Fully Parallel End-to-End Text-to-Speech System
In this paper, we propose a novel non-autoregressive, fully parallel end-to-end TTS system (FPETS). It utilizes a new alignment model and the recently ......
Read more >
Transfer Learning in Speech Synthesis - UEF eRepo
This paper is an attempt to work towards a unified neural network approach for the issue of speaker adaptation. In recent years, machine ......
Read more >
Effective and direct control of neural TTS prosody by removing ...
attribute as a random variable in the latent space. ... speech data for training a neural TTS is usually expensive and time consuming....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found