Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Text-to-Speech problem

See original GitHub issue

Hi, I am trying to use this model with fairseq: https://huggingface.co/facebook/tts_transformer-zh-cv7_css10 I am using the following code snippet for the model download and initialization:

from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/tts_transformer-zh-cv7_css10",
    arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0]
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(model, cfg)

But running this code, I just get an error: “TTSTransformerModel object is not subscriptable”. If I replace model with models in the build_generator, like this:

generator = task.build_generator(models, cfg)

then the initialization will go through, but then I will get an error during the Text-to-Speech process. The code snippet for this part is the following:

text = "您好，这是试运行。"
sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)

And I will get the following error within the get_prediction: “Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same”.

Any suggestion or idea how I can get this work? Or where can I find an example? Thanks!

The environment is the following:

fairseq Version: main
PyTorch Version: 1.8.1+cu111
OS (e.g., Linux): Ubuntu 18.04
How you installed fairseq (pip, source): from source
Build command you used (if compiling from source): pip install --editable ./
Python version: 3.7.12
CUDA/cuDNN version: 11.1.1
GPU models and configuration: NVIDIA A100

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:9 (2 by maintainers)

Top GitHub Comments

3reactions

xeguloncommented, Mar 21, 2022

Finally this worked (the problem was related to IPython.display.Audio not liking the wav variable was on device cuda:0):

0reactions

xeguloncommented, Mar 21, 2022

Making some progress, but not too much: @kahne

Top Results From Across the Web

Text To Speech not working properly in Android Studio

I have recently created a simple project in android studio using speech recognition and text to speech but the problem is that the...

The text-to-speech problem (Chapter 3)

We will now use this framework to explain how text-to-speech can be performed. In TTS, the input is writing and the output speech....

Resolving Text to Speech Issues - Voice Elements

In depth article on issues with Text to Speech causing the app to crash including typical issues, root causes, and suggested solutions.

Correcting a Samsung Device Text-to Speech Problem

There is a rare problem in some Samsung devices which crashes apps attempting to use the text-to speak function. Samsung devices which use ......

TextToSpeech - Android Developers

Broadcast Action: The TextToSpeech synthesizer has completed processing of all the text in the speech queue. int, ERROR. Denotes a generic operation failure ......