question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Textless NLP / GSLM: Speech resynthesis produces something unrelated to source speech

See original GitHub issue

What is your question?

As far as I understand, examples/textless_nlp/gslm/tools/resynthesize_speech.py should take a speech sample (audio), encode it to units, and generate output speech from these units. The output speech should resemble the input sample.

However, when I do this with the released pre-trained models, output is gibberish that doesn’t sound like input at all.

I attach the samples and steps I took. Is there anything I do is wrong?

Thank you!

Code

  1. Download pre-trained models (HuBERT-km200 in this example):
mkdir -p /content/speech/hubert200
cd /content/speech/hubert200
wget https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt -nc 
wget https://dl.fbaipublicfiles.com/textless_nlp/gslm/hubert/km200/km.bin -nc
wget https://dl.fbaipublicfiles.com/textless_nlp/gslm/hubert/tts_km200/tts_checkpoint_best.pt -nc 
wget https://dl.fbaipublicfiles.com/textless_nlp/gslm/waveglow_256channels_new.pt -nc
  1. Generate the code_dict.txt file. I didn’t find “official” description of how to do it, so I used this comment. Note that if I use dict of size 199 or 200, the models will fail
with open("code_dict.txt", "wt") as f:
    for i in range(1, 199):   # Effectively 198 items
        f.write(str(i) + "\n")
  1. Download and convert source audio sample from the speech resynthesis example site:
wget https://speechbot.github.io/resynthesis/audio/teaser/p269_182.mp3 -nc
ffmpeg -y -i p269_182.mp3 sample.input.wav
  1. Run resynthesis:
export FAIRSEQ_ROOT=/home/ubuntu/fairseq
export DATA=/content/speech/hubert200
export TYPE=hubert

echo sample.input.wav > input.txt
echo sample.out.layer5.wav >> input.txt

PYTHONPATH=${FAIRSEQ_ROOT}:${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech python ${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/tools/resynthesize_speech.py \
    --feature_type $TYPE \
    --layer 5 \
    --acoustic_model_path $DATA/hubert_base_ls960.pt \
    --kmeans_model_path $DATA/km.bin \
    --tts_model_path $DATA/tts_checkpoint_best.pt \
    --code_dict_path $DATA/code_dict.txt \
    --waveglow_path $DATA/waveglow_256channels_new.pt \
    --max_decoder_steps 1000 < input.txt
  1. Check the result (in the attachement ). It doesn’t sound like the original audio at all.

What have you tried?

I tried to run resynthesis with different number of units, taking different HuBERT layer for features, different audio, and different offsets for code_dict.txt

In addition to steps outlined above, I tried to generate speech with units2speech directly from units in devset. It still produces gibberish. This makes me think that the problem may lie in bad pre-trained tts checkpoint.

What’s your environment?

  • fairseq Version (e.g., 1.0 or main): main
  • PyTorch Version (e.g., 1.0) 1.9.1
  • OS (e.g., Linux): Ubuntu 18.04
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): pip install -e .
  • Python version: 3.7.0
  • CUDA/cuDNN version: cuda_11.1.TC455_06.29190527_0
  • GPU models and configuration: Tesla V100-SXM2
  • Any other relevant information:

samples.zip contains generates samples - both audio and units.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

4reactions
asivokoncommented, Oct 28, 2021

@eugene-kharitonov, the updated checkpoint df4a9c6f works great!

Having spent over a week in futile attempts to reproduce the results, this newly generated sample sounds like heavenly music to my ears – just can’t stop listening to it! 😃

Now, all other checkpoints I tried (hubert200, hubert500, logmel-100) had the same problem with generating gibberish. Could you please double check if those files (likely, all other TTS checkpoints) should be re-released as well?

Thanks a lot for an impressive piece of work!

2reactions
eugene-kharitonovcommented, Oct 29, 2021

@asivokon I’ve updated TTS checkpoints + provided code_dict files and manually verified that a few of the checkpoints work. @bradgrimm unfortunately, it seems we don’t have a good Hubert500 model. As those were not used in the paper, we decided not to support the case of 500 unit models. Sorry about the confusion.

Thanks for your help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Textless NLP / GSLM: Speech resynthesis produces ... - GitHub
Textless NLP / GSLM: Speech resynthesis produces something unrelated to source speech · Issue #3970 · facebookresearch/fairseq · GitHub.
Read more >
Textless NLP: Generating expressive speech from raw audio
GSLM leverages recent breakthroughs in representation learning, allowing it to work directly from only raw audio signals, without any labels or text.
Read more >
textless NLP project
The Textless NLP project​​ Independently, recent breakthrough in representation learning has yielded models able to discover discrete units from raw audio ...
Read more >
ISCA Archive - International Speech Communication Association
Takayuki Arai (2019), Sound sources used in speech production research with physical models of the human vocal tract, HSCR
Read more >
Emmanuel Dupoux Home Page
Phonological 'deafnesses' in speech perception: acquisition and plasticity. ... Textless Speech Emotion Conversion using Decomposed and Discrete ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found