ljspeech fast_speech recipe for en_US Fails With assert len(out_dict["token_ids"]) > 0
See original GitHub issueDescribe the bug
I am training fast_speech on the ljspeech en_US set using this recipe (on dev branch).
The ljspeech is merged into a single set by using these commands as described here:
cat ../by_book/*/*/*/metadata.csv >> metadata.csv
mkdir wavs
cp ../by_book/*/*/*/wavs/* wavs/
I am getting this exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1492, in fit
self._fit()
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1476, in _fit
self.train_epoch()
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1254, in train_epoch
for cur_step, batch in enumerate(self.train_loader):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1204, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 4.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 180, in __getitem__
return self.load_data(idx)
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 230, in load_data
token_ids = self.get_token_ids(idx, item["text"])
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 213, in get_token_ids
token_ids = self.get_phonemes(idx, text)["token_ids"]
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 198, in get_phonemes
assert len(out_dict["token_ids"]) > 0
AssertionError
To Reproduce
Get the data:
cd /data/
wget https://data.solak.de/data/Training/stt_tts/en_US.tgz
tar zxf en_US.tgz
rm en_US.tgz
cd en_US/
mkdir ljspeech
cd ljspeech/
cat ../by_book/*/*/*/metadata.csv >> metadata.csv
mkdir wavs
cp ../by_book/*/*/*/wavs/* wavs/
(if the cp
throws too many arguments
, then just run it by speaker and by book if necessary: cp ../by_book/female/*/*/wavs/* wavs/
)
Then check if the files exist with python script:
import os
with open('metadata.csv') as f:
for line in f:
file_name = line.split('|')[0]
file_exists = os.path.exists('wavs/'+file_name+'.wav')
if not file_exists:
print(file_name)
Manually remove non-existing files from the metadata.csv
(there are three missing files at this time).
Point train_fast_speech.py to the data:
path="/data/en_US/ljspeech/"
Change the sample rate:
sample_rate=16000
Run the code:
cd TTS
export CUDA_VISIBLE_DEVICES=4
python3 recipes/ljspeech/fast_speech/train_fast_speech.py
After Pre-computing phonemes
completes, it fails to train: [1m --> STEP: 0/1433 -- GLOBAL_STEP: 0
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1492, in fit
self._fit()
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1476, in _fit
self.train_epoch()
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1254, in train_epoch
for cur_step, batch in enumerate(self.train_loader):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1204, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 4.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 180, in __getitem__
return self.load_data(idx)
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 230, in load_data
token_ids = self.get_token_ids(idx, item["text"])
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 213, in get_token_ids
token_ids = self.get_phonemes(idx, text)["token_ids"]
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 198, in get_phonemes
assert len(out_dict["token_ids"]) > 0
AssertionError
Expected behavior
No exception.
Logs
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1492, in fit
self._fit()
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1476, in _fit
self.train_epoch()
File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1254, in train_epoch
for cur_step, batch in enumerate(self.train_loader):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1204, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 4.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 180, in __getitem__
return self.load_data(idx)
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 230, in load_data
token_ids = self.get_token_ids(idx, item["text"])
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 213, in get_token_ids
token_ids = self.get_phonemes(idx, text)["token_ids"]
File "/apps/tts/TTS/TTS/tts/datasets/dataset.py", line 198, in get_phonemes
assert len(out_dict["token_ids"]) > 0
AssertionError
Environment
{
"CUDA": {
"GPU": [
"NVIDIA A100-SXM4-40GB"
],
"available": true,
"version": "11.5"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.11.0+cu115",
"TTS": "0.6.2",
"numpy": "1.21.6"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.7.13",
"version": "#61~18.04.3-Ubuntu SMP Fri Oct 1 14:04:01 UTC 2021"
}
}
Additional context
No response
Issue Analytics
- State:
- Created a year ago
- Comments:9 (7 by maintainers)
Top Results From Across the Web
en_US LJSpeech FastSpeech + MB Melgan Training Quality ...
I am training on en_US LJ Speech Dataset, both FastSpeech and MB Melgan. While testing the wav file comes out with just some...
Read more >ljspeech.fastspeech.v1 | espnet-tts-sample
This is tts demo of The LJ Speech Dataset [0]. tts1 recipe. tts1 recipe is based on Tacotron2 [1] (spectrogram prediction network) w/o...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@p0p4k thank you for the pointer!
I added this code:
to the
__getitem__
right afterper your suggestion and found the item causing the issue (after the tokenization step, it was just whitespaces). After removing it manually from
metadata.csv
, it is working again.Check if the phonemes cache folder actually exists first. Then in the dataset.py file, you can try to debug by printing out the “ids” the
__getitem__
method or thecompute_or_load
method ofPhonemeDataset
class. Report if the ids are printed or not. You can make the debug quicker by using a temp_metadata.txt file with just a few wav|text lines (~5-10 lines). Try it and report back, good luck.