[Bug]: Training on CommonVoice Speech classification recipe crashes with AssertionError assert len(target_shape) == tensor.ndim
See original GitHub issueDescribe the bug
This is similar to the issue: https://github.com/speechbrain/speechbrain/issues/651
However, that was for ASR, this is speaker classification.
Training crashed with the error given in the log:
Expected behaviour
Pass the epoch
To Reproduce
Converted the commonvoice audio files to wav format before processing using train.py
Versions
I’m using the commonvoice German data
Relevant log output
valid_loader_kwargs=hparams["dataloader_options"],
File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\core.py", line 1156, in fit
self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\core.py", line 1008, in _fit_train
for batch in t:
File "C:\Python\Python37\lib\site-packages\tqdm\std.py", line 1195, in __iter__
for obj in iterable:
File "C:\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
data = self._next_data()
File "C:\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Python\Python37\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
return self.collate_fn(data)
File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\dataio\batch.py", line 125, in __init__
padded = PaddedData(*padding_func(values, **padding_kwargs))
File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\utils\data_utils.py", line 445, in batch_pad_right
t, max_shape, mode=mode, value=value
File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\utils\data_utils.py", line 372, in pad_right_to
assert len(target_shape) == tensor.ndim
AssertionError
Additional context
No response
Issue Analytics
- State:
- Created 10 months ago
- Comments:6
Top Results From Across the Web
Training on CommonVoice standard recipe crashes · Issue #651
I was training ASR model using CommonVoice recipe from here: ... in pad_right_to assert len(target_shape) == tensor.ndim AssertionError.
Read more >using pandas dataframes fill rows based on condition if both values ...
Assertion error while reading csv with delimter ¶ · Missing column value when merge tables in python · Python requests waiting for js...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I added the code and it seems to have worked. Got through the iteration.
Thank you so much @AsuMagic and @TParcollet for helping me understand the signal. From the linked issue I was not able to figure out what ‘channels’ represent. So the increase in dimension is basically an increase in the number of channels.
Thanks again! Closing the issue!
If you’re referring to the first and last few values, not really. Those are just close to zero and about what I’d expect for normalized float audio, it’s just silence here.
The issue linked gives a solution. You could try adding something like this after the
read_audio
line:The shape of a signal in mono is
(number_of_samples,)
.The shape of a signal in stereo, as can be seen from the printed value here, is
(number_of_samples, number_of_channels)
.So if the signal tensor has a second dimension we can assume it’s the number of channels. Taking the mean of both channels is a straightforward and usual way to downmix stereo to mono.