Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug]: Training on CommonVoice Speech classification recipe crashes with AssertionError assert len(target_shape) == tensor.ndim

See original GitHub issue

Describe the bug

This is similar to the issue: https://github.com/speechbrain/speechbrain/issues/651

However, that was for ASR, this is speaker classification.

Training crashed with the error given in the log:

Expected behaviour

Pass the epoch

To Reproduce

Converted the commonvoice audio files to wav format before processing using train.py

Versions

I’m using the commonvoice German data

Relevant log output

valid_loader_kwargs=hparams["dataloader_options"],
  File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\core.py", line 1156, in fit
    self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
  File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\core.py", line 1008, in _fit_train
    for batch in t:
  File "C:\Python\Python37\lib\site-packages\tqdm\std.py", line 1195, in __iter__
    for obj in iterable:
  File "C:\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
    data = self._next_data()
  File "C:\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Python\Python37\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\dataio\batch.py", line 125, in __init__
    padded = PaddedData(*padding_func(values, **padding_kwargs))
  File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\utils\data_utils.py", line 445, in batch_pad_right
    t, max_shape, mode=mode, value=value
  File "E:\Study\Thesis\Voice print\speechbrain\speechbrain\utils\data_utils.py", line 372, in pad_right_to
    assert len(target_shape) == tensor.ndim
AssertionError

Additional context

No response

Issue Analytics

State:
Created 10 months ago
Comments:6

Top GitHub Comments

1reaction

praveenmathew93commented, Dec 1, 2022

I added the code and it seems to have worked. Got through the iteration.

Thank you so much @AsuMagic and @TParcollet for helping me understand the signal. From the linked issue I was not able to figure out what ‘channels’ represent. So the increase in dimension is basically an increase in the number of channels.

Thanks again! Closing the issue!

1reaction

AsuMagiccommented, Dec 1, 2022

Plus the values look weird.

If you’re referring to the first and last few values, not really. Those are just close to zero and about what I’d expect for normalized float audio, it’s just silence here.

Is there a way I can accommodate them?

The issue linked gives a solution. You could try adding something like this after the read_audio line:

if sig.dim() > 1:
    sig = torch.mean(sig, dim=1)

The shape of a signal in mono is (number_of_samples,).
The shape of a signal in stereo, as can be seen from the printed value here, is (number_of_samples, number_of_channels).

So if the signal tensor has a second dimension we can assume it’s the number of channels. Taking the mean of both channels is a straightforward and usual way to downmix stereo to mono.