[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ?

See original GitHub issue

Describe your question I just started off learning Nemo for ASR activities and getting exception if I send a different wav file to convert into text. Could you please share what pre-processing has to be performed for any other different wav file/format than an4 dataset

A clear and concise description of your question. Describe what you want to achieve. And/or what NeMo APIs are unclear/confusing. I am trying to send a wav file of < 20 sec duration to get the text output from the quartznet model. Here is a sample code:

files = [‘my_sample.wav’] for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): print(f"Audio in {fname} was recognized as: {transcription}")

After this, I get below exception.

RuntimeError Traceback (most recent call last) <ipython-input-53-f51e6e675965> in <module>() 1 files = [‘my_sample.wav’] ----> 2 for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): 3 print(f"Audio in {fname} was recognized as: {transcription}")

14 frames /usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in transcribe(self, paths2audio_files, batch_size, logprobs) 158 for test_batch in temporary_datalayer: 159 logits, logits_len, greedy_predictions = self.forward( –> 160 input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device) 161 ) 162 if logprobs:

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs) 509 510 # Call the method - this can be forward, or any other callable method –> 511 outputs = wrapped(*args, **kwargs) 512 513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in forward(self, input_signal, input_signal_length, processed_signal, processed_signal_length) 394 if not has_processed_signal: 395 processed_signal, processed_signal_length = self.preprocessor( –> 396 input_signal=input_signal, length=input_signal_length, 397 ) 398

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 725 result = self._slow_forward(*input, **kwargs) 726 else: –> 727 result = self.forward(*input, **kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs) 509 510 # Call the method - this can be forward, or any other callable method –> 511 outputs = wrapped(*args, **kwargs) 512 513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in forward(self, input_signal, length) 77 @torch.no_grad() 78 def forward(self, input_signal, length): —> 79 processed_signal, processed_length = self.get_features(input_signal, length) 80 81 return processed_signal, processed_length

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in get_features(self, input_signal, length) 247 248 def get_features(self, input_signal, length): –> 249 return self.featurizer(input_signal, length) 250 251 @property

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 725 result = self._slow_forward(*input, **kwargs) 726 else: –> 727 result = self.forward(*input, **kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in forward(self, x, seq_len) 345 # disable autocast to get full range of stft values 346 with torch.cuda.amp.autocast(enabled=False): –> 347 x = self.stft(x) 348 349 # torch returns real, imag; so convert to magnitude

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in <lambda>(x) 273 win_length=self.win_length, 274 center=True, –> 275 window=self.window.to(dtype=torch.float), 276 ) 277

/usr/local/lib/python3.6/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex) 511 extended_shape = [1] * (3 - signal_dim) + list(input.size()) 512 pad = int(n_fft // 2) –> 513 input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) 514 input = input.view(input.shape[-signal_dim:]) 515 return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value) 3557 assert len(pad) == 2, ‘3D tensors expect 2 values for padding’ 3558 if mode == ‘reflect’: -> 3559 return torch._C._nn.reflection_pad1d(input, pad) 3560 elif mode == ‘replicate’: 3561 return torch._C._nn.replication_pad1d(input, pad)

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]

Environment overview (please complete the following information)

Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)] Collab
Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install. import nemo import nemo.collections.asr as nemo_asr
If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don’t need to specify these. Otherwise, please provide:

OS version
PyTorch version
Python version

Additional context

Add any other context about the problem here. Example: GPU model

Issue Analytics

State:
Created 3 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

4reactions

rbraccocommented, Oct 13, 2021

I had the same error. It was due to my microphone being stereo (2 channel) and 44.1Khz instead of mono (1 channel) and 16Khz as required.

You can check the sample_rate and resample if needed using torchaudio

import torchaudio

y, sr = torchaudio.load('my_sample.wav')
y = y.mean(dim=0) # if there are multiple channels, average them to single channel
if sr != 16000:
    resampler = torchaudio.transforms.Resample(sr, 16000)
    y_resampled = resampler(y)
torchaudio.save('my_sample_resampled.wav', y, sr)

files = ['my_sample_resampled.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")

2reactions

sheecegardezicommented, Jan 2, 2022

For me this error was being generated because the wav file had stereo channels. I needed to convert the file to mono channel:

from pydub import AudioSegment
file_path = "input_sound_file.wav"
sound = AudioSegment.from_wav(file_path)
sound = sound.set_channels(1)
sound.export(file_path, format="wav")

Top Results From Across the Web

Why am I getting calculated padding input size per channel ...

My input is a 6x6 numpy array and I get the following error, any idea why? RuntimeError: Calculated padded input size per channel:...

How to Grid Search Hyperparameters for Deep Learning ...

How to grid search common neural network parameters, such as learning rate, dropout rate, epochs, and number of neurons; How to define your...

gcc(1) - Linux manual page - man7.org

The usual way to run GCC is to run the executable called gcc, or machine-gcc when cross-compiling, or machine-gcc-version to run a specific...

NVIDIA Deep Learning TensorRT Documentation

Abstract. This NVIDIA TensorRT Developer Guide demonstrates how to use the C++ and Python APIs for implementing the most common deep learning layers....

Character Handling in Fortran

In Fortran 2003 character variables could be declared allocatable, that is to have a length that can be varied at run-time.