question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ?

See original GitHub issue

Describe your question I just started off learning Nemo for ASR activities and getting exception if I send a different wav file to convert into text. Could you please share what pre-processing has to be performed for any other different wav file/format than an4 dataset

A clear and concise description of your question. Describe what you want to achieve. And/or what NeMo APIs are unclear/confusing. I am trying to send a wav file of < 20 sec duration to get the text output from the quartznet model. Here is a sample code:

files = [‘my_sample.wav’] for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): print(f"Audio in {fname} was recognized as: {transcription}")

After this, I get below exception.


RuntimeError Traceback (most recent call last) <ipython-input-53-f51e6e675965> in <module>() 1 files = [‘my_sample.wav’] ----> 2 for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): 3 print(f"Audio in {fname} was recognized as: {transcription}")

14 frames /usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in transcribe(self, paths2audio_files, batch_size, logprobs) 158 for test_batch in temporary_datalayer: 159 logits, logits_len, greedy_predictions = self.forward( –> 160 input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device) 161 ) 162 if logprobs:

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs) 509 510 # Call the method - this can be forward, or any other callable method –> 511 outputs = wrapped(*args, **kwargs) 512 513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in forward(self, input_signal, input_signal_length, processed_signal, processed_signal_length) 394 if not has_processed_signal: 395 processed_signal, processed_signal_length = self.preprocessor( –> 396 input_signal=input_signal, length=input_signal_length, 397 ) 398

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 725 result = self._slow_forward(*input, **kwargs) 726 else: –> 727 result = self.forward(*input, **kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs) 509 510 # Call the method - this can be forward, or any other callable method –> 511 outputs = wrapped(*args, **kwargs) 512 513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in forward(self, input_signal, length) 77 @torch.no_grad() 78 def forward(self, input_signal, length): —> 79 processed_signal, processed_length = self.get_features(input_signal, length) 80 81 return processed_signal, processed_length

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in get_features(self, input_signal, length) 247 248 def get_features(self, input_signal, length): –> 249 return self.featurizer(input_signal, length) 250 251 @property

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 725 result = self._slow_forward(*input, **kwargs) 726 else: –> 727 result = self.forward(*input, **kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in forward(self, x, seq_len) 345 # disable autocast to get full range of stft values 346 with torch.cuda.amp.autocast(enabled=False): –> 347 x = self.stft(x) 348 349 # torch returns real, imag; so convert to magnitude

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in <lambda>(x) 273 win_length=self.win_length, 274 center=True, –> 275 window=self.window.to(dtype=torch.float), 276 ) 277

/usr/local/lib/python3.6/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex) 511 extended_shape = [1] * (3 - signal_dim) + list(input.size()) 512 pad = int(n_fft // 2) –> 513 input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) 514 input = input.view(input.shape[-signal_dim:]) 515 return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value) 3557 assert len(pad) == 2, ‘3D tensors expect 2 values for padding’ 3558 if mode == ‘reflect’: -> 3559 return torch._C._nn.reflection_pad1d(input, pad) 3560 elif mode == ‘replicate’: 3561 return torch._C._nn.replication_pad1d(input, pad)

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)] Collab
  • Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install. import nemo import nemo.collections.asr as nemo_asr
  • If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don’t need to specify these. Otherwise, please provide:

  • OS version
  • PyTorch version
  • Python version

Additional context

Add any other context about the problem here. Example: GPU model

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
rbraccocommented, Oct 13, 2021

I had the same error. It was due to my microphone being stereo (2 channel) and 44.1Khz instead of mono (1 channel) and 16Khz as required.

You can check the sample_rate and resample if needed using torchaudio

import torchaudio

y, sr = torchaudio.load('my_sample.wav')
y = y.mean(dim=0) # if there are multiple channels, average them to single channel
if sr != 16000:
    resampler = torchaudio.transforms.Resample(sr, 16000)
    y_resampled = resampler(y)
torchaudio.save('my_sample_resampled.wav', y, sr)

files = ['my_sample_resampled.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")
2reactions
sheecegardezicommented, Jan 2, 2022

For me this error was being generated because the wav file had stereo channels. I needed to convert the file to mono channel:

from pydub import AudioSegment
file_path = "input_sound_file.wav"
sound = AudioSegment.from_wav(file_path)
sound = sound.set_channels(1)
sound.export(file_path, format="wav")

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why am I getting calculated padding input size per channel ...
My input is a 6x6 numpy array and I get the following error, any idea why? RuntimeError: Calculated padded input size per channel:...
Read more >
How to Grid Search Hyperparameters for Deep Learning ...
How to grid search common neural network parameters, such as learning rate, dropout rate, epochs, and number of neurons; How to define your...
Read more >
gcc(1) - Linux manual page - man7.org
The usual way to run GCC is to run the executable called gcc, or machine-gcc when cross-compiling, or machine-gcc-version to run a specific...
Read more >
NVIDIA Deep Learning TensorRT Documentation
Abstract. This NVIDIA TensorRT Developer Guide demonstrates how to use the C++ and Python APIs for implementing the most common deep learning layers....
Read more >
Character Handling in Fortran
In Fortran 2003 character variables could be declared allocatable, that is to have a length that can be varied at run-time.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found