[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ?
See original GitHub issueDescribe your question I just started off learning Nemo for ASR activities and getting exception if I send a different wav file to convert into text. Could you please share what pre-processing has to be performed for any other different wav file/format than an4 dataset
A clear and concise description of your question. Describe what you want to achieve. And/or what NeMo APIs are unclear/confusing. I am trying to send a wav file of < 20 sec duration to get the text output from the quartznet model. Here is a sample code:
files = [‘my_sample.wav’] for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): print(f"Audio in {fname} was recognized as: {transcription}")
After this, I get below exception.
RuntimeError Traceback (most recent call last) <ipython-input-53-f51e6e675965> in <module>() 1 files = [‘my_sample.wav’] ----> 2 for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): 3 print(f"Audio in {fname} was recognized as: {transcription}")
14 frames /usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in transcribe(self, paths2audio_files, batch_size, logprobs) 158 for test_batch in temporary_datalayer: 159 logits, logits_len, greedy_predictions = self.forward( –> 160 input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device) 161 ) 162 if logprobs:
/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs) 509 510 # Call the method - this can be forward, or any other callable method –> 511 outputs = wrapped(*args, **kwargs) 512 513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in forward(self, input_signal, input_signal_length, processed_signal, processed_signal_length) 394 if not has_processed_signal: 395 processed_signal, processed_signal_length = self.preprocessor( –> 396 input_signal=input_signal, length=input_signal_length, 397 ) 398
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 725 result = self._slow_forward(*input, **kwargs) 726 else: –> 727 result = self.forward(*input, **kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(),
/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs) 509 510 # Call the method - this can be forward, or any other callable method –> 511 outputs = wrapped(*args, **kwargs) 512 513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in forward(self, input_signal, length) 77 @torch.no_grad() 78 def forward(self, input_signal, length): —> 79 processed_signal, processed_length = self.get_features(input_signal, length) 80 81 return processed_signal, processed_length
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in get_features(self, input_signal, length) 247 248 def get_features(self, input_signal, length): –> 249 return self.featurizer(input_signal, length) 250 251 @property
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 725 result = self._slow_forward(*input, **kwargs) 726 else: –> 727 result = self.forward(*input, **kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(),
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 24 def decorate_context(*args, **kwargs): 25 with self.class(): —> 26 return func(*args, **kwargs) 27 return cast(F, decorate_context) 28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in forward(self, x, seq_len) 345 # disable autocast to get full range of stft values 346 with torch.cuda.amp.autocast(enabled=False): –> 347 x = self.stft(x) 348 349 # torch returns real, imag; so convert to magnitude
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in <lambda>(x) 273 win_length=self.win_length, 274 center=True, –> 275 window=self.window.to(dtype=torch.float), 276 ) 277
/usr/local/lib/python3.6/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex) 511 extended_shape = [1] * (3 - signal_dim) + list(input.size()) 512 pad = int(n_fft // 2) –> 513 input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) 514 input = input.view(input.shape[-signal_dim:]) 515 return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value) 3557 assert len(pad) == 2, ‘3D tensors expect 2 values for padding’ 3558 if mode == ‘reflect’: -> 3559 return torch._C._nn.reflection_pad1d(input, pad) 3560 elif mode == ‘replicate’: 3561 return torch._C._nn.replication_pad1d(input, pad)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)] Collab
- Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install. import nemo import nemo.collections.asr as nemo_asr
- If method of install is [Docker], provide
docker pull
&docker run
commands used
Environment details
If NVIDIA docker image is used you don’t need to specify these. Otherwise, please provide:
- OS version
- PyTorch version
- Python version
Additional context
Add any other context about the problem here. Example: GPU model
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
I had the same error. It was due to my microphone being stereo (2 channel) and 44.1Khz instead of mono (1 channel) and 16Khz as required.
You can check the sample_rate and resample if needed using torchaudio
For me this error was being generated because the wav file had stereo channels. I needed to convert the file to mono channel: