Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: Calculated padded input size per channel: (1). Kernel size: (3). Kernel size can't be greater than actual input size

See original GitHub issue

Code sample

nohup python fairseq_cli/hydra_train.py task.data=/datadrive/ASR/training_data model.w2v_path=/datadrive/ASR/model/checkpoint_best.pt --config-path /home/rashwan/ASR/fairseq/examples/wav2vec/config/finetuning --config-name base_100h > /datadrive/ASR/CTC_Model/train.log

Configuration

@package group

common: fp16: false log_format: json log_interval: 200

checkpoint: no_epoch_checkpoints: true best_checkpoint_metric: wer save_dir: /datadrive/ASR/CTC_Model task: _name: audio_pretraining data: ??? normalize: false labels: ltr

dataset: num_workers: 6 max_tokens: 750000 skip_invalid_size_inputs_valid_test: true valid_subset: valid

distributed_training: ddp_backend: no_c10d distributed_world_size: 4

criterion: _name: ctc zero_infinity: true

optimization: max_update: 80000 lr: [0.00003] sentence_avg: true update_freq: [4]

optimizer: _name: adam adam_betas: (0.9,0.98) adam_eps: 1e-08

lr_scheduler: _name: tri_stage phase_ratio: [0.1, 0.4, 0.5] final_lr_scale: 0.05

model: _name: wav2vec_ctc w2v_path: ??? apply_mask: true mask_prob: 0.65 mask_channel_prob: 0.5 mask_channel_length: 64 layerdrop: 0.1 activation_dropout: 0.1 feature_grad_mult: 0.0 freeze_finetune_updates: 0

Error

2020-11-26 11:25:00 | INFO | fairseq_cli.train | task: AudioPretrainingTask 2020-11-26 11:25:00 | INFO | fairseq_cli.train | model: Wav2VecCtc 2020-11-26 11:25:00 | INFO | fairseq_cli.train | criterion: CtcCriterion) 2020-11-26 11:25:00 | INFO | fairseq_cli.train | num. model params: 94420159 (num. trained: 94420159) 2020-11-26 11:25:00 | INFO | fairseq.trainer | detected shared parameter: w2v_encoder.w2v_model.feature_extractor.conv_layers.0.0.bias <- w2v_encoder.w2v_model.feature_extractor.conv_layers.1.0.bias 2020-11-26 11:25:00 | INFO | fairseq.trainer | detected shared parameter: w2v_encoder.w2v_model.feature_extractor.conv_layers.0.0.bias <- w2v_encoder.w2v_model.feature_extractor.conv_layers.2.0.bias 2020-11-26 11:25:00 | INFO | fairseq.trainer | detected shared parameter: w2v_encoder.w2v_model.feature_extractor.conv_layers.0.0.bias <- w2v_encoder.w2v_model.feature_extractor.conv_layers.3.0.bias 2020-11-26 11:25:00 | INFO | fairseq.trainer | detected shared parameter: w2v_encoder.w2v_model.feature_extractor.conv_layers.0.0.bias <- w2v_encoder.w2v_model.feature_extractor.conv_layers.4.0.bias 2020-11-26 11:25:00 | INFO | fairseq.trainer | detected shared parameter: w2v_encoder.w2v_model.feature_extractor.conv_layers.0.0.bias <- w2v_encoder.w2v_model.feature_extractor.conv_layers.5.0.bias 2020-11-26 11:25:00 | INFO | fairseq.trainer | detected shared parameter: w2v_encoder.w2v_model.feature_extractor.conv_layers.0.0.bias <- w2v_encoder.w2v_model.feature_extractor.conv_layers.6.0.bias 2020-11-26 11:25:01 | INFO | fairseq.utils | CUDA enviroments for all 4 workers 2020-11-26 11:25:01 | INFO | fairseq.utils | rank 0: capabilities = 3.7 ; total memory = 11.173 GB ; name = Tesla K80
2020-11-26 11:25:01 | INFO | fairseq.utils | rank 1: capabilities = 3.7 ; total memory = 11.173 GB ; name = Tesla K80
2020-11-26 11:25:01 | INFO | fairseq.utils | rank 2: capabilities = 3.7 ; total memory = 11.173 GB ; name = Tesla K80
2020-11-26 11:25:01 | INFO | fairseq.utils | rank 3: capabilities = 3.7 ; total memory = 11.173 GB ; name = Tesla K80
2020-11-26 11:25:01 | INFO | fairseq.utils | CUDA enviroments for all 4 workers 2020-11-26 11:25:01 | INFO | fairseq_cli.train | training on 4 devices (GPUs/TPUs) 2020-11-26 11:25:01 | INFO | fairseq_cli.train | max tokens per GPU = 750000 and batch size per GPU = None 2020-11-26 11:25:01 | INFO | fairseq.trainer | no existing checkpoint found /datadrive/ASR/CTC_Model/checkpoint_last.pt 2020-11-26 11:25:01 | INFO | fairseq.trainer | loading train data for epoch 1 2020-11-26 11:25:01 | INFO | fairseq.data.audio.raw_audio_dataset | loaded 460997, skipped 0 samples 2020-11-26 11:25:02 | INFO | fairseq.optim.adam | using FusedAdam 2020-11-26 11:25:02 | INFO | fairseq.trainer | begin training epoch 1 2020-11-26 11:40:34 | INFO | train_inner | {“epoch”: 1, “update”: 0.036, “loss”: “2264.27”, “ntokens”: “6405.06”, “nsentences”: “59.175”, “nll_loss”: “20.919”, “wps”: “1401.2”, “ups”: “0.22”, “wpb”: “6405.1”, “bsz”: “59.2”, “num_updates”: “200”, “lr”: “1.0425e-06”, “gnorm”: “8029.41”, “train_wall”: “917”, “wall”: “934”} Traceback (most recent call last): File “fairseq_cli/hydra_train.py”, line 70, in <module> cli_main() File “fairseq_cli/hydra_train.py”, line 66, in cli_main hydra_main() File “/anaconda/envs/ASR/lib/python3.7/site-packages/hydra/main.py”, line 37, in decorated_main strict=strict, File “/anaconda/envs/ASR/lib/python3.7/site-packages/hydra/_internal/utils.py”, line 347, in _run_hydra lambda: hydra.run( File “/anaconda/envs/ASR/lib/python3.7/site-packages/hydra/_internal/utils.py”, line 201, in run_and_report raise ex File “/anaconda/envs/ASR/lib/python3.7/site-packages/hydra/_internal/utils.py”, line 198, in run_and_report return func() File “/anaconda/envs/ASR/lib/python3.7/site-packages/hydra/_internal/utils.py”, line 350, in <lambda> overrides=args.overrides, File “/anaconda/envs/ASR/lib/python3.7/site-packages/hydra/_internal/hydra.py”, line 112, in run configure_logging=with_log_configuration, File “/anaconda/envs/ASR/lib/python3.7/site-packages/hydra/core/utils.py”, line 125, in run_job ret.return_value = task_function(task_cfg) File “fairseq_cli/hydra_train.py”, line 38, in hydra_main distributed_utils.call_main(cfg, pre_main) File “/home/rashwan/ASR/fairseq/fairseq/distributed_utils.py”, line 318, in call_main cfg.distributed_training.distributed_world_size, File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”, line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method=‘spawn’) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”, line 158, in start_processes while not context.join(): File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”, line 119, in join raise Exception(msg) Exception:

– Process 0 terminated with the following error: Traceback (most recent call last): File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”, line 20, in _wrap fn(i, *args) File “/home/rashwan/ASR/fairseq/fairseq/distributed_utils.py”, line 300, in distributed_main main(cfg, **kwargs) File “/home/rashwan/ASR/fairseq/fairseq_cli/train.py”, line 130, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File “/anaconda/envs/ASR/lib/python3.7/contextlib.py”, line 74, in inner return func(*args, **kwds) File “/home/rashwan/ASR/fairseq/fairseq_cli/train.py”, line 219, in train log_output = trainer.train_step(samples) File “/anaconda/envs/ASR/lib/python3.7/contextlib.py”, line 74, in inner return func(*args, **kwds) File “/home/rashwan/ASR/fairseq/fairseq/trainer.py”, line 572, in train_step raise e File “/home/rashwan/ASR/fairseq/fairseq/trainer.py”, line 546, in train_step ignore_grad=is_dummy_batch, File “/home/rashwan/ASR/fairseq/fairseq/tasks/fairseq_task.py”, line 428, in train_step loss, sample_size, logging_output = criterion(model, sample) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/home/rashwan/ASR/fairseq/fairseq/criterions/ctc.py”, line 106, in forward net_output = model(**sample[“net_input”]) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/home/rashwan/ASR/fairseq/fairseq/legacy_distributed_data_parallel.py”, line 83, in forward return self.module(*inputs, **kwargs) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/home/rashwan/ASR/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py”, line 160, in forward x = self.w2v_encoder(**kwargs) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/home/rashwan/ASR/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py”, line 339, in forward x, padding_mask = self.w2v_model.extract_features(**w2v_args) File “/home/rashwan/ASR/fairseq/fairseq/models/wav2vec/wav2vec2.py”, line 570, in extract_features res = self.forward(source, padding_mask, mask=mask, features_only=True) File “/home/rashwan/ASR/fairseq/fairseq/models/wav2vec/wav2vec2.py”, line 454, in forward features = self.feature_extractor(source) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/home/rashwan/ASR/fairseq/fairseq/models/wav2vec/wav2vec2.py”, line 680, in forward x = conv(x) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/container.py”, line 117, in forward input = module(input) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/anaconda/envs/ASR/lib/python3.7/site-packages/torch/nn/modules/conv.py”, line 257, in forward self.padding, self.dilation, self.groups) RuntimeError: Calculated padded input size per channel: (1). Kernel size: (3). Kernel size can’t be greater than actual input size

Environment

fairseq Version : master
PyTorch Version : 1.6.0
OS (e.g., Linux): Ubuntu
How you installed fairseq (pip, source): pip install . (from the source)
Build command you used (if compiling from source):
Python version: 3.7.3
CUDA/cuDNN version: 10.1
GPU models and configuration: 4 K80 gpus with 12 GB RAM
Any other relevant information:

Additional context

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

alexeibcommented, Dec 1, 2020

this means some of your files are too short you can set task.min_sample_size to something reasonable (like 32000 = 2s if sample rate is 16k) to filter out examples that are too short, but then you need to change the code to also skip those examples when loading labels (since you get this error during finetuning). or you change the code to pad those examples instead of skipping them

i’ll look into adding some guard rails

0reactions

alexeibcommented, Dec 8, 2020

during training, we compute raw wer (aka “viterbi”) inside ctc criterion. optionally, if you provide “wer_wars” then we can also eval using a ken lm

if you see strange things happening with wer computation you might want to print out a few examples. for instance, i think by default “post_process” is set to “letter” which assumes your targets are lettesr with | as a word boundary.