Wav2vec2_s2s model fine_tuning error:TypeError: forward() missing 1 required positional argument: 'prev_output_tokens'
See original GitHub issueđ Bug
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
- Run cmd â#!/bin/bash
env HYDRA_FULL_ERROR=1
CUDA_VISIBLE_DEVICES=0,1,2,3
python3 /opt/tiger/fairseq/fairseq_cli/hydra_train.py
task.data=/mnt/bd/liulu-asr-data/mls_spanish
model.w2v_path=/opt/tiger/fairseq/examples/wav2vec/outputs/xlsr_53_56k.pt
dataset.train_subset=âtrain_100hâ dataset.valid_subset=âvalidâ
checkpoint.patience=-1
checkpoint.save_dir=âspanish_100h_s2sâ
checkpoint.maximize_best_checkpoint_metric=False
distributed_training.distributed_world_size=4 optimization.update_freq=â[6]â
âconfig-dir /opt/tiger/fairseq/examples/wav2vec/config/finetuning/
âconfig-name vox_100h ' - See error
2021-01-27 00:45:20 | INFO | fairseq.distributed_utils | distributed init (rank 1): tcp://localhost:15491
2021-01-27 00:45:20 | INFO | fairseq.distributed_utils | distributed init (rank 0): tcp://localhost:15491
2021-01-27 00:45:20 | INFO | fairseq.distributed_utils | distributed init (rank 3): tcp://localhost:15491
2021-01-27 00:45:20 | INFO | fairseq.distributed_utils | distributed init (rank 2): tcp://localhost:15491
2021-01-27 00:45:27 | INFO | fairseq.distributed_utils | initialized host n147-194-033 as rank 1
2021-01-27 00:45:27 | INFO | fairseq.distributed_utils | initialized host n147-194-033 as rank 0
2021-01-27 00:45:27 | INFO | fairseq.distributed_utils | initialized host n147-194-033 as rank 2
2021-01-27 00:45:27 | INFO | fairseq.distributed_utils | initialized host n147-194-033 as rank 3
2021-01-27 00:45:27 | INFO | fairseq_cli.train | {â_nameâ: None, âcommonâ: {â_nameâ: None, âno_progress_barâ: False, âlog_intervalâ: 200, âlog_formatâ: âjsonâ, âtensorboard_logdirâ: None, âwandb_projectâ: None, âazureml_loggingâ: False, âseedâ: 1, âcpuâ: False, âtpuâ: False, âbf16â: False, âmemory_efficient_bf16â: False, âfp16â: True, âmemory_efficient_fp16â: False, âfp16_no_flatten_gradsâ: False, âfp16_init_scaleâ: 128, âfp16_scale_windowâ: None, âfp16_scale_toleranceâ: 0.0, âmin_loss_scaleâ: 0.0001, âthreshold_loss_scaleâ: None, âuser_dirâ: None, âempty_cache_freqâ: 0, âall_gather_list_sizeâ: 16384, âmodel_parallel_sizeâ: 1, âquantization_config_pathâ: None, âprofileâ: False, âreset_loggingâ: True}, âcommon_evalâ: {â_nameâ: None, âpathâ: None, âpost_processâ: None, âquietâ: False, âmodel_overridesâ: â{}â, âresults_pathâ: None}, âdistributed_trainingâ: {â_nameâ: None, âdistributed_world_sizeâ: 4, âdistributed_rankâ: 0, âdistributed_backendâ: âncclâ, âdistributed_init_methodâ: âtcp://localhost:15491â, âdistributed_portâ: -1, âdevice_idâ: 0, âdistributed_no_spawnâ: False, âddp_backendâ: âno_c10dâ, âbucket_cap_mbâ: 25, âfix_batches_to_gpusâ: False, âfind_unused_parametersâ: False, âfast_stat_syncâ: False, âbroadcast_buffersâ: False, âdistributed_wrapperâ: âDDPâ, âslowmo_momentumâ: None, âslowmo_algorithmâ: âLocalSGDâ, âlocalsgd_frequencyâ: 3, ânprocs_per_nodeâ: 4, âpipeline_model_parallelâ: False, âpipeline_balanceâ: None, âpipeline_devicesâ: None, âpipeline_chunksâ: 0, âpipeline_encoder_balanceâ: None, âpipeline_encoder_devicesâ: None, âpipeline_decoder_balanceâ: None, âpipeline_decoder_devicesâ: None, âpipeline_checkpointâ: âneverâ, âzero_shardingâ: ânoneâ, âtpuâ: False, âdistributed_num_procsâ: 4}, âdatasetâ: {â_nameâ: None, ânum_workersâ: 6, âskip_invalid_size_inputs_valid_testâ: True, âmax_tokensâ: None, âbatch_sizeâ: 32, ârequired_batch_size_multipleâ: 8, ârequired_seq_len_multipleâ: 1, âdataset_implâ: None, âdata_buffer_sizeâ: 10, âtrain_subsetâ: âtrain_100hâ, âvalid_subsetâ: âvalidâ, âvalidate_intervalâ: 1, âvalidate_interval_updatesâ: 0, âvalidate_after_updatesâ: 0, âfixed_validation_seedâ: None, âdisable_validationâ: False, âmax_tokens_validâ: None, âbatch_size_validâ: 32, âcurriculumâ: 0, âgen_subsetâ: âtestâ, ânum_shardsâ: 1, âshard_idâ: 0}, âoptimizationâ: {â_nameâ: None, âmax_epochâ: 0, âmax_updateâ: 80000, âstop_time_hoursâ: 0.0, âclip_normâ: 0.0, âsentence_avgâ: True, âupdate_freqâ: [6], âlrâ: [3e-05], âstop_min_lrâ: -1.0, âuse_bmufâ: False}, âcheckpointâ: {â_nameâ: None, âsave_dirâ: âspanish_100h_s2sâ, ârestore_fileâ: âcheckpoint_last.ptâ, âfinetune_from_modelâ: None, âreset_dataloaderâ: False, âreset_lr_schedulerâ: False, âreset_metersâ: False, âreset_optimizerâ: False, âoptimizer_overridesâ: â{}â, âsave_intervalâ: 1, âsave_interval_updatesâ: 0, âkeep_interval_updatesâ: -1, âkeep_last_epochsâ: -1, âkeep_best_checkpointsâ: -1, âno_saveâ: False, âno_epoch_checkpointsâ: True, âno_last_checkpointsâ: False, âno_save_optimizer_stateâ: False, âbest_checkpoint_metricâ: âwerâ, âmaximize_best_checkpoint_metricâ: False, âpatienceâ: -1, âcheckpoint_suffixâ: ââ, âcheckpoint_shard_countâ: 1, âload_checkpoint_on_all_dp_ranksâ: False, âmodel_parallel_sizeâ: 1, âdistributed_rankâ: 0}, âbmufâ: {â_nameâ: None, âblock_lrâ: 1.0, âblock_momentumâ: 0.875, âglobal_sync_iterâ: 50, âwarmup_iterationsâ: 500, âuse_nbmâ: False, âaverage_syncâ: False, âdistributed_world_sizeâ: 4}, âgenerationâ: {â_nameâ: None, âbeamâ: 5, ânbestâ: 1, âmax_len_aâ: 0.0, âmax_len_bâ: 200, âmin_lenâ: 1, âmatch_source_lenâ: False, âunnormalizedâ: False, âno_early_stopâ: False, âno_beamable_mmâ: False, âlenpenâ: 1.0, âunkpenâ: 0.0, âreplace_unkâ: None, âsacrebleuâ: False, âscore_referenceâ: False, âprefix_sizeâ: 0, âno_repeat_ngram_sizeâ: 0, âsamplingâ: False, âsampling_topkâ: -1, âsampling_toppâ: -1.0, âconstraintsâ: None, âtemperatureâ: 1.0, âdiverse_beam_groupsâ: -1, âdiverse_beam_strengthâ: 0.5, âdiversity_rateâ: -1.0, âprint_alignmentâ: False, âprint_stepâ: False, âlm_pathâ: None, âlm_weightâ: 0.0, âiter_decode_eos_penaltyâ: 0.0, âiter_decode_max_iterâ: 10, âiter_decode_force_max_iterâ: False, âiter_decode_with_beamâ: 1, âiter_decode_with_external_rerankerâ: False, âretain_iter_historyâ: False, âretain_dropoutâ: False, âretain_dropout_modulesâ: None, âdecoding_formatâ: None, âno_seed_providedâ: False}, âeval_lmâ: {â_nameâ: None, âoutput_word_probsâ: False, âoutput_word_statsâ: False, âcontext_windowâ: 0, âsoftmax_batchâ: 9223372036854775807}, âinteractiveâ: {â_nameâ: None, âbuffer_sizeâ: 0, âinputâ: â-â}, âmodelâ: {â_nameâ: âwav2vec_seq2seqâ, âw2v_pathâ: â/opt/tiger/fairseq/examples/wav2vec/outputs/xlsr_53_56k.ptâ, âapply_maskâ: True, âmask_probâ: 0.5, âmask_channel_probâ: 0.5, âmask_channel_lengthâ: 64, âlayerdropâ: 0.1, âactivation_dropoutâ: 0.1, âfeature_grad_multâ: 0.0, âfreeze_finetune_updatesâ: 10000}, âtaskâ: {â_nameâ: âaudio_pretrainingâ, âdataâ: â/mnt/bd/liulu-asr-data/mls_spanishâ, ânormalizeâ: True, âlabelsâ: âltrâ}, âcriterionâ: {â_nameâ: âcross_entropyâ}, âoptimizerâ: {â_nameâ: âadamâ, âadam_betasâ: â(0.9,0.98)â, âadam_epsâ: 1e-08}, âlr_schedulerâ: {â_nameâ: âtri_stageâ, âphase_ratioâ: [0.1, 0.4, 0.5], âfinal_lr_scaleâ: 0.05}, âscoringâ: None, âbpeâ: None, âtokenizerâ: None}
2021-01-27 00:45:27 | INFO | fairseq.data.audio.raw_audio_dataset | loaded 2408, skipped 0 samples
2021-01-27 00:45:38 | INFO | fairseq_cli.train | task: AudioPretrainingTask
2021-01-27 00:45:38 | INFO | fairseq_cli.train | model: Wav2Vec2Seq2SeqModel
2021-01-27 00:45:38 | INFO | fairseq_cli.train | criterion: CrossEntropyCriterion
2021-01-27 00:45:38 | INFO | fairseq_cli.train | num. model params: 372207744 (num. trained: 372207744)
2021-01-27 00:45:39 | INFO | fairseq.utils | CUDA enviroments for all 4 workers
2021-01-27 00:45:39 | INFO | fairseq.utils | rank 0: capabilities = 7.0 ; total memory = 31.719 GB ; name = Tesla V100-SXM2-32GB
2021-01-27 00:45:39 | INFO | fairseq.utils | rank 1: capabilities = 7.0 ; total memory = 31.719 GB ; name = Tesla V100-SXM2-32GB
2021-01-27 00:45:39 | INFO | fairseq.utils | rank 2: capabilities = 7.0 ; total memory = 31.719 GB ; name = Tesla V100-SXM2-32GB
2021-01-27 00:45:39 | INFO | fairseq.utils | rank 3: capabilities = 7.0 ; total memory = 31.719 GB ; name = Tesla V100-SXM2-32GB
2021-01-27 00:45:39 | INFO | fairseq.utils | CUDA enviroments for all 4 workers 2021-01-27 00:45:39 | INFO | fairseq_cli.train | training on 4 devices (GPUs/TPUs) 2021-01-27 00:45:39 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 32 2021-01-27 00:45:39 | INFO | fairseq.trainer | no existing checkpoint found spanish_100h_s2s/checkpoint_last.pt 2021-01-27 00:45:39 | INFO | fairseq.trainer | loading train data for epoch 1 2021-01-27 00:45:39 | INFO | fairseq.data.audio.raw_audio_dataset | loaded 24103, skipped 0 samples 2021-01-27 00:45:40 | INFO | fairseq.trainer | begin training epoch 1 Traceback (most recent call last): File â/opt/tiger/fairseq/fairseq_cli/hydra_train.pyâ, line 70, in <module> cli_main() File â/opt/tiger/fairseq/fairseq_cli/hydra_train.pyâ, line 66, in cli_main hydra_main() File â/usr/local/lib/python3.7/dist-packages/hydra/main.pyâ, line 37, in decorated_main strict=strict, File â/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.pyâ, line 347, in _run_hydra lambda: hydra.run( File â/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.pyâ, line 201, in run_and_report raise ex File â/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.pyâ, line 198, in run_and_report return func() File â/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.pyâ, line 350, in <lambda> overrides=args.overrides, File â/usr/local/lib/python3.7/dist-packages/hydra/_internal/hydra.pyâ, line 112, in run configure_logging=with_log_configuration, File â/usr/local/lib/python3.7/dist-packages/hydra/core/utils.pyâ, line 125, in run_job ret.return_value = task_function(task_cfg) File â/opt/tiger/fairseq/fairseq_cli/hydra_train.pyâ, line 38, in hydra_main distributed_utils.call_main(cfg, pre_main) File â/opt/tiger/fairseq/fairseq/distributed_utils.pyâ, line 320, in call_main cfg.distributed_training.distributed_world_size, File â/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.pyâ, line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method=âspawnâ) File â/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.pyâ, line 157, in start_processes while not context.join(): File â/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.pyâ, line 118, in join raise Exception(msg) Exception:
â Process 0 terminated with the following error: Traceback (most recent call last): File â/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.pyâ, line 19, in _wrap fn(i, *args) File â/opt/tiger/fairseq/fairseq/distributed_utils.pyâ, line 302, in distributed_main main(cfg, **kwargs) File â/opt/tiger/fairseq/fairseq_cli/train.pyâ, line 138, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File â/usr/lib/python3.7/contextlib.pyâ, line 74, in inner return func(*args, **kwds) File â/opt/tiger/fairseq/fairseq_cli/train.pyâ, line 235, in train log_output = trainer.train_step(samples) File â/usr/lib/python3.7/contextlib.pyâ, line 74, in inner return func(*args, **kwds) File â/opt/tiger/fairseq/fairseq/trainer.pyâ, line 536, in train_step ignore_grad=is_dummy_batch, File â/opt/tiger/fairseq/fairseq/tasks/fairseq_task.pyâ, line 428, in train_step loss, sample_size, logging_output = criterion(model, sample) File â/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.pyâ, line 727, in _call_impl result = self.forward(*input, **kwargs) File â/opt/tiger/fairseq/fairseq/criterions/cross_entropy.pyâ, line 35, in forward net_output = model(**sample[ânet_inputâ]) File â/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.pyâ, line 727, in _call_impl result = self.forward(*input, **kwargs) File â/opt/tiger/fairseq/fairseq/legacy_distributed_data_parallel.pyâ, line 83, in forward return self.module(*inputs, **kwargs) File â/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.pyâ, line 727, in _call_impl result = self.forward(*input, **kwargs) File â/opt/tiger/fairseq/fairseq/models/wav2vec/wav2vec2_asr.pyâ, line 249, in forward decoder_out = self.decoder(encoder_out=encoder_out, **kwargs) File â/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.pyâ, line 727, in _call_impl result = self.forward(*input, **kwargs) TypeError: forward() missing 1 required positional argument: âprev_output_tokensâ
Code sample
Expected behavior
Environment
- fairseq Version (e.g., 1.0 or master): master
- PyTorch Version (e.g., 1.0) 1.0
- OS (e.g., Linux):Linux
- How you installed fairseq (
pip
, source):pip - Build command you used (if compiling from source): pip install -e .
- Python version: 3.7
- CUDA/cuDNN version: 11.0
- GPU models and configuration: Tesla V100 8 GPU
- Any other relevant information: no
Additional context
no
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (1 by maintainers)
Top GitHub Comments
same prolem when run inference model wav2vec_seq2seq
use the previous token from the encoder in timestep t-1 as input for the decoder in timestep t: