local_rank error
See original GitHub issueI used the distributed training and follow the way here: https://fairseq.readthedocs.io/en/latest/getting_started.html#distributed-training.
However, I got local rank argument error:
usage: fairseq-train [-h] [–no-progress-bar] [–log-interval LOG_INTERVAL] [–log-format {json,none,simple,tqdm}] [–tensorboard-logdir TENSORBOARD_LOGDIR] [–wandb-project WANDB_PROJECT] [–seed SEED] [–cpu] [–tpu] [–bf16] [–memory-efficient-bf16] [–fp16] [–memory-efficient-fp16] [–fp16-no-flatten-grads] [–fp16-init-scale FP16_INIT_SCALE] [–fp16-scale-window FP16_SCALE_WINDOW] [–fp16-scale-tolerance FP16_SCALE_TOLERANCE] [–min-loss-scale MIN_LOSS_SCALE] [–threshold-loss-scale THRESHOLD_LOSS_SCALE] [–user-dir USER_DIR] [–empty-cache-freq EMPTY_CACHE_FREQ] [–all-gather-list-size ALL_GATHER_LIST_SIZE] [–model-parallel-size MODEL_PARALLEL_SIZE] [–quantization-config-path QUANTIZATION_CONFIG_PATH] [–profile] [–tokenizer {nltk,space,moses}] [–bpe {bert,bytes,hf_byte_bpe,characters,fastbpe,sentencepiece,subword_nmt,gpt2,byte_bpe}] [–criterion {wav2vec,cross_entropy,nat_loss,sentence_prediction,composite_loss,legacy_masked_lm_loss,label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,adaptive_loss,sentence_ranking,ctc,masked_lm,vocab_parallel_cross_entropy}] [–optimizer {adagrad,adamax,adadelta,sgd,adafactor,lamb,nag,adam}] [–lr-scheduler {fixed,inverse_sqrt,tri_stage,triangular,cosine,reduce_lr_on_plateau,polynomial_decay}] [–scoring {sacrebleu,bleu,wer,chrf}] [–task TASK] [–num-workers NUM_WORKERS] [–skip-invalid-size-inputs-valid-test] [–max-tokens MAX_TOKENS] [–batch-size BATCH_SIZE] [–required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE] [–required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE] [–dataset-impl {raw,lazy,cached,mmap,fasta}] [–data-buffer-size DATA_BUFFER_SIZE] [–train-subset TRAIN_SUBSET] [–valid-subset VALID_SUBSET] [–validate-interval VALIDATE_INTERVAL] [–validate-interval-updates VALIDATE_INTERVAL_UPDATES] [–validate-after-updates VALIDATE_AFTER_UPDATES] [–fixed-validation-seed FIXED_VALIDATION_SEED] [–disable-validation] [–max-tokens-valid MAX_TOKENS_VALID] [–batch-size-valid BATCH_SIZE_VALID] [–curriculum CURRICULUM] [–gen-subset GEN_SUBSET] [–num-shards NUM_SHARDS] [–shard-id SHARD_ID] [–distributed-world-size DISTRIBUTED_WORLD_SIZE] [–distributed-rank DISTRIBUTED_RANK] [–distributed-backend DISTRIBUTED_BACKEND] [–distributed-init-method DISTRIBUTED_INIT_METHOD] [–distributed-port DISTRIBUTED_PORT] [–device-id DEVICE_ID] [–local-rank LOCAL_RANK] [–distributed-no-spawn] [–ddp-backend {c10d,no_c10d}] [–bucket-cap-mb BUCKET_CAP_MB] [–fix-batches-to-gpus] [–find-unused-parameters] [–fast-stat-sync] [–broadcast-buffers] [–distributed-wrapper {DDP,SlowMo}] [–slowmo-momentum SLOWMO_MOMENTUM] [–slowmo-algorithm SLOWMO_ALGORITHM] [–localsgd-frequency LOCALSGD_FREQUENCY] [–nprocs-per-node NPROCS_PER_NODE] [–pipeline-model-parallel] [–pipeline-balance PIPELINE_BALANCE] [–pipeline-devices PIPELINE_DEVICES] [–pipeline-chunks PIPELINE_CHUNKS] [–pipeline-encoder-balance PIPELINE_ENCODER_BALANCE] [–pipeline-encoder-devices PIPELINE_ENCODER_DEVICES] [–pipeline-decoder-balance PIPELINE_DECODER_BALANCE] [–pipeline-decoder-devices PIPELINE_DECODER_DEVICES] [–pipeline-checkpoint {always,never,except_last}] [–zero-sharding {none,os}] [–arch ARCH] [–max-epoch MAX_EPOCH] [–max-update MAX_UPDATE] [–stop-time-hours STOP_TIME_HOURS] [–clip-norm CLIP_NORM] [–sentence-avg] [–update-freq UPDATE_FREQ] [–lr LR] [–min-lr MIN_LR] [–use-bmuf] [–save-dir SAVE_DIR] [–restore-file RESTORE_FILE] [–finetune-from-model FINETUNE_FROM_MODEL] [–reset-dataloader] [–reset-lr-scheduler] [–reset-meters] [–reset-optimizer] [–optimizer-overrides OPTIMIZER_OVERRIDES] [–save-interval SAVE_INTERVAL] [–save-interval-updates SAVE_INTERVAL_UPDATES] [–keep-interval-updates KEEP_INTERVAL_UPDATES] [–keep-last-epochs KEEP_LAST_EPOCHS] [–keep-best-checkpoints KEEP_BEST_CHECKPOINTS] [–no-save] [–no-epoch-checkpoints] [–no-last-checkpoints] [–no-save-optimizer-state] [–best-checkpoint-metric BEST_CHECKPOINT_METRIC] [–maximize-best-checkpoint-metric] [–patience PATIENCE] [–checkpoint-suffix CHECKPOINT_SUFFIX] [–checkpoint-shard-count CHECKPOINT_SHARD_COUNT] [–activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}] [–dropout D] [–attention-dropout D] [–activation-dropout D] [–encoder-embed-path STR] [–encoder-embed-dim N] [–encoder-ffn-embed-dim N] [–encoder-layers N] [–encoder-attention-heads N] [–encoder-normalize-before] [–encoder-learned-pos] [–decoder-embed-path STR] [–decoder-embed-dim N] [–decoder-ffn-embed-dim N] [–decoder-layers N] [–decoder-attention-heads N] [–decoder-learned-pos] [–decoder-normalize-before] [–decoder-output-dim N] [–share-decoder-input-output-embed] [–share-all-embeddings] [–no-token-positional-embeddings] [–adaptive-softmax-cutoff EXPR] [–adaptive-softmax-dropout D] [–layernorm-embedding] [–no-scale-embedding] [–checkpoint-activations] [–no-cross-attention] [–cross-self-attention] [–encoder-layerdrop D] [–decoder-layerdrop D] [–encoder-layers-to-keep ENCODER_LAYERS_TO_KEEP] [–decoder-layers-to-keep DECODER_LAYERS_TO_KEEP] [–quant-noise-pq D] [–quant-noise-pq-block-size D] [–quant-noise-scalar D] [–pooler-dropout D] [–pooler-activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}] [–spectral-norm-classification-head] [-s SRC] [-t TARGET] [–load-alignments] [–left-pad-source BOOL] [–left-pad-target BOOL] [–max-source-positions N] [–max-target-positions N] [–upsample-primary UPSAMPLE_PRIMARY] [–truncate-source] [–num-batch-buckets N] [–eval-bleu] [–eval-bleu-detok EVAL_BLEU_DETOK] [–eval-bleu-detok-args JSON] [–eval-tokenized-bleu] [–eval-bleu-remove-bpe [EVAL_BLEU_REMOVE_BPE]] [–eval-bleu-args JSON] [–eval-bleu-print-samples] [–label-smoothing D] [–report-accuracy] [–ignore-prefix-size IGNORE_PREFIX_SIZE] [–adam-betas ADAM_BETAS] [–adam-eps ADAM_EPS] [–weight-decay WEIGHT_DECAY] [–use-old-adam] [–force-anneal N] [–warmup-updates N] [–end-learning-rate END_LEARNING_RATE] [–power POWER] [–total-num-update TOTAL_NUM_UPDATE] [–pad PAD] [–eos EOS] [–unk UNK] data fairseq-train: error: unrecognized arguments: --local_rank=3
It seems that in fairseq it wants --local-rank but in practice, it ran with --local_rank.
Is there any solution to it?
Thanks.
What’s your environment?
fairseq (master: Nov 4, 2020)
- PyTorch Version 1.6
- OS (e.g., Linux): Linux
- How you installed fairseq (
pip
, source): yes - Build command you used (if compiling from source): pip install
- Python version: 3.6
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (2 by maintainers)
Top GitHub Comments
I encountered the same problem when using “fairseq-hydra-train” to pretrain wav2vec2.0 model:
fairseq-hydra-train: error: unrecognized arguments: --local_rank=0
Here are the command:python -m torch.distributed.launch --nproc_per_node=1 \ --nnodes=2 --node_rank=0 --master_addr="192.168.24.42" \ --master_port=12345 \ ./fairseq-hydra-train task.data=my_data_set \ --config-dir ./fairseq-main/examples/wav2vec/config/pretraining \ --config-name my_config
Could you give some advice on how can I usefairseq-hydra-train
to train on multi-node? Extremely grateful.Ah yeah,
python -m torch.distributed.launch
will only populate--local_rank
(with an underscore): https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py#L268@alexeib, can we add an alias?