Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is it possible to use adaptive_loss with transformer?

See original GitHub issue

I would like to use adaptvive-loss with transformer Is this possible?

I tried as follows

fairseq-train data/bin/ --save-dir nn_model --task translation --share-all-embeddings --no-progress-bar --arch transformer --ddp-backend=no_c10d --optimizer adam --adam-betas '(0.9, 0.98)' --update-freq=8 --max-tokens 1536 --warmup-updates 1000 --criterion adaptive_loss --adaptive-softmax-cutoff 1000

but I got the following error

Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff='1000', adaptive_softmax_dropout=0, arch='transformer', attention_dropout=0.0, bucket_cap_mb=25, clip_norm=25, cpu=False, criterion='adaptive_loss', curriculum=0, data='data/bin/', dataset_impl='cached', ddp_backend='no_c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_embed_path=None, decoder_ffn_embed_dim=2048, decoder_input_dim=512, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=512, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, encoder_attention_heads=8, encoder_embed_dim=512, encoder_embed_path=None, encoder_ffn_embed_dim=2048, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, find_unused_parameters=False, fix_batches_to_gpus=False, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, global_sync_iter=10, keep_interval_updates=-1, keep_last_epochs=-1, lazy_load=False, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=1000, lr=[0.25], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=1536, max_update=0, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, no_epoch_checkpoints=False, no_progress_bar=True, no_save=False, no_token_positional_embeddings=False, num_workers=0, optimizer='adam', optimizer_overrides='{}', raw_text=False, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='nn_model', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation', tbmf_wrapper=False, tensorboard_logdir='', threshold_loss_scale=None, train_subset='train', update_freq=[8], upsample_primary=1, use_bmuf=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=1000, weight_decay=0.0)
| [sl] dictionary: 30088 types
| [tl] dictionary: 30088 types
| data/bin/en__it_XX/ valid sl-tl 171248 examples
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 302, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 49, in main
    model = task.build_model(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/tasks/fairseq_task.py", line 169, in build_model
    return models.build_model(args, self)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/models/__init__.py", line 50, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/models/transformer.py", line 165, in build_model
    decoder = cls.build_decoder(args, tgt_dict, decoder_embed_tokens)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/models/transformer.py", line 174, in build_decoder
    return TransformerDecoder(args, tgt_dict, embed_tokens)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/models/transformer.py", line 356, in __init__
    adaptive_inputs=embed_tokens if args.tie_adaptive_weights else None,
AttributeError: 'Namespace' object has no attribute 'tie_adaptive_weights'

what’s wrong?

what should be parameters tie_adaptive_weights?

Issue Analytics

State:
Created 4 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

nicolabertoldicommented, Sep 24, 2019

thank you for this hint.

0reactions

lematt1991commented, Sep 24, 2019

I would try using the pytorch profiler

Top Results From Across the Web

[D] Adaptive loss weight in VQGAN paper. : r/MachineLearning

The adaptive weight actually stays relatively small ( scale of 1e-5 ), removing it doesn't have much effects on the final results. You...

When can we call a loss function "adaptive"?

I came across the phrase "adaptive loss function" in several research papers. ... It is the normal English meaning of "adaptive".

Faster Depth-Adaptive Transformers

Depth-adaptive neural networks can dynamically adjust depths according to the hardness of input words, and thus improve efficiency.

Text and Image Matching with Adaptive Loss for Cross-modal ...

Existing pre-trained models for vision-language often learn image-text semantic alignment using a multi-layer Transformer architecture, such as Bert [68], on ...

FashionBERT: Text and Image Matching with Adaptive Loss ...

Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations.