Is it possible to use adaptive_loss with transformer?
See original GitHub issueI would like to use adaptvive-loss with transformer Is this possible?
I tried as follows
fairseq-train data/bin/ --save-dir nn_model --task translation --share-all-embeddings --no-progress-bar --arch transformer --ddp-backend=no_c10d --optimizer adam --adam-betas '(0.9, 0.98)' --update-freq=8 --max-tokens 1536 --warmup-updates 1000 --criterion adaptive_loss --adaptive-softmax-cutoff 1000
but I got the following error
Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff='1000', adaptive_softmax_dropout=0, arch='transformer', attention_dropout=0.0, bucket_cap_mb=25, clip_norm=25, cpu=False, criterion='adaptive_loss', curriculum=0, data='data/bin/', dataset_impl='cached', ddp_backend='no_c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_embed_path=None, decoder_ffn_embed_dim=2048, decoder_input_dim=512, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=512, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, encoder_attention_heads=8, encoder_embed_dim=512, encoder_embed_path=None, encoder_ffn_embed_dim=2048, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, find_unused_parameters=False, fix_batches_to_gpus=False, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, global_sync_iter=10, keep_interval_updates=-1, keep_last_epochs=-1, lazy_load=False, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=1000, lr=[0.25], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=1536, max_update=0, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, no_epoch_checkpoints=False, no_progress_bar=True, no_save=False, no_token_positional_embeddings=False, num_workers=0, optimizer='adam', optimizer_overrides='{}', raw_text=False, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='nn_model', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation', tbmf_wrapper=False, tensorboard_logdir='', threshold_loss_scale=None, train_subset='train', update_freq=[8], upsample_primary=1, use_bmuf=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=1000, weight_decay=0.0)
| [sl] dictionary: 30088 types
| [tl] dictionary: 30088 types
| data/bin/en__it_XX/ valid sl-tl 171248 examples
Traceback (most recent call last):
File "/usr/local/bin/fairseq-train", line 11, in <module>
sys.exit(cli_main())
File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 302, in cli_main
main(args)
File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 49, in main
model = task.build_model(args)
File "/usr/local/lib/python3.6/dist-packages/fairseq/tasks/fairseq_task.py", line 169, in build_model
return models.build_model(args, self)
File "/usr/local/lib/python3.6/dist-packages/fairseq/models/__init__.py", line 50, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/usr/local/lib/python3.6/dist-packages/fairseq/models/transformer.py", line 165, in build_model
decoder = cls.build_decoder(args, tgt_dict, decoder_embed_tokens)
File "/usr/local/lib/python3.6/dist-packages/fairseq/models/transformer.py", line 174, in build_decoder
return TransformerDecoder(args, tgt_dict, embed_tokens)
File "/usr/local/lib/python3.6/dist-packages/fairseq/models/transformer.py", line 356, in __init__
adaptive_inputs=embed_tokens if args.tie_adaptive_weights else None,
AttributeError: 'Namespace' object has no attribute 'tie_adaptive_weights'
what’s wrong?
what should be parameters tie_adaptive_weights
?
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
[D] Adaptive loss weight in VQGAN paper. : r/MachineLearning
The adaptive weight actually stays relatively small ( scale of 1e-5 ), removing it doesn't have much effects on the final results. You...
Read more >When can we call a loss function "adaptive"?
I came across the phrase "adaptive loss function" in several research papers. ... It is the normal English meaning of "adaptive".
Read more >Faster Depth-Adaptive Transformers
Depth-adaptive neural networks can dynamically adjust depths according to the hardness of input words, and thus improve efficiency.
Read more >Text and Image Matching with Adaptive Loss for Cross-modal ...
Existing pre-trained models for vision-language often learn image-text semantic alignment using a multi-layer Transformer architecture, such as Bert [68], on ...
Read more >FashionBERT: Text and Image Matching with Adaptive Loss ...
Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
thank you for this hint.
I would try using the pytorch profiler