question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

lr decrease to 0 when fine-tune CNNDM for summarization

See original GitHub issue

What is your question?

Hi, I was using fairseq to fine-tune CNNDM for summarization based on BART-large. Here is my situation: I have 3 1080Ti GPU with 12GB memory each to train this model. However it doesn’t support fp16. My script is below:

TOTAL_NUM_UPDATES=20000 WARMUP_UPDATES=500 LR=3e-05 MAX_TOKENS=1024 UPDATE_FREQ=1 BART_PATH=./bart.large/model.pt CUDA_VISIBLE_DEVICES=3,6,7 fairseq-train cnn_dm-bin \ --restore-file $BART_PATH \ --max-tokens $MAX_TOKENS \ --task translation \ --source-lang source --target-lang target \ --truncate-source \ --layernorm-embedding \ --share-all-embeddings \ --share-decoder-input-output-embed \ --reset-optimizer --reset-dataloader --reset-meters \ --required-batch-size-multiple 1 \ --arch bart_large \ --criterion label_smoothed_cross_entropy \ --label-smoothing 0.1 \ --dropout 0.1 --attention-dropout 0.1 \ --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08 \ --clip-norm 0.1 \ --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \ --update-freq $UPDATE_FREQ \ --skip-invalid-size-inputs-valid-test \ --memory-efficient-fp16 \ --find-unused-parameters

However, when I trained the model, learning rate decreases constantly and finally becomes 0 in epoch 1(20008 / 84773) Meanwhile, it cause loss or nll_loss stop decrease. There’s totally somewhere wrong. Then what can I do for this situation? I’m looking forward for your kind reply.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
benjpaucommented, Nov 1, 2021

I followed the instructions here exactly to train the model: https://github.com/pytorch/fairseq/blob/main/examples/bart/README.summarization.md.

When the num_updates meets 20000, the lr becomes 0. But, the training process won’t stop, and keep training. Is this normal, should I just exit the train. Does it means that the training is finished?

Thanks a lot! @monologue1107 @myleott

0reactions
stale[bot]commented, Apr 18, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

lr decrease to 0 when fine-tune CNNDM for summarization
Hi, I was using fairseq to fine-tune CNNDM for summarization based on BART-large. Here is my situation: I have 3 1080Ti GPU with...
Read more >
Improving Zero and Few-Shot Abstractive Summarization with ...
We show that this method improves zero-shot domain transfer over transfer from other domains, achieving state-of-the- art unsupervised abstractive summarization ...
Read more >
T5 Finetuning Tips - Models - Hugging Face Forums
T5 model for summarization far from SOTA results. Using Trainer class with T5 - what is ... Fine-tune T5-small but lower performance.
Read more >
Generating Text Summaries Using GPT-2 on PyTorch
This tutorial will show you how to use GPT-2 on PyTorch to summarize text from the CNN/Daily Mail dataset with minimal training.
Read more >
arXiv:2010.12836v2 [cs.CL] 11 Apr 2021
art unsupervised abstractive summarization perfor- mance on the CNNDM dataset while generalizing to other domains, and we perform extensive ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found