Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

lr decrease to 0 when fine-tune CNNDM for summarization

See original GitHub issue

What is your question?

Hi, I was using fairseq to fine-tune CNNDM for summarization based on BART-large. Here is my situation: I have 3 1080Ti GPU with 12GB memory each to train this model. However it doesn’t support fp16. My script is below:

TOTAL_NUM_UPDATES=20000 WARMUP_UPDATES=500 LR=3e-05 MAX_TOKENS=1024 UPDATE_FREQ=1 BART_PATH=./bart.large/model.pt CUDA_VISIBLE_DEVICES=3,6,7 fairseq-train cnn_dm-bin \ --restore-file $BART_PATH \ --max-tokens $MAX_TOKENS \ --task translation \ --source-lang source --target-lang target \ --truncate-source \ --layernorm-embedding \ --share-all-embeddings \ --share-decoder-input-output-embed \ --reset-optimizer --reset-dataloader --reset-meters \ --required-batch-size-multiple 1 \ --arch bart_large \ --criterion label_smoothed_cross_entropy \ --label-smoothing 0.1 \ --dropout 0.1 --attention-dropout 0.1 \ --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08 \ --clip-norm 0.1 \ --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \ --update-freq $UPDATE_FREQ \ --skip-invalid-size-inputs-valid-test \ --memory-efficient-fp16 \ --find-unused-parameters

However, when I trained the model, learning rate decreases constantly and finally becomes 0 in epoch 1(20008 / 84773) Meanwhile, it cause loss or nll_loss stop decrease. There’s totally somewhere wrong. Then what can I do for this situation? I’m looking forward for your kind reply.

Issue Analytics

State:
Created 3 years ago
Comments:9 (1 by maintainers)

Top GitHub Comments

1reaction

benjpaucommented, Nov 1, 2021

I followed the instructions here exactly to train the model: https://github.com/pytorch/fairseq/blob/main/examples/bart/README.summarization.md.

When the num_updates meets 20000, the lr becomes 0. But, the training process won’t stop, and keep training. Is this normal, should I just exit the train. Does it means that the training is finished?

Thanks a lot! @monologue1107 @myleott

0reactions

stale[bot]commented, Apr 18, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

Top Results From Across the Web

lr decrease to 0 when fine-tune CNNDM for summarization

Hi, I was using fairseq to fine-tune CNNDM for summarization based on BART-large. Here is my situation: I have 3 1080Ti GPU with...

Improving Zero and Few-Shot Abstractive Summarization with ...

We show that this method improves zero-shot domain transfer over transfer from other domains, achieving state-of-the- art unsupervised abstractive summarization ...

T5 Finetuning Tips - Models - Hugging Face Forums

T5 model for summarization far from SOTA results. Using Trainer class with T5 - what is ... Fine-tune T5-small but lower performance.

Generating Text Summaries Using GPT-2 on PyTorch

This tutorial will show you how to use GPT-2 on PyTorch to summarize text from the CNN/Daily Mail dataset with minimal training.

arXiv:2010.12836v2 [cs.CL] 11 Apr 2021

art unsupervised abstractive summarization perfor- mance on the CNNDM dataset while generalizing to other domains, and we perform extensive ...