question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

BART model does NOT work properly when trained from scratch

See original GitHub issue

šŸ› Bug

I trained a BART model from scratch (without the ā€œā€“restore-file $PATHā€ argument) for the summarization task. During inference, the decoding seems wrong. Here are some output samples from the model:

s in Wales have been warned to be "inadequate" by the Welsh Government over the next five years.
s of a man who died after being hit by a car have been named by police.
ing the murder of a man who was found dead at a house in County Antrim has been jailed for life.
 Ched Evans has told a court that she would have to be a woman accused of raping a woman in the UK.
 Glamorgan has signed a new two-year contract with the Premier League club.

The beginning of each output sentence seems incomplete. Note that when the model is fine-tuned on the same dataset, everything is fine (with ā€œā€“restore-file $PATHā€).

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Train a BART model on the summarization dataset (XSum/CNNDM) from scratch.
TOTAL_NUM_UPDATES=15000
WARMUP_UPDATES=500      
LR=3e-05
MAX_TOKENS=2048
UPDATE_FREQ=2
SAVE_DIR=checkpoints/

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 fairseq-train $DATA_PATH \
    --save-dir $SAVE_DIR \
    --max-tokens $MAX_TOKENS \
    --task translation \
    --source-lang source --target-lang target \
    --truncate-source \
    --layernorm-embedding \
    --share-all-embeddings \
    --share-decoder-input-output-embed \
    --reset-optimizer --reset-dataloader --reset-meters \
    --required-batch-size-multiple 1 \
    --arch bart_large \
    --criterion label_smoothed_cross_entropy \
    --label-smoothing 0.1 \
    --dropout 0.1 --attention-dropout 0.1 \
    --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08 \
    --clip-norm 0.1 \
    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
    --fp16 --update-freq $UPDATE_FREQ \
    --skip-invalid-size-inputs-valid-test \
    --find-unused-parameters;
  1. Inference with the official code.
bart = BARTModel.from_pretrained(
    args.checkpoint_path,
    checkpoint_file="checkpoint_best.pt",
    data_name_or_path=args.data_path
)

bart.cuda()
bart.eval()
bart.half()
count = 1
bsz = 32
with open('test.source') as source, open('test.hypo', 'w') as fout:
    sline = source.readline().strip()
    slines = [sline]
    for sline in source:
        if count % bsz == 0:
            with torch.no_grad():
                hypotheses_batch = bart.sample(slines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)

            for hypothesis in hypotheses_batch:
                fout.write(hypothesis + '\n')
                fout.flush()
            slines = []

        slines.append(sline.strip())
        count += 1
    if slines != []:
        hypotheses_batch = bart.sample(slines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)
        for hypothesis in hypotheses_batch:
            fout.write(hypothesis + '\n')
            fout.flush()

Expected behavior

During inference, the decoding goes wrong and the decoded sentences are not completed.

Environment

  • fairseq Version: 0.9.0
  • PyTorch Version: 1.5.0
  • OS: Linux
  • How you installed fairseq: pip, source
  • Build command you used (if compiling from source): pip install --editable ./
  • Python version: 3.7.4
  • CUDA/cuDNN version: 10.2
  • GPU models and configuration: V100
  • Any other relevant information: None

Additional context

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:16 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
mcao516commented, Dec 6, 2020

Ah, youā€™re using the translation task rather than denoising. Probably thereā€™s a mismatch with the hub interface relating to the beginning of sentence token, and you need to remove the <s> token from here: https://github.com/pytorch/fairseq/blob/f732b403ec15244c41a24b9e28d6c5a411a511df/fairseq/models/bart/hub_interface.py#L58

Hi @myleott, thanks for the replay. Yes, I am using the translation task. However, removing the "<s> " token doesnā€™t work for me. The outputs are the same.

I think the problem is caused by prefix_tokens. Force prefix_tokens to be None will solve the issue.

1reaction
mcao516commented, Dec 20, 2020

Hi. Iā€™m running into the same problem with sample. Iā€™m following the fine tuning recipe for BART using the latest version of fairseq with my own data (formatted as one sentence per line for the source and target files, without any special tokens added, for example, on line of each of these files could simply be ā€œThe cat sat on the mat.ā€).

@mcao610 Can you please elaborate what do you mean by setting prefix_tokens to None? prefix_tokens appears to be a string in hub_interface.py. Should you change the call to sample or hub_interface.py directly? Thank you.

I mean set the prefix_tokens variable in sequence_generator.py to None. For instance, you can add prefix_tokens = None before this for loop in sequence_generator.py :

https://github.com/pytorch/fairseq/blob/f732b403ec15244c41a24b9e28d6c5a411a511df/fairseq/sequence_generator.py#L293

Read more comments on GitHub >

github_iconTop Results From Across the Web

BART model does NOT work properly when trained from scratch
I trained a BART model from scratch (without the "--restore-file $PATH" argument) for the summarization task. During inference, the decodingĀ ...
Read more >
BART - Hugging Face
The Bart model was proposed in BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by MikeĀ ...
Read more >
Teaching BART to Rap: Fine-tuning Hugging Face's BART ...
Although I've taught BART to rap here, it's really just a convenient (and fun!) seq2seq example as to how one can fine-tune the...
Read more >
Transformers BART Model Explained for Text Summarization
The BART HugggingFace model allows the pre-trained weights and weights fine-tuned on question-answering, text summarization, conditional textĀ ...
Read more >
How to pretrain BART using custom dataset(Not fine tuning!!)
You need to initialize a random model with the architecture of your choice. from transformers import BartConfig, BartModel configurationĀ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found