Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transformer generates unrelated sentences to the input

See original GitHub issue

Hello, I am trying to use the transformer in a sentence simplification dataset. Training seems to run without problems, but at generation time the hypotheses sentences do not make any sense. I was wondering if you could help me with figuring out what I am doing wrong.

I tried to follow this example that you provide for translation using the transformer.

1. Pre-processing: The dataset I am using for training contains aligned sentences such as this pair:

Original: In Holland they were called Stadspijpers , in Germany Stadtpfeifer and in Italy Pifferi . Simplified: They were called Stadtpfeifer in Germany and Pifferi in Italy .

Since the sentences in the dataset are already tokenised, for pre-processing I only lowercased all sentences and learned/applied BPE using the following script:

src=orig
tgt=simp
prep=data/wikilarge/prep
tmp=$prep/tmp
orig=data/wikilarge

mkdir -p $prep $tmp

for d in train dev test; do
    for l in $src $tgt; do
        perl $LC < $orig/wikilarge.$d.$l > $tmp/wikilarge.$d.low.$l
    done
done

TRAIN=$tmp/train.wikilarge
BPE_CODE=$prep/code
rm -f $TRAIN
for l in $src $tgt; do
    cat $tmp/wikilarge.train.low.$l >> $TRAIN
done

python $BPEROOT/learn_bpe.py -s $BPE_TOKENS < $TRAIN > $BPE_CODE

for L in $src $tgt; do
    for d in train dev test; do
        echo "apply_bpe.py to wikilarge.${d}.low.${L}..."
        python $BPEROOT/apply_bpe.py -c $BPE_CODE < $tmp/wikilarge.$d.low.$L > $prep/$d.$L
    done
done

Then I proceeded to binary the dataset:

TEXT=data/wikilarge/prep
fairseq-preprocess --source-lang orig --target-lang simp \
  --trainpref $TEXT/train --validpref $TEXT/dev --testpref $TEXT/test \
  --destdir data/wikilarge/bin/

2. Training For training, I used the same command as in the example provided. I am aware that I’d need to adapt the parameters to suit the dataset, but I thought it was a good starting point.

mkdir -p models/wikilarge/transformer/checkpoints/
CUDA_VISIBLE_DEVICES=0 fairseq-train data/wikilarge/bin \
  -a transformer --optimizer adam --lr 0.0005 -s orig -t simp \
  --label-smoothing 0.1 --dropout 0.3 --max-tokens 4000 \
  --min-lr '1e-09' --lr-scheduler inverse_sqrt --weight-decay 0.0001 \
  --criterion label_smoothed_cross_entropy --max-update 50000 \
  --warmup-updates 4000 --warmup-init-lr '1e-07' \
  --adam-betas '(0.9, 0.98)' --save-dir models/wikilarge/transformer/checkpoints/

3. Generation As in the example, I executed the following commands:

# Average 10 latest checkpoints:
python scripts/average_checkpoints.py --inputs models/wikilarge/transformer/checkpoints \
   --num-epoch-checkpoints 10 --output models/wikilarge/transformer/checkpoints/model.pt

# Generate
fairseq-generate data/wikilarge/bin \
  --path models/wikilarge/transformer/checkpoints/model.pt \
  --batch-size 128 --beam 5 --remove-bpe

Most output sentences I get are like this:

S-124   the two former presidents were later separately charged with mutiny and treason for their roles in the 1979 coup and the 1980 gwangju massacre .
T-124   the two former presidents were later charged , each on their own , with mutiny and treason for their roles in the 1979 coup and the 1980 gwangju massacre .
H-124   -1.1352218389511108     he was the first woman to win the tour de france .
P-124   -2.4326 -1.1815 -1.0359 -1.1694 -1.9666 -0.0793 -2.0569 -0.5309 -2.4636 -0.2983 -0.0907 -1.3463 -0.1060
S-258   a town may be correctly described as a market town or as having market rights even if it no longer holds a market , provided the right to do so still exists .
T-258   a town may be correctly identified by a market or as having market rights even if it no longer holds a market , provided the right to do so still exists .
H-258   -0.9187995195388794     this is a list of people who live in the city .
P-258   -3.2003 -0.9018 -1.4129 -0.1210 -0.0787 -1.7663 -0.2098 -1.9090 -0.2615 -0.8027 -0.7472 -0.4241 -0.1091

As can be seen, the generated H sentences make no sense as they are not related at all with the corresponding input.

Am I doing something wrong at training or generation time that causes this? Maybe I am not understanding the parameters properly?

I hope this is the right place to ask this type of question. Thank you.

Issue Analytics

State:
Created 5 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

feralvamcommented, Feb 20, 2019

I see. I’ll start changing the parameters and see what happens. Any suggestion on what try first would be welcome. I’m not very experience in this. This paper uses the transformer (tensor2tensor) for the same data, so I’ll try to use their same configuration in fairseq as a starting point.

1reaction

huihuifancommented, Feb 20, 2019

Skimming your provided code it looks alright. Does the model training look stable? Is the perplexity decreasing? If you decode on the training set instead of on test/valid, the model can produce the target sentences?

Top Results From Across the Web

Training Overview - sentence-transformers - GitHub

We feed the input sentence or text into a transformer network like BERT. BERT produces contextualized word embeddings for all input tokens in...

Training Overview — Sentence-Transformers documentation

We feed the input sentence or text into a transformer network like BERT. BERT produces contextualized word embeddings for all input tokens in...

Illustrated Guide to Transformers- Step by Step Explanation

We'll prime the model with our input, and the model will generate the rest. ... The lower scores will drown out the irrelevant...

Implementing Sentence Transformer - Kaggle

Sentence -Transformers is a state-of-the-art Python framework for embedding sentences, text and images. These embeddings can then be used for classification or ...

Sentence Embeddings using Siamese BERT-Networks

Sentence Transformers : Sentence -BERT - Sentence Embeddings using Siamese BERT-Networks |arXiv abstract similarity demo #NLProcIn this video ...