Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Encode-Decode after training, generation gives the same results regardless of the input

See original GitHub issue

❓ Questions & Help

Hi, everyone. I need help with the encoding-decoding model. I’m trying to train the model to create a title for a small text.

I’m creating a basic Encode-Decode model with Bert

from transformers import EncoderDecoderModel, BertTokenizer
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased')

After training on my data, when generate I get the same results independent of the input data in model.eval () mode. If you convert model to train, then different results will be generated.

The code I use for training.

tokenized_texts = [tokenizer.tokenize(sent) for sent in train_sentences]
tokenized_gt = [tokenizer.tokenize(sent) for sent in train_gt]

input_ids = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_texts]
input_ids = pad_sequences(
    input_ids,
    maxlen=max_len_abstract,
    dtype="long",
    truncating="post",
    padding="post"
)
attention_masks = [[float(i>0) for i in seq] for seq in input_ids]
input_ids_decode = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_gt]
input_ids_decode = pad_sequences(
    input_ids_decode,
    maxlen=max_len_title,
    dtype="long",
    truncating="post",
    padding="post"
)

attention_masks_encode = [[float(i>0) for i in seq] for seq in input_ids]
attention_masks_decode = [[float(i>0) for i in seq] for seq in input_ids_decode]

input_ids = torch.tensor(input_ids)
input_ids_decode = torch.tensor(input_ids_decode)
attention_masks_encode = torch.tensor(attention_masks_encode)
attention_masks_decode = torch.tensor(attention_masks_decode)

train_data = TensorDataset(input_ids, input_ids_decode, attention_masks_encode, attention_masks_decode)
train_dataloader = DataLoader(train_data, sampler=RandomSampler(train_data), batch_size=4)

model.cuda()

param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.01},
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.0}
]

optimizer = AdamW(optimizer_grouped_parameters, lr=2e-5)

model.train()
train_loss_set = []
train_loss = 0
for i in range(4):
    for step, batch in enumerate(train_dataloader):
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_input_ids_de, b_attention_masks_encode, b_attention_masks_decode = batch
        optimizer.zero_grad()
        model.zero_grad()
        loss, outputs = model(input_ids=b_input_ids, decoder_input_ids=b_input_ids_de, lm_labels=b_input_ids_de)[:2]
        train_loss_set.append(loss.item())
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

        clear_output(True)
        plt.plot(train_loss_set)
        plt.title("Training loss")
        plt.xlabel("Batch")
        plt.ylabel("Loss")
        plt.show()
        if step != 0 and step % 20 == 0:
              torch.save(model.state_dict(), model_weigth)
    print(f'Epoch {i}')

Maybe I’m doing something wrong? I would be grateful for any advice.

Issue Analytics

State:
Created 3 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

2reactions

patrickvonplatencommented, Jul 17, 2020

Hey @HodorTheCoder,

Sorry for the late reply. I have been working on the encoder-decoder framework and verified that it works, but only on single GPU training.

This model + model card shows how to train a Bert2Bert model and how it should be used: https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16

Regarding your code, why do you do

bert2bert.module.generate(...)

instead of just doing

bert2bert.generate(...)

The encoder decoder inherits from PretrainedModel and thus has direct access to generate(...), see here: https://github.com/huggingface/transformers/blob/0b6c255a95368163d2b1d37635e5ce5bdd1b9423/src/transformers/modeling_encoder_decoder.py#L29 . Also no need to wrap everything into the torch.no_grad() context -> generate() is always in no_grad mode.

Hope this helps! I will be off for the next two weeks - if it’s urgent feel free to ping @sshleifer (hope it’s fine to ping you here Sam 😉 )

1reaction

iliemihaicommented, Jul 17, 2020

Thank you so much for your work @patrickvonplaten

Top Results From Across the Web

Encoder Decoder Model gives same generation results after ...

After finetuning only 1 update, the model starts to generate same results, for both trained samples and unseen validation samples, until the end ......

How to Develop an Encoder-Decoder Model for Sequence-to ...

The approach involves two recurrent neural networks, one to encode the source sequence, called the encoder, and a second to decode the encoded...

Transformers Explained Visually (Part 3): Multi-head Attention ...

Encoder-Decoder-attention in the Decoder — the target sequence pays attention to the input sequence. (Image by Author). Attention Input ...

Decoding Methods in Neural Language Generation: A Survey

Neural encoder-decoder models for language generation can be trained to predict words directly from linguistic or non-linguistic inputs.

Distributed information encoding and decoding using self ...

Modulating the pattern-generation and machine learning model training can tune the tradeoff between encoding capacity and security. We further ...