question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Encode-Decode after training, generation gives the same results regardless of the input

See original GitHub issue

❓ Questions & Help

Hi, everyone. I need help with the encoding-decoding model. I’m trying to train the model to create a title for a small text.

I’m creating a basic Encode-Decode model with Bert

from transformers import EncoderDecoderModel, BertTokenizer
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased')

After training on my data, when generate I get the same results independent of the input data in model.eval () mode. If you convert model to train, then different results will be generated.

The code I use for training.

tokenized_texts = [tokenizer.tokenize(sent) for sent in train_sentences]
tokenized_gt = [tokenizer.tokenize(sent) for sent in train_gt]

input_ids = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_texts]
input_ids = pad_sequences(
    input_ids,
    maxlen=max_len_abstract,
    dtype="long",
    truncating="post",
    padding="post"
)
attention_masks = [[float(i>0) for i in seq] for seq in input_ids]
input_ids_decode = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_gt]
input_ids_decode = pad_sequences(
    input_ids_decode,
    maxlen=max_len_title,
    dtype="long",
    truncating="post",
    padding="post"
)

attention_masks_encode = [[float(i>0) for i in seq] for seq in input_ids]
attention_masks_decode = [[float(i>0) for i in seq] for seq in input_ids_decode]

input_ids = torch.tensor(input_ids)
input_ids_decode = torch.tensor(input_ids_decode)
attention_masks_encode = torch.tensor(attention_masks_encode)
attention_masks_decode = torch.tensor(attention_masks_decode)

train_data = TensorDataset(input_ids, input_ids_decode, attention_masks_encode, attention_masks_decode)
train_dataloader = DataLoader(train_data, sampler=RandomSampler(train_data), batch_size=4)

model.cuda()

param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.01},
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.0}
]

optimizer = AdamW(optimizer_grouped_parameters, lr=2e-5)

model.train()
train_loss_set = []
train_loss = 0
for i in range(4):
    for step, batch in enumerate(train_dataloader):
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_input_ids_de, b_attention_masks_encode, b_attention_masks_decode = batch
        optimizer.zero_grad()
        model.zero_grad()
        loss, outputs = model(input_ids=b_input_ids, decoder_input_ids=b_input_ids_de, lm_labels=b_input_ids_de)[:2]
        train_loss_set.append(loss.item())
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

        clear_output(True)
        plt.plot(train_loss_set)
        plt.title("Training loss")
        plt.xlabel("Batch")
        plt.ylabel("Loss")
        plt.show()
        if step != 0 and step % 20 == 0:
              torch.save(model.state_dict(), model_weigth)
    print(f'Epoch {i}')

Maybe I’m doing something wrong? I would be grateful for any advice.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
patrickvonplatencommented, Jul 17, 2020

Hey @HodorTheCoder,

Sorry for the late reply. I have been working on the encoder-decoder framework and verified that it works, but only on single GPU training.

This model + model card shows how to train a Bert2Bert model and how it should be used: https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16

Regarding your code, why do you do

bert2bert.module.generate(...)

instead of just doing

bert2bert.generate(...)

?

The encoder decoder inherits from PretrainedModel and thus has direct access to generate(...), see here: https://github.com/huggingface/transformers/blob/0b6c255a95368163d2b1d37635e5ce5bdd1b9423/src/transformers/modeling_encoder_decoder.py#L29 . Also no need to wrap everything into the torch.no_grad() context -> generate() is always in no_grad mode.

Hope this helps! I will be off for the next two weeks - if it’s urgent feel free to ping @sshleifer (hope it’s fine to ping you here Sam 😉 )

1reaction
iliemihaicommented, Jul 17, 2020

Thank you so much for your work @patrickvonplaten

Read more comments on GitHub >

github_iconTop Results From Across the Web

Encoder Decoder Model gives same generation results after ...
After finetuning only 1 update, the model starts to generate same results, for both trained samples and unseen validation samples, until the end ......
Read more >
How to Develop an Encoder-Decoder Model for Sequence-to ...
The approach involves two recurrent neural networks, one to encode the source sequence, called the encoder, and a second to decode the encoded...
Read more >
Transformers Explained Visually (Part 3): Multi-head Attention ...
Encoder-Decoder-attention in the Decoder — the target sequence pays attention to the input sequence. (Image by Author). Attention Input ...
Read more >
Decoding Methods in Neural Language Generation: A Survey
Neural encoder-decoder models for language generation can be trained to predict words directly from linguistic or non-linguistic inputs.
Read more >
Distributed information encoding and decoding using self ...
Modulating the pattern-generation and machine learning model training can tune the tradeoff between encoding capacity and security. We further ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found