Encode-Decode after training, generation gives the same results regardless of the input
See original GitHub issue❓ Questions & Help
Hi, everyone. I need help with the encoding-decoding model. I’m trying to train the model to create a title for a small text.
I’m creating a basic Encode-Decode model with Bert
from transformers import EncoderDecoderModel, BertTokenizer
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased')
After training on my data, when generate I get the same results independent of the input data in model.eval () mode. If you convert model to train, then different results will be generated.
The code I use for training.
tokenized_texts = [tokenizer.tokenize(sent) for sent in train_sentences]
tokenized_gt = [tokenizer.tokenize(sent) for sent in train_gt]
input_ids = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_texts]
input_ids = pad_sequences(
input_ids,
maxlen=max_len_abstract,
dtype="long",
truncating="post",
padding="post"
)
attention_masks = [[float(i>0) for i in seq] for seq in input_ids]
input_ids_decode = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_gt]
input_ids_decode = pad_sequences(
input_ids_decode,
maxlen=max_len_title,
dtype="long",
truncating="post",
padding="post"
)
attention_masks_encode = [[float(i>0) for i in seq] for seq in input_ids]
attention_masks_decode = [[float(i>0) for i in seq] for seq in input_ids_decode]
input_ids = torch.tensor(input_ids)
input_ids_decode = torch.tensor(input_ids_decode)
attention_masks_encode = torch.tensor(attention_masks_encode)
attention_masks_decode = torch.tensor(attention_masks_decode)
train_data = TensorDataset(input_ids, input_ids_decode, attention_masks_encode, attention_masks_decode)
train_dataloader = DataLoader(train_data, sampler=RandomSampler(train_data), batch_size=4)
model.cuda()
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
'weight_decay_rate': 0.01},
{'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
'weight_decay_rate': 0.0}
]
optimizer = AdamW(optimizer_grouped_parameters, lr=2e-5)
model.train()
train_loss_set = []
train_loss = 0
for i in range(4):
for step, batch in enumerate(train_dataloader):
batch = tuple(t.to(device) for t in batch)
b_input_ids, b_input_ids_de, b_attention_masks_encode, b_attention_masks_decode = batch
optimizer.zero_grad()
model.zero_grad()
loss, outputs = model(input_ids=b_input_ids, decoder_input_ids=b_input_ids_de, lm_labels=b_input_ids_de)[:2]
train_loss_set.append(loss.item())
loss.backward()
optimizer.step()
train_loss += loss.item()
clear_output(True)
plt.plot(train_loss_set)
plt.title("Training loss")
plt.xlabel("Batch")
plt.ylabel("Loss")
plt.show()
if step != 0 and step % 20 == 0:
torch.save(model.state_dict(), model_weigth)
print(f'Epoch {i}')
Maybe I’m doing something wrong? I would be grateful for any advice.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (5 by maintainers)
Top Results From Across the Web
Encoder Decoder Model gives same generation results after ...
After finetuning only 1 update, the model starts to generate same results, for both trained samples and unseen validation samples, until the end ......
Read more >How to Develop an Encoder-Decoder Model for Sequence-to ...
The approach involves two recurrent neural networks, one to encode the source sequence, called the encoder, and a second to decode the encoded...
Read more >Transformers Explained Visually (Part 3): Multi-head Attention ...
Encoder-Decoder-attention in the Decoder — the target sequence pays attention to the input sequence. (Image by Author). Attention Input ...
Read more >Decoding Methods in Neural Language Generation: A Survey
Neural encoder-decoder models for language generation can be trained to predict words directly from linguistic or non-linguistic inputs.
Read more >Distributed information encoding and decoding using self ...
Modulating the pattern-generation and machine learning model training can tune the tradeoff between encoding capacity and security. We further ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @HodorTheCoder,
Sorry for the late reply. I have been working on the encoder-decoder framework and verified that it works, but only on single GPU training.
This model + model card shows how to train a Bert2Bert model and how it should be used: https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16
Regarding your code, why do you do
instead of just doing
?
The encoder decoder inherits from
PretrainedModel
and thus has direct access togenerate(...)
, see here: https://github.com/huggingface/transformers/blob/0b6c255a95368163d2b1d37635e5ce5bdd1b9423/src/transformers/modeling_encoder_decoder.py#L29 . Also no need to wrap everything into thetorch.no_grad()
context ->generate()
is always inno_grad
mode.Hope this helps! I will be off for the next two weeks - if it’s urgent feel free to ping @sshleifer (hope it’s fine to ping you here Sam 😉 )
Thank you so much for your work @patrickvonplaten