How to finetune with a new dataset?
See original GitHub issueHi, I am trying to finetune PRIMERA from huggingface using trainer, with a new dataset. However, i keep getting rouge scores of 0. May I know which part of the code is wrong?
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
import nltk
import numpy as np
TOKENIZER = AutoTokenizer.from_pretrained("allenai/PRIMERA")
MODEL = AutoModelForSeq2SeqLM.from_pretrained("allenai/PRIMERA")
import torch
MODEL.gradient_checkpointing_enable()
PAD_TOKEN_ID = TOKENIZER.pad_token_id
DOCSEP_TOKEN_ID = TOKENIZER.convert_tokens_to_ids("<doc-sep>")
from huggingface_hub import notebook_login
notebook_login()
here i load my own reformatted version of the multi_news dataset from huggingface - format is a (src,tgt) pair, where src is the related documents, tgt is the summary. its almost the same as the original multi_news dataset, just that i added a few more words at the front along with |||||.
train = load_dataset('cammy/multi_news_formatted_small', split='train[:100]', use_auth_token=True, cache_dir="D:")
valid = load_dataset('cammy/multi_news_formatted_small', split='valid[:10]', use_auth_token=True, cache_dir="D:")
test = load_dataset('cammy/multi_news_formatted_small', split='test[:10]', use_auth_token=True, cache_dir="D:")
then i do the preprocessing of data
then lastly:
trainer.train()
but these are the results:
Issue Analytics
- State:
- Created 2 years ago
- Comments:14
I did have the issue quite a while ago, and it has disappeared for me. There was a bug a while back where the
Seq2SeqTrainer
function was not taking into account theglobal_attention_mask
which may have been the problem? Might be worth updatingtransformers
to the latest version (if you haven’t already) and trying again.Hi @JohnGiorgi, thanks for your reply! However I am still having this problem of running your provided script (https://gist.github.com/JohnGiorgi/8c7dcabd3ee8a362b9174c5d145029ab) with the newest version of
transformers==4.21.0.dev0
. I used the following command to run (on a 8*32GB V100 EC2 instance):The evaluation results are:
Not sure what causes this problem but there must still be something wrong with the
generation
method in huggingface implementations. But anyway, thanks much for your script and it is really helpful.