Fine-tuning tips with loss functions & evaluators
See original GitHub issueHi,
Before my question, i’d like to thank you for open sourcing your awesome work to the community.
Context:
I’m working on continue-training on the SentenceTransformer ('bert-base-nli-mean-tokens'
) with my customized data. I am following the sample training_stsbenchmark_continue_training.py
you provided. And I use my own data to construct a NLI version training data. That {(s1, s2), label}
, where the labels were mapped to {"entailment": 1, "neutral": 2}
, i do not have the {"contradiction": 0}
case.
I looked at the tutorial script for continue training. You took the STS data in the example, which the label is not the same as the classification labels with NLI. The loss is cosinesimilarityloss
, and
evaluator is EmbeddingSimilarityEvaluator
.
Question:
Is it possible to continue train the 'bert-base-nli-mean-tokens'
model with NLI style training data? If so, for the classification task, which loss function and evaluator would you recommend here? My customized training data has about 500000 training instances, for continue-training, what many epochs is good?
Thank you in advance.
Best, Hetian
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
You have to pass the SoftMax Loss model to the evaluator:
This should work
Hi Hetian, for NLI style training data have a look at the training_nli.py example. There, you just need to change how the model is constructed. There are two ways how you can build a sentence embedding model.
Option 1: Take the different models and stick them together. I.e., you start with a BERT / Transformer model and then add a Pooling layer. This is done in training_nli.py
Option 2: You take an already build sentence transformer model. This model is loaded via SentenceTransformer(‘bert-base-nli-mean-tokens’). In the background, it downloads the fine-tuned BERT model and the config for the pooling layer and loads it as in Option 1.
To continue training, you just have to change the training_nli.py such that instead of creating your model from scratch from BERT, you just load the model with: model = SentenceTransformer(‘bert-base-nli-mean-tokens’)