Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fine-tuning tips with loss functions & evaluators

See original GitHub issue

Hi,

Before my question, i’d like to thank you for open sourcing your awesome work to the community.

Context: I’m working on continue-training on the SentenceTransformer ('bert-base-nli-mean-tokens') with my customized data. I am following the sample training_stsbenchmark_continue_training.py you provided. And I use my own data to construct a NLI version training data. That {(s1, s2), label}, where the labels were mapped to {"entailment": 1, "neutral": 2}, i do not have the {"contradiction": 0} case.

I looked at the tutorial script for continue training. You took the STS data in the example, which the label is not the same as the classification labels with NLI. The loss is cosinesimilarityloss, and evaluator is EmbeddingSimilarityEvaluator.

Question: Is it possible to continue train the 'bert-base-nli-mean-tokens' model with NLI style training data? If so, for the classification task, which loss function and evaluator would you recommend here? My customized training data has about 500000 training instances, for continue-training, what many epochs is good?

Thank you in advance.

Best, Hetian

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

nreimerscommented, Apr 30, 2020

You have to pass the SoftMax Loss model to the evaluator:

evaluator = LabelAccuracyEvaluator(dev_dataloader, softmax_model=train_loss)

This should work

1reaction

nreimerscommented, Apr 29, 2020

Hi Hetian, for NLI style training data have a look at the training_nli.py example. There, you just need to change how the model is constructed. There are two ways how you can build a sentence embedding model.

Option 1: Take the different models and stick them together. I.e., you start with a BERT / Transformer model and then add a Pooling layer. This is done in training_nli.py

Option 2: You take an already build sentence transformer model. This model is loaded via SentenceTransformer(‘bert-base-nli-mean-tokens’). In the background, it downloads the fine-tuned BERT model and the config for the pooling layer and loads it as in Option 1.

To continue training, you just have to change the training_nli.py such that instead of creating your model from scratch from BERT, you just load the model with: model = SentenceTransformer(‘bert-base-nli-mean-tokens’)