Question about training with BatchHardTripletLoss
See original GitHub issueMaybe it is a naive questions (as I am not native to pyTorch).
When training as an example shown here (to the above loss mentioned):
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
train_examples = [InputExample(texts=['Sentence from class 0'], label=0), InputExample(texts=['Another sentence from class 0'], label=0),
InputExample(texts=['Sentence from class 1'], label=1), InputExample(texts=['Sentence from class 2'], label=2)]
train_dataset = SentencesDataset(train_examples, model)
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=train_batch_size)
train_loss = losses.BatchSemiHardTripletLoss(model=model)
how is there a siamese model trained where I have two inputs? Because you are using a SentenceTransaformer (which maps single input to output). Also in your bi-encoder example you build a sentence transformer from scratch. I just wonder how training in a siamese manner happens?
In my understanding SentenceTranformer is a siamese bi-encoder (like in your paper).
Otherwise in your Quora example: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/quora_duplicate_questions/training_multi-task-learning.py
also SentenceTransformer model is trained and gets the 2 inputs for sentences pairs. I wonder where and when the model “knows” how to fit depending on the number of inputs? I feel I miss something. When is “siamese” model trained and when a “single” model with 1 input?
Issue Analytics
- State:
- Created 3 years ago
- Comments:23 (9 by maintainers)
@datistiquo All the losses use single networks / models: The inputs are passed through the same network (same object) for all the cases.
The only differences are how the losses are computed. And how to compute the loss depends on your available training data and the properties these have. So based on what labeled data you have, you have to choose the right loss.
BatchHard generates the triplets online (as described in the above blog post). So no need to generate triplets yourself, the loss will look into the batch and create all possible triplets from it.
For evaluation, however, we want to see how well it works for specific triplets. So we create some fixed triplets and evaluate the model on it.
https://stackoverflow.com/questions/11218477/how-can-i-use-pickle-to-save-a-dict
Works also with any other data type in python