Confusion about MultipleNegativesRankingLoss
See original GitHub issueFirst of all, thank you for your work.
I have been using sentence-transformers in recent days, but I am a bit confused about the MultipleNegativesRankingLoss.
If i use like below:
from sentence_transformers import SentenceTransformer, SentencesDataset, LoggingHandler, losses
from sentence_transformers.readers import InputExample
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
train_examples = [InputExample(texts=['Anchor 1', 'Positive 1']),
InputExample(texts=['Anchor 2', 'Positive 2'])]
train_dataset = SentencesDataset(train_examples, model)
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=train_batch_size)
train_loss = losses.MultipleNegativesRankingLoss(model=model)
model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100)
Does it mean that train_examples are all positive examples? Will negative examples be constructed automatically by sentence_transformers?
Or, train_examples should also contain negative examples. Should I build them manually?
Thank you!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
@nreimers Hey, I think it would be very helpful if you could provide at least a rough minimal example of using a dataset for MUltipleNegativeLoss structuring the batch right with having positive and negatives right, I think. this loss is not applicable very well besides for very very large amount of data like in msmarco example. But in most situations you have to take care that all other positives are not negatives.
It is still a bit confusing. And I think an example would be great.
Okay, I understand. Thanks for your help! @nreimers @datistiquo