question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fine-Tuning STS and semantic search multilingual HF transformer

See original GitHub issue

Hi,

For the need of exporting transformer model to ONNX format for inference, I use a multilingual sentence-transformer model based on the HF transformers library (separate tokenizer and model + mean pooling layer) for semantic text similarity and semantic search.

Is it possible to fine-tune this model with # https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/sts/training_stsbenchmark_continue_training.py despite not using the sentence-transformer library directly?

If not is there a workaround to enable to do this? (like fine-tune sentence-transformer model then split tokenizer/model before ONNX export)

Thanks!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
nreimerscommented, Oct 9, 2021

Correct.

Number of examples depends on how complex is your domain. With a simple and narrow domain 1k examples are helpful.

With a broad domain that spans over basically all topics (physics, math, sports, gaming, dating, programming,…) you need a lot more examples. If you don’t have examples for eg math, the model will not work that well for math queries.

0reactions
Matthieu-Tinycoachingcommented, Oct 8, 2021

That sounds great @nreimers! So, regular transformer model without mean pooling?

Could you give me an idea of how many minimum training examples would be needed to gain from fine-tuning pre-trained multilingual sentence-transformer for STS and semantic search?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training Sentence Transformers with MNR Loss - Pinecone
How to create sentence transformers by fine-tuning with MNR loss. ... We're going to use a semantic textual similarity (STS) dataset to test...
Read more >
Train and Fine-Tune Sentence Transformers Models
Training or fine-tuning a Sentence Transformers model highly ... Transformers require heavy computation to perform semantic search tasks.
Read more >
Fine-tune High Performance Sentence Transformers (with ...
Transformer -produced sentence embeddings have come a long way in a very short time. ... NLP for Semantic Search Course ...
Read more >
Semantic Textual Similarity - Sentence-Transformers
Semantic Textual Similarity (STS) assigns a score on the similarity of two texts. In this example, we use the STSbenchmark as training data...
Read more >
Measurement of Semantic Textual Similarity in Clinical Texts
Methods In this study, we explored 3 transformer-based models for clinical STS: Bidirectional Encoder Representations from Transformers (BERT), XLNet, and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found