Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not clear which layers are being trained

See original GitHub issue

In your paper, you state “The purpose of SBERT sentence embeddings are not to be used for transfer learning for other tasks. Here, we think fine-tuning BERT as described by Devlin et al. (2018) for new tasks is the more suitable method, as it updates all layers of the BERT network.”

The above is implying that the SentenceTransformer.fit() method does not update all layers of BERT (or XLNet, etc.) . However, there is no code in this repo that partially freezes any layers when training or anything like that. Could you kindly clarify how you’re controlling which layers are being trained (other than the parameters in the loss modules, which are clearly being trained) ? Sorry if this is a stupid question.

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

nreimerscommented, Feb 22, 2020

@peacej The statement was in reference to sentence classification tasks, like sentiment classification. These tasks are evaluated for example with SentEval.

There (in SentEval), the assumption is: Sentencen —Fixed sentence embedding method —> Vector — Log reg. --> Label

In that SentEval scenario, only the logistic regression classifier is updated. The sentence embedding method is fixed.

If you have a sentence classification task, like sentiment classification, I think this setup doesn’t make sense. There, it is better to fine-tune bert: Sentence ---- BERT —> Label

Sentence embedding methods make sense for unsupervised tasks and tasks where you use cosine similarity / manhatten- / euclidian-distance for retrieval etc.

I hope this makes it a bit more clear.

Best Nils Reimers

1reaction

kevinmandichcommented, Feb 20, 2020

@peacej I have a similar situation with a specialized domain. What I do is a two-piece training process:

1 - Finetune BERT in an unsupervised fashion on my domain text (via Huggingface’s transformers repo) 2 - Fit a SentenceTransformer on labeled pairs, using the weights from 1 as the base model.