question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not clear which layers are being trained

See original GitHub issue

In your paper, you state “The purpose of SBERT sentence embeddings are not to be used for transfer learning for other tasks. Here, we think fine-tuning BERT as described by Devlin et al. (2018) for new tasks is the more suitable method, as it updates all layers of the BERT network.”

The above is implying that the SentenceTransformer.fit() method does not update all layers of BERT (or XLNet, etc.) . However, there is no code in this repo that partially freezes any layers when training or anything like that. Could you kindly clarify how you’re controlling which layers are being trained (other than the parameters in the loss modules, which are clearly being trained) ? Sorry if this is a stupid question.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
nreimerscommented, Feb 22, 2020

@peacej The statement was in reference to sentence classification tasks, like sentiment classification. These tasks are evaluated for example with SentEval.

There (in SentEval), the assumption is: Sentencen —Fixed sentence embedding method —> Vector — Log reg. --> Label

In that SentEval scenario, only the logistic regression classifier is updated. The sentence embedding method is fixed.

If you have a sentence classification task, like sentiment classification, I think this setup doesn’t make sense. There, it is better to fine-tune bert: Sentence ---- BERT —> Label

Sentence embedding methods make sense for unsupervised tasks and tasks where you use cosine similarity / manhatten- / euclidian-distance for retrieval etc.

I hope this makes it a bit more clear.

Best Nils Reimers

1reaction
kevinmandichcommented, Feb 20, 2020

@peacej I have a similar situation with a specialized domain. What I do is a two-piece training process:

1 - Finetune BERT in an unsupervised fashion on my domain text (via Huggingface’s transformers repo) 2 - Fit a SentenceTransformer on labeled pairs, using the weights from 1 as the base model.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Visualizing the "fit, unfreeze, fit again" process. Which layers ...
I'm going through Lessons 2 and 3. Having trouble visualizing what goes on during the “fit, unfreeze, fit again” process. Which layers are...
Read more >
Transfer learning and fine-tuning | TensorFlow Core
Train the new layers on your dataset. A last, optional step, is fine-tuning, which consists of unfreezing the entire model you obtained ...
Read more >
How to fix some layers for transfer learning? #1706 - GitHub
One thing I tried in your Colab is to freeze all layers except the last one, then run a single training step in...
Read more >
DL Assignment 2 – Weights & Biases - WandB
The goal of this assignment is threefold: (i) train a CNN model from scratch ... layer that is training only last layer and...
Read more >
How the pytorch freeze network in some layers, only the rest of ...
Freezing intermediate layers while training top and bottom layers ... The basic idea is that all models have a function model.children() ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found