fine-tune model with language model head
See original GitHub issueHi
I would like to fine-tune esm with a language model head. I tried
import torch
model = torch.hub.load("facebookresearch/esm", "modelWithLMHead", "esm1_t34_670M_UR50S")
but I got RuntimeError: Cannot find callable modelWithLMHead in hubconf
Is there a simple way to do this? Thanks
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Fine-tuning a masked language model - Hugging Face Course
However, there are a few cases where you'll want to first fine-tune the language models on your data, before training a task-specific head....
Read more >How to Fine-Tune BERT Transformer Python
In this tutorial, we'll learn how to fine-tune a BERT transformer model using masked-language modeling (MLM) and next sentence prediction ...
Read more >Fine-Tuning BERT with Masked Language Modeling -
In the final layer, a model head for MLM is stacked over the BERT core model and outputs the same number of tokens...
Read more >Fine-tune transformer language models for linguistic diversity ...
In this post, we explored fine tuning pre-trained transformer-based language models for a question answering task for a mid-resource language ( ...
Read more >Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub
In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @brucejwittmann, to quickly answer your questions:
<pad>
tokens in thebatch_converter
, so I brought upignore_index
as a warning just in case you implement it in a way where masks could be introduced on<pad>
tokens. However, the design plan you just described sounds great andignore_index
won’t be needed in that case.Hi @joshim5 , thanks for your quick and helpful response! Just to clarify on the use of
ignore_index
: My understanding from your paper is that loss was calculated for predictions made for masked tokens only. Does this mean that<pad>
tokens were sometimes the ones that were masked? I was planning to design my masking function such that it never masks a padding token (in other words, it knows the length of each given protein and just masks amino acid tokens). If I were to do that, my understanding is thatignore_index
wouldn’t be needed as<pad>
could never be a target. I suppose I have a few follow-up questions, then:<pad>
tokens masked in the original work? If so, is this because there is a downside to restricting padding to amino-acid tokens only?Thanks again!