question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fine-tune model with language model head

See original GitHub issue

Hi

I would like to fine-tune esm with a language model head. I tried

import torch
model = torch.hub.load("facebookresearch/esm", "modelWithLMHead", "esm1_t34_670M_UR50S")  

but I got RuntimeError: Cannot find callable modelWithLMHead in hubconf

Is there a simple way to do this? Thanks

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
joshim5commented, Nov 29, 2020

Hi @brucejwittmann, to quickly answer your questions:

  1. No, not for pretraining
  2. No You asked about <pad> tokens in the batch_converter, so I brought up ignore_index as a warning just in case you implement it in a way where masks could be introduced on <pad> tokens. However, the design plan you just described sounds great and ignore_index won’t be needed in that case.
0reactions
brucejwittmanncommented, Nov 29, 2020

Hi @joshim5 , thanks for your quick and helpful response! Just to clarify on the use of ignore_index: My understanding from your paper is that loss was calculated for predictions made for masked tokens only. Does this mean that <pad> tokens were sometimes the ones that were masked? I was planning to design my masking function such that it never masks a padding token (in other words, it knows the length of each given protein and just masks amino acid tokens). If I were to do that, my understanding is that ignore_index wouldn’t be needed as <pad> could never be a target. I suppose I have a few follow-up questions, then:

  1. Was loss calculated against more than the masked tokens in your original work?
  2. Were <pad> tokens masked in the original work? If so, is this because there is a downside to restricting padding to amino-acid tokens only?

Thanks again!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fine-tuning a masked language model - Hugging Face Course
However, there are a few cases where you'll want to first fine-tune the language models on your data, before training a task-specific head....
Read more >
How to Fine-Tune BERT Transformer Python
In this tutorial, we'll learn how to fine-tune a BERT transformer model using masked-language modeling (MLM) and next sentence prediction ...
Read more >
Fine-Tuning BERT with Masked Language Modeling -
In the final layer, a model head for MLM is stacked over the BERT core model and outputs the same number of tokens...
Read more >
Fine-tune transformer language models for linguistic diversity ...
In this post, we explored fine tuning pre-trained transformer-based language models for a question answering task for a mid-resource language ( ...
Read more >
Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub
In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found