Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fine-tune model with language model head

See original GitHub issue

I would like to fine-tune esm with a language model head. I tried

import torch
model = torch.hub.load("facebookresearch/esm", "modelWithLMHead", "esm1_t34_670M_UR50S")

but I got RuntimeError: Cannot find callable modelWithLMHead in hubconf

Is there a simple way to do this? Thanks

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

joshim5commented, Nov 29, 2020

Hi @brucejwittmann, to quickly answer your questions:

No, not for pretraining
No You asked about <pad> tokens in the batch_converter, so I brought up ignore_index as a warning just in case you implement it in a way where masks could be introduced on <pad> tokens. However, the design plan you just described sounds great and ignore_index won’t be needed in that case.

0reactions

brucejwittmanncommented, Nov 29, 2020

Hi @joshim5 , thanks for your quick and helpful response! Just to clarify on the use of ignore_index: My understanding from your paper is that loss was calculated for predictions made for masked tokens only. Does this mean that <pad> tokens were sometimes the ones that were masked? I was planning to design my masking function such that it never masks a padding token (in other words, it knows the length of each given protein and just masks amino acid tokens). If I were to do that, my understanding is that ignore_index wouldn’t be needed as <pad> could never be a target. I suppose I have a few follow-up questions, then:

Was loss calculated against more than the masked tokens in your original work?
Were <pad> tokens masked in the original work? If so, is this because there is a downside to restricting padding to amino-acid tokens only?

Thanks again!

Top Results From Across the Web

Fine-tuning a masked language model - Hugging Face Course

However, there are a few cases where you'll want to first fine-tune the language models on your data, before training a task-specific head....

How to Fine-Tune BERT Transformer Python

In this tutorial, we'll learn how to fine-tune a BERT transformer model using masked-language modeling (MLM) and next sentence prediction ...

Fine-Tuning BERT with Masked Language Modeling -

In the final layer, a model head for MLM is stacked over the BERT core model and outputs the same number of tokens...

Fine-tune transformer language models for linguistic diversity ...

In this post, we explored fine tuning pre-trained transformer-based language models for a question answering task for a mid-resource language ( ...

Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub

In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling...