question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use multiple PreTrainedModel models in a custom model?

See original GitHub issue

Details

I am using the Trainer to train a custom model, like this:

class MyModel(nn.Module):
    def __init__(self,):
        super(MyModel, self).__init__()
        # I want the code to be clean so I load the pretrained model like this
        self.bert_layer_1 = transformers.AutoModel.from_pretrained("hfl/chinese-roberta-wwm-ext")
        self.bert_layer_2 = transformers.AutoModel.from_pretrained("bert-base-chinese")
        self.other_layers = ... # not important

    def forward(self,):
        pass # not important

When running trainer.save_model(), it will only save the model’s state, as the custom model is not a PreTrainedModel(as the terminal shown below).

Trainer.model is not a `PreTrainedModel`, only saving its state dict.

And when reloading the saved model on production, I need to initialize a new MyModel and load its states, which is not so convenient. I hope to load this model using transformers.AutoModel.from_pretrained('MODEL_PATH') like other PreTrainedModels.

I tried to change class MyModel(nn.Module) to class MyModel(PreTrainedModel), but the PreTrainedModel needs a PretrainedConfig when initialized. I don’t have one in the current implementation, I don’t know how to manage the config when using multiple PreTrainedModel models. I want to keep the self.bert_layer_1 and self.bert_layer_2 as simple as from_pretrained, not = BertModel(config).

Is there a way to do that?

Environment info

  • transformers version: 4.9.2
  • Platform: macOS / Ubuntu
  • Python version: 3.8.6
  • PyTorch version (GPU?): 1.8.1 (False) / (yes)
  • Tensorflow version (GPU?): 2.4.1 (False) / (yes)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: parallel

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
maxpelcommented, Oct 22, 2021

A model that is not inside the transformers library won’t work with the AutoModel API. To properly use the save/from pretrained methods, why not subclassing PreTrainedModel instead of nn.Module?

@sgugger Could you give an example on how to subclass PreTrainedModel? I would also like to integrate my model at https://huggingface.co/maxpe/twitter-roberta-base_semeval18_emodetection better with the transformer library:

def loss_fn(outputs, targets):
return torch.nn.BCEWithLogitsLoss()(outputs, targets)

  class RobertaClass(torch.nn.Module):
  
    def __init__(self):
        super(RobertaClass, self).__init__()
        self.l1 = AutoModel.from_pretrained("cardiffnlp/twitter-roberta-base",return_dict=False)
        self.l2 = torch.nn.Dropout(0.3)
        self.l3 = torch.nn.Linear(768, 11)
        
    def forward(self, input_ids, attention_mask,labels):
        _, output_1= self.l1(input_ids=input_ids, attention_mask=attention_mask)
        output_2 = self.l2(output_1)
        output = self.l3(output_2)
        
        return (loss_fn(labels.float(),output),output)

model=RobertaClass()

model.train()

...

model=RobertaClass()

model.load_state_dict(torch.load(path))

model.eval()

My attempt with PyTorchModelHubMixin didn’t work well.

1reaction
sguggercommented, Sep 3, 2021

A model that is not inside the transformers library won’t work with the AutoModel API. To properly use the save/from pretrained methods, why not subclassing PreTrainedModel instead of nn.Module?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sharing custom models - Hugging Face
We will illustrate all of this on a ResNet model, by wrapping the ResNet class of the timm library into a PreTrainedModel. Writing...
Read more >
Adding Custom Layers on Top of a Hugging Face Model
We saw how one can add custom layers to a pre-trained model's body using the Hugging Face Hub. Some takeaways: This technique is...
Read more >
Tutorial 2- Fine Tuning Pretrained Model On Custom Dataset ...
github: https://github.com/krishnaik06/Huggingfacetransformer In this tutorial, we will show you how to fine-tune a pretrained model from ...
Read more >
Combining a Pre-trained Model with a Custom Model in TF
Even if I try to switch between these models by giving the InceptionV3 output as an input to my custom network I got...
Read more >
Save and load models | TensorFlow Core
Setup. Installs and imports; Get an example dataset; Define a model ; Save checkpoints during training. Checkpoint callback usage; Checkpoint ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found