Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use multiple PreTrainedModel models in a custom model?

See original GitHub issue

Details

I am using the Trainer to train a custom model, like this:

class MyModel(nn.Module):
    def __init__(self,):
        super(MyModel, self).__init__()
        # I want the code to be clean so I load the pretrained model like this
        self.bert_layer_1 = transformers.AutoModel.from_pretrained("hfl/chinese-roberta-wwm-ext")
        self.bert_layer_2 = transformers.AutoModel.from_pretrained("bert-base-chinese")
        self.other_layers = ... # not important

    def forward(self,):
        pass # not important

When running trainer.save_model(), it will only save the model’s state, as the custom model is not a PreTrainedModel(as the terminal shown below).

Trainer.model is not a `PreTrainedModel`, only saving its state dict.

And when reloading the saved model on production, I need to initialize a new MyModel and load its states, which is not so convenient. I hope to load this model using transformers.AutoModel.from_pretrained('MODEL_PATH') like other PreTrainedModels.

I tried to change class MyModel(nn.Module) to class MyModel(PreTrainedModel), but the PreTrainedModel needs a PretrainedConfig when initialized. I don’t have one in the current implementation, I don’t know how to manage the config when using multiple PreTrainedModel models. I want to keep the self.bert_layer_1 and self.bert_layer_2 as simple as from_pretrained, not = BertModel(config).

Is there a way to do that?

Environment info

transformers version: 4.9.2
Platform: macOS / Ubuntu
Python version: 3.8.6
PyTorch version (GPU?): 1.8.1 (False) / (yes)
Tensorflow version (GPU?): 2.4.1 (False) / (yes)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: parallel

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

maxpelcommented, Oct 22, 2021

A model that is not inside the transformers library won’t work with the AutoModel API. To properly use the save/from pretrained methods, why not subclassing PreTrainedModel instead of nn.Module?

@sgugger Could you give an example on how to subclass PreTrainedModel? I would also like to integrate my model at https://huggingface.co/maxpe/twitter-roberta-base_semeval18_emodetection better with the transformer library:

def loss_fn(outputs, targets):
return torch.nn.BCEWithLogitsLoss()(outputs, targets)

  class RobertaClass(torch.nn.Module):
  
    def __init__(self):
        super(RobertaClass, self).__init__()
        self.l1 = AutoModel.from_pretrained("cardiffnlp/twitter-roberta-base",return_dict=False)
        self.l2 = torch.nn.Dropout(0.3)
        self.l3 = torch.nn.Linear(768, 11)
        
    def forward(self, input_ids, attention_mask,labels):
        _, output_1= self.l1(input_ids=input_ids, attention_mask=attention_mask)
        output_2 = self.l2(output_1)
        output = self.l3(output_2)
        
        return (loss_fn(labels.float(),output),output)

model=RobertaClass()

model.train()

...

model=RobertaClass()

model.load_state_dict(torch.load(path))

model.eval()

My attempt with PyTorchModelHubMixin didn’t work well.

1reaction

sguggercommented, Sep 3, 2021

A model that is not inside the transformers library won’t work with the AutoModel API. To properly use the save/from pretrained methods, why not subclassing PreTrainedModel instead of nn.Module?