Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exporting Fine tuned T5ForConditionalGeneration model to TF-Serving using ONNX

See original GitHub issue

Environment info

transformers version: 4.9.1
Platform: Linux-5.4.0-1049-gcp-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): 1.9.0 (False)
Tensorflow version (GPU?): 2.5.0 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

@patrickvonplaten, @patil-suraj

Information

Model I am using (Bert, XLNet …): T5

The problem arises when using:

my own modified scripts: I use a fine tuned version of t5 (fine tuned using Huggingfqce and PyTorch), trained on a custom dataset for summarization. Since PyTorch Serving is no longer an option because of unrelated reasons, I require TF-Serving for a production optimized setting. I’m using the ONNX pipeline detailed here : https://huggingface.co/transformers/serialization.html#converting-an-onnx-model-using-the-transformers-onnx-package, with necessary changes to the paths.

When I serve this model and do inference, it seems the model being loaded isn’t the fine tuned one, as it gives output of the following nature : In In In In In auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf auf

The tasks I am working on is:

an official GLUE/SQUaD task: Summarization

To reproduce

Steps to reproduce the behavior:

def load_ckp(checkpoint_fpath, model, optimizer):
    checkpoint = torch.load(checkpoint_fpath, map_location=torch.device('cpu'))
    model.load_state_dict(checkpoint['state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer'])
    return model, optimizer, checkpoint['epoch']

import os

tokenizer = T5Tokenizer.from_pretrained("t5-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base")
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'
model = model.to(device)
optimizer = torch.optim.Adam(params =  model.parameters(), lr=1e-4)

ckp_path = '/checkpoint_dir/checkpoint.pt'
model, optimizer, start_epoch = load_ckp(ckp_path, model, optimizer)

model.save_pretrained('fine-tuned') ##saves it to pytorch_model.bin format and config.json, which is needed for onnx
tokenizer.save_pretrained('fine-tuned')

tokenizer = T5Tokenizer.from_pretrained("fine-tuned")
model = T5ForConditionalGeneration.from_pretrained("fine-tuned")

==== COMMAND LINE====

python -m transformers.onnx --model=fine-tuned onnx/t5-tf-serving/

At this point, I get the following warning : Some weights of the model checkpoint at fine-tuned were not used when initializing T5Model: ['lm_head.weight'], but the process completes with the following message : All good, model saved at: onnx/t5-tf-serving/model.onnx.

Post this I use onnx-tf convert -i onnx/t5-tf-serving/model.onnx -o output.pb to get the corresponding Tensorflow SavedModel, and use standard docker based procedure for deploying it with TF-Serving.

Expected behavior

I’m able to serve the model using the proper request formats, but the outputs are way off, as shown above. I’m guessing it has to do with the warning message that was displayed when converting the pytorch model to onnx. Fwiw, I tested out normal inference on the .bin formatted pytorch model that was obtained using the model.save_pretrained('fine-tuned') function, and it was generating expected outputs.

Can you please suggest workarounds? @patrickvonplaten, @patil-suraj