question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SavedModel load sees different input tensor shape than exists in the model

See original GitHub issue

An attempt to load a SavedModel-formatted ResNet50 model - which was converted from a Keras HDF5 file - fails because TRTIS v1.8.0 running from the NGC-provided Docker image, is seeing a different shape for the input tensor than is specified in the config.pbtxt, which is based on what’s shown by the saved_model_cli probe of the model, as well as the original Keras model information.

The saved_model_cli command shows the model’s input tensor shape as [-1, -1, -1, 3] as you can see here:

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['resnet50_input:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, -1, -1, 3)
        name: resnet50_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_2/Softmax:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5)
        name: dense_2/Softmax:0
  Method name is: tensorflow/serving/predict

I set up the config.pbtxt accordingly:

input [
  {
    name: "resnet50_input:0"
    data_type: TYPE_FP32
    dims: [ -1, -1, -1, 3 ]
  }
]

However, when TRTIS attempts to load this model, it gets the wrong shape for the input tensor, namely [ -1, 3 ], and sees this as a conflict with what’s specified in the config.pbtxt:

model_repository_manager.cc:810] failed to load ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’ version 1: Invalid argument: unable to load model ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’, tensor ‘resnet50_input:0’ shape expected by framework [-1,3] doesn’t match model configuration shape [-1,-1,-1,3]

Curiously, when I set the config.pbtxt to use the input dimensions that the above error shows, I get the same kind of error:

model_repository_manager.cc:810] failed to load ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’ version 1: Invalid argument: unable to load model ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’, tensor ‘resnet50_input:0’ shape expected by framework [-1,3] doesn’t match model configuration shape [-1,3]

The logical inconsistency in this error message is what prompted me to open an issue.

When I move the config.pbtxt file out of the way, and run with strict-model-config turned off, it is able to generate a config and load the model successfully, and shows the proper dimensions in the api/status call:

      max_batch_size: 1
      input {
        name: "resnet50_input:0"
        data_type: TYPE_FP32
        dims: -1
        dims: -1
        dims: 3
      }

However, as described by this passage in the documentation:

Generated Model Configuration Some TensorFlow SavedModel models do not require a model configuration file. The models must specify all inputs and outputs as fixed-size tensors (with an optional initial batch dimension) for the model configuration to be generated automatically.

Predictably, an attempt to use this model comes back with the error “variable-size dimension in model input not supported” and so the model can’t be used.

I’d appreciate any suggestions, and would be happy to provide whatever information I’m able.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
mvpelcommented, Jan 8, 2020

The issue we run into when we specify the dimensions of the input tensor without changing the model is this:

509191 model_repository_manager.cc:832] failed to load ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’ version 1: Invalid argument: unable to load model ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’, tensor ‘resnet50_input:0’ shape expected by framework [-1,3] doesn’t match model configuration shape [-1,494,648,3]

So that’s related to #977 it would appear, so perhaps the problem is that I’m specifying a batch dimension in the config.pbtxt when I shouldn’t? Yep, that appears to be it:

510471 model_repository_manager.cc:832] failed to load ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’ version 1: Invalid argument: unable to load model ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’, tensor ‘dense_2/Softmax:0’ shape expected by framework [5] doesn’t match model configuration shape [-1,5]

And fixing the config.pbtxt to omit the batch size dimension…

511047 model_repository_manager.cc:829] successfully loaded ‘TR46_ResNet50_FFS_300ep0.6train_5classes_noshuffle’ version 1

Perhaps a bit of additional documentation tweaks in TensorRT could be done to help those of us who are comparatively new to this world. Since the Inference Server entails some firewall and service-maintenance elements, I’m coming at this from an IT department perspective and thus had a somewhat limited understanding of the nuances of batch sizes and tensor dimensions.

Since saved_model_cli showed “shape: (-1, -1, -1, 3)” I thought that was what should go in the config.pbtxt. Guidance in the TRTIS docs on how to determine the input and output tensor names and dimensions of a model, and clarification that the config.pbtxt can specify fixed dimensions even if the model has dynamic dimensions, would have reduced the slope of my learning curve.

Thanks again!

1reaction
deadeyegoodwincommented, Jan 7, 2020

The error you are getting from image_client.py is because your model allows variable size images and so image_client doesn’t know how to resize your input image. We have an open ticket to add an argument to image-client to allow you to explicitly give a size for this case (or perhaps we could just use the image in its native size), but for now this is not supported. So to use image_client you need to change the model config to have an exact size for the input tensor (no need to change the actual model… it is ok for the model to accept variable-size and the model configuration to specific an exact size).

Read more comments on GitHub >

github_iconTop Results From Across the Web

SavedModel load sees different input tensor shape than exists ...
An attempt to load a SavedModel-formatted ResNet50 model - which was converted from a Keras HDF5 file - fails because TRTIS v1.8.0 running ......
Read more >
Using the SavedModel format | TensorFlow Core
You can save and load a model in the SavedModel format using the following APIs: ... Only existing signatures are [((TensorSpec(shape=(), dtype=tf.float32, ...
Read more >
TensorFlow/Keras model errors out when trying to predict ...
The solution is to define an input shape, which already exists in your code: if dimension >= 256: model.add(Conv2D(256, kernel_size=(5, 5), ...
Read more >
Understanding inputs and outputs for explanation | AI Platform ...
Finding input and output tensors​​ After training a TensorFlow model, export it as a SavedModel. The TensorFlow SavedModel contains your trained TensorFlow model...
Read more >
A quick complete tutorial to save and restore Tensorflow models
meta extension. b) Checkpoint file: This is a binary file which contains all the values of the weights, biases, gradients and all the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found