Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Device Auto Reallocaton not working as expected

See original GitHub issue

Description I have lots of classification model in tensorflow SavedModel format that runs perfectly on GPU version of tensorflow/serving. The label look up defined in graph is CPU-only. And the triton server failed to re allocate these op from GPU to CPU with --strict-model-config=true --backend-config=tensorflow,allow-soft-placement=true specified in this specific version Here’s the logs from triton-model-server. Please refer to #3344 as well for test result from other users. Triton Information What version of Triton are you using? nvcr.io/nvidia/tritonserver:21.08-py3 Are you using the Triton container or did you build it yourself? Container To Reproduce server options --strict-model-config=true --backend-config=tensorflow,allow-soft-placement=true config.pbtxt

platform: "tensorflow_savedmodel"
backend: "tensorflow"
max_batch_size: 2
input: [
  {
  name: "input_ids"
  data_type: TYPE_INT64
  dims: [ 512 ]
  allow_ragged_batch: false
  },
  {
  name: "input_mask"
  data_type: TYPE_INT64
  dims: [ 512 ]
  reshape: { shape: [ ] }
  },
  {
  name: "segment_ids"
  data_type: TYPE_INT64
  dims: [ 512 ]
  reshape: { shape: [ ] }
  }
]
output [
  {
  name: "cls_embedding"
  data_type: TYPE_FP32
  dims: [ 768 ]
  }
]

batch_input []
batch_output []
instance_group: [
  {
  kind: KIND_MODEL
  }
]

Logs

2021-09-11 10:11:47.932340: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
LookupTableFindV2: CPU 
LookupTableImportV2: CPU 
HashTableV2: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  index_to_string (HashTableV2) /gpu:0
  index_to_string/table_init (LookupTableImportV2) /gpu:0
  index_to_string_Lookup (LookupTableFindV2) /gpu:0

2021-09-11 10:11:47.933076: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-09-11 10:11:47.975969: I tensorflow/cc/saved_model/loader.cc:200] Running initialization op on SavedModel bundle at path: /home/model-repo/ner/1/model.savedmodel
2021-09-11 10:11:47.993625: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 4368833 microseconds.
2021-09-11 10:11:47.993684: W triton/tensorflow_backend_tf.cc:986] unable to find serving signature 'serving_default
2021-09-11 10:11:47.993693: W triton/tensorflow_backend_tf.cc:988] using signature 'predict'
I0911 02:11:47.993920 1 model_repository_manager.cc:1212] successfully loaded 'ner' version 1
I0911 02:11:47.994044 1 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0911 02:11:47.994199 1 server.cc:543] 
+-------------+-----------------------------------------------------------------+---------------------------------------------+
| Backend     | Path                                                            | Config                                      |
+-------------+-----------------------------------------------------------------+---------------------------------------------+
| tensorrt    | <built-in>                                                      | {}                                          |
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}                                          |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {"cmdline":{"allow-soft-placement":"true"}} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}                                          |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}                                          |
+-------------+-----------------------------------------------------------------+---------------------------------------------+

I0911 02:11:47.994222 1 server.cc:586] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| ner   | 1       | READY  |
+-------+---------+--------+

I0911 02:11:47.994306 1 tritonserver.cc:1718] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.13.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /home/model-repo                                                                                                                                                                       |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                               |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0911 02:11:47.994314 1 server.cc:234] Waiting for in-flight requests to complete.
I0911 02:11:47.994318 1 model_repository_manager.cc:1078] unloading: ner:1
I0911 02:11:47.994357 1 server.cc:249] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0911 02:11:47.994527 1 tensorflow.cc:2356] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0911 02:11:47.994569 1 tensorflow.cc:2295] TRITONBACKEND_ModelFinalize: delete model state
I0911 02:11:48.000163 1 model_repository_manager.cc:1195] successfully unloaded 'ner' version 1
W0911 02:11:48.939825 1 metrics.cc:395] Unable to get power limit for GPU 0: Success
W0911 02:11:48.939849 1 metrics.cc:410] Unable to get power usage for GPU 0: Success
I0911 02:11:48.994473 1 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior The model with label look up ops defined should re-allocate CPU-only ops automatically, as test results from other users (#3344) suggest that this feature should works in the previous version

Issue Analytics

State:
Created 2 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

2reactions

BorisPolonskycommented, Nov 3, 2021

Thank you both. It doesn’t seem to be a matter of the signature def, or at least the server never gets to reading the tag on my end. It almost looks like the model configuration is invalid.

@BorisPolonsky Can you tell me what command you’re running with the dummy model? I’m using the flags you listed above. Once I reproduce your error, I can investigate the issue further.

Hello, I used the following script for running the model for the test I mentioned above:

#!/bin/bash
STRICT_MODEL_CONFIG=false
TF_ALLOW_SOFT_PLACEMENT=true
docker run --name triton -d --gpus all -p 8000-8002:8000-8002 -v /home/polonsky/Documents/triton-test/model-repo/:/home/model-repo -v /etc/localtime:/etc/localtime:ro -e LANG=C.UTF-8 -d nvcr.io/nvidia/tritonserver:21.09-py3 tritonserver --model-repository /home/model-repo --strict-model-config="$STRICT_MODEL_CONFIG" --backend-config=tensorflow,allow-soft-placement="$TF_ALLOW_SOFT_PLACEMENT"

I just looked through my config.pbtxt file and I did found out something wrong with my configuration. I’ve fixed them as below and changed STRICT_MODEL_CONFIG to true and re-spawn the server, the model was finally in READY state, with following warning message:

2021-11-03 15:02:40.956570: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
LookupTableFindV2: CPU 
LookupTableImportV2: CPU 
HashTableV2: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  index_to_string (HashTableV2) /gpu:0
  index_to_string/table_init (LookupTableImportV2) /gpu:0
  index_to_string_Lookup (LookupTableFindV2) /gpu:0

It looks like that these messages are for warning purpose only, not the reason why the model loading process failed. Here’s how I corrected my config.pbtxt file.

platform: "tensorflow_savedmodel"
backend: "tensorflow"
max_batch_size: 2
input: [
  {
  name: "input_ids"
  data_type: TYPE_INT64
  dims: [ 128 ]
  allow_ragged_batch: false
  },
  {
  name: "input_mask"
  data_type: TYPE_INT64
  dims: [ 128 ]
    allow_ragged_batch: false
  },
  {
  name: "segment_ids"
  data_type: TYPE_INT64
  dims: [ 128 ]
  allow_ragged_batch: false
  }
]
output [
  {
  name: "score"
  data_type: TYPE_FP32
  dims: [ 1 ]
  reshape: { shape: [ ] }
  },
  {
  name: "label"
  data_type: TYPE_STRING
  dims: [ 1 ]
  reshape: { shape: [ ] }
  }
]

batch_input []
batch_output []
instance_group: [
  {
  kind: KIND_MODEL
  }
]

2reactions

deadeyegoodwincommented, Nov 2, 2021

There is a TF_SIGNATURE_DEF parameter that you can set to select the signature def to use: https://github.com/triton-inference-server/tensorflow_backend#parameters In any case Triton should get consistent behavior when that parameter is not specified, we shouldn’t intermittently use one sig_def or another.

Top Results From Across the Web

When does a std::vector reallocate its memory array?

Memory will be reallocated automatically if more than capacity() - size() elements are inserted into the vector. Reallocation does not ...

storage aggregate reallocation start - Product documentation

Begins a reallocation scan on a specified aggregate. Before performing a reallocation scan, the reallocation job normally performs a check of the current...

U.S. court upholds FCC reallocation of auto safety spectrum

Court of Appeals for the District of Columbia seeking to reverse the FCC's reallocation of 60% of the 5.9 GHz band spectrum block....

FAQ - smartmontools

What is going wrong? This means that the device driver does not support the command SMART READ LOG. The message does not indicate...

Project management goal: Resolve resource allocation ...

When a resource is assigned too much work in a given period, the most effective way to reallocate or reschedule the work is...