Device Auto Reallocaton not working as expected
See original GitHub issueDescription
I have lots of classification model in tensorflow SavedModel format that runs perfectly on GPU version of tensorflow/serving. The label look up defined in graph is CPU-only. And the triton server failed to re allocate these op from GPU to CPU with --strict-model-config=true --backend-config=tensorflow,allow-soft-placement=true
specified in this specific version
Here’s the logs from triton-model-server. Please refer to #3344 as well for test result from other users.
Triton Information
What version of Triton are you using?
nvcr.io/nvidia/tritonserver:21.08-py3
Are you using the Triton container or did you build it yourself?
Container
To Reproduce
server options
--strict-model-config=true --backend-config=tensorflow,allow-soft-placement=true
config.pbtxt
platform: "tensorflow_savedmodel"
backend: "tensorflow"
max_batch_size: 2
input: [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [ 512 ]
allow_ragged_batch: false
},
{
name: "input_mask"
data_type: TYPE_INT64
dims: [ 512 ]
reshape: { shape: [ ] }
},
{
name: "segment_ids"
data_type: TYPE_INT64
dims: [ 512 ]
reshape: { shape: [ ] }
}
]
output [
{
name: "cls_embedding"
data_type: TYPE_FP32
dims: [ 768 ]
}
]
batch_input []
batch_output []
instance_group: [
{
kind: KIND_MODEL
}
]
Logs
2021-09-11 10:11:47.932340: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
LookupTableFindV2: CPU
LookupTableImportV2: CPU
HashTableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
index_to_string (HashTableV2) /gpu:0
index_to_string/table_init (LookupTableImportV2) /gpu:0
index_to_string_Lookup (LookupTableFindV2) /gpu:0
2021-09-11 10:11:47.933076: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-09-11 10:11:47.975969: I tensorflow/cc/saved_model/loader.cc:200] Running initialization op on SavedModel bundle at path: /home/model-repo/ner/1/model.savedmodel
2021-09-11 10:11:47.993625: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 4368833 microseconds.
2021-09-11 10:11:47.993684: W triton/tensorflow_backend_tf.cc:986] unable to find serving signature 'serving_default
2021-09-11 10:11:47.993693: W triton/tensorflow_backend_tf.cc:988] using signature 'predict'
I0911 02:11:47.993920 1 model_repository_manager.cc:1212] successfully loaded 'ner' version 1
I0911 02:11:47.994044 1 server.cc:504]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0911 02:11:47.994199 1 server.cc:543]
+-------------+-----------------------------------------------------------------+---------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------+
| tensorrt | <built-in> | {} |
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {"cmdline":{"allow-soft-placement":"true"}} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} |
+-------------+-----------------------------------------------------------------+---------------------------------------------+
I0911 02:11:47.994222 1 server.cc:586]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| ner | 1 | READY |
+-------+---------+--------+
I0911 02:11:47.994306 1 tritonserver.cc:1718]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.13.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /home/model-repo |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0911 02:11:47.994314 1 server.cc:234] Waiting for in-flight requests to complete.
I0911 02:11:47.994318 1 model_repository_manager.cc:1078] unloading: ner:1
I0911 02:11:47.994357 1 server.cc:249] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0911 02:11:47.994527 1 tensorflow.cc:2356] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0911 02:11:47.994569 1 tensorflow.cc:2295] TRITONBACKEND_ModelFinalize: delete model state
I0911 02:11:48.000163 1 model_repository_manager.cc:1195] successfully unloaded 'ner' version 1
W0911 02:11:48.939825 1 metrics.cc:395] Unable to get power limit for GPU 0: Success
W0911 02:11:48.939849 1 metrics.cc:410] Unable to get power usage for GPU 0: Success
I0911 02:11:48.994473 1 server.cc:249] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Expected behavior The model with label look up ops defined should re-allocate CPU-only ops automatically, as test results from other users (#3344) suggest that this feature should works in the previous version
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (6 by maintainers)
Top GitHub Comments
Hello, I used the following script for running the model for the test I mentioned above:
I just looked through my
config.pbtxt
file and I did found out something wrong with my configuration. I’ve fixed them as below and changedSTRICT_MODEL_CONFIG
totrue
and re-spawn the server, the model was finally inREADY
state, with following warning message:It looks like that these messages are for warning purpose only, not the reason why the model loading process failed. Here’s how I corrected my
config.pbtxt
file.There is a TF_SIGNATURE_DEF parameter that you can set to select the signature def to use: https://github.com/triton-inference-server/tensorflow_backend#parameters In any case Triton should get consistent behavior when that parameter is not specified, we shouldn’t intermittently use one sig_def or another.