Internal: failed to connect to all addresses
See original GitHub issueDescription When launching Triton on Jetson TX2 with python backend sometimes happens this error:
Error: E0114 model_repository_manager.cc:986] failed to load ‘greedy’ version 1: Internal: failed to connect to all addresses
However, if I try to launch Triton one ore more times it will finally success to load the models. This is annoying because it sometimes happens while other times no.
| Model | Version | Status |
+----------------------+-----------+------------------------------------------------------------------------------+
| SpanishQ10x5 | 1 | READY |
| greedy | 1 | UNAVAILABLE: Internal: failed to connect to all addresses |
| preprocess | 1 | READY |
+----------------------+-----------+------------------------------------------------------------------------------+
Even when succeding loading all models there some warnings and errors like this (although it perfectly works doing inferences from a client):
I0114 autofill.cc:138] TensorFlow SavedModel autofill: Internal: unable to autofill for 'greedy', unable to find savedmodel directory named 'model.savedmodel'
I0114 autofill.cc:151] TensorFlow GraphDef autofill: Internal: unable to autofill for 'greedy', unable to find graphdef file named 'model.graphdef'
E0114 logging.cc:43] coreReadArchive.cpp (31) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
E0114 logging.cc:43] INVALID_STATE: std::exception
E0114 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
Triton Information 2.5.0-jetpack4.4-1795341
Are you using the Triton container or did you build it yourself? I am using my own Container based on: nvcr.io/nvidia/l4t-ml:r32.4.4-py3
To Reproduce model.py
import numpy as np
import json
from torch import nn
import torch
import triton_python_backend_utils as pb_utils
class GreedyCTCDecoder(nn.Module):
""" Greedy CTC Decoder
"""
def __init__(self, **kwargs):
nn.Module.__init__(self) # For PyTorch API
@torch.no_grad()
def forward(self, log_probs):
argmx = log_probs.argmax(dim=-1, keepdim=False).int()
return argmx
class TritonPythonModel:
"""Your Python model must use the same class name. Every Python model
that is created must have "TritonPythonModel" as the class name.
"""
def initialize(self, args):
"""`initialize` is called only once when the model is being loaded.
Implementing `initialize` function is optional. This function allows
the model to intialize any state associated with this model.
Parameters
----------
args : dict
Both keys and values are strings. The dictionary keys and values are:
* model_config: A JSON string containing the model configuration
* model_instance_kind: A string containing model instance kind
* model_instance_device_id: A string containing model instance device ID
* model_repository: Model repository path
* model_version: Model version
* model_name: Model name
"""
# You must parse model_config. JSON string is not parsed here
self.model_config = model_config = json.loads(args['model_config'])
# Get OUTPUT0 configuration
output0_config = pb_utils.get_output_config_by_name(model_config, "OUTPUT0")
# Convert Triton types to numpy types
self.output0_dtype = pb_utils.triton_string_to_numpy(output0_config['data_type'])
# HARDCODED CONFIGURATION
self.labels = [" ", "a", "á", "b", "c", "ç", "d", "e", "é", "f", "g", "h", "i", "í", "j", "k", "l", "m", "n",
"ñ", "o", "ó", "p", "q", "r", "s", "t", "u", "ú", "ü", "v", "w", "x", "y", "z", "'", "<BLANK>"]
def execute(self, requests):
def __ctc_decoder_predictions_tensor(tensor, labels):
"""
Takes output of greedy ctc decoder and performs ctc decoding algorithm to
remove duplicates and special symbol. Returns prediction
Args:
tensor: model output tensor
label: A list of labels
Returns:
prediction
"""
blank_id = len(labels) - 1
hypotheses = []
labels_map = dict([(i, labels[i]) for i in range(len(labels))])
prediction_cpu_tensor = tensor.long().cpu()
# iterate over batch
for ind in range(prediction_cpu_tensor.shape[0]):
prediction = prediction_cpu_tensor[ind].numpy().tolist()
# CTC decoding procedure
decoded_prediction = []
previous = len(labels) - 1 # id of a blank symbol
for p in prediction:
if (p != previous or previous == blank_id) and p != blank_id:
decoded_prediction.append(p)
previous = p
hypothesis = ''.join([labels_map[c] for c in decoded_prediction])
hypotheses.append(hypothesis)
return hypotheses
"""`execute` must be implemented in every Python model. `execute`
function receives a list of pb_utils.InferenceRequest as the only
argument. This function is called when an inference is requested
for this model. Depending on the batching configuration (e.g. Dynamic
Batching) used, `requests` may contain multiple requests. Every
Python model, must create one pb_utils.InferenceResponse for every
pb_utils.InferenceRequest in `requests`. If there is an error, you can
set the error argument when creating a pb_utils.InferenceResponse.
Parameters
----------
requests : list
A list of pb_utils.InferenceRequest
Returns
-------
list
A list of pb_utils.InferenceResponse. The length of this list must
be the same as `requests`
"""
output0_dtype = self.output0_dtype
responses = []
# Every Python backend must iterate over everyone of the requests
# and create a pb_utils.InferenceResponse for each of them.
for request in requests:
# Get INPUT0
self.in_0 = pb_utils.get_input_tensor_by_name(request, "LOGITS")
# GREEDY DECODING
greedy_decoder = GreedyCTCDecoder()
t_predictions_e = greedy_decoder(torch.from_numpy(self.in_0.as_numpy()))
hypotheses = __ctc_decoder_predictions_tensor(t_predictions_e, labels=self.labels)
# Create output tensors. You need pb_utils.Tensor objects to create pb_utils.InferenceResponse.
out_0 = np.asarray(hypotheses)
#out_tensor_0 = pb_utils.Tensor("OUTPUT0", out_0.astype(output0_dtype))
out_tensor_0 = pb_utils.Tensor("OUTPUT0", np.char.encode(out_0, 'utf-8')) # Encode to bytes
inference_response = pb_utils.InferenceResponse(output_tensors=[out_tensor_0])
responses.append(inference_response)
return responses
def finalize(self):
"""`finalize` is called only once when the model is being unloaded.
Implementing `finalize` function is optional. This function allows
the model to perform any necessary clean ups before exit.
"""
print('Cleaning up...')
config.pbtxt:
name: "preprocess"
backend: "python"
input [
{
name: "INPUT0"
data_type: TYPE_FP32
dims: [ 1, -1 ]
}
]
output [
{
name: "FEATURES"
data_type: TYPE_FP32
dims: [ -1, -1, -1 ]
}
]
instance_group [
{ count: 1
kind: KIND_CPU
}
]
I do launch Triton using: ./bin/tritonserver --log-verbose=100 --strict-model-config=false --model-store=/home/igonzalez/models --backend-directory=backends --backend-config=python,grpc-timeout-milliseconds=5000
Expected behavior To always success in launching Triton inference server with python backend
+--------------------+---------+--------+
| Model | Version | Status |
+--------------------+---------+--------+
| SpanishQ10x5 | 1 | READY |
| greedy | 1 | READY |
| preprocess | 1 | READY |
+--------------------+---------+--------+
¿No other Warning or Errors?
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
@ivangtorre I tried the Python backend on Jetson and it seems to be working fine. Can you please provide the version for the packages that you are using?
I’m using the version of the packages posted above. Also, you might need to make sure that your PyTorch installation on jetson is correct. I suspect that the reason why you are seeing this error is that Jetson is overloaded and the GRPC server fails to start.
@ivangtorre I see. I was not building Python backend inside Docker. That would be great! You can consider submitting your PR to the https://github.com/triton-inference-server/contrib repo.