Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Internal: failed to connect to all addresses

See original GitHub issue

Description When launching Triton on Jetson TX2 with python backend sometimes happens this error:

Error: E0114 model_repository_manager.cc:986] failed to load ‘greedy’ version 1: Internal: failed to connect to all addresses

However, if I try to launch Triton one ore more times it will finally success to load the models. This is annoying because it sometimes happens while other times no.

| Model                 | Version | Status                                                                                    |
+----------------------+-----------+------------------------------------------------------------------------------+
| SpanishQ10x5   | 1            | READY                                                                                 |
| greedy                | 1            | UNAVAILABLE: Internal: failed to connect to all addresses |
| preprocess         | 1            | READY                                                                                 |
+----------------------+-----------+------------------------------------------------------------------------------+

Even when succeding loading all models there some warnings and errors like this (although it perfectly works doing inferences from a client):

I0114 autofill.cc:138] TensorFlow SavedModel autofill: Internal: unable to autofill for 'greedy', unable to find savedmodel directory named 'model.savedmodel'
I0114 autofill.cc:151] TensorFlow GraphDef autofill: Internal: unable to autofill for 'greedy', unable to find graphdef file named 'model.graphdef'
E0114 logging.cc:43] coreReadArchive.cpp (31) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
E0114  logging.cc:43] INVALID_STATE: std::exception
E0114 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.

Triton Information 2.5.0-jetpack4.4-1795341

Are you using the Triton container or did you build it yourself? I am using my own Container based on: nvcr.io/nvidia/l4t-ml:r32.4.4-py3

To Reproduce model.py

import numpy as np
import json
from torch import nn
import torch
import triton_python_backend_utils as pb_utils


class GreedyCTCDecoder(nn.Module):
    """ Greedy CTC Decoder
    """

    def __init__(self, **kwargs):
        nn.Module.__init__(self)  # For PyTorch API

    @torch.no_grad()
    def forward(self, log_probs):
        argmx = log_probs.argmax(dim=-1, keepdim=False).int()
        return argmx


class TritonPythonModel:
    """Your Python model must use the same class name. Every Python model
    that is created must have "TritonPythonModel" as the class name.
    """

    def initialize(self, args):
        """`initialize` is called only once when the model is being loaded.
        Implementing `initialize` function is optional. This function allows
        the model to intialize any state associated with this model.
        Parameters
        ----------
        args : dict
          Both keys and values are strings. The dictionary keys and values are:
          * model_config: A JSON string containing the model configuration
          * model_instance_kind: A string containing model instance kind
          * model_instance_device_id: A string containing model instance device ID
          * model_repository: Model repository path
          * model_version: Model version
          * model_name: Model name
        """
        # You must parse model_config. JSON string is not parsed here
        self.model_config = model_config = json.loads(args['model_config'])

        # Get OUTPUT0 configuration
        output0_config = pb_utils.get_output_config_by_name(model_config, "OUTPUT0")

        # Convert Triton types to numpy types
        self.output0_dtype = pb_utils.triton_string_to_numpy(output0_config['data_type'])

        # HARDCODED CONFIGURATION
        self.labels = [" ", "a", "á", "b", "c", "ç", "d", "e", "é", "f", "g", "h", "i", "í", "j", "k", "l", "m", "n",
                      "ñ", "o", "ó", "p", "q", "r", "s", "t", "u", "ú", "ü", "v", "w", "x", "y", "z", "'", "<BLANK>"]


    def execute(self, requests):
        def __ctc_decoder_predictions_tensor(tensor, labels):
            """
            Takes output of greedy ctc decoder and performs ctc decoding algorithm to
            remove duplicates and special symbol. Returns prediction
            Args:
                tensor: model output tensor
                label: A list of labels
            Returns:
                prediction
            """
            blank_id = len(labels) - 1
            hypotheses = []
            labels_map = dict([(i, labels[i]) for i in range(len(labels))])
            prediction_cpu_tensor = tensor.long().cpu()
            # iterate over batch
            for ind in range(prediction_cpu_tensor.shape[0]):
                prediction = prediction_cpu_tensor[ind].numpy().tolist()
                # CTC decoding procedure
                decoded_prediction = []
                previous = len(labels) - 1  # id of a blank symbol
                for p in prediction:
                    if (p != previous or previous == blank_id) and p != blank_id:
                        decoded_prediction.append(p)
                    previous = p
                hypothesis = ''.join([labels_map[c] for c in decoded_prediction])
                hypotheses.append(hypothesis)
            return hypotheses

        """`execute` must be implemented in every Python model. `execute`
        function receives a list of pb_utils.InferenceRequest as the only
        argument. This function is called when an inference is requested
        for this model. Depending on the batching configuration (e.g. Dynamic
        Batching) used, `requests` may contain multiple requests. Every
        Python model, must create one pb_utils.InferenceResponse for every
        pb_utils.InferenceRequest in `requests`. If there is an error, you can
        set the error argument when creating a pb_utils.InferenceResponse.
        Parameters
        ----------
        requests : list
          A list of pb_utils.InferenceRequest
        Returns
        -------
        list
          A list of pb_utils.InferenceResponse. The length of this list must
          be the same as `requests`
        """

        output0_dtype = self.output0_dtype
        responses = []

        # Every Python backend must iterate over everyone of the requests
        # and create a pb_utils.InferenceResponse for each of them.
        for request in requests:
            # Get INPUT0
            self.in_0 = pb_utils.get_input_tensor_by_name(request, "LOGITS")

            # GREEDY DECODING
            greedy_decoder = GreedyCTCDecoder()
            t_predictions_e = greedy_decoder(torch.from_numpy(self.in_0.as_numpy()))
            hypotheses = __ctc_decoder_predictions_tensor(t_predictions_e, labels=self.labels)
            # Create output tensors. You need pb_utils.Tensor objects to create pb_utils.InferenceResponse.
            out_0 = np.asarray(hypotheses)
            #out_tensor_0 = pb_utils.Tensor("OUTPUT0", out_0.astype(output0_dtype))
            out_tensor_0 = pb_utils.Tensor("OUTPUT0", np.char.encode(out_0, 'utf-8')) # Encode to bytes

            inference_response = pb_utils.InferenceResponse(output_tensors=[out_tensor_0])
            responses.append(inference_response)

        return responses

    def finalize(self):
        """`finalize` is called only once when the model is being unloaded.
        Implementing `finalize` function is optional. This function allows
        the model to perform any necessary clean ups before exit.
        """
        print('Cleaning up...')

config.pbtxt:

name: "preprocess"
backend: "python"

input [
  {
    name: "INPUT0"
    data_type: TYPE_FP32
    dims: [ 1, -1 ]
  }
]

output [
  {
    name: "FEATURES"
    data_type: TYPE_FP32
    dims: [ -1, -1, -1 ]
  }
]

instance_group [
   { count: 1
     kind: KIND_CPU
   }
]

I do launch Triton using: ./bin/tritonserver --log-verbose=100 --strict-model-config=false --model-store=/home/igonzalez/models --backend-directory=backends --backend-config=python,grpc-timeout-milliseconds=5000

Expected behavior To always success in launching Triton inference server with python backend

+--------------------+---------+--------+
| Model              | Version | Status |
+--------------------+---------+--------+
| SpanishQ10x5       | 1       | READY  |
| greedy             | 1       | READY  |
| preprocess         | 1       | READY  |
+--------------------+---------+--------+

¿No other Warning or Errors?

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

Tabriziancommented, Jan 19, 2021

@ivangtorre I tried the Python backend on Jetson and it seems to be working fine. Can you please provide the version for the packages that you are using?

grpcio==1.34.1
grpcio-channelz==1.34.1
grpcio-tools==1.34.1

I’m using the version of the packages posted above. Also, you might need to make sure that your PyTorch installation on jetson is correct. I suspect that the reason why you are seeing this error is that Jetson is overloaded and the GRPC server fails to start.

0reactions

Tabriziancommented, Jan 20, 2021

@ivangtorre I see. I was not building Python backend inside Docker. That would be great! You can consider submitting your PR to the https://github.com/triton-inference-server/contrib repo.