Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failing to get output tensor on GPU device

See original GitHub issue

Description Hello,

I’m trying to get an output tensor on the GPU device when doing InferenceRequest from the python backend

Triton Information What version of Triton are you using? Are you using the Triton container or did you build it yourself?

nvcr.io/nvidia/tritonserver:22.06-py3

To Reproduce

Here is a simple reproduction, I just make a request to a simple ONNX graph. I print if the output tensor is on GPU or CPU

import triton_python_backend_utils as pb_utils
import json
import asyncio

class TritonPythonModel:
    def initialize(self, args):
        self.model_config = json.loads(args['model_config'])

    async def execute(self, requests):
        responses = []
        for request in requests:
            in_0 = pb_utils.get_input_tensor_by_name(request, "input")
            inference_response_awaits = []
            infer_request = pb_utils.InferenceRequest(
                model_name="onnx",
                requested_output_names=["output"],
                inputs=[in_0])

            inference_response_awaits.append(infer_request.async_exec())

            inference_responses = await asyncio.gather(
                *inference_response_awaits)

            for infer_response in inference_responses:
                if infer_response.has_error():
                    raise pb_utils.TritonModelException(
                        infer_response.error().message())

            pytorch_output0_tensor = pb_utils.get_output_tensor_by_name(
                inference_responses[0], "output")

            # Here we print if the tensor is on CPU or GPU
            print(pytorch_output0_tensor.is_cpu())

            inference_response = pb_utils.InferenceResponse(
                output_tensors=[pytorch_output0_tensor])
            responses.append(inference_response)

        return responses

    def finalize(self):
        print('Cleaning up...')

name: "bls_async2"
backend: "python"
max_batch_size: 0

input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [ 1, 3 ]
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 1, 3 ]
  }
]

instance_group [
    {
      count: 1
      kind: KIND_GPU
    }
]

parameters: { key: "FORCE_CPU_ONLY_INPUT_TENSORS" value: {string_value:"no"}}

I run the server with

tritonserver --model-repository `pwd`/models --model-control-mode=poll --repository-poll-secs 2 --log-verbose 100

In the log I see it enters here https://github.com/triton-inference-server/onnxruntime_backend/blob/5568172eab065ae9bf31fe9dc1e2bed9dfc363d9/src/onnxruntime.cc#L1640

It will print True because is_cpu is true for the output issue.zip

Expected behavior is_cpu to be false

I add a zip containing the example models, you just need to run python3 make_request.py to run an inference -> issue.zip

Issue Analytics

State:
Created a year ago
Comments:6 (2 by maintainers)

Top GitHub Comments

2reactions

amircodotacommented, Jul 28, 2022

@Tabrizian Works like a charm ⭐

Thanks

0reactions

amircodotacommented, Jul 28, 2022

Thanks a lot! will try

Top Results From Across the Web

Tensorflow doesn't seem to see my gpu - Stack Overflow

I came across this same issue in jupyter notebooks. This could be an easy fix. $ pip uninstall tensorflow $ pip install tensorflow-gpu....

TensorFlow Lite inference

The term inference refers to the process of executing a TensorFlow Lite model on-device in order to make predictions based on input data....

Solution to TensorFlow 2 not using GPU | by Shakti Wadekar

Step I: Find out if the tensorflow is able to see the GPU. Command: Output: You will see only CPU info if no...

Introduction to PyTorch Tensors

You can see when we print the new tensor, PyTorch informs us which device it's on (if it's not on CPU). You can...

Pipeline — NVIDIA DALI 1.20.0 documentation

Data nodes (see DataNode ) - represent outputs and inputs of operators; ... If the device parameter is not specified, it is selected...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Failing to get output tensor on GPU device

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Cannot load Custom Op file in the container LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

bug: Trace output for `BYTES` has invalid JSON encoding