question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failing to get output tensor on GPU device

See original GitHub issue

Description Hello,

I’m trying to get an output tensor on the GPU device when doing InferenceRequest from the python backend

Triton Information What version of Triton are you using? Are you using the Triton container or did you build it yourself?

nvcr.io/nvidia/tritonserver:22.06-py3

To Reproduce

Here is a simple reproduction, I just make a request to a simple ONNX graph. I print if the output tensor is on GPU or CPU

import triton_python_backend_utils as pb_utils
import json
import asyncio

class TritonPythonModel:
    def initialize(self, args):
        self.model_config = json.loads(args['model_config'])

    async def execute(self, requests):
        responses = []
        for request in requests:
            in_0 = pb_utils.get_input_tensor_by_name(request, "input")
            inference_response_awaits = []
            infer_request = pb_utils.InferenceRequest(
                model_name="onnx",
                requested_output_names=["output"],
                inputs=[in_0])

            inference_response_awaits.append(infer_request.async_exec())

            inference_responses = await asyncio.gather(
                *inference_response_awaits)

            for infer_response in inference_responses:
                if infer_response.has_error():
                    raise pb_utils.TritonModelException(
                        infer_response.error().message())

            pytorch_output0_tensor = pb_utils.get_output_tensor_by_name(
                inference_responses[0], "output")

            # Here we print if the tensor is on CPU or GPU
            print(pytorch_output0_tensor.is_cpu())

            inference_response = pb_utils.InferenceResponse(
                output_tensors=[pytorch_output0_tensor])
            responses.append(inference_response)

        return responses

    def finalize(self):
        print('Cleaning up...')
name: "bls_async2"
backend: "python"
max_batch_size: 0

input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [ 1, 3 ]
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 1, 3 ]
  }
]

instance_group [
    {
      count: 1
      kind: KIND_GPU
    }
]

parameters: { key: "FORCE_CPU_ONLY_INPUT_TENSORS" value: {string_value:"no"}}

I run the server with

tritonserver --model-repository `pwd`/models --model-control-mode=poll --repository-poll-secs 2 --log-verbose 100

In the log I see it enters here https://github.com/triton-inference-server/onnxruntime_backend/blob/5568172eab065ae9bf31fe9dc1e2bed9dfc363d9/src/onnxruntime.cc#L1640

It will print True because is_cpu is true for the output issue.zip

Expected behavior is_cpu to be false

I add a zip containing the example models, you just need to run python3 make_request.py to run an inference -> issue.zip

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
amircodotacommented, Jul 28, 2022

@Tabrizian Works like a charm ⭐

Thanks

0reactions
amircodotacommented, Jul 28, 2022

Thanks a lot! will try

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorflow doesn't seem to see my gpu - Stack Overflow
I came across this same issue in jupyter notebooks. This could be an easy fix. $ pip uninstall tensorflow $ pip install tensorflow-gpu....
Read more >
TensorFlow Lite inference
The term inference refers to the process of executing a TensorFlow Lite model on-device in order to make predictions based on input data....
Read more >
Solution to TensorFlow 2 not using GPU | by Shakti Wadekar
Step I: Find out if the tensorflow is able to see the GPU. Command: Output: You will see only CPU info if no...
Read more >
Introduction to PyTorch Tensors
You can see when we print the new tensor, PyTorch informs us which device it's on (if it's not on CPU). You can...
Read more >
Pipeline — NVIDIA DALI 1.20.0 documentation
Data nodes (see DataNode ) - represent outputs and inputs of operators; ... If the device parameter is not specified, it is selected...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found