question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

openvino_backend has much lower performance than tensorFlow_backend

See original GitHub issue

Description I have a recommended model, which can be loaded normally using TensorFlow backend, but when using OV loading, it indicates that the operator is not supported, so I implemented the operator that OV does not support and encapsulated the custom operator into a shared library. now OV can also load the model normally. However, in the performance test, it was found that the performance of OV was much lower than that of TensorFlow. The results are as follows: tensorflow_backend:

root@test:/data/xxxx# ./perf_client -a -b 600 -u localhost:8001 -i gRPC -m lat_tf --concurrency-range 1
*** Measurement Settings ***
  Batch size: 600
  Measurement window: 5000 msec
  Using asynchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
  Client:
    Request count: 157
    Throughput: 18840 infer/sec
    Avg latency: 31705 usec (standard deviation 2264 usec)
    p50 latency: 31287 usec
    p90 latency: 34560 usec
    p95 latency: 36016 usec
    p99 latency: 37818 usec
    Avg gRPC time: 31804 usec ((un)marshal request/response 2205 usec + response wait 29599 usec)
  Server:
    Inference count: 113400
    Execution count: 189
    Successful request count: 189
    Avg request latency: 27745 usec (overhead 218 usec + queue 180 usec + compute input 647 usec + compute infer 26637 usec + compute output 63 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 18840 infer/sec, latency 31705 usec

ov backend:

root@test:/data/xxxxx# ./perf_client -a -b 600 -u localhost:8001 -i gRPC -m lat_openvino --concurrency-range 1
*** Measurement Settings ***
  Batch size: 600
  Measurement window: 5000 msec
  Using asynchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
  Client:
    Request count: 35
    Throughput: 4200 infer/sec
    Avg latency: 142570 usec (standard deviation 11775 usec)
    p50 latency: 141110 usec
    p90 latency: 157490 usec
    p95 latency: 158868 usec
    p99 latency: 171043 usec
    Avg gRPC time: 142093 usec ((un)marshal request/response 992 usec + response wait 141101 usec)
  Server:
    Inference count: 25200
    Execution count: 42
    Successful request count: 42
    Avg request latency: 140174 usec (overhead 138 usec + queue 35 usec + compute input 683 usec + compute infer 138491 usec + compute output 827 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 4200 infer/sec, latency 142570 usec

Triton Information What version of Triton are you using? 21.06

Are you using the Triton container or did you build it yourself? yes

I recompile ov backend with openvino_backend r21.06, openvino 2021.3. this my command:

cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_BUILD_OPENVINO_VERSION=2021.3.394
 -DTRITON_BUILD_CONTAINER_VERSION=21.06 ..

To Reproduce my code: https://drive.google.com/file/d/1WObxsMidRSEnkSr97P2bHUytX5UKoym_/view?usp=sharing it include repo(ov and tf) and custom op share lib. you can use Triton to load the repo, and the shared library is placed somewhere, and the shared library path in the OV model’s configuration file needs to be changed accordingly.

Expected behavior ov backend better than tf, who can help me? thank you very much.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:15 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
zhaohbcommented, Aug 4, 2021

yes,model.xml model.bin had created from tf savedmodel in my repo

Get Outlook for Android

0reactions
tanmayv25commented, Jan 7, 2022

We don’t officially support openVINO 2021.4 and onnxruntime_backend is still at 2021.2. But you can use our build.py script to build openvino backends for multiple versions. https://github.com/triton-inference-server/server/blob/main/build.py#L62-L83

Also, this ticket compares the performance of OV backend with TF backend for the model. Just so that we can better track the issue, can you create a new issue which describes the how to reproduce the perf regression between triton and intel’s model server? I think that would be an interesting take. Please make sure you are using same library versions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Triton Inference Server is 10X Slower than TensorFlow ...
Convert your TF model to ONNX and use Triton's Onnx backend with openvino accelerator. Both the above methods should give better performance ......
Read more >
OpenVINO TensorFlow Integration FAQ Sheet - Intel
The integration is designed for developers who would like to boost performance of their inferencing applications with minimal code modifications.
Read more >
Introduction to the Performance Topics - OpenVINO™ Toolkit
GPU backend comes with a feature, that allows models tuning, so the workload is configured to fit better into hardware. Tuning is time...
Read more >
EXTREME VISION Accelerates TensorFlow workloads ...
In this blog, we showcase how to further accelerate inference performance of TensorFlow developers on the Extreme Vision platform, ...
Read more >
Why is TF Keras inference way slower than Numpy operations?
I'm working on a reinforcement learning model implemented with Keras and Tensorflow. I have to do frequent calls to model.predict() on single ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found