Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inference is slow on Jetson Xavier NX

See original GitHub issue

I am running the following test.py script to benchmark the inference. For resnet50 FPS supposed to be ~312, but I get ~68.

import torch
import torchvision.models as models
import numpy as np
from time import time
from torch2trt import torch2trt

def inference_test():
    device = torch.device('cuda:0')

    # Create model and input.
    model = models.resnet50(pretrained=True)
    tmp = (np.random.standard_normal([1, 3, 224, 224]) * 255).astype(np.uint8)  
    # tmp = (np.random.standard_normal([1, 3, 416, 416]) * 255).astype(np.uint8)  #mobilenet_v2

    # move them to the device 
    model.eval()
    model.to(device)   
    img = torch.from_numpy(tmp.astype(np.float32)).to(device)

    # convert to TensorRT feeding sample data as input
    model_trt = torch2trt(model, [img])

    def infer():
        with torch.no_grad():
            before = time()
            # outs = model(img)
            outs = model_trt(img)
            infer_time = time() - before
        return infer_time

    print("Running warming up iterations..")
    for i in range(0, 100):
        infer()
    
    total_infer_time = 0
    print("Running the test iterations..")
    for i in range(0, 100):
        total_infer_time += infer()
    print(f"FPS: {100 / total_infer_time}")
inference_test()

OUTPUT:

Running warming up iterations..
Running the test iterations..
FPS: 67.9085001712161

Jetson env:

 - NVIDIA Jetson Xavier NX (Developer Kit Version)
   * Jetpack 4.4 [L4T 32.4.3]
   * NV Power Mode: MODE_15W_6CORE - Type: 2
   * jetson_stats.service: active
 - Libraries:
   * CUDA: 10.2.89
   * cuDNN: 8.0.0.180
   * TensorRT: 7.1.3.0
   * Visionworks: 1.6.0.501
   * OpenCV: 4.1.1 compiled CUDA: NO
   * VPI: 0.3.7
   * Vulkan: 1.2.70

$ sudo jetson_clocks --show

SOC family:tegra194  Machine:NVIDIA Jetson Xavier NX Developer Kit
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0 
cpu1: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0 
cpu2: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0 
cpu3: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0 
cpu4: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0 
cpu5: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0 
GPU MinFreq=1109250000 MaxFreq=1109250000 CurrentFreq=1109250000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: speed=130
NV Power Mode: MODE_15W_6CORE

torch version: 1.6 torchvision version: 0.7.0 Please help me to find out the issue here? Thanks.

Issue Analytics

State:
Created 3 years ago
Comments:7

Top GitHub Comments

1reaction

IDaydaycommented, May 10, 2021

Hi there,

You can set the flags int8_mode or f16_mode in the torch2trt lib to achieve it.

More info at line 482 in this file

Thanks a lot. It means I need tensorRT_version >= 7.0, and the device supports INT8 inference, just like Xavier NX.

1reaction

dixantmittalcommented, Dec 22, 2020

Hi there! I tried it on my Xavier NX. I got the following results:

TRT + fp32 = ~61 fps TRT + fp16 = ~220 fps TRT + int8 = ~340 fps

I think the Nvidia’s claimed speed is for quantised model.

Top Results From Across the Web

Yolov6 Slow inference speed on the Nvidia Jetson NX board

Hi Jetson community, I changed the yolov6 code to be able to use my intel realsense camera as input source like image and...

NVIDIA Jetson Xavier - Maximizing Performance - RidgeRun

Xavier provides the jetson_clocks script to maximize Jetson Xavier performance by setting static max frequency to CPU, GPU, and EMC clocks. The ...

Make the Most of Your Jetson's Computing Power for ... - Deci AI

Let's take a look at the latency that can be achieved on Jetson Xavier NX in two power modes by a few architectures...

Benchmarking YoloV4 Models on an Nvidia Jetson Xavier NX

Note, that this also means that inference itself is much slower than if executed with no context. Further, in-depth research on individual bottlenecks...

Benchmarking Jetson Nano, Jetson Xavier NX and RPi with ...

Download scientific diagram | Benchmarking Jetson Nano, Jetson Xavier NX and RPi with ... i.e., the more complex the model is, the slower...