Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BERT model is returning NaN logits values in output

See original GitHub issue

Description I’m able to deploy fine-tuned “bert-base-uncased” model on Triton inference server using TensorRT, while inference I am getting a NaN logits values.

Converted the onnx model to tensorrt using command below. trtexec --onnx=model.onnx --saveEngine=model.plan --minShapes=input_ids:1x1,attention_mask:1x1,token_type_ids:1x1 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt

Output logs logits: [[[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan] ............ [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]]]

Triton Information Triton Server Version 2.22.0 NVIDIA Release 22.05

Using Triton container: ‘007439368137.dkr.ecr.us-east-2.amazonaws.com/sagemaker-tritonserver:22.05-py3’

To Reproduce

Deploy tensorrt model on triton inference server.
Send a inference request.

text = "Published by HT Digital Content Services with permission from Construction Digital." batch_size = 1 payload = { "inputs": [ { "name": "TEXT", "shape": (batch_size,), "datatype": "BYTES", "data": [text], } ] }

Preprocessed the input text, got the input_ids and attention_masks from tokenizer then send the below input to the model.

Model input:

{'input_ids': array([[ 101, 12414, 10151, 87651, 10764, 18491, 12238, 10171, 48822, 10195, 13154, 10764, 119, 102]], dtype=int32), 'token_type_ids': array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32), 'attention_mask': array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)}

Then you will see that model produce the logits values as NaN.

Please find the all deployment files on G-drive - https://drive.google.com/file/d/1uteEOgnSLwtfTonJtgukKjnDwycezFg3/view?usp=sharing

Expected behavior I expect valid logits values from the BERT model instead of NaN.

Please help me on this issue. Thanks

Issue Analytics

State:
Created a year ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

rmccorm4commented, Sep 9, 2022

It will take some time for me to run the model on Polygraphy because I didn’t use that earlier.

@Vinayaks117 Hopefully something like this should get you started (assuming you’re in the same directory as your ONNX model, otherwise change mount paths)

# Start latest TRT container that comes with polygraphy installed
docker run -ti --gpus all -v ${PWD}:/mnt -w /mnt nvcr.io/nvidia/tensorrt:22.08-py3

# Let polygraphy install dependencies as needed (onnxruntime, etc)
export POLYGRAPHY_AUTOINSTALL_DEPS=1

# Run model with both onnxruntime, and tensorrt, and then compare the outputs
polygraphy run --validate --onnxrt --trt model.onnx

# For more details, config options, dynamic shape settings, etc.
polygraphy -h

# For example to validate that your TRT model is returning NaNs or not, you might try
polygraphy run --trt <trt plan or onnx file> --validate

0reactions

dyastremskycommented, Sep 30, 2022

Closing due to inactivity. Please let us know to reopen the issue if you’d like to follow up.