Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Op type not registered 'SentencepieceOp' for Universal Sentence Encoder

See original GitHub issue

Description I am running into issues while trying to serve a TF HUB model on Triton. Model I am using is: https://tfhub.dev/google/universal-sentence-encoder-multilingual/3

Triton Information I have reproduced this issue on following Triton versions: 21.04, 21.10 & 21.11 I am using the official Triton containers docker pull nvcr.io/nvidia/tritonserver:21.xx-py3

To Reproduce

Download the above model
Generate config.pbtxt
Start the triton server
Load the model
Send data for prediction (this step fails)

The error I get is

2021-11-23 02:27:50.394337: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:568] function_optimizer failed: Not found: Op type not registered ‘SentencepieceOp’ in binary running on db03c21d5f13. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. 2021-11-23 02:27:50.442544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at partitioned_function_ops.cc:113 : Not found: Op type not registered ‘SentencepieceOp’ in binary running on db03c21d5f13. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

Model framework, inputs, outputs & the model configuration file

name: “USEV3” platform: “tensorflow_savedmodel” input [ { name: “inputs”, data_type: TYPE_STRING, dims: [-1] } ] output [ { name: “outputs”, data_type: TYPE_FP32, dims: [-1, 512] } ] version_policy: { latest { num_versions : 1 }} optimization { graph { level: 1 } }

Expected behavior Successful output with a 512 dim vector for the prediction request

Issue Analytics

State:
Created 2 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

3reactions

tanmayv25commented, Nov 29, 2021

@shashankharinath I was seeing the protobuf mismatch because of the different versions of TF being used. pip installs the latest tensorflow-text which is for TF 2.x. For deploying the model with TF 1.x, we would need python=3.6, we can use the miniconda envioronment in the container. These steps deploy the model successfully:

TensorFlow 1

Within nvcr.io/nvidia/tritonserver:21.10-py3 container:

Get the miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Linux-x86_64.sh

Install miniconda and enter yes/agree for all prompts:

bash Miniconda3-py38_4.10.3-Linux-x86_64.sh

Source the bashrc

source /root/.bashrc

Create an environment with python=3.6

 conda create -q -y -n test_conda_env python=3.6

Activate the environment

source activate test_conda_env

Install tensorflow-text==1.15.1

pip install tensorflow-text==1.15.1

Create the model repository.

my_model/
`-- test_model
    `-- 1
        `-- model.savedmodel
            |-- assets
            |-- saved_model.pb
            |-- universal-sentence-encoder-multilingual_3.tar.gz
            `-- variables
                |-- variables.data-00000-of-00001
                `-- variables.index

5 directories, 4 files

Export the Tensorflow library to system paths.

export LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow1:$LD_LIBRARY_PATH

Launch the server.

LD_PRELOAD=/root/miniconda3/envs/test_conda_env/lib/python3.6/site-packages/tensorflow_text/python/ops/_sentencepiece_tokenizer.so tritonserver --model-store=my_model/ --strict-model-config=false

Confirm the server launched successfully. You should be able to see the below messages:

I1129 22:47:38.266012 825 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I1129 22:47:38.266583 825 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I1129 22:47:38.309267 825 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002

Follow below instructions only to test whether deployed model runs inferences without issue. These steps don’t validate the results.

Open another terminal and run docker container ps. Take note of the the Container ID running tritonserver.

Launch the container with below command.

docker exec -it <Container ID> bash

Verify the endpoint.

# curl -v localhost:8000/v2/health/ready
*   Trying 127.0.0.1:8000...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host localhost left intact

Install tritonclient python pip package and utilize the in-built perf_analyzer to test the model deployed on the server.

# pip install tritonclient[all]==2.15.0
# perf_analyzer -m test_model
*** Measurement Settings ***
  Batch size: 1
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
  Client: 
    Request count: 360
    Throughput: 72 infer/sec
    Avg latency: 13903 usec (standard deviation 1007 usec)
    p50 latency: 13914 usec
    p90 latency: 14898 usec
    p95 latency: 15374 usec
    p99 latency: 16214 usec
    Avg HTTP time: 13913 usec (send/recv 110 usec + response wait 13803 usec)
  Server: 
    Inference count: 430
    Execution count: 430
    Successful request count: 430
    Avg request latency: 13466 usec (overhead 55 usec + queue 29 usec + compute input 21 usec + compute infer 13332 usec + compute output 29 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 72 infer/sec, latency 13903 usec

As it can be seen the model was successfully deployed and served. Look at this section to add validation logic. Instead of installing tritonclient pip package in the same container, you can use tritonserver:21.10-py3-sdk container as described here. It would require the port-forwarding to be enabled when launching the tritonserver container.

TensorFlow 2

Within nvcr.io/nvidia/tritonserver:21.10-py3 container:

Install tensorflow-text==2.6.0

pip install tensorflow-text==2.6.0

Create the model repository.

my_model/
`-- test_model
    `-- 1
        `-- model.savedmodel
            |-- assets
            |-- saved_model.pb
            |-- universal-sentence-encoder-multilingual_3.tar.gz
            `-- variables
                |-- variables.data-00000-of-00001
                `-- variables.index

5 directories, 4 files

Export the Tensorflow 2 library to system paths. Note this is different from above.

export LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH

Launch the server and use backend-config to use TF 2. Specify correct path to the custom ops

LD_PRELOAD=/usr/local/lib/python3.8/dist-packages/tensorflow_text/python/ops/_sentencepiece_tokenizer.so tritonserver --model-store=my_model/ --strict-model-config=false --backend-config=tensorflow,version=2

It should load properly with the logs

I1129 23:14:18.161191 92 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I1129 23:14:18.161735 92 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I1129 23:14:18.203984 92 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002

You can follow the instructions in TensorFow 1 section to test whether model can serve the inference requests.

Conclusion

I have verified that Triton can load and run the TF-Text model successfully for both TF1 and TF2. Care must be taken that tensorflow-text version matches with the tensorflow version shipped with corresponding triton version. See support matrix for version information.

2reactions

tanmayv25commented, Nov 24, 2021

I did pip install tensorflow_text and loaded it in tritonserver as a custom op. It looks like there is a mismatch between protobuf versions in tf_text and tf in triton.

LD_PRELOAD=/root/miniconda3/lib/python3.8/site-packages/tensorflow_text/python/ops/_sentencepiece_tokenizer.so tritonserver --model-store=tf_store/ --strict-model-config=false
I1124 22:21:57.895074 26879 metrics.cc:298] Collecting metrics for GPU 0: NVIDIA TITAN RTX
I1124 22:21:58.163343 26879 libtorch.cc:1192] TRITONBACKEND_Initialize: pytorch
I1124 22:21:58.163368 26879 libtorch.cc:1202] Triton TRITONBACKEND API version: 1.6
I1124 22:21:58.163373 26879 libtorch.cc:1208] 'pytorch' TRITONBACKEND API version: 1.6
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/stubs/common.cc:86] This program was compiled against version 3.8.0 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.9.2).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "bazel-out/k8-opt/genfiles/tensorflow/contrib/boosted_trees/proto/tree_config.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  This program was compiled against version 3.8.0 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.9.2).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "bazel-out/k8-opt/genfiles/tensorflow/contrib/boosted_trees/proto/tree_config.pb.cc".)

The solution is to apply a small patch to TF-Text and Hub and building TF-Text from source code such that protobuf versions can match.