Triton hangs on tensorflow1 backend cpu-only build
See original GitHub issueDescription Triton hangs during the initialization of the Tensorflow 1 runtime.
Triton Information What version of Triton are you using? 2.20
Are you using the Triton container or did you build it yourself?
Triton container built using build.py
convenience script
To Reproduce Steps to reproduce the behavior. All of this is tested on an aarch64 and x86 non-cuda machine. Docker image build using command:
./build.py --cmake-dir=$(pwd)/build --build-dir=/tmp/citritonbuild --enable-logging --enable-stats --enable-tracing --enable-metrics --endpoint=http --endpoint=grpc --backend=tensorflow1 --extra-backend-cmake-arg=tensorflow1:TRITON_TENSORFLOW_INSTALL_EXTRA_DEPS=ON
Docker run command was:
docker run --rm -it --entrypoint="" -v $(pwd)/triton_model_repo:/models tritonserver bash
After doing this the server output should look something like this then hang:
root@b0c1ff746084:/opt/tritonserver# tritonserver --model-repository /models
2022-03-30 16:36:43.807897: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-30 16:36:43.808035: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-03-30 16:36:43.808230: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-30 16:36:43.808262: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). Hangs before even looking at the model repository. You can pass an invalid directory for the model repository and the behavior is the same.
Expected behavior Tensorflow1 model should load properly and server shouldn’t hang on a cpu only build of Triton.
Issue Analytics
- State:
- Created a year ago
- Comments:13 (13 by maintainers)
Top GitHub Comments
Yes running from the r22.03 branch as opposed to the release tag. I can also comment I saw the same behavior when running on main yesterday before the 22.03 release.
@CoderHam Can confirm this PR fixed the hang issue. Here are the log outputs for reference: