Model-analyser, on remote host and docker.
See original GitHub issueCan you please, help me with this issue?
I have generated the models that works with triton-20.09 in a stand alone Triton-inference-server container. I have built the models-analyser and it by default supports triton-inference 20.11. While, I am passing models and plugins that are generated in 20.09 it is giving me an error when loading with model-analyser since, model-analyser supports 20.11. On the other side, when I am generating models and plugins with trt ngc container-20.11 and loading in 20.11 model-analyser I am able to run the model-analyser without any issue. My requirement is to load the models and plugins in model-analyser that are generated for 20.09.
Running
sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v /home/ubuntu/cuda/sec_models:/models -v /home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash
The models and plugins that are given in the above command, are generated for 20.09-py3. The models are loaded fine with 20.09-py3 triton inference server.
Command inside the docker.
model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode docker --triton-version 20.09-py3
Error
model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode docker --triton-version 20.09-py3
2021-01-23 19:39:10.854 INFO[entrypoint.py:368] Triton Model Analyzer started: config={'model_repository': '/models/', 'model_names': 'yolo1', 'batch_sizes': '1', 'concurrency': '1', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'docker', 'triton_version': '20.09-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-23 19:39:10.859 INFO[entrypoint.py:105] Starting a Triton Server using docker...
2021-01-23 19:39:10.859 INFO[driver.py:236] init
2021-01-23 19:39:13.687 INFO[entrypoint.py:209] Triton Server is ready.
2021-01-23 19:39:14.714 INFO[entrypoint.py:383] Starting perf_analyzer...
2021-01-23 19:39:14.714 INFO[analyzer.py:91] Profiling server only metrics...
2021-01-23 19:39:15.737 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-5df6aea1-a690-25ee-c16e-bd46a1d95792 } for the analysis.
2021-01-23 19:39:21.852 ERROR[entrypoint.py:387] Model Analyzer encountered an error: Unable to load the model : [StatusCode.INTERNAL] failed to load 'yolo1', no version is available
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 79, in load_model
self._client.load_model(model.name())
File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 555, in load_model
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] failed to load 'yolo1', no version is available
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 384, in main
run_analyzer(config, analyzer, client, run_configs)
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 323, in run_analyzer
client.load_model(model=model)
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 82, in load_model
f"Unable to load the model : {e}")
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.INTERNAL] failed to load 'yolo1', no version is available
2021-01-23 19:39:21.854 INFO[server_docker.py:128] Stopping triton server.
Also how do we Run docker on remote mode
Stand-alone-inference server-20.09-py3
sudo docker run --gpus all --rm --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v/home/ubuntu/cuda/sec_models:/models -v/home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" bdb0cbe1c039 tritonserver --model-repository=/models --grpc-infer-allocation-pool-size=512 --log-verbose 1
op
I0123 19:44:29.564053 1 grpc_server.cc:2078] Thread started for ModelStreamInferHandler
I0123 19:44:29.564070 1 grpc_server.cc:3897] Started GRPCInferenceService at 0.0.0.0:8001
I0123 19:44:29.564351 1 http_server.cc:2705] Started HTTPService at 0.0.0.0:8000
I0123 19:44:29.605837 1 http_server.cc:2724] Started Metrics Service at 0.0.0.0:8002
Model-analyser command
sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v /home/ubuntu/cuda/sec_models:/models -v /home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash
inside docker
model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode remote --triton-grpc-endpoint localhost:8001
2021-01-23 19:53:10.191 INFO[entrypoint.py:368] Triton Model Analyzer started: config={'model_repository': '/models/', 'model_names': 'yolo1', 'batch_sizes': '1', 'concurrency': '1', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'remote', 'triton_version': '20.11-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-23 19:53:10.197 INFO[entrypoint.py:84] Using remote Triton Server...
2021-01-23 19:53:10.199 INFO[entrypoint.py:209] Triton Server is ready.
2021-01-23 19:53:10.199 INFO[driver.py:236] init
2021-01-23 19:53:11.299 INFO[entrypoint.py:383] Starting perf_analyzer...
2021-01-23 19:53:11.299 INFO[analyzer.py:91] Profiling server only metrics...
2021-01-23 19:53:12.323 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-5df6aea1-a690-25ee-c16e-bd46a1d95792 } for the analysis.
2021-01-23 19:53:18.438 ERROR[entrypoint.py:387] Model Analyzer encountered an error: Unable to load the model : [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 79, in load_model
self._client.load_model(model.name())
File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 555, in load_model
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 384, in main
run_analyzer(config, analyzer, client, run_configs)
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 323, in run_analyzer
client.load_model(model=model)
File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 82, in load_model
f"Unable to load the model : {e}")
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
root@tensorgo-rppg:/opt/triton-model-analyzer#
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Top GitHub Comments
@alphapibeta Regarding your first question, there is a bug in Model Analyzer currently that requires the path inside the container to be the same as the path outside the container.
For now, I recommend loading the model in the same path as the host machine.
/cc @xprotobeast2
Regarding your second question, you need to start the tritonserver using
--model-control-mode=explicit
flag when you want to use the remote mode. I’ll update the doc to reflect this. Thanks for pointing this out.@xprotobeast2 Actually, I tried to load a model that does not exist after restarting the Triton Server with the --model-control-mode=explicit and then was encountered with this error (Number of retries exceeded) in the second run. However, these flags (–model-control-mode=explicit, --triton-launch-mode remote) worked when the loaded model exist in the repo.