question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Huge inference speed difference when loading a model from S3

See original GitHub issue

Hi all! I found a huge difference on RPS (Requests per Second) when loading a model from a MINIO S3 bucket.

Specifically, when mounting a model from local path I get 40 RPS with the following command:

docker run --rm -t -i --gpus "0" \
    -p "$TRITON_HTTP_PORT:$TRITON_HTTP_PORT" \
    -p "$TRITON_gRPC_PORT:$TRITON_gRPC_PORT" \
    -p "$TRITON_METRICS_PORT:$TRITON_METRICS_PORT" \
    -v "$TRITON_MODEL_DIR:/models" \
    -e CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES \
    $TRITON_DOCKER_IMAGE tritonserver \
        --http-port "$TRITON_HTTP_PORT" \
        --grpc-port "$TRITON_gRPC_PORT" \
        --metrics-port "$TRITON_METRICS_PORT" \
        --model-repository "/models" \
        --log-verbose "0"

But when I point the model-repository flag directly to my MINIO instance I get 4 RPS, using the following:

docker run --rm -t -i --gpus "0" \
    -p "$TRITON_HTTP_PORT:$TRITON_HTTP_PORT" \
    -p "$TRITON_gRPC_PORT:$TRITON_gRPC_PORT" \
    -p "$TRITON_METRICS_PORT:$TRITON_METRICS_PORT" \
    -e CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES \
    -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
    $TRITON_DOCKER_IMAGE tritonserver \
        --http-port "$TRITON_HTTP_PORT" \
        --grpc-port "$TRITON_gRPC_PORT" \
        --metrics-port "$TRITON_METRICS_PORT" \
        --model-repository "$TRITON_MINIO_BUCKET" \
        --log-verbose "0"

Could this possibly be resolved by using a special configuration or its probably an internal bug?

Thanks all!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
CoderHamcommented, Aug 3, 2021

@ioangatop I have filed a ticket and the Triton team and I will look into root causing this issue.

0reactions
CoderHamcommented, Feb 24, 2022

@ioangatop closing due to inactivity. Will reopen if needed

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deploy large models on Amazon SageMaker using ...
DeepSpeed Inference supports large Transformer-based models with ... You can load different versions of a model on a single endpoint.
Read more >
Use Batch Transform - Amazon SageMaker - 亚马逊云科技
Use a batch transform job to get inferences for an entire dataset, when you don't need a persistent endpoint, or to preprocess your...
Read more >
How to write/load machine learning model to/from S3 bucket ...
For some reasons, I'm storing the fitted model in a dictionary. The idea is to dump/load the model through joblib to/from an S3...
Read more >
Deploy BLOOM-176B and OPT-30B on Amazon SageMaker ...
In this post, we use the SageMaker large model inference container to generate and compare latency and throughput performance using these two ...
Read more >
A complete guide to AI accelerators for deep learning inference
And you can speed up inference by offloading ML model prediction ... Since the system had different types of processors (the CPU and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found