Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Huge inference speed difference when loading a model from S3

See original GitHub issue

Hi all! I found a huge difference on RPS (Requests per Second) when loading a model from a MINIO S3 bucket.

Specifically, when mounting a model from local path I get 40 RPS with the following command:

docker run --rm -t -i --gpus "0" \
    -p "$TRITON_HTTP_PORT:$TRITON_HTTP_PORT" \
    -p "$TRITON_gRPC_PORT:$TRITON_gRPC_PORT" \
    -p "$TRITON_METRICS_PORT:$TRITON_METRICS_PORT" \
    -v "$TRITON_MODEL_DIR:/models" \
    -e CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES \
    $TRITON_DOCKER_IMAGE tritonserver \
        --http-port "$TRITON_HTTP_PORT" \
        --grpc-port "$TRITON_gRPC_PORT" \
        --metrics-port "$TRITON_METRICS_PORT" \
        --model-repository "/models" \
        --log-verbose "0"

But when I point the model-repository flag directly to my MINIO instance I get 4 RPS, using the following:

docker run --rm -t -i --gpus "0" \
    -p "$TRITON_HTTP_PORT:$TRITON_HTTP_PORT" \
    -p "$TRITON_gRPC_PORT:$TRITON_gRPC_PORT" \
    -p "$TRITON_METRICS_PORT:$TRITON_METRICS_PORT" \
    -e CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES \
    -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
    $TRITON_DOCKER_IMAGE tritonserver \
        --http-port "$TRITON_HTTP_PORT" \
        --grpc-port "$TRITON_gRPC_PORT" \
        --metrics-port "$TRITON_METRICS_PORT" \
        --model-repository "$TRITON_MINIO_BUCKET" \
        --log-verbose "0"

Could this possibly be resolved by using a special configuration or its probably an internal bug?

Thanks all!