Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Makes KFServing supports batch inference

See original GitHub issue

/kind feature

Describe the solution you’d like

Makes KFServing supports batch inference.

It could take the following form:

BatchPredict POST /v1/models/<model_name>:batch_predict Request: {instances_path, predictions_path} Response: {}

Issue Analytics

State:
Created 4 years ago
Reactions:11
Comments:11 (2 by maintainers)

Top GitHub Comments

5reactions

wronkcommented, Oct 3, 2019

I’m also looking to do batched inference with kubeflow, so +1 for this feature and thanks @yuzisun for those resources. Our use case is running inference over very large sets (10s to 100s of millions) of satellite images. On-demand or event triggering would also be helpful for us as the frequency can vary quite a bit by project (but likely between once a week to once every 3 months).

4reactions

radchebcommented, Apr 15, 2020

As i understand, the ask is similar to Sagemaker batch transform?

Yes it’s a bit similar but should be more efficient. Sagemaker batch transform is using regular inference endpoints to batch inference with HTTP as transfer protocol. It’s not efficient for big datasets.

Top Results From Across the Web

Inference Batcher - KServe Documentation Website

This batcher is implemented in the KServe model agent sidecar, so the requests first hit the agent sidecar, when a batch prediction is...

How to make an ML model inference on KFServing from ...

It abstracts different ML frameworks such as TensorFlow, PyTorch, and XGBoost. It supports auto scaling, scale to zero, canary rollouts, GPUs, ...

Simplifying and Scaling Inference Serving with NVIDIA ...

Triton supports real-time, batch, and streaming inference queries for ... Serverless inferencing in Kubernetes with Triton and KFServing.

Overview

KFServing includes support for NVIDIA Triton Inference Server. BentoML. BentoML is an open-source platform for high-performance ML model serving ...

Amazon SageMaker Autopilot now supports batch ...

Starting today, you can select any of the SageMaker Autopilot models and proceed with batch inference within SageMaker Studio. To perform batch ......