Makes KFServing supports batch inference
See original GitHub issue/kind feature
Describe the solution you’d like
Makes KFServing supports batch inference.
It could take the following form:
BatchPredict POST /v1/models/<model_name>:batch_predict
Request: {instances_path, predictions_path}
Response: {}
Issue Analytics
- State:
- Created 4 years ago
- Reactions:11
- Comments:11 (2 by maintainers)
Top Results From Across the Web
Inference Batcher - KServe Documentation Website
This batcher is implemented in the KServe model agent sidecar, so the requests first hit the agent sidecar, when a batch prediction is...
Read more >How to make an ML model inference on KFServing from ...
It abstracts different ML frameworks such as TensorFlow, PyTorch, and XGBoost. It supports auto scaling, scale to zero, canary rollouts, GPUs, ...
Read more >Simplifying and Scaling Inference Serving with NVIDIA ...
Triton supports real-time, batch, and streaming inference queries for ... Serverless inferencing in Kubernetes with Triton and KFServing.
Read more >Overview
KFServing includes support for NVIDIA Triton Inference Server. BentoML. BentoML is an open-source platform for high-performance ML model serving ...
Read more >Amazon SageMaker Autopilot now supports batch ...
Starting today, you can select any of the SageMaker Autopilot models and proceed with batch inference within SageMaker Studio. To perform batch ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m also looking to do batched inference with kubeflow, so +1 for this feature and thanks @yuzisun for those resources. Our use case is running inference over very large sets (10s to 100s of millions) of satellite images. On-demand or event triggering would also be helpful for us as the frequency can vary quite a bit by project (but likely between once a week to once every 3 months).
Yes it’s a bit similar but should be more efficient. Sagemaker batch transform is using regular inference endpoints to batch inference with HTTP as transfer protocol. It’s not efficient for big datasets.