question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Makes KFServing supports batch inference

See original GitHub issue

/kind feature

Describe the solution you’d like

Makes KFServing supports batch inference.

It could take the following form:

BatchPredict POST /v1/models/<model_name>:batch_predict Request: {instances_path, predictions_path} Response: {}

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:11
  • Comments:11 (2 by maintainers)

github_iconTop GitHub Comments

5reactions
wronkcommented, Oct 3, 2019

I’m also looking to do batched inference with kubeflow, so +1 for this feature and thanks @yuzisun for those resources. Our use case is running inference over very large sets (10s to 100s of millions) of satellite images. On-demand or event triggering would also be helpful for us as the frequency can vary quite a bit by project (but likely between once a week to once every 3 months).

4reactions
radchebcommented, Apr 15, 2020

As i understand, the ask is similar to Sagemaker batch transform?

Yes it’s a bit similar but should be more efficient. Sagemaker batch transform is using regular inference endpoints to batch inference with HTTP as transfer protocol. It’s not efficient for big datasets.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inference Batcher - KServe Documentation Website
This batcher is implemented in the KServe model agent sidecar, so the requests first hit the agent sidecar, when a batch prediction is...
Read more >
How to make an ML model inference on KFServing from ...
It abstracts different ML frameworks such as TensorFlow, PyTorch, and XGBoost. It supports auto scaling, scale to zero, canary rollouts, GPUs, ...
Read more >
Simplifying and Scaling Inference Serving with NVIDIA ...
Triton supports real-time, batch, and streaming inference queries for ... Serverless inferencing in Kubernetes with Triton and KFServing.
Read more >
Overview
KFServing includes support for NVIDIA Triton Inference Server. BentoML. BentoML is an open-source platform for high-performance ML model serving ...
Read more >
Amazon SageMaker Autopilot now supports batch ...
Starting today, you can select any of the SageMaker Autopilot models and proceed with batch inference within SageMaker Studio. To perform batch ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found