Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

number of batch response mismatched for models with custom handler

See original GitHub issue

Context

I want to serve a model that accepts 8 ordered images per each request and returns an output. I wrote my custom handler, where a preprocess method collects images that are coming from request and forms a tensor with size [8, 3, H, W]. Next, it sends it to an inference method, where this tensor used as an input to my model.

I sent a single request and got the following error.

2020-10-15 06:53:50,802 [INFO ] W-9001-mymodel_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - model: mymodel, number of batch response mismatched, expect: 1, got: 8.

Trying to fix this issue, I thought that changing the batch size value would help. I deleted my model and registered a new one with a new batch size. curl -X DELETE http://localhost:8081/models/mymodel curl -X POST "localhost:8081/models?model_name=mymodel&url=mymodel.mar&batch_size=8&max_batch_delay=1000&initial_workers=4" and checked if it was applied

[ { "modelName": "mymodel", "modelVersion": "1.0", "modelUrl": "fa.mar", "runtime": "python", "minWorkers": 4, "maxWorkers": 4, "batchSize": 8, "maxBatchDelay": 1000, "loadedAtStartup": false, "workers": [ { "id": "9012", "startTime": "2020-10-15T06:53:51.466Z", "status": "READY", "gpu": false, "memoryUsage": 998322176 },...

Everything looks good and now "batchSize": 8

Next, I sent the same request again and got the same error.

2020-10-15 06:57:12,616 [INFO ] W-9008-mymodel_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - model: mymodel, number of batch response mismatched, expect: 1, got: 8.

torchserve version: 0.2.0
torch version: 1.6.0
torchvision version [if any]: 0.7.0
java version: java 14.0.2 2020-07-14
Operating System and version: macOS, 10.15.6

Expected Behavior

I run a request with 8 images and get the output.

Ideally, knowing my GPU capacity(=80), I expect that setting batch_size to 10(=gpu capacity/number of images in every request) would work too. In other words, by sending 10 requests simultaneously, torchserve would fully load my GPU and I would receive all 10 responses at the same time.

Current Behavior

Got an error number of batch response mismatched, expect: 1, got: 8

P.S. If I am doing it wrong and I should use another approach to handle requests with 8 ordered images, I would very much appreciate any help and advice!

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:7 (2 by maintainers)

Top GitHub Comments

9reactions

harshbafnacommented, Oct 16, 2020

@veronikayurchuk:

In TorchServe, batching happens at the server frontend, where the frontend creates a batch of n (configured model batch_size, in your case 8) from n input requests and passes it on to the model handler, and expects n response back from the handler.

IIUC, you are doing the following steps:

you are sending one single request with 8 images,
converting these images into tensors in your handler’s preprocess
create a batch from the above tensors and pass it to the model for inference
return the inference response

In this case, TorchServe frontend creates a batch of size 1 as it receives a single request is configured max_batch_delay time and expects a response of the same size. Which is why you see the following error in logs:

number of batch response mismatched, expect: 1, got: 8.

There are a couple of approaches for resolving this issue :

Option 1 (recommended): Send every image in an independent request. TorchServe frontend will create a batch from the number of requests received in configured max_batch_delay time and pass it on to the handler.
Option 2: If your use-case requires sending 8 images in one request only, then return a single list of outputs from your current handler.

4reactions

ankita4AIcommented, Jan 19, 2021

@veronikayurchuk was your problem resolved? I faced the same issue, Since the number of request is still 1, which is inherently carrying 8 inputs. Updating the postprocess method solved the issue. Since number of predictions will be equal to the number of inputs. if we return a list of list, this issue will not persist. Would love to know how you handled it. Thanks

@harshbafna I have used gRPC with tensorflow serving and seen huge difference wrt inference time. Fail to observe any speedup here with the difference in the number of inputs. Is it because of the layer of handler in between? Can you share your insights on benchmarks/profiling so as to what is causing the delay.

Top Results From Across the Web

3. Batch Inference with TorchServe - PyTorch

Model handler code: TorchServe requires the Model handler to handle batch inference requests. For a full working example of a custom model handler...

Batch Inference with TorchServe

Introduction. Batch inference is a process of aggregating inference requests and sending this aggregated requests through the ML/DL framework for inference ...

Mismatch in batch size - pytorch - Stack Overflow

I received the following error: <ipython-input-468-7a508893b28d> in <module> 21 output = model(inputs) 22 ---> 23 loss = criterion( ...

Use Batch Transform - Amazon SageMaker

For custom algorithms, provide these values through an execution-parameters endpoint. Use Batch Transform to Test Production Variants. To test different models ...

Spring Batch - Reference Documentation

Let batch developers use the Spring programming model: Concentrate on ... Parallel processing of many different batch runs or jobs at the same...