number of batch response mismatched for models with custom handler
See original GitHub issueContext
I want to serve a model that accepts 8 ordered images per each request and returns an output.
I wrote my custom handler, where a preprocess method collects images that are coming from request and forms a tensor with size [8, 3, H, W]. Next, it sends it to an inference method, where this tensor used as an input to my model.
I sent a single request and got the following error.
2020-10-15 06:53:50,802 [INFO ] W-9001-mymodel_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - model: mymodel, number of batch response mismatched, expect: 1, got: 8.
Trying to fix this issue, I thought that changing the batch size value would help.
I deleted my model and registered a new one with a new batch size.
curl -X DELETE http://localhost:8081/models/mymodel
curl -X POST "localhost:8081/models?model_name=mymodel&url=mymodel.mar&batch_size=8&max_batch_delay=1000&initial_workers=4"
and checked if it was applied
[ { "modelName": "mymodel", "modelVersion": "1.0", "modelUrl": "fa.mar", "runtime": "python", "minWorkers": 4, "maxWorkers": 4, "batchSize": 8, "maxBatchDelay": 1000, "loadedAtStartup": false, "workers": [ { "id": "9012", "startTime": "2020-10-15T06:53:51.466Z", "status": "READY", "gpu": false, "memoryUsage": 998322176 },...
Everything looks good and now "batchSize": 8
Next, I sent the same request again and got the same error.
2020-10-15 06:57:12,616 [INFO ] W-9008-mymodel_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - model: mymodel, number of batch response mismatched, expect: 1, got: 8.
- torchserve version: 0.2.0
- torch version: 1.6.0
- torchvision version [if any]: 0.7.0
- java version: java 14.0.2 2020-07-14
- Operating System and version: macOS, 10.15.6
Expected Behavior
I run a request with 8 images and get the output.
Ideally, knowing my GPU capacity(=80), I expect that setting batch_size to 10(=gpu capacity/number of images in every request) would work too. In other words, by sending 10 requests simultaneously, torchserve would fully load my GPU and I would receive all 10 responses at the same time.
Current Behavior
Got an error number of batch response mismatched, expect: 1, got: 8
P.S. If I am doing it wrong and I should use another approach to handle requests with 8 ordered images, I would very much appreciate any help and advice!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:7 (2 by maintainers)

Top Related StackOverflow Question
@veronikayurchuk:
In TorchServe, batching happens at the server frontend, where the frontend creates a batch of
n(configured model batch_size, in your case 8) fromninput requests and passes it on to the model handler, and expectsnresponse back from the handler.IIUC, you are doing the following steps:
preprocessIn this case, TorchServe frontend creates a
batchof size1as it receives a single request is configuredmax_batch_delaytime and expects a response of the same size. Which is why you see the following error in logs:There are a couple of approaches for resolving this issue :
Option 1 (recommended): Send every image in an independent request. TorchServe frontend will create a batch from the number of requests received in configured
max_batch_delaytime and pass it on to the handler.Option 2: If your use-case requires sending 8 images in one request only, then return a single list of outputs from your current handler.
@veronikayurchuk was your problem resolved? I faced the same issue, Since the number of request is still 1, which is inherently carrying 8 inputs. Updating the postprocess method solved the issue. Since number of predictions will be equal to the number of inputs. if we return a list of list, this issue will not persist. Would love to know how you handled it. Thanks
@harshbafna I have used gRPC with tensorflow serving and seen huge difference wrt inference time. Fail to observe any speedup here with the difference in the number of inputs. Is it because of the layer of handler in between? Can you share your insights on benchmarks/profiling so as to what is causing the delay.