Multi request batching
See original GitHub issueI’v been looking in previous issues, but I could not find satisfying answer.
I have packed model using model-archiver in docker.
docker run --rm -it --name mar -v $(pwd)/output:/output -v \
$(pwd)/model:/model -v $(pwd)/src/:/src pytorch/torchserve:latest \
torch-model-archiver --model-name u2net --version ${MODEL_VERSION:-'1.0'} \
--model-file /src/u2net.py \
--serialized-file /model/u2net.pth --export-path /output \
--extra-files /src/unet_classes.py --handler /src/custom_handler.py
Than I run model in docker.
docker run --rm -it -v $(pwd)/output:/home/model-server/model-store \
-v $(pwd)/config.properties:/tmp/config.properties \
-p 8080:8080 -p 8081:8081 -p 8082:8082 pytorch/torchserve:latest \
torchserve --start --model-store model-store --ts-config /tmp/config.properties
Python executable: /usr/bin/python3
Config file: /tmp/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: u2net.mar
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 12
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server/model-store
Model config: {"u2net": {"1.0": {"defaultVersion": true,"marName": "u2net.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 8,"maxBatchDelay": 250,"responseTimeout": 120}},"u2netp": {"1.0": {"defaultVersion": true,"marName": "u2netp.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 8,"maxBatchDelay": 250,"responseTimeout": 120}}}
2021-09-03 09:10:33,042 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2021-09-03 09:10:33,044 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: u2net.mar
2021-09-03 09:10:35,220 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model u2net
2021-09-03 09:10:35,220 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model u2net
2021-09-03 09:10:35,221 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model u2net loaded.
2021-09-03 09:10:35,221 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: u2net, count: 1
Than I will call model multiple times
curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg,boat.jpg,horse.jpg}"
or from python
import aiohttp
import asyncio
import glob
images = glob.glob('test_data/test_images/*')
async def main():
async with aiohttp.ClientSession() as session:
for image in images:
async with session.post('http://localhost:8080/predictions/u2net', data=open(image, 'rb')) as resp:
print(resp.status)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
but in TS log I can see, that requests are processed sequentially.
2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)
2021-09-03 09:11:58,697 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:903.98|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:076a76ee-d493-4b47-9253-f3e81335ae91,timestamp:1630660318
2021-09-03 09:11:58,700 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:904.05|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:076a76ee-d493-4b47-9253-f3e81335ae91,timestamp:1630660318
2021-09-03 09:11:58,703 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 925
2021-09-03 09:11:58,707 [INFO ] W-9000-u2net_1.0 ACCESS_LOG - /172.17.0.1:53032 "POST /predictions/u2net HTTP/1.1" 200 1182
2021-09-03 09:11:58,707 [INFO ] W-9000-u2net_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:58,708 [DEBUG] W-9000-u2net_1.0 org.pytorch.serve.job.Job - Waiting time ns: 250420767, Backend time ns: 932032627
2021-09-03 09:11:58,708 [INFO ] W-9000-u2net_1.0 TS_METRICS - QueueTime.ms:250|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:58,708 [INFO ] W-9000-u2net_1.0 TS_METRICS - WorkerThreadTime.ms:7|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (1280, 720)
2021-09-03 09:11:59,550 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:574.61|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:9b5fdc78-df93-4c73-9eff-ae2bf4e3142f,timestamp:1630660319
2021-09-03 09:11:59,550 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:574.66|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:9b5fdc78-df93-4c73-9eff-ae2bf4e3142f,timestamp:1630660319
2021-09-03 09:11:59,551 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 578
2021-09-03 09:11:59,551 [INFO ] W-9000-u2net_1.0 ACCESS_LOG - /172.17.0.1:53036 "POST /predictions/u2net HTTP/1.1" 200 832
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,552 [DEBUG] W-9000-u2net_1.0 org.pytorch.serve.job.Job - Waiting time ns: 250759189, Backend time ns: 581669301
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - QueueTime.ms:250|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - WorkerThreadTime.ms:4|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (770, 595)
2021-09-03 09:12:00,425 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:608.38|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:e36697cd-22a4-4074-9d68-b93927f7ef45,timestamp:1630660320
2021-09-03 09:12:00,425 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 610
Context
We would like to batch multiple requests and do the inference just once for more requests.
- torchserve version: 0.4.2
- torch-model-archiver version: 0.4.2
- torch version:
- torchvision version [if any]:
- torchtext version [if any]:
- torchaudio version [if any]:
- java version:
- Operating System and version:
Your Environment
There is a full repository to reproduce https://github.com/Biano-AI/TorchServe-u2net-handler
- Installed using source? [yes/no]: no
- Are you planning to deploy it using docker container? [yes/no]: yes
- Is it a CPU or GPU environment?: both
- Using a default/custom handler? [If possible upload/share custom handler/model]: custom handler
- What kind of model is it e.g. vision, text, audio?: vision
- Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]:
- Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs:
- Link to your project [if any]: https://github.com/Biano-AI/TorchServe-u2net-handler/
Custom handler
class U2NetHandler(BaseHandler):
def preprocess(self, data):
"""
Scales, crops, and normalizes a PIL image for a PyTorch model,
returns an Numpy array
"""
normalize = Compose([
Resize((320, 320)),
ToTensor(),
Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
return torch.stack([normalize(im) for im in data])
def _get_mask_bytes(self, img, mask):
logger.info(img.size)
return Image.fromarray(mask).resize(img.size, Image.BILINEAR).tobytes()
def postprocess(self, images, output):
pred = output[0][:, 0, :, :]
predict = self._normPRED(pred)
predict_np = predict.cpu().detach().numpy()
logger.info(f'predict_np shape {predict_np.shape}')
res = []
i = 0
for im in images:
logger.info(f'postprocessing image {i}')
mask = (predict_np[i] * 255).astype(np.uint8)
res.append(self._get_mask_bytes(im, mask))
return res
# normalize the predicted SOD probability map
# from oficial U^2-Net repo
def _normPRED(self, d):
ma = torch.max(d)
mi = torch.min(d)
dn = (d - mi) / (ma - mi)
return dn
def load_images(self, data):
images = []
for row in data:
image = row.get("data") or row.get("body")
if isinstance(image, str):
image = base64.b64decode(image)
image = Image.open(io.BytesIO(image))
images.append(image)
return images
def handle(self, data, context):
start_time = time.time()
self.context = context
metrics = self.context.metrics
images = self.load_images(data)
data_preprocess = self.preprocess(images)
if not self._is_explain():
output = self.inference(data_preprocess)
output = self.postprocess(images, output)
else:
output = self.explain_handle(data_preprocess, data)
stop_time = time.time()
metrics.add_time('HandlerTime', round((stop_time - start_time) * 1000, 2), None, 'ms')
return output
Expected Behavior
I understand from documentation, that TS should be able to aggregate multiple requests and call model just once. If not, please aplogoze…
Thanks
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (4 by maintainers)
Top Results From Across the Web
Combine multiple requests in one HTTP call using JSON ...
JSON batching allows you to optimize your application by combining multiple requests (up to 20) into a single JSON object.
Read more >Request Batch - Martin Fowler
Combine multiple requests together into a single request batch. The batch of the request will be sent to the cluster node for processing....
Read more >Batching requests with multi - Leanplum Documentation
The first step in batching API actions into the multi call is to make a POST request to https://api.leanplum.com/api?action=multi . To complete the...
Read more >Batching Requests | Google Classroom API
A batch request consists of multiple API calls combined into one HTTP request, which can be sent to the batchPath specified in the...
Read more >Baking a Batch: Processing Multiple Requests at Once - Smarty
The answer is batch processing—a term that refers to any tool or function that allows a user to dump a whole mess of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This should really be added to the batch inferencing documentation as the example there only shows how to run 1 image. I was pretty confused until I stumbled on this issue.
https://pytorch.org/serve/batch_inference_with_ts.html
“”" Run inference to test the model.
$ curl http://localhost:8080/predictions/resnet-152-batch_v2 -T kitten.jpg { “tiger_cat”: 0.5848360657691956, “tabby”: 0.3782736361026764, “Egyptian_cat”: 0.03441936895251274, “lynx”: 0.0005633446853607893, “quilt”: 0.0002698268508538604 } “”"
@msaroufim you are absolutely right … I was sending requests using this curl all the time
and that was the issue
when are requests sent like this
batching works!
so thank you very much! It’s a quite surprise for me …