Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multi request batching

See original GitHub issue

I’v been looking in previous issues, but I could not find satisfying answer.

I have packed model using model-archiver in docker.

docker run --rm -it --name mar -v $(pwd)/output:/output -v \
$(pwd)/model:/model -v $(pwd)/src/:/src pytorch/torchserve:latest \
torch-model-archiver --model-name u2net --version ${MODEL_VERSION:-'1.0'} \
--model-file /src/u2net.py \
--serialized-file /model/u2net.pth --export-path /output \
--extra-files /src/unet_classes.py --handler /src/custom_handler.py

Than I run model in docker.

docker run --rm -it -v $(pwd)/output:/home/model-server/model-store \
-v $(pwd)/config.properties:/tmp/config.properties \
-p 8080:8080 -p 8081:8081 -p 8082:8082 pytorch/torchserve:latest \
torchserve --start --model-store model-store --ts-config /tmp/config.properties

Python executable: /usr/bin/python3
Config file: /tmp/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: u2net.mar
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 12
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server/model-store
Model config: {"u2net": {"1.0": {"defaultVersion": true,"marName": "u2net.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 8,"maxBatchDelay": 250,"responseTimeout": 120}},"u2netp": {"1.0": {"defaultVersion": true,"marName": "u2netp.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 8,"maxBatchDelay": 250,"responseTimeout": 120}}}
2021-09-03 09:10:33,042 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2021-09-03 09:10:33,044 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: u2net.mar
2021-09-03 09:10:35,220 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model u2net
2021-09-03 09:10:35,220 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model u2net
2021-09-03 09:10:35,221 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model u2net loaded.
2021-09-03 09:10:35,221 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: u2net, count: 1

Than I will call model multiple times

curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg,boat.jpg,horse.jpg}"

or from python

import aiohttp
import asyncio
import glob

images = glob.glob('test_data/test_images/*')

async def main():

    async with aiohttp.ClientSession() as session:
        for image in images:
            async with session.post('http://localhost:8080/predictions/u2net', data=open(image, 'rb')) as resp:
                print(resp.status)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

but in TS log I can see, that requests are processed sequentially.

2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)
2021-09-03 09:11:58,697 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:903.98|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:076a76ee-d493-4b47-9253-f3e81335ae91,timestamp:1630660318
2021-09-03 09:11:58,700 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:904.05|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:076a76ee-d493-4b47-9253-f3e81335ae91,timestamp:1630660318
2021-09-03 09:11:58,703 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 925
2021-09-03 09:11:58,707 [INFO ] W-9000-u2net_1.0 ACCESS_LOG - /172.17.0.1:53032 "POST /predictions/u2net HTTP/1.1" 200 1182
2021-09-03 09:11:58,707 [INFO ] W-9000-u2net_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:58,708 [DEBUG] W-9000-u2net_1.0 org.pytorch.serve.job.Job - Waiting time ns: 250420767, Backend time ns: 932032627
2021-09-03 09:11:58,708 [INFO ] W-9000-u2net_1.0 TS_METRICS - QueueTime.ms:250|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:58,708 [INFO ] W-9000-u2net_1.0 TS_METRICS - WorkerThreadTime.ms:7|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (1280, 720)
2021-09-03 09:11:59,550 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:574.61|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:9b5fdc78-df93-4c73-9eff-ae2bf4e3142f,timestamp:1630660319
2021-09-03 09:11:59,550 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:574.66|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:9b5fdc78-df93-4c73-9eff-ae2bf4e3142f,timestamp:1630660319
2021-09-03 09:11:59,551 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 578
2021-09-03 09:11:59,551 [INFO ] W-9000-u2net_1.0 ACCESS_LOG - /172.17.0.1:53036 "POST /predictions/u2net HTTP/1.1" 200 832
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,552 [DEBUG] W-9000-u2net_1.0 org.pytorch.serve.job.Job - Waiting time ns: 250759189, Backend time ns: 581669301
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - QueueTime.ms:250|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - WorkerThreadTime.ms:4|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (770, 595)
2021-09-03 09:12:00,425 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:608.38|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:e36697cd-22a4-4074-9d68-b93927f7ef45,timestamp:1630660320
2021-09-03 09:12:00,425 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 610

Context

We would like to batch multiple requests and do the inference just once for more requests.

torchserve version: 0.4.2
torch-model-archiver version: 0.4.2
torch version:
torchvision version [if any]:
torchtext version [if any]:
torchaudio version [if any]:
java version:
Operating System and version:

Your Environment

There is a full repository to reproduce https://github.com/Biano-AI/TorchServe-u2net-handler

Installed using source? [yes/no]: no
Are you planning to deploy it using docker container? [yes/no]: yes
Is it a CPU or GPU environment?: both
Using a default/custom handler? [If possible upload/share custom handler/model]: custom handler
What kind of model is it e.g. vision, text, audio?: vision
Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]:
Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs:
Link to your project [if any]: https://github.com/Biano-AI/TorchServe-u2net-handler/

Custom handler


class U2NetHandler(BaseHandler):

    def preprocess(self, data):
        """
         Scales, crops, and normalizes a PIL image for a PyTorch model,
         returns an Numpy array
        """
        normalize = Compose([
            Resize((320, 320)),
            ToTensor(),
            Normalize(mean=[0.485, 0.456, 0.406],
                      std=[0.229, 0.224, 0.225])
        ])
        return torch.stack([normalize(im) for im in data])

    def _get_mask_bytes(self, img, mask):
        logger.info(img.size)
        return Image.fromarray(mask).resize(img.size, Image.BILINEAR).tobytes()

    def postprocess(self, images, output):
        pred = output[0][:, 0, :, :]
        predict = self._normPRED(pred)
        predict_np = predict.cpu().detach().numpy()
        logger.info(f'predict_np shape {predict_np.shape}')
        res = []
        i = 0
        for im in images:
            logger.info(f'postprocessing image {i}')
            mask = (predict_np[i] * 255).astype(np.uint8)
            res.append(self._get_mask_bytes(im, mask))
        return res

    # normalize the predicted SOD probability map
    # from oficial U^2-Net repo
    def _normPRED(self, d):
        ma = torch.max(d)
        mi = torch.min(d)
        dn = (d - mi) / (ma - mi)
        return dn

    def load_images(self, data):
        images = []
        for row in data:
            image = row.get("data") or row.get("body")
            if isinstance(image, str):
                image = base64.b64decode(image)
            image = Image.open(io.BytesIO(image))
            images.append(image)
        return images

    def handle(self, data, context):
        start_time = time.time()

        self.context = context
        metrics = self.context.metrics

        images = self.load_images(data)
        data_preprocess = self.preprocess(images)

        if not self._is_explain():
            output = self.inference(data_preprocess)
            output = self.postprocess(images, output)
        else:
            output = self.explain_handle(data_preprocess, data)

        stop_time = time.time()
        metrics.add_time('HandlerTime', round((stop_time - start_time) * 1000, 2), None, 'ms')
        return output

Expected Behavior

I understand from documentation, that TS should be able to aggregate multiple requests and call model just once. If not, please aplogoze…

Thanks

Issue Analytics

State:
Created 2 years ago
Comments:14 (4 by maintainers)

Top GitHub Comments

1reaction

Vert53commented, Aug 18, 2022

This should really be added to the batch inferencing documentation as the example there only shows how to run 1 image. I was pretty confused until I stumbled on this issue.

https://pytorch.org/serve/batch_inference_with_ts.html

“”" Run inference to test the model.

$ curl http://localhost:8080/predictions/resnet-152-batch_v2 -T kitten.jpg { “tiger_cat”: 0.5848360657691956, “tabby”: 0.3782736361026764, “Egyptian_cat”: 0.03441936895251274, “lynx”: 0.0005633446853607893, “quilt”: 0.0002698268508538604 } “”"

1reaction

toretakcommented, Oct 8, 2021

@msaroufim you are absolutely right … I was sending requests using this curl all the time

curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg,boat.jpg,horse.jpg}"

and that was the issue

when are requests sent like this

curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg}" & curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{boat.jpg}"

batching works!

...
2021-10-08 07:29:12,653 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2021-10-08 07:29:12,674 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - model_name: u2net, batchSize: 8
2021-10-08 07:29:20,237 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-08 07:29:20,377 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (2, 320, 320)
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (1280, 720)
2021-10-08 07:29:21,388 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-08 07:29:21,389 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)

so thank you very much! It’s a quite surprise for me …

Top Results From Across the Web

Combine multiple requests in one HTTP call using JSON ...

JSON batching allows you to optimize your application by combining multiple requests (up to 20) into a single JSON object.

Request Batch - Martin Fowler

Combine multiple requests together into a single request batch. The batch of the request will be sent to the cluster node for processing....

Batching requests with multi - Leanplum Documentation

The first step in batching API actions into the multi call is to make a POST request to https://api.leanplum.com/api?action=multi . To complete the...

Batching Requests | Google Classroom API

A batch request consists of multiple API calls combined into one HTTP request, which can be sent to the batchPath specified in the...

Baking a Batch: Processing Multiple Requests at Once - Smarty

The answer is batch processing—a term that refers to any tool or function that allows a user to dump a whole mess of...