question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multi request batching

See original GitHub issue

I’v been looking in previous issues, but I could not find satisfying answer.

I have packed model using model-archiver in docker.

docker run --rm -it --name mar -v $(pwd)/output:/output -v \
$(pwd)/model:/model -v $(pwd)/src/:/src pytorch/torchserve:latest \
torch-model-archiver --model-name u2net --version ${MODEL_VERSION:-'1.0'} \
--model-file /src/u2net.py \
--serialized-file /model/u2net.pth --export-path /output \
--extra-files /src/unet_classes.py --handler /src/custom_handler.py

Than I run model in docker.

docker run --rm -it -v $(pwd)/output:/home/model-server/model-store \
-v $(pwd)/config.properties:/tmp/config.properties \
-p 8080:8080 -p 8081:8081 -p 8082:8082 pytorch/torchserve:latest \
torchserve --start --model-store model-store --ts-config /tmp/config.properties
Python executable: /usr/bin/python3
Config file: /tmp/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: u2net.mar
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 12
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server/model-store
Model config: {"u2net": {"1.0": {"defaultVersion": true,"marName": "u2net.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 8,"maxBatchDelay": 250,"responseTimeout": 120}},"u2netp": {"1.0": {"defaultVersion": true,"marName": "u2netp.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 8,"maxBatchDelay": 250,"responseTimeout": 120}}}
2021-09-03 09:10:33,042 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2021-09-03 09:10:33,044 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: u2net.mar
2021-09-03 09:10:35,220 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model u2net
2021-09-03 09:10:35,220 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model u2net
2021-09-03 09:10:35,221 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model u2net loaded.
2021-09-03 09:10:35,221 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: u2net, count: 1

Than I will call model multiple times

curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg,boat.jpg,horse.jpg}"

or from python

import aiohttp
import asyncio
import glob

images = glob.glob('test_data/test_images/*')

async def main():

    async with aiohttp.ClientSession() as session:
        for image in images:
            async with session.post('http://localhost:8080/predictions/u2net', data=open(image, 'rb')) as resp:
                print(resp.status)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

but in TS log I can see, that requests are processed sequentially.

2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:11:58,652 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)
2021-09-03 09:11:58,697 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:903.98|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:076a76ee-d493-4b47-9253-f3e81335ae91,timestamp:1630660318
2021-09-03 09:11:58,700 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:904.05|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:076a76ee-d493-4b47-9253-f3e81335ae91,timestamp:1630660318
2021-09-03 09:11:58,703 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 925
2021-09-03 09:11:58,707 [INFO ] W-9000-u2net_1.0 ACCESS_LOG - /172.17.0.1:53032 "POST /predictions/u2net HTTP/1.1" 200 1182
2021-09-03 09:11:58,707 [INFO ] W-9000-u2net_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:58,708 [DEBUG] W-9000-u2net_1.0 org.pytorch.serve.job.Job - Waiting time ns: 250420767, Backend time ns: 932032627
2021-09-03 09:11:58,708 [INFO ] W-9000-u2net_1.0 TS_METRICS - QueueTime.ms:250|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:58,708 [INFO ] W-9000-u2net_1.0 TS_METRICS - WorkerThreadTime.ms:7|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:11:59,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (1280, 720)
2021-09-03 09:11:59,550 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:574.61|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:9b5fdc78-df93-4c73-9eff-ae2bf4e3142f,timestamp:1630660319
2021-09-03 09:11:59,550 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:574.66|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:9b5fdc78-df93-4c73-9eff-ae2bf4e3142f,timestamp:1630660319
2021-09-03 09:11:59,551 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 578
2021-09-03 09:11:59,551 [INFO ] W-9000-u2net_1.0 ACCESS_LOG - /172.17.0.1:53036 "POST /predictions/u2net HTTP/1.1" 200 832
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,552 [DEBUG] W-9000-u2net_1.0 org.pytorch.serve.job.Job - Waiting time ns: 250759189, Backend time ns: 581669301
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - QueueTime.ms:250|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:11:59,552 [INFO ] W-9000-u2net_1.0 TS_METRICS - WorkerThreadTime.ms:4|#Level:Host|#hostname:52033c12a7e8,timestamp:null
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-09-03 09:12:00,421 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (770, 595)
2021-09-03 09:12:00,425 [INFO ] W-9000-u2net_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:608.38|#ModelName:u2net,Level:Model|#hostname:52033c12a7e8,requestID:e36697cd-22a4-4074-9d68-b93927f7ef45,timestamp:1630660320
2021-09-03 09:12:00,425 [INFO ] W-9000-u2net_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 610

Context

We would like to batch multiple requests and do the inference just once for more requests.

  • torchserve version: 0.4.2
  • torch-model-archiver version: 0.4.2
  • torch version:
  • torchvision version [if any]:
  • torchtext version [if any]:
  • torchaudio version [if any]:
  • java version:
  • Operating System and version:

Your Environment

There is a full repository to reproduce https://github.com/Biano-AI/TorchServe-u2net-handler

  • Installed using source? [yes/no]: no
  • Are you planning to deploy it using docker container? [yes/no]: yes
  • Is it a CPU or GPU environment?: both
  • Using a default/custom handler? [If possible upload/share custom handler/model]: custom handler
  • What kind of model is it e.g. vision, text, audio?: vision
  • Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]:
  • Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs:
  • Link to your project [if any]: https://github.com/Biano-AI/TorchServe-u2net-handler/

Custom handler


class U2NetHandler(BaseHandler):

    def preprocess(self, data):
        """
         Scales, crops, and normalizes a PIL image for a PyTorch model,
         returns an Numpy array
        """
        normalize = Compose([
            Resize((320, 320)),
            ToTensor(),
            Normalize(mean=[0.485, 0.456, 0.406],
                      std=[0.229, 0.224, 0.225])
        ])
        return torch.stack([normalize(im) for im in data])

    def _get_mask_bytes(self, img, mask):
        logger.info(img.size)
        return Image.fromarray(mask).resize(img.size, Image.BILINEAR).tobytes()

    def postprocess(self, images, output):
        pred = output[0][:, 0, :, :]
        predict = self._normPRED(pred)
        predict_np = predict.cpu().detach().numpy()
        logger.info(f'predict_np shape {predict_np.shape}')
        res = []
        i = 0
        for im in images:
            logger.info(f'postprocessing image {i}')
            mask = (predict_np[i] * 255).astype(np.uint8)
            res.append(self._get_mask_bytes(im, mask))
        return res

    # normalize the predicted SOD probability map
    # from oficial U^2-Net repo
    def _normPRED(self, d):
        ma = torch.max(d)
        mi = torch.min(d)
        dn = (d - mi) / (ma - mi)
        return dn

    def load_images(self, data):
        images = []
        for row in data:
            image = row.get("data") or row.get("body")
            if isinstance(image, str):
                image = base64.b64decode(image)
            image = Image.open(io.BytesIO(image))
            images.append(image)
        return images

    def handle(self, data, context):
        start_time = time.time()

        self.context = context
        metrics = self.context.metrics

        images = self.load_images(data)
        data_preprocess = self.preprocess(images)

        if not self._is_explain():
            output = self.inference(data_preprocess)
            output = self.postprocess(images, output)
        else:
            output = self.explain_handle(data_preprocess, data)

        stop_time = time.time()
        metrics.add_time('HandlerTime', round((stop_time - start_time) * 1000, 2), None, 'ms')
        return output

Expected Behavior

I understand from documentation, that TS should be able to aggregate multiple requests and call model just once. If not, please aplogoze…

Thanks

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:14 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Vert53commented, Aug 18, 2022

This should really be added to the batch inferencing documentation as the example there only shows how to run 1 image. I was pretty confused until I stumbled on this issue.

https://pytorch.org/serve/batch_inference_with_ts.html

“”" Run inference to test the model.

$ curl http://localhost:8080/predictions/resnet-152-batch_v2 -T kitten.jpg { “tiger_cat”: 0.5848360657691956, “tabby”: 0.3782736361026764, “Egyptian_cat”: 0.03441936895251274, “lynx”: 0.0005633446853607893, “quilt”: 0.0002698268508538604 } “”"

1reaction
toretakcommented, Oct 8, 2021

@msaroufim you are absolutely right … I was sending requests using this curl all the time

curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg,boat.jpg,horse.jpg}"

and that was the issue

when are requests sent like this

curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg}" & curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{boat.jpg}"

batching works!

...
2021-10-08 07:29:12,653 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2021-10-08 07:29:12,674 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - model_name: u2net, batchSize: 8
2021-10-08 07:29:20,237 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-08 07:29:20,377 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (2, 320, 320)
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (1280, 720)
2021-10-08 07:29:21,388 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-08 07:29:21,389 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)

so thank you very much! It’s a quite surprise for me …

Read more comments on GitHub >

github_iconTop Results From Across the Web

Combine multiple requests in one HTTP call using JSON ...
JSON batching allows you to optimize your application by combining multiple requests (up to 20) into a single JSON object.
Read more >
Request Batch - Martin Fowler
Combine multiple requests together into a single request batch. The batch of the request will be sent to the cluster node for processing....
Read more >
Batching requests with multi - Leanplum Documentation
The first step in batching API actions into the multi call is to make a POST request to https://api.leanplum.com/api?action=multi . To complete the...
Read more >
Batching Requests | Google Classroom API
A batch request consists of multiple API calls combined into one HTTP request, which can be sent to the batchPath specified in the...
Read more >
Baking a Batch: Processing Multiple Requests at Once - Smarty
The answer is batch processing—a term that refers to any tool or function that allows a user to dump a whole mess of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found