question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Batching for Large dataset

See original GitHub issue

I have looked into documentation and previous issue but I could find the satisfying answer for batching large datasets.

Server

Torchserve version: 0.4.2
TS Home: /home/gunalan/miniconda3/envs/classification_pipeline/lib/python3.8/site-packages
Current directory: /home/gunalan/PycharmProjects/classification_pipeline/torch serving
Temp directory: /tmp
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 3920 M
Python executable: /home/gunalan/miniconda3/envs/classification_pipeline/bin/python
Config file: config/config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/gunalan/PycharmProjects/classification_pipeline/torch serving/model_store
Initial Models: N/A
Log dir: /home/gunalan/PycharmProjects/classification_pipeline/torch serving/logs
Metrics dir: /home/gunalan/PycharmProjects/classification_pipeline/torch serving/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 16553500
Maximum Request Size: 16553500
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/gunalan/PycharmProjects/classification_pipeline/torch serving/model_store
Model config: N/A
2021-10-04 20:53:23,857 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2021-10-04 20:53:23,899 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2021-10-04 20:53:24,008 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2021-10-04 20:53:24,009 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2021-10-04 20:53:24,010 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2021-10-04 20:53:24,010 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2021-10-04 20:53:24,011 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.

Config properties

inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
model_store=/home/gunalan/PycharmProjects/classification_pipeline/torch serving/model_store
service_envelope=json
max_request_size=16553500
max_response_size=16553500

Registered model

curl -X POST "http://localhost:8081/models?model_name=classification-sample&url=classification-sample.mar&initial_workers=1&batch_size=3"

I am using json as service envelope and I am passing the request through rest api Sample input json request: I am sending 2 images in a single request for prediction

request = {
    "instances":
        [
            {
                "b64": bytes_array[0]
            },
            {
                "b64": bytes_array[1]
            }
        ]
}

resp = requests.post(url, json=request)

Solutions tried

  • I can send multiple data in single request as mentioned above but there is problem of request size. Though I can handle that with max_request_size in config.properties, I can’t send large dataset of images which might be in GB’s.
  • I can send parallel request where torchserve handles the batches based on batch_size and max_batch_delay mentioned while registering the model. Please let me know if my understanding is wrong here.
def post_request(args):
    return requests.post(args[0], json=args[1])


list_of_urls = [(url, request)]*3
with ThreadPoolExecutor(max_workers=10) as pool:
    response_list = list(pool.map(post_request, list_of_urls))

Sever log

2021-10-04 21:01:44,397 [INFO ] epollEventLoopGroup-3-5 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:N9596,timestamp:null
2021-10-04 21:02:13,460 [INFO ] W-9000-classification-sample_1.0-stdout MODEL_LOG - Type of data<class 'list'>
2021-10-04 21:02:14,201 [INFO ] W-9000-classification-sample_1.0-stdout MODEL_LOG - Shape of the data: torch.Size([6, 3, 224, 224])
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1020
2021-10-04 21:02:14,363 [INFO ] W-9000-classification-sample_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:902.92|#ModelName:classification-sample,Level:Model|#hostname:N9596,requestID:da485326-b1f7-4f06-a652-9e0710771d56,199e0b94-9725-437e-8fbf-51af67f01ed1,timestamp:1633361534
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0 ACCESS_LOG - /127.0.0.1:59388 "POST /predictions/classification-sample HTTP/1.1" 200 1246
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:N9596,timestamp:null
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:998.72|#ModelName:classification-sample,Level:Model|#hostname:N9596,requestID:da485326-b1f7-4f06-a652-9e0710771d56,199e0b94-9725-437e-8fbf-51af67f01ed1,timestamp:1633361534
2021-10-04 21:02:14,364 [DEBUG] W-9000-classification-sample_1.0 org.pytorch.serve.job.Job - Waiting time ns: 100926211, Backend time ns: 1110781497
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0 TS_METRICS - QueueTime.ms:100|#Level:Host|#hostname:N9596,timestamp:null

Output

b'{"predictions": [{"label1": 1.0}, {"label2": 1.0}]}'
b'{"predictions": [{"label1": 1.0}, {"label2": 1.0}]}'
b'{"predictions": [{"label1": 1.0}, {"label2": 1.0}]}'

Here there are 3 request, each of 2 images has been accumulated to single data of batch 6, since the batch size is 3. If am going with this solution , I have to handle the batches inside the handler too, if the data is too large. Then I need to have two batch size one while registering the model and one inside the handler.

  • I can batch the data externally before sending into the torchserve.

Problem

In order to do batching for large dataset, is there any other efficient way apart from the solutions that I mentioned?. If possible, the predictions for the entire dataset should happen in single request.

Thanks in advance

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
gunalan-lakshmanancommented, Oct 6, 2021

But here I could only do that via batch_size parameter which handles batching based on number of requests, not based on number of data

That is correct, is your ask about making it possible to add a max possible memory on total number of requests sent to torchserve. If you know roughly how large an example in a batch is, you can set the batch size to determine how large the amount of memory should be involved in an inference

So if you want to do inferences for 10,000 images either you can do what you’re doing now which is sending a batch request to torchserve (although I’m not too familiar on how much quicker that is vs sending 10,000 requests) or just sending a single request per image

If you’re trying to optimize the speed of inference on 10,000 images you also need to decide what’s more important to you latency or throughput? If latency you want a small batch size, if throughput you want a larger batch size

Also is your bottleneck the model or sending data to the model? Tools here should help you answer that https://github.com/pytorch/serve/tree/master/benchmarks

I have do the benchmarks and see which option is better for this. Thanks @msaroufim.

0reactions
msaroufimcommented, Oct 14, 2021

Excellent, feel free to reopen this if you need any more help

Read more comments on GitHub >

github_iconTop Results From Across the Web

Batch Processing Large Data Sets
Batch processing allows you to automatically compensate and analyze a group of files with one analysis template and compensation matrix. To perform batch ......
Read more >
[Question] Best way to batch a large dataset? #315 - GitHub
I'm training on large datasets such as Wikipedia and BookCorpus. ... So I tried manual batching using dataset.select() :.
Read more >
Process large-scale datasets by using Data Factory and Batch
Describes how to process huge amounts of data in an Azure Data Factory pipeline by using the parallel processing capability of Azure Batch....
Read more >
Train Keras Model with Large dataset (Batch Training) - Medium
Hi Folks!! In this blog I am going to discuss a very interesting feature of Keras. While training any deep learning model, the...
Read more >
Efficient Dynamic Batching of Large Datasets with Infinibatch
For large datasets, loading the entire data into memory might not be possible. If we were to sample fully random batches we need...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found