Batching for Large dataset
See original GitHub issueI have looked into documentation and previous issue but I could find the satisfying answer for batching large datasets.
Server
Torchserve version: 0.4.2
TS Home: /home/gunalan/miniconda3/envs/classification_pipeline/lib/python3.8/site-packages
Current directory: /home/gunalan/PycharmProjects/classification_pipeline/torch serving
Temp directory: /tmp
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 3920 M
Python executable: /home/gunalan/miniconda3/envs/classification_pipeline/bin/python
Config file: config/config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/gunalan/PycharmProjects/classification_pipeline/torch serving/model_store
Initial Models: N/A
Log dir: /home/gunalan/PycharmProjects/classification_pipeline/torch serving/logs
Metrics dir: /home/gunalan/PycharmProjects/classification_pipeline/torch serving/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 16553500
Maximum Request Size: 16553500
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/gunalan/PycharmProjects/classification_pipeline/torch serving/model_store
Model config: N/A
2021-10-04 20:53:23,857 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2021-10-04 20:53:23,899 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2021-10-04 20:53:24,008 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2021-10-04 20:53:24,009 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2021-10-04 20:53:24,010 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2021-10-04 20:53:24,010 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2021-10-04 20:53:24,011 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
Config properties
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
model_store=/home/gunalan/PycharmProjects/classification_pipeline/torch serving/model_store
service_envelope=json
max_request_size=16553500
max_response_size=16553500
Registered model
curl -X POST "http://localhost:8081/models?model_name=classification-sample&url=classification-sample.mar&initial_workers=1&batch_size=3"
I am using json as service envelope and I am passing the request through rest api Sample input json request: I am sending 2 images in a single request for prediction
request = {
"instances":
[
{
"b64": bytes_array[0]
},
{
"b64": bytes_array[1]
}
]
}
resp = requests.post(url, json=request)
Solutions tried
- I can send multiple data in single request as mentioned above but there is problem of request size. Though I can handle that with max_request_size in config.properties, I can’t send large dataset of images which might be in GB’s.
- I can send parallel request where torchserve handles the batches based on batch_size and max_batch_delay mentioned while registering the model. Please let me know if my understanding is wrong here.
def post_request(args):
return requests.post(args[0], json=args[1])
list_of_urls = [(url, request)]*3
with ThreadPoolExecutor(max_workers=10) as pool:
response_list = list(pool.map(post_request, list_of_urls))
Sever log
2021-10-04 21:01:44,397 [INFO ] epollEventLoopGroup-3-5 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:N9596,timestamp:null
2021-10-04 21:02:13,460 [INFO ] W-9000-classification-sample_1.0-stdout MODEL_LOG - Type of data<class 'list'>
2021-10-04 21:02:14,201 [INFO ] W-9000-classification-sample_1.0-stdout MODEL_LOG - Shape of the data: torch.Size([6, 3, 224, 224])
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1020
2021-10-04 21:02:14,363 [INFO ] W-9000-classification-sample_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:902.92|#ModelName:classification-sample,Level:Model|#hostname:N9596,requestID:da485326-b1f7-4f06-a652-9e0710771d56,199e0b94-9725-437e-8fbf-51af67f01ed1,timestamp:1633361534
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0 ACCESS_LOG - /127.0.0.1:59388 "POST /predictions/classification-sample HTTP/1.1" 200 1246
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:N9596,timestamp:null
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:998.72|#ModelName:classification-sample,Level:Model|#hostname:N9596,requestID:da485326-b1f7-4f06-a652-9e0710771d56,199e0b94-9725-437e-8fbf-51af67f01ed1,timestamp:1633361534
2021-10-04 21:02:14,364 [DEBUG] W-9000-classification-sample_1.0 org.pytorch.serve.job.Job - Waiting time ns: 100926211, Backend time ns: 1110781497
2021-10-04 21:02:14,364 [INFO ] W-9000-classification-sample_1.0 TS_METRICS - QueueTime.ms:100|#Level:Host|#hostname:N9596,timestamp:null
Output
b'{"predictions": [{"label1": 1.0}, {"label2": 1.0}]}'
b'{"predictions": [{"label1": 1.0}, {"label2": 1.0}]}'
b'{"predictions": [{"label1": 1.0}, {"label2": 1.0}]}'
Here there are 3 request, each of 2 images has been accumulated to single data of batch 6, since the batch size is 3. If am going with this solution , I have to handle the batches inside the handler too, if the data is too large. Then I need to have two batch size one while registering the model and one inside the handler.
- I can batch the data externally before sending into the torchserve.
Problem
In order to do batching for large dataset, is there any other efficient way apart from the solutions that I mentioned?. If possible, the predictions for the entire dataset should happen in single request.
Thanks in advance
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
I have do the benchmarks and see which option is better for this. Thanks @msaroufim.
Excellent, feel free to reopen this if you need any more help