ConcurrentModificationException when doing batch classification for more than one instance
See original GitHub issue- torchserve version: 0.4.1
- torch-model-archiver version: 0.4.1
- torch version: 1.7.1+cu101
- torchvision version [if any]:
- torchtext version [if any]:
- torchaudio version [if any]:
- java version: openjdk version “11.0.12”
- Operating System and version: Ubuntu 18.04.5 LTS
Getting the following exception:
2021-10-14 22:57:59,490 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.util.ConcurrentModificationException
at java.base/java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
at java.base/java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:741)
at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:81)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:194)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2021-10-14 22:57:59,491 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.BatchAggregator - Unexpected job: 460bf910-0e24-4b65-9ed1-bc2d376352dc
This happens when within the maxBatchDelay time, more than one request gets sent to the server. If only one request is sent, everything is fine and I get back the expected array of a single prediction.
The handler is implemented in a way that the method handle(self, data, context)
returns a list of dicts with as many elements as data contains.
Before returning the list of dicts, the following code is also executed:
for idx, response_content_type in enumerate(response_content_types):
context.set_response_content_type(idx, response_content_type)
where response_content_types
is a list with as many elements as the length of data or the length of the returned list. (in this case, all elements are just ‘application/json’
The exception seems to indicate some problem in the Java code which I assume should be completely independent of the python code running in the handler (the ConcurrentModificationException in java usually indicates that an element of a collection is removed or added while iterating over the collection, I do not think my handler has anything to do with this). I therefore assume this is a severe bug in torch serve or, should it be caused by a client problem, definitely not the right way to indicate the client problem by running into such an exception.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (3 by maintainers)
Top GitHub Comments
PR1272 merged
OK, noticed there were some new commits in the meantime, re-installed from f713d49d9e5ccf61e6be2ea099f2cbe3f9643dff and tested again, works fine!