Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error 507 when predicting

See original GitHub issue

Hello,

I have a rather large model that I need to use for prediction. When I make the request, I receive an error 507 and a message stating that the worker has died on the server side

2020-06-10 11:30:41,286 [DEBUG] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
	at java.base/java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:513)
	at java.base/java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:675)
	at org.pytorch.serve.wlm.Model.pollBatch(Model.java:155)
	at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

I suspect this is somehow related with the JVM memory. I am therefore using a config.properties file with the following entry

vmargs=-Xmx128g (of course the model needs much less than 128gb)

I am not using the GPU for this prediction as it is just a test. I am also running within a docker container.

How can I debug this? is there a way to get better error messages and a stack trace (for example finding out if pytorch has issues allocating the model or similar problems?)

Issue Analytics

State:
Created 3 years ago
Comments:8 (2 by maintainers)

Top GitHub Comments

1reaction

faustomilletaricommented, Jun 10, 2020

Thank you for your help. I will try on a bigger machine in the cloud with a GPU. I suspect that since the Docker container cannot swap, and since there are multiple workers which might holt their own copy of stuff, the memory consumption is so high that the system runs legitimately out of memory.

I will update you within 1 hour.

1reaction

harshbafnacommented, Jun 10, 2020

@faustomilletari : How many workers are you using for your model and can you share the output of top commands for different scenarios like after starting the workers, running the inference with smaller input size and larger input size.

Please also share the different log files from the log folder, which is by default generate on the path from where you start TorchServe

Top Results From Across the Web

507 Insufficient Storage - HTTP Status Code Glossary - WebFX

This condition is considered to be temporary. If the request that received this status code was the result of a user action, the...

507 Insufficient Storage - HTTP - MDN Web Docs

It indicates that a method could not be performed because the server cannot store the representation needed to successfully complete the request ......

How to fix error 507 for Creative Cloud apps - Adobe Support

You get error code 507 when the installation of your Creative Cloud app fails due to the volume for your installed files being...

Error code reference - IBM

Error codes are listed in numerical order. ... No enclosure identity and no node state on partner; 507 ... Flash module is predicted...

Display generic error message and exit - Stata

The by-variable takes on too many different values to construct a readable chart. 135. not possible with weighted data. You attempted to predict...