Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[50,68,394054]

See original GitHub issue

I am trying to use this code for English-Hindi translation. Following are the details of my corpus training file = 1761368 lines testing file = 3537 lines validation file = 3537 lines I have a server with 4 GTX 1080 GPUs of 8 Gb each. The htop command shows the following result when this program is not running.

I tried reducing batch size to 16 but nothing worked. Also, I have edited following parameters in config file, MAX_INPUT_TEXT_LEN = 11351 and MAX_OUTPUT_TEXT_LEN = 7898. I tried to run the program with default parameters but still, the same error persists.

Please help. Thanks

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

3reactions

hiteshvaidyacommented, Apr 13, 2018

So, I googled a bit and found the solution. Tensorflow, keras by default takes all the available GPUs. If we want to avoid that then we can tweak the settings. Just for your information, following are the links to how that can be done, https://github.com/keras-team/keras/issues/6031 https://www.tensorflow.org/programmers_guide/using_gpu

Also, we can run the program as, python <program_name> --num_gpus=<no._of_GPUs_u_want_2_allocate> This will take the first 2 GPUs for execution as tensorflow considers GPUs in numeric order.

2reactions

hiteshvaidyacommented, Apr 12, 2018

Thank you so much for your help. I made following changes as per your suggestion and I am able to train. MAX_INPUT_TEXT_LEN = 100 MAX_OUTPUT_TEXT_LEN = 100 INPUT_VOCABULARY_SIZE = 50000 OUTPUT_VOCABULARY_SIZE = 50000 BATCH_SIZE = 50

However, I didn’t find any option for setting the parameters like gpuid or multiple gpuids etc. Please let me know if there is any way to put a control on the use of GPUs. The above experiment that I told was running on 3 GPUs but still showed 10 hours for completing just one epoch. I don’t think it should take that much time because in the past I had run few experiments on OpenNMT backend using the same corpus and it didn’t take so much time.

Thank you.

Top Results From Across the Web

OOM when allocating tensor with shape - Stack Overflow

I can't see the actual GPU usage of my script since tensorflow always steals all memory at the beginning. And the actual problem...

ResourceExhaustedError (see above for traceback): OOM ...

I am trying to perform faster rcnn on a custom dataset based on pascal_VOC. But I get this error when I start to...

How to solve Error of ResourceExhaustedError in Tensorflow

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Node: ......

resource exhausted error using tensorflow on jetson nano

_traceback = tf_stack.extract_stack() ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,77,120,192] ...

Resource Exhausted Error: OOM when allocating tensor

ResourceExhaustedError : OOM when allocating tensor with shape[1 ... the frames and not sacrifice DLC performance, see our preprint here: ...