question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[50,68,394054]

See original GitHub issue

I am trying to use this code for English-Hindi translation. Following are the details of my corpus training file = 1761368 lines testing file = 3537 lines validation file = 3537 lines I have a server with 4 GTX 1080 GPUs of 8 Gb each. The htop command shows the following result when this program is not running. image

I tried reducing batch size to 16 but nothing worked. Also, I have edited following parameters in config file, MAX_INPUT_TEXT_LEN = 11351 and MAX_OUTPUT_TEXT_LEN = 7898. I tried to run the program with default parameters but still, the same error persists.

Please help. Thanks

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
hiteshvaidyacommented, Apr 13, 2018

So, I googled a bit and found the solution. Tensorflow, keras by default takes all the available GPUs. If we want to avoid that then we can tweak the settings. Just for your information, following are the links to how that can be done, https://github.com/keras-team/keras/issues/6031 https://www.tensorflow.org/programmers_guide/using_gpu

Also, we can run the program as, python <program_name> --num_gpus=<no._of_GPUs_u_want_2_allocate> This will take the first 2 GPUs for execution as tensorflow considers GPUs in numeric order.

2reactions
hiteshvaidyacommented, Apr 12, 2018

Thank you so much for your help. I made following changes as per your suggestion and I am able to train. MAX_INPUT_TEXT_LEN = 100 MAX_OUTPUT_TEXT_LEN = 100 INPUT_VOCABULARY_SIZE = 50000 OUTPUT_VOCABULARY_SIZE = 50000 BATCH_SIZE = 50

However, I didn’t find any option for setting the parameters like gpuid or multiple gpuids etc. Please let me know if there is any way to put a control on the use of GPUs. The above experiment that I told was running on 3 GPUs but still showed 10 hours for completing just one epoch. I don’t think it should take that much time because in the past I had run few experiments on OpenNMT backend using the same corpus and it didn’t take so much time.

Thank you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

OOM when allocating tensor with shape - Stack Overflow
I can't see the actual GPU usage of my script since tensorflow always steals all memory at the beginning. And the actual problem...
Read more >
ResourceExhaustedError (see above for traceback): OOM ...
I am trying to perform faster rcnn on a custom dataset based on pascal_VOC. But I get this error when I start to...
Read more >
How to solve Error of ResourceExhaustedError in Tensorflow
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Node: ......
Read more >
resource exhausted error using tensorflow on jetson nano
_traceback = tf_stack.extract_stack() ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,77,120,192] ...
Read more >
Resource Exhausted Error: OOM when allocating tensor
ResourceExhaustedError : OOM when allocating tensor with shape[1 ... the frames and not sacrifice DLC performance, see our preprint here: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found