ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[50,68,394054]
See original GitHub issueI am trying to use this code for English-Hindi translation. Following are the details of my corpus training file = 1761368 lines testing file = 3537 lines validation file = 3537 lines I have a server with 4 GTX 1080 GPUs of 8 Gb each. The htop command shows the following result when this program is not running.
I tried reducing batch size to 16 but nothing worked. Also, I have edited following parameters in config
file, MAX_INPUT_TEXT_LEN = 11351
and MAX_OUTPUT_TEXT_LEN = 7898
. I tried to run the program with default parameters but still, the same error persists.
Please help. Thanks
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
OOM when allocating tensor with shape - Stack Overflow
I can't see the actual GPU usage of my script since tensorflow always steals all memory at the beginning. And the actual problem...
Read more >ResourceExhaustedError (see above for traceback): OOM ...
I am trying to perform faster rcnn on a custom dataset based on pascal_VOC. But I get this error when I start to...
Read more >How to solve Error of ResourceExhaustedError in Tensorflow
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Node: ......
Read more >resource exhausted error using tensorflow on jetson nano
_traceback = tf_stack.extract_stack() ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,77,120,192] ...
Read more >Resource Exhausted Error: OOM when allocating tensor
ResourceExhaustedError : OOM when allocating tensor with shape[1 ... the frames and not sacrifice DLC performance, see our preprint here: ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So, I googled a bit and found the solution. Tensorflow, keras by default takes all the available GPUs. If we want to avoid that then we can tweak the settings. Just for your information, following are the links to how that can be done, https://github.com/keras-team/keras/issues/6031 https://www.tensorflow.org/programmers_guide/using_gpu
Also, we can run the program as,
python <program_name> --num_gpus=<no._of_GPUs_u_want_2_allocate>
This will take the first 2 GPUs for execution as tensorflow considers GPUs in numeric order.Thank you so much for your help. I made following changes as per your suggestion and I am able to train. MAX_INPUT_TEXT_LEN = 100 MAX_OUTPUT_TEXT_LEN = 100 INPUT_VOCABULARY_SIZE = 50000 OUTPUT_VOCABULARY_SIZE = 50000 BATCH_SIZE = 50
However, I didn’t find any option for setting the parameters like gpuid or multiple gpuids etc. Please let me know if there is any way to put a control on the use of GPUs. The above experiment that I told was running on 3 GPUs but still showed 10 hours for completing just one epoch. I don’t think it should take that much time because in the past I had run few experiments on OpenNMT backend using the same corpus and it didn’t take so much time.
Thank you.