About per_process_gpu_memory_fraction and multi gpu.
See original GitHub issueHi,I have som problems when I train the model in the multi gpu way. Just like that
ValueError: To call multi_gpu_model
with gpus=2
, we expect the following devices to be available: [‘/cpu:0’, ‘/gpu:0’, ‘/gpu:1’]. However this machine only has: [u’/cpu:0’]. Try reducing gpus
.
But I can train the model in one gpu. And the memory of gpu just cost 105MB, no matter how can i modified the batchsize from 8 to 128. Could you give me some help? Thank you very much.
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Use a GPU | TensorFlow Core
The simplest way to run on multiple GPUs, on one or many machines, is using ... By default, TensorFlow maps nearly all of...
Read more >Multi-Process Service :: GPU Deployment and Management ...
Without MPS each CUDA processes using a GPU allocates separate storage ... This mechanism provides a facility to fractionalize GPU memory ...
Read more >How to limit GPU Memory in TensorFlow 2.0 (and 1.x)
This code below corresponds to TF2.0's 2nd option, but it sets memory fraction, not a definite value. # change the memory fraction as...
Read more >A parameter to set per process gpu memory limit/fraction #1383
I am interested to know if you are looking to: Torchserve instance ( its multi-model serving, so it can have multi models and...
Read more >How To Fit a Bigger Model and Train It Faster - Hugging Face
Training ever larger models can become challenging even on modern GPUs. Due to their immense size we often run out of GPU memory...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m running 1.7.
It sounds to me like your GPUs are not being found / used at all. The network takes way more than 105mb. Does
nvidia-smi
list your GPU?