[BUG] getting started `03-Training-with-TF` nb gives OOM on a 16 GB GPU
See original GitHub issueDescribe the bug
I am getting OOM issue when I train 03-Training-with-TF
nb on a 16 GB GPU, and this can be problematic for users who are running these notebooks in the cloud with GPU memory sizes.
I can avoid OOM if I comment out os.environ["TF_MEMORY_ALLOCATION"] = "0.7"
line. It works fine then.
Steps/Code to reproduce bug
Run 03-Training-with-TF
to repro.
Expected behavior
Should run without OOM. I recommend to remove os.environ["TF_MEMORY_ALLOCATION"] = "0.7"
line from the example nb.
Environment details (please complete the following information):
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
- Method of NVTabular install: [conda, Docker, or from source]
- If method of install is [Docker], provide
docker pull
&docker run
commands used
- If method of install is [Docker], provide
I am using merlin-tensorflow-training:22.04
.
Issue Analytics
- State:
- Created a year ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
How much GPU Memory do you REALLY need? - YouTube
People get REALLY caught up on Video Card memory... so today lets talk about how much you ACTUALLY need! Learn more about the...
Read more >Radeon Software: GPU Utilization 100% bug - AMD Community
Occurs when the following settings are made in some video games. Remains at 100% usage even after exiting the game program. The Talos...
Read more >Lightroom Classic GPU FAQ - Adobe Support
Learn how to use Adobe Lightroom Classic GPU (graphics processor acceleration, ... 8 GB of dedicated GPU RAM or 16 GB of shared...
Read more >GeForce RTX 30-Series Laptops - NVIDIA
NVIDIA ® GeForce RTX ™ 30 Series Laptop GPUs power the world's fastest laptops for gamers and creators. They're built with Ampere—NVIDIA's 2nd...
Read more >GeForce RTX™ 4080 16GB EAGLE OC - Graphics Card
The 3D Active Fan provides semi-passive cooling, and the fans will remain off when the GPU is in a low load or low...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@EvenOldridge and @karlhigley My understanding is that
os.environ["TF_MEMORY_ALLOCATION"] = "0.5"
is currently the default behavior under the hood (because ofconfigure_tensorflow()
) if we use NVT Keras dataloader. So I dont think we need to define it again in the notebook.Which example is this for? Let me see how it goes on an 11GB GPU. 😺