GPU usage
See original GitHub issueHi all,
I tested the training example in readme.
I found that the volatile GPU-util
of almost all GPUs are 0%
except the first one but took all GPU memories. I’m not sure whether it’s a tensorflow
or tensor2tensor
error.
Thank you
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 24GB Off | 0000:04:00.0 Off | 0 |
| N/A 56C P0 187W / 250W | 21871MiB / 22939MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M40 24GB Off | 0000:05:00.0 Off | 0 |
| N/A 28C P0 56W / 250W | 21806MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M40 24GB Off | 0000:08:00.0 Off | 0 |
| N/A 28C P0 55W / 250W | 21804MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M40 24GB Off | 0000:09:00.0 Off | 0 |
| N/A 29C P0 55W / 250W | 21804MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla M40 24GB Off | 0000:86:00.0 Off | 0 |
| N/A 29C P0 56W / 250W | 21808MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla M40 24GB Off | 0000:87:00.0 Off | 0 |
| N/A 27C P0 57W / 250W | 21806MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla M40 24GB Off | 0000:8A:00.0 Off | 0 |
| N/A 30C P0 57W / 250W | 21804MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla M40 24GB Off | 0000:8B:00.0 Off | 0 |
| N/A 27C P0 56W / 250W | 21804MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Issue Analytics
- State:
- Created 6 years ago
- Comments:16 (13 by maintainers)
Top Results From Across the Web
What Should Your GPU Utilization Be? [Different Workloads ...
During regular desktop use, your GPU utilization shouldn't be very high. If you aren't watching any videos or something of that nature, your...
Read more >How to Monitor GPU Usage in the Windows Task Manager
To monitor overall GPU resource usage statistics, click the “Performance” tab and look for the “GPU” option in the sidebar—you may have to ......
Read more >How To Check GPU Usage In Windows - TechNewsToday
You can check overall GPU Utilization, Dedicated/Shared GPU Memory, usage per engine, and much more directly from the Task Manager.
Read more >GPU usage monitoring (CUDA) - Unix & Linux Stack Exchange
For Nvidia GPUs there is a tool nvidia-smi that can show memory usage, GPU utilization and temperature of GPU. There also is a...
Read more >Is 100% GPU Usage Bad or Good? How to Fix 100 ... - MiniTool
GPU usage is a quite contextual parameter thus it reaches different values in different games. For heavy games, 100% GPU usage is good,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
A single
step
runs 1 batch on each GPU. So it’s always slower with more GPUs: in addition to running it on every GPU, you need them all to sync. But you’re running 8x more examples effectively.Our
batch_size
might be a bit misleading: it’s calculated (1) per-token and (2) per-gpu. To understand (1), assume you’re first processing a batch of sentences of 15 words each, and then another, of sentences of 40 words each. If you keep the same batch size, your memory use in the second case (40 words) will be over 2x more than in the first case (15 words), as all hidden activations have a length dimension. So either you set a lowbatch_size
and under-utilize your GPU in the 15-words case, or you try to set a high one, and possibly get anOOM
in the 40-words case. That’s why we have a per-token batch size: it’s variable depending on token length. Ifbatch_size=4096
then if the sentence has 15 words, we’ll actually get a batch of4096 // 15 = 273
sentences. But if the length is 40, we’ll take a batch of4096 // 40 = 102
sentences. But, coming to (2), this is per-gpu. If you have another GPU, you can easily process another batch of the same size there. We want to avoid changing batch sizes when we use more GPUs, because we sometimes test a model on 1 or 2 GPUs and then run it on more. It’s helpful to not have to change the hyperparameters each time when running on different hardware. That’s why abatch_size = 4096
actually means you’re running 4096 tokens (not sentences) on each GPU you have. So0.912966 * 1 * 4096 = 3740
tokens / s in 1-GPU case, and0.762162 * 8 * 4096 = 24975
tokens / s on 8 GPUs.Hope that helps to understand it!
I have the same problem that my sole gpu usage is close to 0 and cpu usage is high while training for tensor2tensor model. I’m using the following installations: tensor2tensor==1.7.0 tensorboard==1.13.1 tensorflow==1.13.1 tensorflow-estimator==1.13.0 tensorflow-gpu==1.13.1 tensorflow-tensorboard==1.5.1 I tried to uninstall tensorflow and kept tensorflow-gpu and it reported an error that ModuleNotFoundError: No module named ‘tensorflow.python’ Is anyone know what is going wrong in the training??