question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hi all, I tested the training example in readme. I found that the volatile GPU-util of almost all GPUs are 0% except the first one but took all GPU memories. I’m not sure whether it’s a tensorflow or tensor2tensor error.

Thank you

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M40 24GB      Off  | 0000:04:00.0     Off |                    0 |
| N/A   56C    P0   187W / 250W |  21871MiB / 22939MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M40 24GB      Off  | 0000:05:00.0     Off |                    0 |
| N/A   28C    P0    56W / 250W |  21806MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M40 24GB      Off  | 0000:08:00.0     Off |                    0 |
| N/A   28C    P0    55W / 250W |  21804MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M40 24GB      Off  | 0000:09:00.0     Off |                    0 |
| N/A   29C    P0    55W / 250W |  21804MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla M40 24GB      Off  | 0000:86:00.0     Off |                    0 |
| N/A   29C    P0    56W / 250W |  21808MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla M40 24GB      Off  | 0000:87:00.0     Off |                    0 |
| N/A   27C    P0    57W / 250W |  21806MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla M40 24GB      Off  | 0000:8A:00.0     Off |                    0 |
| N/A   30C    P0    57W / 250W |  21804MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla M40 24GB      Off  | 0000:8B:00.0     Off |                    0 |
| N/A   27C    P0    56W / 250W |  21804MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:16 (13 by maintainers)

github_iconTop GitHub Comments

31reactions
lukaszkaisercommented, Jun 22, 2017

A single step runs 1 batch on each GPU. So it’s always slower with more GPUs: in addition to running it on every GPU, you need them all to sync. But you’re running 8x more examples effectively.

Our batch_size might be a bit misleading: it’s calculated (1) per-token and (2) per-gpu. To understand (1), assume you’re first processing a batch of sentences of 15 words each, and then another, of sentences of 40 words each. If you keep the same batch size, your memory use in the second case (40 words) will be over 2x more than in the first case (15 words), as all hidden activations have a length dimension. So either you set a low batch_size and under-utilize your GPU in the 15-words case, or you try to set a high one, and possibly get an OOM in the 40-words case. That’s why we have a per-token batch size: it’s variable depending on token length. If batch_size=4096 then if the sentence has 15 words, we’ll actually get a batch of 4096 // 15 = 273 sentences. But if the length is 40, we’ll take a batch of 4096 // 40 = 102 sentences. But, coming to (2), this is per-gpu. If you have another GPU, you can easily process another batch of the same size there. We want to avoid changing batch sizes when we use more GPUs, because we sometimes test a model on 1 or 2 GPUs and then run it on more. It’s helpful to not have to change the hyperparameters each time when running on different hardware. That’s why a batch_size = 4096 actually means you’re running 4096 tokens (not sentences) on each GPU you have. So 0.912966 * 1 * 4096 = 3740 tokens / s in 1-GPU case, and 0.762162 * 8 * 4096 = 24975 tokens / s on 8 GPUs.

Hope that helps to understand it!

0reactions
EthannyDingcommented, May 17, 2019

I have the same problem that my sole gpu usage is close to 0 and cpu usage is high while training for tensor2tensor model. I’m using the following installations: tensor2tensor==1.7.0 tensorboard==1.13.1 tensorflow==1.13.1 tensorflow-estimator==1.13.0 tensorflow-gpu==1.13.1 tensorflow-tensorboard==1.5.1 I tried to uninstall tensorflow and kept tensorflow-gpu and it reported an error that ModuleNotFoundError: No module named ‘tensorflow.python’ Is anyone know what is going wrong in the training??

Read more comments on GitHub >

github_iconTop Results From Across the Web

What Should Your GPU Utilization Be? [Different Workloads ...
During regular desktop use, your GPU utilization shouldn't be very high. If you aren't watching any videos or something of that nature, your...
Read more >
How to Monitor GPU Usage in the Windows Task Manager
To monitor overall GPU resource usage statistics, click the “Performance” tab and look for the “GPU” option in the sidebar—you may have to ......
Read more >
How To Check GPU Usage In Windows - TechNewsToday
You can check overall GPU Utilization, Dedicated/Shared GPU Memory, usage per engine, and much more directly from the Task Manager.
Read more >
GPU usage monitoring (CUDA) - Unix & Linux Stack Exchange
For Nvidia GPUs there is a tool nvidia-smi that can show memory usage, GPU utilization and temperature of GPU. There also is a...
Read more >
Is 100% GPU Usage Bad or Good? How to Fix 100 ... - MiniTool
GPU usage is a quite contextual parameter thus it reaches different values in different games. For heavy games, 100% GPU usage is good,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found