question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Low GPU utilization with tfjs-node-gpu

See original GitHub issue

TensorFlow.js version

  "dependencies": {
    "@tensorflow/tfjs": "^0.11.4",
    "@tensorflow/tfjs-node": "^0.1.5",
    "@tensorflow/tfjs-node-gpu": "^0.1.7",
}

Browser version

N/A. Node v8.9.4. Ubuntu 16.04

Describe the problem or feature request

Using tfjs-node-gpu, I can’t seem to get GPU utilization above ~0-3%. I have CUDA 9 and CuDNN 7.1 installed, am importing @tensorflow/tfjs-node-gpu, and am setting the “tensorflow” backend with tf.setBackend('tensorflow'). CPU usage is at 100% on one core, but GPU utilization is practically none. I’ve tried tfjs-examples/baseball-node (replacing import'@tensorflow/tfjs-node' with import'@tensorflow/tfjs-node-gpu' of course) as well as my own custom LSTM code. Does tfjs-node-gpu actually run processes on the GPU?

Code to reproduce the bug / link to feature request

# assumes CUDA 9, CuDNN 7.1, and latest nvidia drivers are already installed
git clone https://github.com/tensorflow/tfjs-examples
cd tfjs-examples/baseball-node

# replace tfjs-node import with tfjs-node-gpu
sed -i s/tfjs-node/tfjs-node-gpu/ src/server/server.ts

# install dependencies and download data
yarn add @tensorflow/tfjs-node-gpu
yarn && yarn download-data

# start the server
yarn start-server

Now open another terminal and watch GPU usage. Note that if you are running the process on the same GPU as an X window server GPU usage will likely be greater than 3% because of that process. I’ve tested this on a dedicated GPU running no other processes using the CUDA_VISIBLE_DEVICES env var.

# monitor GPU utilization
watch -n 0.1 nvidia-smi

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
brannondorseycommented, Jun 28, 2018

Gotcha. Thanks for that clarification. I’ve revisited the char-rnn tfjs-node-gpu example I was telling you about and it looks like it is indeed running on the GPU as memory is allocated, but GPU utilization is ~1%. If I’m understanding you correctly this is because tfjs-node-gpu is using TF Eager mode. So I should expect the same type of model to run ~1 GPU utilization if it were written in Python using TF Eager mode as well, correct?

Does tfjs-node-gpu intend to add support for graph-based execution at some point in the near future? Unless I’m missing something, this “Eager mode only” behavior creates some significance performance hurdles, no? In general, how does tfjs-node-gpu compare in performance to similar implementations in Keras?

I ask because I’m writing some documentation for my team and am beginning to consider a javascript-first approach to common high-level ML tasks. A year ago that would have seemed like a crazy idea, but with tfjs, maybe not so. Basically I’m curious if tfjs-node-gpu will ever be comparable in performance to Keras and Python Tensorflow?

0reactions
f4z3k4scommented, Jan 27, 2022

We actually experience the same. Running our model on CPU takes ~400ms, running it on GPU takes ~3000ms. This happens on a server with two NVIDIA GeForce RTX 3090 and cuda 11.6 with cudnn 8.3. Relevant logs:

2022-01-27 22:48:03.044007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 19758 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:65:00.0, compute capability: 8.6
2022-01-27 22:48:03.044598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22307 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:b4:00.0, compute capability: 8.6
2022-01-27 22:48:04.985189: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8302
2022-01-27 22:48:06.383271: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

I can confrim that cuda is installed well as I am able to utilize it with several other tools correctly.

This does not happen in the browser though, running on WebGL is way faster than CPU inference.


UPDATE: I actually have to admit, that I was only testing these by only doing 1 inference instead of 100s or 1000s. I created test suites for larger magnitudes of inference, and it's actually true that copying the model to GPU memory is what takes a lot of time. After that's done, GPU inference is way faster than CPU inference:

GPU info:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:65:00.0 Off |                  N/A |
|  0%   26C    P8    34W / 390W |   2552MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:B4:00.0 Off |                  N/A |
|  0%   28C    P8    24W / 350W |      3MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    360790      C   ...9TtSrW0h-py3.7/bin/python     2549MiB |

CPU info:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              2
Core(s) per socket:              12
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
Stepping:                        4
CPU MHz:                         1000.089
CPU max MHz:                     3200.0000
CPU min MHz:                     1000.0000
BogoMIPS:                        4600.00
Virtualization:                  VT-x
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        12 MiB
L3 cache:                        16.5 MiB
NUMA node0 CPU(s):               0-23

Following were the results for averaging 100 inferences on a hot GPU (model is loaded to GPU memory and not disposed between model.execute calls):

GPU CPU
yolov5 s model 61.9ms 146.6ms
yolov5 m model 73.9ms 255.1ms
yolov5 l model 85.1ms 386.4ms
yolov5 x model 97.3ms 609.1ms
Read more comments on GitHub >

github_iconTop Results From Across the Web

Requirements for TFJS on GPU: Trying to Compare the ...
I tried to use an NVIDIA GeForce RTX 1080Ti ...
Read more >
Running Tensorflow JS on an NVIDIA Jetson
js on a wide variety of platforms. In this post we will show how to run Tensorflow JS (TFjs) on a Jetson device...
Read more >
CPU load settings tfjs-node - General Discussion
and only about 25% gpu utilization. I saw two node processes, one at 20% cpu, which I think is feeding the GPU, and...
Read more >
How to identify low GPU utilization due to small batch size
How to identify low GPU utilization due to small batch size · 1. Prepare training dataset · 2. Create...
Read more >
tensorflow low gpu utilization
Low GPU utilization with tfjs-node-gpu #468. Closed ('tensorflow'). CPU usage is at 100% on one core, but GPU utilization is practically none. Low...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found