question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

nv_gpu_memory_used_bytes metric does not decrease on model unload

See original GitHub issue

From local testing with 20.01 the metric nv_gpu_memory_used_bytes exposed on /metrics does not decrease on model unload. Assuming this is expected would there be some way to expose the actual memory used by the loaded models?

I ask as orchestrators (like when running inside kubernetes) might wish to determine the available memory for new models on the server.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
deadeyegoodwincommented, Feb 3, 2020

Any deployment plan should be aware of the limitations of the different backends. TF is not a particularly good framework for inference for a couple of reasons (this memory usage policy being one of them). If you want to have a long-running “dynamic model repository” TRTIS instance where you use the model-control APIs to load and unload models, then you need to account for TF. TRTIS provides a command-line option to limit TF to a fraction of the GPU memory… but this doesn’t cause it to release its memory.

For a system like kubernetes an alternative is to use a “static model repository” TRTIS instance where TRTIS is started/initialized with some set of models and that set of models never changes. If a change is needed then use k8s to change configuration via a rolling update or similar.

Each approach has advantages and disadvantages.

0reactions
cliveseldoncommented, Feb 4, 2020

Thanks for your feedback. Very useful.

Read more comments on GitHub >

github_iconTop Results From Across the Web

nv_gpu_memory_used_bytes metric does not decrease on ...
From local testing with 20.01 the metric nv_gpu_memory_used_bytes exposed on /metrics does not decrease on model unload.
Read more >
How can I solve 'ran out of gpu memory' in TensorFlow
I use train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size=batch_size) and then iterate as for x_batch, y_batch in ...
Read more >
Improving GPU Memory Oversubscription Performance
In this post, we dive into the performance characteristics of a micro-benchmark that stresses different memory access patterns for the ...
Read more >
How To Fit a Bigger Model and Train It Faster - Hugging Face
In this section we have a look at a few tricks to reduce the memory ... That looks good: the GPU memory is...
Read more >
nvgpu · PyPI
Often we want to train a ML model on one of GPUs installed on a multi-GPU machine. Since TensorFlow allocates all memory, only...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found