Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

nv_gpu_memory_used_bytes metric does not decrease on model unload

See original GitHub issue

From local testing with 20.01 the metric nv_gpu_memory_used_bytes exposed on /metrics does not decrease on model unload. Assuming this is expected would there be some way to expose the actual memory used by the loaded models?

I ask as orchestrators (like when running inside kubernetes) might wish to determine the available memory for new models on the server.

Issue Analytics

State:
Created 4 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

deadeyegoodwincommented, Feb 3, 2020

Any deployment plan should be aware of the limitations of the different backends. TF is not a particularly good framework for inference for a couple of reasons (this memory usage policy being one of them). If you want to have a long-running “dynamic model repository” TRTIS instance where you use the model-control APIs to load and unload models, then you need to account for TF. TRTIS provides a command-line option to limit TF to a fraction of the GPU memory… but this doesn’t cause it to release its memory.

For a system like kubernetes an alternative is to use a “static model repository” TRTIS instance where TRTIS is started/initialized with some set of models and that set of models never changes. If a change is needed then use k8s to change configuration via a rolling update or similar.

Each approach has advantages and disadvantages.

0reactions

cliveseldoncommented, Feb 4, 2020

Thanks for your feedback. Very useful.

Top Results From Across the Web

nv_gpu_memory_used_bytes metric does not decrease on ...

From local testing with 20.01 the metric nv_gpu_memory_used_bytes exposed on /metrics does not decrease on model unload.

How can I solve 'ran out of gpu memory' in TensorFlow

I use train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size=batch_size) and then iterate as for x_batch, y_batch in ...

Improving GPU Memory Oversubscription Performance

In this post, we dive into the performance characteristics of a micro-benchmark that stresses different memory access patterns for the ...

How To Fit a Bigger Model and Train It Faster - Hugging Face

In this section we have a look at a few tricks to reduce the memory ... That looks good: the GPU memory is...

nvgpu · PyPI

Often we want to train a ML model on one of GPUs installed on a multi-GPU machine. Since TensorFlow allocates all memory, only...