question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MB-MelGAN consumes a lot of memory while starting evaluation

See original GitHub issue

(Issue opened after https://github.com/TensorSpeech/TensorFlowTTS/issues/354 as this appears to be a separate issue)

I am fine-tuning the multiband_melgan.v1_24k Universal Vocoder with command:

CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan/train_multiband_melgan.py \
  --train-dir ./dump_LibriTTSFormatted/train/ \
  --dev-dir ./dump_LibriTTSFormatted/valid/ \
  --outdir ./outdir/MBMELGAN/MBMelgan-Tune-Experiment1 \
  --config ./models/multiband_melgan.v1_24k.yaml \
  --use-norm 1 \
  --pretrained ./models/libritts_24k.h5

After 5000 steps it begins evaluation…

image

Sometimes it gets killed while filling the shuffle buffer. Sometimes it fills the buffer, but gets stuck in evaluation for some minutes before then getting killed.

I put training into the background and entered top as I reached the Shuffle buffer filled stage. I noticed using SSH was extremely laggy and slow. (Probably because the system memory is full). It says python is using almost 35GB of Virtual memory, ~15GB of RAM (the remaining RAM is full). image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
OscarVanLcommented, Nov 7, 2020

As for debugging in headless SSH, should I try the TensorFlow Profiler? It seems to show a lot of memory usage details.

0reactions
OscarVanLcommented, Nov 8, 2020

https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html Here is small tutorial for easier debugging using pycharm on a remote machine 😃

Wow, this is a really nice method, I will try this out in the future 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory Hygiene With TensorFlow During Model Training and ...
It can be clearly observed that GPU has 10 GB of memory and of which only 489 MB is occupied. Now let's load...
Read more >
Getting started with TensorFlow large model support - IBM
The data transformations produce tensors which will consume GPU memory during model execution. This memory overhead can limit the data resolution, batch sizes, ......
Read more >
Your Jest Tests are Leaking Memory
So, due to no code that you wrote yourself, your Jest tests can start leaking memory. Of course, you could also monkey-patch a...
Read more >
How to avoid excessive memory usage while training multiple ...
Clear memory with tf.keras.backend.clear_session() after each model trains. Keras documentation states the following:.
Read more >
PPO trainer eating up memory - RLlib - Ray
This is kinda frustrating as it makes it hard to see what my metrics are doing over a lot of iterations, and I...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found