MB-MelGAN consumes a lot of memory while starting evaluation
See original GitHub issue(Issue opened after https://github.com/TensorSpeech/TensorFlowTTS/issues/354 as this appears to be a separate issue)
I am fine-tuning the multiband_melgan.v1_24k
Universal Vocoder with command:
CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan/train_multiband_melgan.py \
--train-dir ./dump_LibriTTSFormatted/train/ \
--dev-dir ./dump_LibriTTSFormatted/valid/ \
--outdir ./outdir/MBMELGAN/MBMelgan-Tune-Experiment1 \
--config ./models/multiband_melgan.v1_24k.yaml \
--use-norm 1 \
--pretrained ./models/libritts_24k.h5
After 5000 steps it begins evaluation…
Sometimes it gets killed while filling the shuffle buffer. Sometimes it fills the buffer, but gets stuck in evaluation for some minutes before then getting killed.
I put training into the background and entered top
as I reached the Shuffle buffer filled stage. I noticed using SSH was extremely laggy and slow. (Probably because the system memory is full).
It says python is using almost 35GB of Virtual memory, ~15GB of RAM (the remaining RAM is full).
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Memory Hygiene With TensorFlow During Model Training and ...
It can be clearly observed that GPU has 10 GB of memory and of which only 489 MB is occupied. Now let's load...
Read more >Getting started with TensorFlow large model support - IBM
The data transformations produce tensors which will consume GPU memory during model execution. This memory overhead can limit the data resolution, batch sizes, ......
Read more >Your Jest Tests are Leaking Memory
So, due to no code that you wrote yourself, your Jest tests can start leaking memory. Of course, you could also monkey-patch a...
Read more >How to avoid excessive memory usage while training multiple ...
Clear memory with tf.keras.backend.clear_session() after each model trains. Keras documentation states the following:.
Read more >PPO trainer eating up memory - RLlib - Ray
This is kinda frustrating as it makes it hard to see what my metrics are doing over a lot of iterations, and I...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
As for debugging in headless SSH, should I try the TensorFlow Profiler? It seems to show a lot of memory usage details.
Wow, this is a really nice method, I will try this out in the future 😄