Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FastSpeech2 Out of Memory when allocating tensor

See original GitHub issue

I’m trying to finetune a FastSpeech2 model as we discussed in https://github.com/TensorSpeech/TensorFlowTTS/issues/296 for this I am using the pretrained fastspeech2.v1 model and some of my own voice dataset.

I have followed the steps in examples/mfa_extraction/README.md and examples/fastspeech2_libritts/README.md to prepare my dataset.

I have taken the existing fastspeech2.v1.yaml config and changed var_train_expr to train only embeddings.

I have printed the shape of the train dataset CharactorDurationF0EnergyMelDataset:

utt_ids shape: (15,) inpu5_iew shape: (15, 1085) speaker_ids shape: (15,) duration_gts shape: (15, 1085) f0_gts shape: (15, 1085) energy_gts shape: (15, 1085) mel_gts shape: (15, 7531, 80)

Here is my launch configuration:

CUDA_VISIBLE_DEVICES=0 python examples/fastspeech2_libritts/train_fastspeech2.py \
  --train-dir ./dump_briefhistory/train/ \
  --dev-dir ./dump_briefhistory/valid/ \
  --outdir ./outdir_briefhistory/ \
  --config ./briefhistory/fastspeech2_finetune.v1.yaml \
  --use-norm 1 \
  --f0-stat ./dump_briefhistory/stats_f0.npy \
  --energy-stat ./dump_briefhistory/stats_energy.npy \
  --mixed_precision 1 \
  --dataset_config preprocess/libritts_preprocess.yaml \
  --dataset_stats dump_briefhistory/stats.npy \
  --pretrained ./pretrained/model-150000.h5

My hardware/OS configuration: Intel i7-8700 @ 3.20GHz 16GB RAM Nvidia GeForce RTX 2070 8GB Ubuntu 18.04.3

When I start, I get an Out Of Memory error…

2020-10-14 23:46:06.504767: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 23:46:07.322303: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-14 23:46:07.351471: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:07.351994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-10-14 23:46:07.352013: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 23:46:07.353336: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-14 23:46:07.354542: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-10-14 23:46:07.354919: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-10-14 23:46:07.356281: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-10-14 23:46:07.357015: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-10-14 23:46:07.359658: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-14 23:46:07.359831: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:07.360213: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:07.360649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-10-14 23:46:08.826174: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-14 23:46:08.847615: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3199980000 Hz
2020-10-14 23:46:08.848166: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50f8fa0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-14 23:46:08.848179: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-10-14 23:46:08.910158: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:08.910730: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50ae430 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-10-14 23:46:08.910744: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-10-14 23:46:08.910937: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:08.911223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-10-14 23:46:08.911259: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 23:46:08.911274: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-14 23:46:08.911285: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-10-14 23:46:08.911295: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-10-14 23:46:08.911305: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-10-14 23:46:08.911314: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-10-14 23:46:08.911331: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-14 23:46:08.911382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:08.911663: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:08.911918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-10-14 23:46:08.911937: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 23:46:09.364677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-14 23:46:09.364704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2020-10-14 23:46:09.364710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2020-10-14 23:46:09.364929: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:09.365317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:09.365711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7274 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-10-14 23:46:09,380 (train_fastspeech2:329) INFO: hop_size = 256
2020-10-14 23:46:09,380 (train_fastspeech2:329) INFO: format = npy
2020-10-14 23:46:09,380 (train_fastspeech2:329) INFO: model_type = fastspeech2
2020-10-14 23:46:09,380 (train_fastspeech2:329) INFO: fastspeech2_params = {'n_speakers': 1, 'encoder_hidden_size': 384, 'encoder_num_hidden_layers': 4, 'encoder_num_attention_heads': 2, 'encoder_attention_head_size': 192, 'encoder_intermediate_size': 1024, 'encoder_intermediate_kernel_size': 3, 'encoder_hidden_act': 'mish', 'decoder_hidden_size': 384, 'decoder_num_hidden_layers': 4, 'decoder_num_attention_heads': 2, 'decoder_attention_head_size': 192, 'decoder_intermediate_size': 1024, 'decoder_intermediate_kernel_size': 3, 'decoder_hidden_act': 'mish', 'variant_prediction_num_conv_layers': 2, 'variant_predictor_filter': 256, 'variant_predictor_kernel_size': 3, 'variant_predictor_dropout_rate': 0.5, 'num_mels': 80, 'hidden_dropout_prob': 0.2, 'attention_probs_dropout_prob': 0.1, 'max_position_embeddings': 2048, 'initializer_range': 0.02, 'output_attentions': False, 'output_hidden_states': False}
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: batch_size = 16
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: remove_short_samples = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: allow_cache = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: mel_length_threshold = 32
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: is_shuffle = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 5e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: var_train_expr = embeddings
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: train_max_steps = 200000
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: save_interval_steps = 5000
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: eval_interval_steps = 500
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: log_interval_steps = 200
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: num_save_intermediate_results = 1
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: train_dir = ./dump_briefhistory/train/
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: dev_dir = ./dump_briefhistory/valid/
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: use_norm = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: f0_stat = ./dump_briefhistory/stats_f0.npy
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: energy_stat = ./dump_briefhistory/stats_energy.npy
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: outdir = ./outdir_briefhistory/
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: config = ./briefhistory/fastspeech2_finetune.v1.yaml
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: resume =
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: verbose = 1
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: mixed_precision = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: dataset_config = preprocess/libritts_preprocess.yaml
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: dataset_stats = dump_briefhistory/stats.npy
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: pretrained = ./pretrained/model-150000.h5
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: version = 0.0
PRINTING TENSOR CONTENT ----------------------------
2020-10-14 23:46:09.704292: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do
2020-10-14 23:46:09.709369: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do
utt_ids shape: (15,)
inpu5_iew shape: (15, 1085)
speaker_ids shape: (15,)
duration_gts shape: (15, 1085)
f0_gts shape: (15, 1085)
energy_gts shape: (15, 1085)
mel_gts shape: (15, 7531, 80)
mel_lengths shape: (15,)
2020-10-14 23:46:14.814446: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-14 23:46:15.003089: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
Model: "tf_fast_speech2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embeddings (TFFastSpeechEmbe multiple                  844032
_________________________________________________________________
encoder (TFFastSpeechEncoder multiple                  11814400
_________________________________________________________________
length_regulator (TFFastSpee multiple                  0
_________________________________________________________________
decoder (TFFastSpeechDecoder multiple                  12601216
_________________________________________________________________
mel_before (Dense)           multiple                  30800
_________________________________________________________________
postnet (TFTacotronPostnet)  multiple                  4352400
_________________________________________________________________
f0_predictor (TFFastSpeechVa multiple                  493313
_________________________________________________________________
energy_predictor (TFFastSpee multiple                  493313
_________________________________________________________________
duration_predictor (TFFastSp multiple                  493313
_________________________________________________________________
f0_embeddings (Conv1D)       multiple                  3840
_________________________________________________________________
dropout_32 (Dropout)         multiple                  0
_________________________________________________________________
energy_embeddings (Conv1D)   multiple                  3840
_________________________________________________________________
dropout_33 (Dropout)         multiple                  0
=================================================================
Total params: 31,130,467
Trainable params: 29,552,579
Non-trainable params: 1,577,888
_________________________________________________________________
2020-10-14 23:46:16,304 (train_fastspeech2:411) INFO: Successfully loaded pretrained weight from ./pretrained/model-150000.h5.
[train]:   0%|                                                                                                                    | 0/200000 [00:00<?, ?it/s]/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2020-10-14 23:46:26.041045: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 1146/6423 nodes to float16 precision using 113 cast(s) to float16 (excluding Const and Variable casts)
2020-10-14 23:46:26.984089: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 0/5391 nodes to float16 precision using 0 cast(s) to float16 (excluding Const and Variable casts)
2020-10-14 23:46:37.835415: W tensorflow/core/common_runtime/bfc_allocator.cc:431] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.17GiB (rounded to 3402957824)requested by op tf_fast_speech2/decoder/layer_._0/attention/self/MatMul
Current allocation summary follows.
2020-10-14 23:46:37.835552: I tensorflow/core/common_runtime/bfc_allocator.cc:970] BFCAllocator dump for GPU_0_bfc

........

2020-10-14 23:46:37.845455: I tensorflow/core/common_runtime/bfc_allocator.cc:1038] Sum Total of in-use chunks: 3.70GiB
2020-10-14 23:46:37.845459: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] total_region_allocated_bytes_: 7628218624 memory_limit_: 7628218720 available bytes: 96 curr_region_allocation_bytes_: 8589934592
2020-10-14 23:46:37.845466: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Stats:
Limit:                      7628218720
InUse:                      3974140928
MaxInUse:                   5580161536
NumAllocs:                        3098
MaxAllocSize:                929824768
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2020-10-14 23:46:37.845499: W tensorflow/core/common_runtime/bfc_allocator.cc:439] ****_____________________________________**********************_______******************************
2020-10-14 23:46:37.845513: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at batch_matmul_op_impl.h:730 : Resource exhausted: OOM when allocating tensor with shape[15,2,7531,7531] and type half on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "examples/fastspeech2_libritts/train_fastspeech2.py", line 458, in <module>
    main()
  File "examples/fastspeech2_libritts/train_fastspeech2.py", line 450, in main
    resume=args.resume,
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 870, in fit
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 101, in run
    self._train_epoch()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 123, in _train_epoch
    self._train_step(batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 686, in _train_step
    self.one_step_forward(batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError:  OOM when allocating tensor with shape[15,2,7531,7531] and type half on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node tf_fast_speech2/decoder/layer_._0/attention/self/MatMul (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_tts/models/fastspeech.py:229) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference__one_step_forward_22864]

Errors may have originated from an input operation.
Input Source operations connected to node tf_fast_speech2/decoder/layer_._0/attention/self/MatMul:
 tf_fast_speech2/decoder/layer_._0/attention/self/transpose_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_tts/models/fastspeech.py:214)

Function call stack:
_one_step_forward

Any idea what the cause may be?

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

ZDisketcommented, Oct 15, 2020

@OscarVanL I personally haven’t encountered any problems with very trimmed clips, but I would recommend manually adding some silence.

0reactions

OscarVanLcommented, Oct 15, 2020

Awesome! I got it working with the trimmed clips without any errors. It’s now training! 😃 It’s going at 3-3.5it/s

Top Results From Across the Web

keras - How to solve "OOM when allocating tensor with shape ...

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions ...

OOM when allocating tensor of shape [4096,8192] and type ...

Hi, I am trying to predict audio by chunks using Stream in deepspeech on NVIDIA GeForce GTX 1060 OC and I have this...

tensorspeech/tts-fastspeech2-ljspeech-en - Hugging Face

This repository provides a pretrained FastSpeech2 trained on LJSpeech dataset ... First of all, please install TensorFlowTTS with the following command:.

Text to Speech with TensorFlow with Heroku and Google Colab

In this project we can Synthesize Speech with TensorFlowTTS ... Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2.

ESPnet-ONNX: Bridging a Gap Between Research and ... - arXiv

layer/tensor fusion and memory caching. ... single node, the amount of memory allocated at once would ... Conformer-based FastSpeech2 with MelGAN [29].