FastSpeech2 Out of Memory when allocating tensor
See original GitHub issueI’m trying to finetune a FastSpeech2 model as we discussed in https://github.com/TensorSpeech/TensorFlowTTS/issues/296 for this I am using the pretrained fastspeech2.v1 model and some of my own voice dataset.
I have followed the steps in examples/mfa_extraction/README.md
and examples/fastspeech2_libritts/README.md
to prepare my dataset.
I have taken the existing fastspeech2.v1.yaml
config and changed var_train_expr
to train only embeddings.
I have printed the shape of the train dataset CharactorDurationF0EnergyMelDataset:
utt_ids shape: (15,) inpu5_iew shape: (15, 1085) speaker_ids shape: (15,) duration_gts shape: (15, 1085) f0_gts shape: (15, 1085) energy_gts shape: (15, 1085) mel_gts shape: (15, 7531, 80)
Here is my launch configuration:
CUDA_VISIBLE_DEVICES=0 python examples/fastspeech2_libritts/train_fastspeech2.py \
--train-dir ./dump_briefhistory/train/ \
--dev-dir ./dump_briefhistory/valid/ \
--outdir ./outdir_briefhistory/ \
--config ./briefhistory/fastspeech2_finetune.v1.yaml \
--use-norm 1 \
--f0-stat ./dump_briefhistory/stats_f0.npy \
--energy-stat ./dump_briefhistory/stats_energy.npy \
--mixed_precision 1 \
--dataset_config preprocess/libritts_preprocess.yaml \
--dataset_stats dump_briefhistory/stats.npy \
--pretrained ./pretrained/model-150000.h5
My hardware/OS configuration: Intel i7-8700 @ 3.20GHz 16GB RAM Nvidia GeForce RTX 2070 8GB Ubuntu 18.04.3
When I start, I get an Out Of Memory error…
2020-10-14 23:46:06.504767: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 23:46:07.322303: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-14 23:46:07.351471: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:07.351994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-10-14 23:46:07.352013: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 23:46:07.353336: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-14 23:46:07.354542: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-10-14 23:46:07.354919: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-10-14 23:46:07.356281: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-10-14 23:46:07.357015: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-10-14 23:46:07.359658: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-14 23:46:07.359831: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:07.360213: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:07.360649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-10-14 23:46:08.826174: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-14 23:46:08.847615: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3199980000 Hz
2020-10-14 23:46:08.848166: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50f8fa0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-14 23:46:08.848179: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-10-14 23:46:08.910158: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:08.910730: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50ae430 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-10-14 23:46:08.910744: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-10-14 23:46:08.910937: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:08.911223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-10-14 23:46:08.911259: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 23:46:08.911274: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-14 23:46:08.911285: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-10-14 23:46:08.911295: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-10-14 23:46:08.911305: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-10-14 23:46:08.911314: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-10-14 23:46:08.911331: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-14 23:46:08.911382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:08.911663: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:08.911918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-10-14 23:46:08.911937: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 23:46:09.364677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-14 23:46:09.364704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-10-14 23:46:09.364710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-10-14 23:46:09.364929: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:09.365317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 23:46:09.365711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7274 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-10-14 23:46:09,380 (train_fastspeech2:329) INFO: hop_size = 256
2020-10-14 23:46:09,380 (train_fastspeech2:329) INFO: format = npy
2020-10-14 23:46:09,380 (train_fastspeech2:329) INFO: model_type = fastspeech2
2020-10-14 23:46:09,380 (train_fastspeech2:329) INFO: fastspeech2_params = {'n_speakers': 1, 'encoder_hidden_size': 384, 'encoder_num_hidden_layers': 4, 'encoder_num_attention_heads': 2, 'encoder_attention_head_size': 192, 'encoder_intermediate_size': 1024, 'encoder_intermediate_kernel_size': 3, 'encoder_hidden_act': 'mish', 'decoder_hidden_size': 384, 'decoder_num_hidden_layers': 4, 'decoder_num_attention_heads': 2, 'decoder_attention_head_size': 192, 'decoder_intermediate_size': 1024, 'decoder_intermediate_kernel_size': 3, 'decoder_hidden_act': 'mish', 'variant_prediction_num_conv_layers': 2, 'variant_predictor_filter': 256, 'variant_predictor_kernel_size': 3, 'variant_predictor_dropout_rate': 0.5, 'num_mels': 80, 'hidden_dropout_prob': 0.2, 'attention_probs_dropout_prob': 0.1, 'max_position_embeddings': 2048, 'initializer_range': 0.02, 'output_attentions': False, 'output_hidden_states': False}
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: batch_size = 16
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: remove_short_samples = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: allow_cache = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: mel_length_threshold = 32
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: is_shuffle = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 5e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: var_train_expr = embeddings
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: train_max_steps = 200000
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: save_interval_steps = 5000
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: eval_interval_steps = 500
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: log_interval_steps = 200
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: num_save_intermediate_results = 1
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: train_dir = ./dump_briefhistory/train/
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: dev_dir = ./dump_briefhistory/valid/
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: use_norm = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: f0_stat = ./dump_briefhistory/stats_f0.npy
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: energy_stat = ./dump_briefhistory/stats_energy.npy
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: outdir = ./outdir_briefhistory/
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: config = ./briefhistory/fastspeech2_finetune.v1.yaml
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: resume =
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: verbose = 1
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: mixed_precision = True
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: dataset_config = preprocess/libritts_preprocess.yaml
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: dataset_stats = dump_briefhistory/stats.npy
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: pretrained = ./pretrained/model-150000.h5
2020-10-14 23:46:09,381 (train_fastspeech2:329) INFO: version = 0.0
PRINTING TENSOR CONTENT ----------------------------
2020-10-14 23:46:09.704292: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do
2020-10-14 23:46:09.709369: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do
utt_ids shape: (15,)
inpu5_iew shape: (15, 1085)
speaker_ids shape: (15,)
duration_gts shape: (15, 1085)
f0_gts shape: (15, 1085)
energy_gts shape: (15, 1085)
mel_gts shape: (15, 7531, 80)
mel_lengths shape: (15,)
2020-10-14 23:46:14.814446: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-14 23:46:15.003089: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
Model: "tf_fast_speech2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embeddings (TFFastSpeechEmbe multiple 844032
_________________________________________________________________
encoder (TFFastSpeechEncoder multiple 11814400
_________________________________________________________________
length_regulator (TFFastSpee multiple 0
_________________________________________________________________
decoder (TFFastSpeechDecoder multiple 12601216
_________________________________________________________________
mel_before (Dense) multiple 30800
_________________________________________________________________
postnet (TFTacotronPostnet) multiple 4352400
_________________________________________________________________
f0_predictor (TFFastSpeechVa multiple 493313
_________________________________________________________________
energy_predictor (TFFastSpee multiple 493313
_________________________________________________________________
duration_predictor (TFFastSp multiple 493313
_________________________________________________________________
f0_embeddings (Conv1D) multiple 3840
_________________________________________________________________
dropout_32 (Dropout) multiple 0
_________________________________________________________________
energy_embeddings (Conv1D) multiple 3840
_________________________________________________________________
dropout_33 (Dropout) multiple 0
=================================================================
Total params: 31,130,467
Trainable params: 29,552,579
Non-trainable params: 1,577,888
_________________________________________________________________
2020-10-14 23:46:16,304 (train_fastspeech2:411) INFO: Successfully loaded pretrained weight from ./pretrained/model-150000.h5.
[train]: 0%| | 0/200000 [00:00<?, ?it/s]/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2020-10-14 23:46:26.041045: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 1146/6423 nodes to float16 precision using 113 cast(s) to float16 (excluding Const and Variable casts)
2020-10-14 23:46:26.984089: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 0/5391 nodes to float16 precision using 0 cast(s) to float16 (excluding Const and Variable casts)
2020-10-14 23:46:37.835415: W tensorflow/core/common_runtime/bfc_allocator.cc:431] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.17GiB (rounded to 3402957824)requested by op tf_fast_speech2/decoder/layer_._0/attention/self/MatMul
Current allocation summary follows.
2020-10-14 23:46:37.835552: I tensorflow/core/common_runtime/bfc_allocator.cc:970] BFCAllocator dump for GPU_0_bfc
........
2020-10-14 23:46:37.845455: I tensorflow/core/common_runtime/bfc_allocator.cc:1038] Sum Total of in-use chunks: 3.70GiB
2020-10-14 23:46:37.845459: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] total_region_allocated_bytes_: 7628218624 memory_limit_: 7628218720 available bytes: 96 curr_region_allocation_bytes_: 8589934592
2020-10-14 23:46:37.845466: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Stats:
Limit: 7628218720
InUse: 3974140928
MaxInUse: 5580161536
NumAllocs: 3098
MaxAllocSize: 929824768
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2020-10-14 23:46:37.845499: W tensorflow/core/common_runtime/bfc_allocator.cc:439] ****_____________________________________**********************_______******************************
2020-10-14 23:46:37.845513: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at batch_matmul_op_impl.h:730 : Resource exhausted: OOM when allocating tensor with shape[15,2,7531,7531] and type half on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "examples/fastspeech2_libritts/train_fastspeech2.py", line 458, in <module>
main()
File "examples/fastspeech2_libritts/train_fastspeech2.py", line 450, in main
resume=args.resume,
File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 870, in fit
self.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 101, in run
self._train_epoch()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 123, in _train_epoch
self._train_step(batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 686, in _train_step
self.one_step_forward(batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
cancellation_manager=cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[15,2,7531,7531] and type half on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node tf_fast_speech2/decoder/layer_._0/attention/self/MatMul (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_tts/models/fastspeech.py:229) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference__one_step_forward_22864]
Errors may have originated from an input operation.
Input Source operations connected to node tf_fast_speech2/decoder/layer_._0/attention/self/MatMul:
tf_fast_speech2/decoder/layer_._0/attention/self/transpose_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_tts/models/fastspeech.py:214)
Function call stack:
_one_step_forward
Any idea what the cause may be?
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
@OscarVanL I personally haven’t encountered any problems with very trimmed clips, but I would recommend manually adding some silence.
Awesome! I got it working with the trimmed clips without any errors. It’s now training! 😃 It’s going at 3-3.5it/s