Cannot train MFA-aligned FastSpeech2 with gradient accumulator: ValueError: None values not supported.
See original GitHub issueI tried training FastSpeech2 on LJSpeech resampled to 24KHz with gradient_accumulation_steps: 1
and batch size 128 with mixed precision on a Tesla T4 (14GB of VRAM) and got this:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
File "/content/TensorflowTTS/ttsexamples/fastspeech2/train_fastspeech2.py", line 436, in <module>
main()
File "/content/TensorflowTTS/ttsexamples/fastspeech2/train_fastspeech2.py", line 428, in main
resume=args.resume,
File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 1002, in fit
self.run()
File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 103, in run
self._train_epoch()
File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 125, in _train_epoch
self._train_step(batch)
File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 780, in _train_step
self.one_step_forward(batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 823, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 697, in _initialize
*args, **kwds))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2855, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 600, in wrapped_fn
return weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 973, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py:788 _one_step_forward *
per_replica_losses = self._strategy.run(
/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py:835 _one_step_forward_per_replica *
self._optimizer.apply_gradients(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:380 apply_gradients **
args=(grads_and_vars, name, experimental_aggregate_gradients))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2715 merge_call
return self._merge_call(merge_fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2722 _merge_call
return merge_fn(self._strategy, *args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:410 _apply_gradients_cross_replica **
do_not_apply_fn)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/smart_cond.py:59 smart_cond
name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper
return target(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507 new_func
return func(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py:1180 cond
return cond_v2.cond_v2(pred, true_fn, false_fn, name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/cond_v2.py:85 cond_v2
op_return_value=pred)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py:986 func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:396 apply_fn
args=(grads, wrapped_vars, name, experimental_aggregate_gradients))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/one_device_strategy.py:367 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:420 _apply_gradients
experimental_aggregate_gradients=experimental_aggregate_gradients)
/content/TensorflowTTS/tensorflow_tts/optimizers/adamweightdecay.py:124 apply_gradients
(grads, _) = tf.clip_by_global_norm(grads, clip_norm=clip_norm)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper
return target(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/clip_ops.py:352 clip_by_global_norm
constant_op.constant(1.0, dtype=use_norm.dtype) / clip_norm)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1124 binary_op_wrapper
return func(x, y, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper
return target(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1296 truediv
return _truediv_python3(x, y, name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1222 _truediv_python3
y = ops.convert_to_tensor(y, dtype_hint=x.dtype.base_dtype, name="y")
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:1499 convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:338 _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:264 constant
allow_broadcast=True)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:282 _constant_impl
allow_broadcast=allow_broadcast))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:444 make_tensor_proto
raise ValueError("None values not supported.")
ValueError: None values not supported.
[train]: 0% 0/150000 [01:14<?, ?it/s]
Any ideas?
Issue Analytics
- State:
- Created 3 years ago
- Comments:15
Top Results From Across the Web
Tensorflow adam optimizer ValueError "Error: None values not ...
None values not supported. This happens if you use a TensorFlow instruction on a variable containing none. I used if g is not...
Read more >DeepSpeed Configuration JSON
Batch size to be processed by one GPU in one step (without gradient accumulation). Can be omitted if both train_batch_size and gradient_accumulation_steps are ......
Read more >Issues-TensorSpeech/TensorFlowTTS - PythonTechWorld
Cannot train MFA-aligned FastSpeech2 with gradient accumulator: ValueError: None values not supported. 888. I tried training FastSpeech2 on LJSpeech ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@dathudeptrai
Now it works well.
normally, we train with batch_size 16 so you can obtain 3it/s but now you are training with batch-size 128 so 2.7s/it is normal.