`CaseOptimizer` broken on multi-GPU
See original GitHub issueDescribe the bug
When training a model using the CaseOptimizer
on multi-GPU (4 of them in my case, both p100 and v100 will break), I get the following error:
WARNING:tensorflow:There is non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
distributed training: False
Train on 60000 samples
60000/60000 [==============================] - 4s 61us/sample - loss: 8.2390
Successfully fitted model
distributed training: True
Train on 60000 samples
INFO:tensorflow:Error reported to Coordinator: list index out of range
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 190, in _call_for_each_replica
**merge_kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 446, in _distributed_apply
ds_reduce_util.ReduceOp.SUM, grads_and_vars)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1481, in batch_reduce_to
return self._batch_reduce_to(reduce_op, value_destination_pairs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 707, in _batch_reduce_to
value_destination_pairs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/cross_device_ops.py", line 317, in batch_reduce
value_destination_pairs[0][0].values) == 1:
IndexError: list index out of range
32/60000 [..............................] - ETA: 10:32Exception raised:
list index out of range
To Reproduce
import contextlib
import numpy as np
import tensorflow.keras as keras
import larq as lq
import tensorflow as tf
def get_model():
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28)))
model.add(
lq.layers.QuantDense(
units=10,
input_quantizer="ste_sign",
kernel_quantizer="ste_sign",
kernel_constraint="weight_clip",
name="layer_1"
)
)
model.add(keras.layers.Dense(units=10, name="layer_2"))
def is_layer_1(var: tf.Variable) -> bool:
layer_name = var.name.split("/")[-2]
return layer_name == "layer_1"
optimizer = lq.optimizers.CaseOptimizer(
(is_layer_1, keras.optimizers.Adam()),
default_optimizer=keras.optimizers.Adam(),
)
# optimizer = keras.optimizers.Adam()
model.compile(
optimizer=optimizer, loss="sparse_categorical_crossentropy"
)
return model
def attempt_fit(distributed_training=False):
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
with strategy.scope() if distributed_training else contextlib.nullcontext():
model = get_model()
model.fit(train_images, train_labels, epochs=1)
if __name__ == "__main__":
keras.backend.clear_session()
strategy = tf.distribute.MirroredStrategy()
for distributed_training in [False, True]:
print("distributed training: ", distributed_training)
try:
attempt_fit(distributed_training)
print("Successfully fitted model")
except Exception as e:
print("Exception raised: \n", e)
print()
For simplicity, you can change the predicate of the optimizer to lambda x: False
; it makes no difference whether it actually selects any layers or not. Using keras.optimizers.Adam
instead of the CaseOptimizer
will work just fine.
Expected behavior
I expected it to train, as it does in the single-GPU case.
Environment
TensorFlow version: 2.0.0 Larq version: 0.8.3
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (9 by maintainers)
Top Results From Across the Web
How to use multiple optimizer with multiple GPU? #2368
My code contains multiple networks. When I did not use horovod, the optimizers are as below: optimizer1 = optim.
Read more >Efficient Training on Multiple GPUs - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >Multi GPU Model Training: Monitoring and Optimizing
In this article, we will discuss multi GPU training with Pytorch Lightning and find out the best practices that should be adopted to...
Read more >How to scale training on multiple GPUs - Towards Data Science
I will cover the main differences between the two, and how training in multiple GPUs works. I will first explain how the training...
Read more >Optimize TensorFlow GPU performance with the TensorFlow ...
This guide outlines how to debug performance issues starting with a single GPU, then moving to a single host with multiple GPUs. It...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, I found the case optimizer works for both TF 2.0 and 2.1. The original script above can be fixed by calling
keras.backend.clear_session()
in between the single and multi gpu test (notebook). I did encounter the problem withSystemError: <built-in function Flatten> returned a result with an error set
, but only when running in reduced precision.Could someone double check if this solves the problems with the case optimizer? If so, we can close this issue.
I found that the case optimizer breaks on multi-gpu when using TF 2.0 due to a distribution strategy related error, however works fine when using TF 2.1 with lq.optimizers.Bop.is_binary_variable as a predicate.