Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`CaseOptimizer` broken on multi-GPU

See original GitHub issue

Describe the bug

When training a model using the CaseOptimizer on multi-GPU (4 of them in my case, both p100 and v100 will break), I get the following error:

WARNING:tensorflow:There is non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
distributed training:  False
Train on 60000 samples
60000/60000 [==============================] - 4s 61us/sample - loss: 8.2390
Successfully fitted model

distributed training:  True
Train on 60000 samples
INFO:tensorflow:Error reported to Coordinator: list index out of range
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 190, in _call_for_each_replica
    **merge_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 446, in _distributed_apply
    ds_reduce_util.ReduceOp.SUM, grads_and_vars)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1481, in batch_reduce_to
    return self._batch_reduce_to(reduce_op, value_destination_pairs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 707, in _batch_reduce_to
    value_destination_pairs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/cross_device_ops.py", line 317, in batch_reduce
    value_destination_pairs[0][0].values) == 1:
IndexError: list index out of range
   32/60000 [..............................] - ETA: 10:32Exception raised: 
 list index out of range

To Reproduce

import contextlib
import numpy as np
import tensorflow.keras as keras
import larq as lq
import tensorflow as tf


def get_model():
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(28, 28)))
    model.add(
        lq.layers.QuantDense(
            units=10,
            input_quantizer="ste_sign",
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            name="layer_1"
        )
    )
    model.add(keras.layers.Dense(units=10, name="layer_2"))
    
    
    def is_layer_1(var: tf.Variable) -> bool:
        layer_name = var.name.split("/")[-2]
        return layer_name == "layer_1"
    
    optimizer = lq.optimizers.CaseOptimizer(
        (is_layer_1, keras.optimizers.Adam()),
        default_optimizer=keras.optimizers.Adam(),
    )

#     optimizer = keras.optimizers.Adam()

    model.compile(
        optimizer=optimizer, loss="sparse_categorical_crossentropy"
    )
    return model


def attempt_fit(distributed_training=False):
    fashion_mnist = keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
    train_images = train_images / 255.0
    test_images = test_images / 255.0

    with strategy.scope() if distributed_training else contextlib.nullcontext():
        model = get_model()
        model.fit(train_images, train_labels, epochs=1)


if __name__ == "__main__":
    keras.backend.clear_session()
    
    strategy = tf.distribute.MirroredStrategy()
    for distributed_training in [False, True]:
        print("distributed training: ", distributed_training)
        try:
            attempt_fit(distributed_training)
            print("Successfully fitted model")
        except Exception as e:
            print("Exception raised: \n", e)
        print()

For simplicity, you can change the predicate of the optimizer to lambda x: False; it makes no difference whether it actually selects any layers or not. Using keras.optimizers.Adam instead of the CaseOptimizer will work just fine.

Expected behavior

I expected it to train, as it does in the single-GPU case.

Environment

TensorFlow version: 2.0.0 Larq version: 0.8.3

Issue Analytics

State:
Created 4 years ago
Comments:10 (9 by maintainers)

Top GitHub Comments

2reactions

koenhelwegencommented, Mar 16, 2020

Hi, I found the case optimizer works for both TF 2.0 and 2.1. The original script above can be fixed by calling keras.backend.clear_session() in between the single and multi gpu test (notebook). I did encounter the problem with SystemError: <built-in function Flatten> returned a result with an error set, but only when running in reduced precision.

Could someone double check if this solves the problems with the case optimizer? If so, we can close this issue.

1reaction

sib1commented, Feb 14, 2020

I found that the case optimizer breaks on multi-gpu when using TF 2.0 due to a distribution strategy related error, however works fine when using TF 2.1 with lq.optimizers.Bop.is_binary_variable as a predicate.