question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`CaseOptimizer` broken on multi-GPU

See original GitHub issue

Describe the bug

When training a model using the CaseOptimizer on multi-GPU (4 of them in my case, both p100 and v100 will break), I get the following error:

WARNING:tensorflow:There is non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
distributed training:  False
Train on 60000 samples
60000/60000 [==============================] - 4s 61us/sample - loss: 8.2390
Successfully fitted model

distributed training:  True
Train on 60000 samples
INFO:tensorflow:Error reported to Coordinator: list index out of range
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 190, in _call_for_each_replica
    **merge_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 446, in _distributed_apply
    ds_reduce_util.ReduceOp.SUM, grads_and_vars)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1481, in batch_reduce_to
    return self._batch_reduce_to(reduce_op, value_destination_pairs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 707, in _batch_reduce_to
    value_destination_pairs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/distribute/cross_device_ops.py", line 317, in batch_reduce
    value_destination_pairs[0][0].values) == 1:
IndexError: list index out of range
   32/60000 [..............................] - ETA: 10:32Exception raised: 
 list index out of range

To Reproduce

import contextlib
import numpy as np
import tensorflow.keras as keras
import larq as lq
import tensorflow as tf


def get_model():
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(28, 28)))
    model.add(
        lq.layers.QuantDense(
            units=10,
            input_quantizer="ste_sign",
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            name="layer_1"
        )
    )
    model.add(keras.layers.Dense(units=10, name="layer_2"))
    
    
    def is_layer_1(var: tf.Variable) -> bool:
        layer_name = var.name.split("/")[-2]
        return layer_name == "layer_1"
    
    optimizer = lq.optimizers.CaseOptimizer(
        (is_layer_1, keras.optimizers.Adam()),
        default_optimizer=keras.optimizers.Adam(),
    )

#     optimizer = keras.optimizers.Adam()

    model.compile(
        optimizer=optimizer, loss="sparse_categorical_crossentropy"
    )
    return model


def attempt_fit(distributed_training=False):
    fashion_mnist = keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
    train_images = train_images / 255.0
    test_images = test_images / 255.0

    with strategy.scope() if distributed_training else contextlib.nullcontext():
        model = get_model()
        model.fit(train_images, train_labels, epochs=1)


if __name__ == "__main__":
    keras.backend.clear_session()
    
    strategy = tf.distribute.MirroredStrategy()
    for distributed_training in [False, True]:
        print("distributed training: ", distributed_training)
        try:
            attempt_fit(distributed_training)
            print("Successfully fitted model")
        except Exception as e:
            print("Exception raised: \n", e)
        print()          

For simplicity, you can change the predicate of the optimizer to lambda x: False; it makes no difference whether it actually selects any layers or not. Using keras.optimizers.Adam instead of the CaseOptimizer will work just fine.

Expected behavior

I expected it to train, as it does in the single-GPU case.

Environment

TensorFlow version: 2.0.0 Larq version: 0.8.3

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
koenhelwegencommented, Mar 16, 2020

Hi, I found the case optimizer works for both TF 2.0 and 2.1. The original script above can be fixed by calling keras.backend.clear_session() in between the single and multi gpu test (notebook). I did encounter the problem with SystemError: <built-in function Flatten> returned a result with an error set, but only when running in reduced precision.

Could someone double check if this solves the problems with the case optimizer? If so, we can close this issue.

1reaction
sib1commented, Feb 14, 2020

I found that the case optimizer breaks on multi-gpu when using TF 2.0 due to a distribution strategy related error, however works fine when using TF 2.1 with lq.optimizers.Bop.is_binary_variable as a predicate.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to use multiple optimizer with multiple GPU? #2368
My code contains multiple networks. When I did not use horovod, the optimizers are as below: optimizer1 = optim.
Read more >
Efficient Training on Multiple GPUs - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >
Multi GPU Model Training: Monitoring and Optimizing
In this article, we will discuss multi GPU training with Pytorch Lightning and find out the best practices that should be adopted to...
Read more >
How to scale training on multiple GPUs - Towards Data Science
I will cover the main differences between the two, and how training in multiple GPUs works. I will first explain how the training...
Read more >
Optimize TensorFlow GPU performance with the TensorFlow ...
This guide outlines how to debug performance issues starting with a single GPU, then moving to a single host with multiple GPUs. It...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found