Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

multi_gpu_model doesn't work with stateful models

See original GitHub issue

Background:

Keras 2.0.9, released three days ago, there seem to be no relevant changes after this release
Tensorflow 1.2

I’m working on a stateful stacked RNN model, using the new CuDNNGRU as recurrent layers (although I don’t think the type of recurrent layer is relevant for this issue).

Applying the multi_gpu_model utility results in the following error when trying to train the parallelised model:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [256,75,39] vs. [512,75,39] 	
    [[Node: training/Adam/gradients/loss/concatenate_1_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/concatenate_1_loss/mul"], _device="/job:localhost/replica:0/task:0/gpu:0"](training/Adam/gradients/loss/concatenate_1_loss/mul_grad/Shape, training/Adam/gradients/loss/concatenate_1_loss/mul_grad/Shape_1)]]
    [[Node: replica_1/sequential_1/dense_1/truediv/_473 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:1", send_device_incarnation=1, tensor_name="edge_3032_replica_1/sequential_1/dense_1/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

For stateful models it is compulsory to specify the batch_input_size in the first layer, this seems to hardwire the batch_size throughout the model, likely resulting in the above error:

model.add(InputLayer(
        batch_input_shape=(batch_size, max_seq_length, num_dim)))

This can be seen in the summary of the parallelised stateful model:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (256, 75, 39)        0                                            
__________________________________________________________________________________________________
lambda_1 (Lambda)               (256, 75, 39)        0           input_1[0][0]                    
__________________________________________________________________________________________________
lambda_2 (Lambda)               (256, 75, 39)        0           input_1[0][0]                    
__________________________________________________________________________________________________
sequential_1 (Sequential)       (256, 75, 39)        7524903     lambda_1[0][0]                   
                                                                 lambda_2[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (512, 75, 39)        0           sequential_1[1][0]               
                                                                 sequential_1[2][0]               
==================================================================================================
Total params: 7,524,903
Trainable params: 7,524,903
Non-trainable params: 0
__________________________________________________________________________________________________

When I make my model NOT stateful, I’m allowed to state the batch_input_size without a hardcoded batch size:

model.add(InputLayer(
        batch_input_shape=(None, max_seq_length, num_dim)))

This works fine, no error is given and both of my GPUs are used.

The summary of the the parallelised NOT stateful model:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 75, 39)       0                                            
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 75, 39)       0           input_1[0][0]                    
__________________________________________________________________________________________________
lambda_2 (Lambda)               (None, 75, 39)       0           input_1[0][0]                    
__________________________________________________________________________________________________
sequential_1 (Sequential)       (None, 75, 39)       7524903     lambda_1[0][0]                   
                                                                 lambda_2[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 75, 39)       0           sequential_1[1][0]               
                                                                 sequential_1[2][0]               
==================================================================================================
Total params: 7,524,903
Trainable params: 7,524,903
Non-trainable params: 0
__________________________________________________________________________________________________

I’ve tried a few workarounds but didn’t get the parallelised stateful model to work.

Is this a bug or is there a way to make multi_gpu_model work with stateful models?

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:11

Top GitHub Comments

1reaction

douglas125commented, Jun 12, 2019

There seems to be a problem there indeed. Same thing here using CuDNNLSTM.

1reaction

flxwcommented, Oct 29, 2018

Hi @dolaamon2 or @visionscaper! I am planning to use Multi GPU model training in my thesis and have come upon this bug. Have you tested your solutions further and deem them fit for use? My network architecture uses two LSTM layers after concatenating the inputs of several Embeddings.

Top Results From Across the Web

Multi-GPU model ( LSTM with Stateful ) on Keras is not working

I am working on LSTM model with stateful using keras (Tensorflow backend); I cannot parallelize it on multi-GPU platform. here is link to...

Multi-GPU doesn't work for model(inputs) nor when computing ...

Hi,. When using multiple GPUs to perform inference on a model (e.g. the call method: model(inputs) ) and calculate its gradients, the machine ......

Handling big models - Hugging Face

The model parallelism used when your model is split on several GPUs is naive and not optimized, meaning that only one GPU works...

Frequently Asked Questions

There are two ways to run a single model on multiple GPUs: data parallelism and device parallelism. In most cases, what you need...

Multi-device execution — OpenVINO™ documentation

To run inference on multiple devices, you can choose either of the following ways: ... core.compile_model(model, "MULTI:HDDL,GPU"); ov::CompiledModel ...