question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

multi_gpu_model doesn't work with stateful models

See original GitHub issue

Background:

  • Keras 2.0.9, released three days ago, there seem to be no relevant changes after this release
  • Tensorflow 1.2

I’m working on a stateful stacked RNN model, using the new CuDNNGRU as recurrent layers (although I don’t think the type of recurrent layer is relevant for this issue).

Applying the multi_gpu_model utility results in the following error when trying to train the parallelised model:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [256,75,39] vs. [512,75,39] 	
    [[Node: training/Adam/gradients/loss/concatenate_1_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/concatenate_1_loss/mul"], _device="/job:localhost/replica:0/task:0/gpu:0"](training/Adam/gradients/loss/concatenate_1_loss/mul_grad/Shape, training/Adam/gradients/loss/concatenate_1_loss/mul_grad/Shape_1)]]
    [[Node: replica_1/sequential_1/dense_1/truediv/_473 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:1", send_device_incarnation=1, tensor_name="edge_3032_replica_1/sequential_1/dense_1/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

For stateful models it is compulsory to specify the batch_input_size in the first layer, this seems to hardwire the batch_size throughout the model, likely resulting in the above error:

model.add(InputLayer(
        batch_input_shape=(batch_size, max_seq_length, num_dim)))

This can be seen in the summary of the parallelised stateful model:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (256, 75, 39)        0                                            
__________________________________________________________________________________________________
lambda_1 (Lambda)               (256, 75, 39)        0           input_1[0][0]                    
__________________________________________________________________________________________________
lambda_2 (Lambda)               (256, 75, 39)        0           input_1[0][0]                    
__________________________________________________________________________________________________
sequential_1 (Sequential)       (256, 75, 39)        7524903     lambda_1[0][0]                   
                                                                 lambda_2[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (512, 75, 39)        0           sequential_1[1][0]               
                                                                 sequential_1[2][0]               
==================================================================================================
Total params: 7,524,903
Trainable params: 7,524,903
Non-trainable params: 0
__________________________________________________________________________________________________

When I make my model NOT stateful, I’m allowed to state the batch_input_size without a hardcoded batch size:

model.add(InputLayer(
        batch_input_shape=(None, max_seq_length, num_dim)))

This works fine, no error is given and both of my GPUs are used.

The summary of the the parallelised NOT stateful model:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 75, 39)       0                                            
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 75, 39)       0           input_1[0][0]                    
__________________________________________________________________________________________________
lambda_2 (Lambda)               (None, 75, 39)       0           input_1[0][0]                    
__________________________________________________________________________________________________
sequential_1 (Sequential)       (None, 75, 39)       7524903     lambda_1[0][0]                   
                                                                 lambda_2[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 75, 39)       0           sequential_1[1][0]               
                                                                 sequential_1[2][0]               
==================================================================================================
Total params: 7,524,903
Trainable params: 7,524,903
Non-trainable params: 0
__________________________________________________________________________________________________

I’ve tried a few workarounds but didn’t get the parallelised stateful model to work.

Is this a bug or is there a way to make multi_gpu_model work with stateful models?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:11

github_iconTop GitHub Comments

1reaction
douglas125commented, Jun 12, 2019

There seems to be a problem there indeed. Same thing here using CuDNNLSTM.

1reaction
flxwcommented, Oct 29, 2018

Hi @dolaamon2 or @visionscaper! I am planning to use Multi GPU model training in my thesis and have come upon this bug. Have you tested your solutions further and deem them fit for use? My network architecture uses two LSTM layers after concatenating the inputs of several Embeddings.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multi-GPU model ( LSTM with Stateful ) on Keras is not working
I am working on LSTM model with stateful using keras (Tensorflow backend); I cannot parallelize it on multi-GPU platform. here is link to...
Read more >
Multi-GPU doesn't work for model(inputs) nor when computing ...
Hi,. When using multiple GPUs to perform inference on a model (e.g. the call method: model(inputs) ) and calculate its gradients, the machine ......
Read more >
Handling big models - Hugging Face
The model parallelism used when your model is split on several GPUs is naive and not optimized, meaning that only one GPU works...
Read more >
Frequently Asked Questions
There are two ways to run a single model on multiple GPUs: data parallelism and device parallelism. In most cases, what you need...
Read more >
Multi-device execution — OpenVINO™ documentation
To run inference on multiple devices, you can choose either of the following ways: ... core.compile_model(model, "MULTI:HDDL,GPU"); ov::CompiledModel ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found