multi_gpu_model doesn't work with stateful models
See original GitHub issueBackground:
- Keras 2.0.9, released three days ago, there seem to be no relevant changes after this release
- Tensorflow 1.2
I’m working on a stateful stacked RNN model, using the new CuDNNGRU
as recurrent layers (although I don’t think the type of recurrent layer is relevant for this issue).
Applying the multi_gpu_model
utility results in the following error when trying to train the parallelised model:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [256,75,39] vs. [512,75,39]
[[Node: training/Adam/gradients/loss/concatenate_1_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/concatenate_1_loss/mul"], _device="/job:localhost/replica:0/task:0/gpu:0"](training/Adam/gradients/loss/concatenate_1_loss/mul_grad/Shape, training/Adam/gradients/loss/concatenate_1_loss/mul_grad/Shape_1)]]
[[Node: replica_1/sequential_1/dense_1/truediv/_473 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:1", send_device_incarnation=1, tensor_name="edge_3032_replica_1/sequential_1/dense_1/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
For stateful models it is compulsory to specify the batch_input_size
in the first layer, this seems to hardwire the batch_size throughout the model, likely resulting in the above error:
model.add(InputLayer(
batch_input_shape=(batch_size, max_seq_length, num_dim)))
This can be seen in the summary of the parallelised stateful model:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (256, 75, 39) 0
__________________________________________________________________________________________________
lambda_1 (Lambda) (256, 75, 39) 0 input_1[0][0]
__________________________________________________________________________________________________
lambda_2 (Lambda) (256, 75, 39) 0 input_1[0][0]
__________________________________________________________________________________________________
sequential_1 (Sequential) (256, 75, 39) 7524903 lambda_1[0][0]
lambda_2[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (512, 75, 39) 0 sequential_1[1][0]
sequential_1[2][0]
==================================================================================================
Total params: 7,524,903
Trainable params: 7,524,903
Non-trainable params: 0
__________________________________________________________________________________________________
When I make my model NOT stateful, I’m allowed to state the batch_input_size
without a hardcoded batch size:
model.add(InputLayer(
batch_input_shape=(None, max_seq_length, num_dim)))
This works fine, no error is given and both of my GPUs are used.
The summary of the the parallelised NOT stateful model:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 75, 39) 0
__________________________________________________________________________________________________
lambda_1 (Lambda) (None, 75, 39) 0 input_1[0][0]
__________________________________________________________________________________________________
lambda_2 (Lambda) (None, 75, 39) 0 input_1[0][0]
__________________________________________________________________________________________________
sequential_1 (Sequential) (None, 75, 39) 7524903 lambda_1[0][0]
lambda_2[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 75, 39) 0 sequential_1[1][0]
sequential_1[2][0]
==================================================================================================
Total params: 7,524,903
Trainable params: 7,524,903
Non-trainable params: 0
__________________________________________________________________________________________________
I’ve tried a few workarounds but didn’t get the parallelised stateful model to work.
Is this a bug or is there a way to make multi_gpu_model
work with stateful models?
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:11
There seems to be a problem there indeed. Same thing here using CuDNNLSTM.
Hi @dolaamon2 or @visionscaper! I am planning to use Multi GPU model training in my thesis and have come upon this bug. Have you tested your solutions further and deem them fit for use? My network architecture uses two LSTM layers after concatenating the inputs of several Embeddings.