In multi_gpu_model with cpu_relocation the weights of the template model do not change
See original GitHub issueWhen using multi_gpu_model
with cpu_relocation
the weights of the template model do not change when training the model, and are different to the weights of the parallel model, which do change. See below for an example.
This is in contradiction with the documentation, which states:
To save the multi-gpu model, use .save(fname) or .save_weights(fname) with the template model (the argument you passed to multi_gpu_model), rather than the model returned by multi_gpu_model.
But it is useless to save the template model if it does not evolve due to training, and if its weights are different to the parallel model.
See the following minimal example:
from keras import Model, Input
from keras.layers import Dense
from keras.utils import multi_gpu_model
import keras.backend as K
import numpy as np
BATCHSIZE = 8
NITER = 4
# dummy model
x = Input(shape=(4,))
layer = Dense(2, activation='relu')(x)
y = Dense(1)(layer)
model = Model(inputs=x, outputs=y)
try:
parallel_model = multi_gpu_model(model, cpu_relocation=True)
print("Training using multiple GPUs..")
except ValueError:
parallel_model = model
print("Training using single GPU or CPU..")
parallel_model.compile(optimizer='sgd', loss='mse')
original_weights = K.batch_get_value(model.weights)
# Dummy training
for i in range(NITER):
x = np.random.randn(BATCHSIZE, 4)
y = np.random.randn(BATCHSIZE)
parallel_model.train_on_batch(x, y)
weights = K.batch_get_value(model.weights)
parallel_weights = K.batch_get_value(parallel_model.weights)
if all([np.all(w == ow) for w, ow in zip(weights, original_weights)]):
print('Weights in the template model have not changed')
else:
print('Weights in the template model have changed')
if all([np.all(w == pw) for w, pw in zip(weights, parallel_weights)]):
print('Weights in the template and parallel model are equal')
else:
print('Weights in the template and parallel model are different')
When executing on a single GPU or CPU, the result is:
Training using single GPU or CPU… Weights in the template model have changed Weights in the template and parallel model are equal
When executing on multiple GPUs, the result is:
Training using multiple GPUs… Weights in the template model have not changed Weights in the template and parallel model are different
-
Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
-
Check that your version of TensorFlow is up-to-date. The installation instructions can be found here.
-
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
Issue Analytics
- State:
- Created 5 years ago
- Comments:6
If I replace
by
then, when training with multiple GPUs, I get, as expected:
@loretoparisi I haven’t tried it, since the workaround in https://github.com/keras-team/keras/issues/11313#issuecomment-427768441 works fine for me. BTW, notice that #8123 is closed because it makes reference to this bug.
However, https://github.com/keras-team/keras/issues/11313#issuecomment-427768441 it is still a workaround, and not a solution to the bug, which, as far as I can see, it is still an open and unresolved bug.