CuDNNGRU/LSTM weights trained on GPU can't be used on GRU/LSTM (i.e CPU versions)
See original GitHub issueIf you train a model on the GPU using the CuDNN (GRU or LSTM) layers and save the weights, it is not possible to load those weights into their respective CPU variants.
Is this a bug or expected? I tried messing about with implementation=0, 1, 2
for the GRU
layer but this didn’t seem to help.
The code below raises the following exception
ValueError: Dimension 0 in both shapes must be equal, but are 48 and 96 for 'Assign_39' (op: 'Assign') with input shapes: [48], [96].
import numpy as np
import keras
from keras import layers
from keras.utils.np_utils import to_categorical
T = 10
k = 3
batch_size = 32
classes = 5
X = np.random.random((32, T, k))
y = to_categorical(np.random.randint(0, classes, size=(32, )), num_classes=classes)
model=keras.models.Sequential()
model.add(layers.InputLayer(input_shape=(T, 3)))
model.add(layers.CuDNNGRU(16 ,return_sequences=False))
model.add(layers.Dense(classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd')
model.fit(X, y)
model.save_weights('GPU.weights')
cpu_model=keras.models.Sequential()
cpu_model.add(layers.InputLayer(input_shape=(T, 3)))
cpu_model.add(layers.GRU(16 ,return_sequences=False))
cpu_model.add(layers.Dense(classes, activation='softmax'))
cpu_model.compile(loss='categorical_crossentropy',optimizer='sgd')
cpu_model.load_weights('GPU.weights')
-
[x ] Check that you are up-to-date with the master branch of Keras. You can update with: pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
-
[x ] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
-
[ x] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
Issue Analytics
- State:
- Created 6 years ago
- Reactions:2
- Comments:14 (5 by maintainers)
Activations for CuDNN LSTM/GRU are hard-coded in CuDNN and cannot be changed from Keras. They correspond to
activation='tanh'
andrecurrent_activation='sigmoid'
(slightly different than defaulthard_sigmoid
in Keras).@rsmith49: CuDNNLSTM has 2x biases. In order to use weights/biases from one implementation in another you need to perform conversion. If you just dump the whole model weights and load again, the conversion is performed automatically (preferred). See the tests for examples.
When picking weights from just one layer and setting to another layer you need to do it manually. https://github.com/keras-team/keras/blob/master/keras/engine/saving.py#L468. But I’d better load the model since the
preprocess_weights_for_loading()
function is more of an internal function that part of the API.