Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CuDNNGRU/LSTM weights trained on GPU can't be used on GRU/LSTM (i.e CPU versions)

See original GitHub issue

If you train a model on the GPU using the CuDNN (GRU or LSTM) layers and save the weights, it is not possible to load those weights into their respective CPU variants.

Is this a bug or expected? I tried messing about with implementation=0, 1, 2 for the GRU layer but this didn’t seem to help.

The code below raises the following exception

ValueError: Dimension 0 in both shapes must be equal, but are 48 and 96 for 'Assign_39' (op: 'Assign') with input shapes: [48], [96].

import numpy as np
import keras
from keras import layers
from keras.utils.np_utils import to_categorical

T = 10
k = 3
batch_size = 32
classes = 5

X = np.random.random((32, T, k))
y = to_categorical(np.random.randint(0, classes, size=(32, )), num_classes=classes)

model=keras.models.Sequential()
model.add(layers.InputLayer(input_shape=(T, 3)))
model.add(layers.CuDNNGRU(16 ,return_sequences=False))
model.add(layers.Dense(classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='sgd')

model.fit(X, y)

model.save_weights('GPU.weights')

cpu_model=keras.models.Sequential()
cpu_model.add(layers.InputLayer(input_shape=(T, 3)))
cpu_model.add(layers.GRU(16 ,return_sequences=False))
cpu_model.add(layers.Dense(classes, activation='softmax'))
cpu_model.compile(loss='categorical_crossentropy',optimizer='sgd')
cpu_model.load_weights('GPU.weights')

[x ] Check that you are up-to-date with the master branch of Keras. You can update with: pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[x ] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[ x] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Issue Analytics

State:
Created 6 years ago
Reactions:2
Comments:14 (5 by maintainers)

Top GitHub Comments

9reactions

bzamecnikcommented, Jan 18, 2018

Activations for CuDNN LSTM/GRU are hard-coded in CuDNN and cannot be changed from Keras. They correspond to activation='tanh' and recurrent_activation='sigmoid' (slightly different than default hard_sigmoid in Keras).

5reactions

bzamecnikcommented, Jun 1, 2018

@rsmith49: CuDNNLSTM has 2x biases. In order to use weights/biases from one implementation in another you need to perform conversion. If you just dump the whole model weights and load again, the conversion is performed automatically (preferred). See the tests for examples.

When picking weights from just one layer and setting to another layer you need to do it manually. https://github.com/keras-team/keras/blob/master/keras/engine/saving.py#L468. But I’d better load the model since the preprocess_weights_for_loading() function is more of an internal function that part of the API.

from keras.engine.saving import preprocess_weights_for_loading

cudnn_weights = cudnn_lstm_model.get_weights()
weights2 = preprocess_weights_for_loading(lstm_layer, cudnn_weights) # target layer, source weights
lstm_model.set_weights(cudnn_weights)