Bug in deserializing TF2.0 GRU layer's bias vector in tf.loadLayersModel()
See original GitHub issueDear tfjs-team,
I ran into an issue these days when I tried to import a former Keras .h5 model with GRU layers into tfjs, see https://github.com/tensorflow/tfjs/issues/2437
tldr: this error popped up:
Uncaught (in promise) Error: Shape mismatch: [384] vs. [2,384]
at variables.ts:135
at t.write (variables.ts:98)
at variables.ts:339
at Array.forEach (<anonymous>)
at sf (variables.ts:337)
at e.loadWeights (container.ts:598)
at models.ts:315
at common.ts:14
at Object.next (common.ts:14)
at a (common.ts:14)
On further investigation, I figured out that there’s something wrong in models.ts:300
in the deserialize()
function.
Turns out that the model object returned by deserialize()
sets faulty shapes for the bias vectors of GRU layers (those should be of shape [2, x] but are set to [x], here [384):
On the contrary, the weights loaded in models.ts:313
by io.decodeWeights()
are set correctly (here [2, 384]):
So there must be something wrong with deserialize()
or some nested functions. I really tried to dig further, but I’m basically totally foreign to JS/TS so it’s really hard for me to figure it out any further.
This bug should be easy to reproduce, just create some model with GRU layers in Keras, like this:
model = keras.models.Sequential([
# keras.layers.GRU(128, return_sequences=True, batch_input_shape=[batch_size, None, max_id+1]),
keras.layers.GRU(128, return_sequences=True, input_shape=[ None, max_id+1]),
keras.layers.GRU(128, return_sequences=True),
keras.layers.GRU(128),
keras.layers.Flatten(),
keras.layers.Dense(output_size, activation="softmax")
])
I guess you don’t even need to train it, just initializing it should be fine.
Then convert it with the tfjs-converter, and load it with tf.loadLayersModel
I’d be really grateful for any fixes or quick workarounds. Thank you in advance!
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:17
Top GitHub Comments
reset_after=True
doesn’t work for me:Version 1.6.0
Can someone explain why the GRU would need twice as many biases in TF2.0 compared to all other implementations of GRUs (origninal paper, TF1, other frameworks, etc.? Is this a TF2 bug?
The original bias tensor already contained the non-recurrent and recurrent parts, so the (2, XX) seems superfluous.