Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot call `load_model` on network trained using `multi_gpu`.

See original GitHub issue

I came across your post on Medium and was instantly hooked. Nice job!

I’ve been developing a series of deep learning experiments that use only a single GPU and decided to switch them over to a multi-GPU setting. After training the models are serialized to disk via model.save.

However, when I try to call load_model on to load the pre-trained network for disk I get an error:

[INFO] loading model...
Traceback (most recent call last):
  File "rank_accuracy.py", line 28, in <module>
    model = load_model(config.MODEL_PATH)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/models.py", line 140, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/models.py", line 189, in model_from_config
    return layer_from_config(config, custom_objects=custom_objects)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/utils/layer_utils.py", line 34, in layer_from_config
    return layer_class.from_config(config['config'])
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2395, in from_config
    process_layer(layer_data)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2390, in process_layer
    layer(input_tensors[0])
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 517, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 571, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 155, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/layers/core.py", line 587, in call
    return self.function(x, **arguments)
  File "/home/ubuntu/deep-learning-book/dataset_to_hdf5/multi_gpu.py", line 9, in get_slice
    shape = tf.shape(data)
NameError: global name 'tf' is not defined

Looking at multi_gpu.py it’s clear that TensorFlow is imported so I’m not sure why the error is being generated.

Issue Analytics

State:
Created 7 years ago
Reactions:1
Comments:9 (1 by maintainers)

Top GitHub Comments

7reactions

tstandleycommented, Dec 2, 2016

I have an ugly but functional workaround:

Change the return statement to the following code:

new_model = Model(input=model.inputs, output=merged)
funcType = type(model.save)

# monkeypatch the save to save just the underlying model
def new_save(self_,filepath, overwrite=True):
    model.save(filepath, overwrite)
new_model.save=funcType(new_save, new_model)
return new_model

This monkey-patches the old model’s save onto the new model’s save (calling the parallel model’s save will call the simple model’s save)

When loading, you must load the simple model before creating the parallel model.

5reactions

aschampioncommented, Feb 7, 2017

@aman-tiwari and anyone else who may stumble across this: you can recover an already saved multi GPU model simply. Temporarily edit your virtualenv’s keras/layers/core.py to have the necessary import: import tensorflow as tf. Then:

from keras.models import load_model

multi_model = load_model('your_multi_gpu_model.hdf5')
old_model = multi_model.layers[-2] # The last layer is the merge layer from make_parallel
old_model.save('single_gpu_model.hdf5')