question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

'ValueError: Dimensions must be equal, but are 32 and 127 for 'replica_0/model_1/add_17/add' (op: 'Add') with input shapes: [?,32,32,256], [?,127,127,256].' when use multi_gpu

See original GitHub issue

keras-2.2.4, tensorflow-gpu 1.10.1

here is my network structure


from keras.models import Model
from keras.layers import Conv2D, BatchNormalization, Activation, Lambda, add, Input, concatenate
from keras import backend as K
from keras.applications.resnet50 import ResNet50

def conv_bn_relu(feature_map, filters, kernel, activation=True):
    feature_map = Conv2D(filters, kernel, padding='same')(feature_map)
    feature_map = BatchNormalization()(feature_map)
    if activation:
        feature_map = Activation('relu')(feature_map)
    return feature_map


def bottleneck(inputs, depth, depth_bottleneck, stride=1):

    residual = conv_bn_relu(inputs, depth_bottleneck, (1, 1))
    residual = conv_bn_relu(residual, depth_bottleneck, (3, 3))
    residual = conv_bn_relu(residual, depth, (1, 1), activation=False)

    shortcut = conv_bn_relu(inputs, depth, [1, 1])

    output = Activation('relu')(add([residual, shortcut]))
    return output


def global_net(feature_maps, point_num):

    global_feature_maps = []
    global_outputs = []
    
    for i, feature_map in enumerate(reversed(feature_maps)):
        feature_map = conv_bn_relu(feature_map, 256, (1, 1))

        if 'last_feature_map' in dir():
            shape = feature_map.get_shape()
            upsample = Lambda(lambda x: K.tf.image.resize_bilinear(x, (shape[1], shape[2])))(feature_map)
            upsample = Conv2D(256, (1, 1), padding='same')(upsample)
            last_feature_map = add([feature_map, upsample])
        else:
            last_feature_map = feature_map

        tmp = conv_bn_relu(last_feature_map, 256, (1, 1))
        out = conv_bn_relu(tmp, point_num, (3, 3))
        out = Lambda(lambda x: K.tf.image.resize_bilinear(x, (128, 128)))(out)
        global_feature_maps.append(last_feature_map)
        global_outputs.append(out)

    return global_feature_maps[::-1], global_outputs[::-1]

def refine_net(feature_maps, point_num):
    
    refine_feature_maps = []

    for i, feature_map in enumerate(feature_maps):
        for j in range(i):
            feature_map = bottleneck(feature_map, 256, 128)
            feature_map = Lambda(lambda x: K.tf.image.resize_bilinear(x, (128, 128)))(feature_map)
            refine_feature_maps.append(feature_map)

    refine_feature_map = Lambda(lambda x:K.tf.concat(x, axis=3))(refine_feature_maps)
    refine_feature_map = bottleneck(refine_feature_map, 256, 128) 
    res = conv_bn_relu(refine_feature_map, point_num, (3, 3))
    
    return res

def build_cpn(style):

    point_num = 13
    backbone = ResNet50(weights='imagenet', input_shape=(512, 512, 3), include_top=False)
    resnet_feature_maps = []
    for layer_name in ['activation_10', 'activation_22', 'activation_40', 'activation_49']:
        feature_maps = backbone.get_layer(layer_name).output 
        resnet_feature_maps.append(feature_maps)

    global_feature_maps, global_outputs = global_net(resnet_feature_maps, point_num)
    refine_output = refine_net(global_feature_maps, point_num)
    cpn_outputs = global_outputs + [refine_output]
    cpn_outputs = concatenate(cpn_outputs)

    cpn = Model(inputs=backbone.input, outputs=cpn_outputs)

    return cpn

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
maxima120commented, May 28, 2019

I have the same. Cut off pretty much everything down to a single layer, changed sizes - nothing works.

code:

import tensorflow as tf
import keras

window_size = 1024
inputs_n = 128
outputs_n = 128
neurons = 128

n_steps = len(days[0][1]) - window_size

from keras.layers import Dense, Activation, Dropout, LSTM
from keras.models import Sequential, load_model

model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(window_size, n_steps, inputs_n), stateful=True)) 

from keras.utils import multi_gpu_model
parallel_model = multi_gpu_model(model, gpus=2)

error:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs)
   1658   try:
-> 1659     c_op = c_api.TF_FinishOperation(op_desc)
   1660   except errors.InvalidArgumentError as e:

InvalidArgumentError: Dimensions must be equal, but are 512 and 1024 for 'replica_0_8/sequential_12/lstm_12/add' (op: 'Add') with input shapes: [512,128], [1024,128].

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-18-db18560e3da4> in <module>
     22 
     23 from keras.utils import multi_gpu_model
---> 24 parallel_model = multi_gpu_model(model, gpus=2)
     25 
     26 #parallel_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])

~/.local/lib/python3.5/site-packages/keras/utils/multi_gpu_utils.py in multi_gpu_model(model, gpus, cpu_merge, cpu_relocation)
    225                 # Apply model on slice
    226                 # (creating a model replica on the target device).
--> 227                 outputs = model(inputs)
    228                 outputs = to_list(outputs)
    229 

~/.local/lib/python3.5/site-packages/keras/engine/base_layer.py in __call__(self, inputs, **kwargs)
    455             # Actually call the layer,
    456             # collecting output(s), mask(s), and shape(s).
--> 457             output = self.call(inputs, **kwargs)
    458             output_mask = self.compute_mask(inputs, previous_mask)
    459 

~/.local/lib/python3.5/site-packages/keras/engine/network.py in call(self, inputs, mask)
    562             return self._output_tensor_cache[cache_key]
    563         else:
--> 564             output_tensors, _, _ = self.run_internal_graph(inputs, masks)
    565             return output_tensors
    566 

~/.local/lib/python3.5/site-packages/keras/engine/network.py in run_internal_graph(self, inputs, masks)
    719                                     kwargs['mask'] = computed_mask
    720                             output_tensors = to_list(
--> 721                                 layer.call(computed_tensor, **kwargs))
    722                             output_masks = layer.compute_mask(computed_tensor,
    723                                                               computed_mask)

~/.local/lib/python3.5/site-packages/keras/layers/recurrent.py in call(self, inputs, mask, training, initial_state)
   2192                                       mask=mask,
   2193                                       training=training,
-> 2194                                       initial_state=initial_state)
   2195 
   2196     @property

~/.local/lib/python3.5/site-packages/keras/layers/recurrent.py in call(self, inputs, mask, training, initial_state, constants)
    647                                              mask=mask,
    648                                              unroll=self.unroll,
--> 649                                              input_length=timesteps)
    650         if self.stateful:
    651             updates = []

~/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in rnn(step_function, inputs, initial_states, go_backwards, mask, constants, unroll, input_length)
   2920 
   2921         time_steps = tf.shape(inputs)[0]
-> 2922         outputs, _ = step_function(inputs[0], initial_states + constants)
   2923         output_ta = tensor_array_ops.TensorArray(
   2924             dtype=outputs.dtype,

~/.local/lib/python3.5/site-packages/keras/layers/recurrent.py in step(inputs, states)
    638         else:
    639             def step(inputs, states):
--> 640                 return self.cell.call(inputs, states, **kwargs)
    641 
    642         last_output, outputs, states = K.rnn(step,

~/.local/lib/python3.5/site-packages/keras/layers/recurrent.py in call(self, inputs, states, training)
   1971                 h_tm1_o = h_tm1
   1972             i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
-> 1973                                                       self.recurrent_kernel_i))
   1974             f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
   1975                                                       self.recurrent_kernel_f))

/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y)
    810     with ops.name_scope(None, op_name, [x, y]) as name:
    811       if isinstance(x, ops.Tensor) and isinstance(y, ops.Tensor):
--> 812         return func(x, y, name=name)
    813       elif not isinstance(y, sparse_tensor.SparseTensor):
    814         try:

/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py in add(x, y, name)
    363   try:
    364     _, _, _op = _op_def_lib._apply_op_helper(
--> 365         "Add", x=x, y=y, name=name)
    366   except (TypeError, ValueError):
    367     result = _dispatch.dispatch(

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
    786         op = g.create_op(op_type_name, inputs, output_types, name=scope,
    787                          input_types=input_types, attrs=attr_protos,
--> 788                          op_def=op_def)
    789       return output_structure, op_def.is_stateful, op
    790 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py in new_func(*args, **kwargs)
    505                 'in a future version' if date is None else ('after %s' % date),
    506                 instructions)
--> 507       return func(*args, **kwargs)
    508 
    509     doc = _add_deprecated_arg_notice_to_docstring(

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in create_op(***failed resolving arguments***)
   3298           input_types=input_types,
   3299           original_op=self._default_original_op,
-> 3300           op_def=op_def)
   3301       self._create_op_helper(ret, compute_device=compute_device)
   3302     return ret

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in __init__(self, node_def, g, inputs, output_types, control_inputs, input_types, original_op, op_def)
   1821           op_def, inputs, node_def.attr)
   1822       self._c_op = _create_c_op(self._graph, node_def, grouped_inputs,
-> 1823                                 control_input_ops)
   1824 
   1825     # Initialize self._outputs.

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs)
   1660   except errors.InvalidArgumentError as e:
   1661     # Convert to ValueError for backwards compatibility.
-> 1662     raise ValueError(str(e))
   1663 
   1664   return c_op

ValueError: Dimensions must be equal, but are 512 and 1024 for 'replica_0_8/sequential_12/lstm_12/add' (op: 'Add') with input shapes: [512,128], [1024,128].
1reaction
gabrieldemarmiessecommented, Nov 21, 2018

I don’t have a multi gpu setup to work on this piece of code. I’ve flagged this issue as bug because I think it needs attention. So hopefully someone will fix it in the future.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ValueError: Dimensions must be equal (keras) - Stack Overflow
I'm trying to train an autoencoder but have problems in reshaping my X_train to fit it to my model model(). from tensorflow import...
Read more >
Optimize TensorFlow GPU performance with the TensorFlow ...
Overview. This guide will show you how to use the TensorFlow Profiler with TensorBoard to gain insight into and get the maximum performance ......
Read more >
ValueError: Dimensions must be equal : r/tensorflow - Reddit
This is the error message I got. ValueError: Dimensions must be equal but are 9 and 4 for '{{node mean_absolute_percentage_error/sub}} =…
Read more >
cuFFT Library User's Guide - NVIDIA Documentation Center
This document describes cuFFT, the NVIDIA® CUDA® Fast Fourier Transform (FFT) product. It consists of two separate libraries: cuFFT and cuFFTW.
Read more >
Multi-GPU and distributed training - Keras
Specifically, this guide teaches you how to use the tf.distribute API to train Keras models on multiple GPUs, with minimal changes to your ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found