question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training works in keras 2.2.2 but not in 2.2.3 and 2.2.4

See original GitHub issue

Hi, I’m training a semantic image segmentation model.

And I’ve built a generator that yields:

(images, masks, sample_weights) my batch size is 8. at the following shapes:

( (8, 256, 256, 3) , (8, 65536, 1) , (8, 65536) ) In the compilation I also defined sample_weight_mode = "temporal"

and this is the error I’m getting: (it looks like these versions reshape the validation labels to be a 1D vector)

InvalidArgumentError: Incompatible shapes: [524288] vs. [8,65536]
	 [[Node: metrics/acc/Equal = Equal[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/Reshape, metrics/acc/Cast)]]
	 [[Node: training/Adam/gradients/activation_1_1/concat_grad/Slice_1/_799 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2406_training/Adam/gradients/activation_1_1/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

When I set batch_size=1 it works.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:7
  • Comments:20 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
mrluincommented, Dec 19, 2018

The issue happens with tf1.9 and keras2.2.4

2reactions
durandg12commented, Feb 18, 2019

I have a similar issue using tf.keras. My keras version is 2.2.4, my tensorflow version is 1.12.0, and I use Python 3.6. I train a simple RNN with one embedding layer, one GRU layer with return_sequences=True, one Dense layer (using TimeDistributed or not with the Dense layer doesn’t change the problem, I get the same error). My metric is sparse_categorical_entropy.

The length of my sequences is 4, and there are 60 categories, so the shape of my input is (batch_size, 4, 60). The shape of my output is also (batch_size, 4, 60) since return_sequences=True.

The error I get is InvalidArgumentError: Incompatible shapes: [batch_size] vs. [batch_size,4] [Op:Equal]

Note that, like in the posts above, the error does not occur when batch_size=1 or when I don’t use any metric. In both cases the neural network trains without error.

Note also that if I set return_sequences=False, that is, I only try to predict one category instead of a sequence and my output is of shape (batch_size,60), then the network trains also without error.

EDIT : I managed to make my training work by replacing

    model.compile(
      optimizer = tf.train.AdamOptimizer(),
      loss = 'sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'])

by

    model.compile(
      optimizer = tf.train.AdamOptimizer(),
      loss = 'sparse_categorical_crossentropy',
      metrics=[new_sparse_categorical_accuracy])

where I defined new_sparse_categorical_accuracy by copying the code taken from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/metrics.py, that is:

    from tensorflow.python.ops import math_ops
    from tensorflow.python.framework import ops
    from tensorflow.python.keras import backend as K
    from tensorflow.python.ops import array_ops
    def new_sparse_categorical_accuracy(y_true, y_pred):
        y_pred_rank = ops.convert_to_tensor(y_pred).get_shape().ndims
        y_true_rank = ops.convert_to_tensor(y_true).get_shape().ndims
        # If the shape of y_true is (num_samples, 1), squeeze to (num_samples,)
        if (y_true_rank is not None) and (y_pred_rank is not None) and (len(K.int_shape(y_true)) == len(K.int_shape(y_pred))):
            y_true = array_ops.squeeze(y_true, [-1])
        y_pred = math_ops.argmax(y_pred, axis=-1)
        # If the predicted output and actual output types don't match, force cast them
        # to match.
        if K.dtype(y_pred) != K.dtype(y_true):
            y_pred = math_ops.cast(y_pred, K.dtype(y_true))
        return math_ops.cast(math_ops.equal(y_true, y_pred), K.floatx())
Read more comments on GitHub >

github_iconTop Results From Across the Web

Keras 2.2.4 with TensorFlow 1.4.1 crashing GPU instances
I'm having trouble when using a Conv2D architecture, the GPU instance seems to crash, i.e. i can see the GPU memory fill up...
Read more >
How to correctly install Keras and Tensorflow - ActiveState
Once TensorFlow and Keras are installed, you can start working with them. ... It's not necessary to import all of the Keras and...
Read more >
About Keras
Keras is: Simple -- but not simplistic. Keras reduces developer cognitive load to free you to focus on the parts of the problem...
Read more >
Deep Learning with Keras - Keras 2.2.4 - INTERMEDIATE
In this 19-video course, learners explore deep learning with Keras, including how to create and use neural networks with Keras for machine learning ......
Read more >
Keras vs. tf.keras: What's the difference in TensorFlow 2.0?
What does the TensorFlow 2.0 release mean for me as a Keras user? Am I supposed to use the keras package for training...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found