question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Custom loss function y_true y_pred shape mismatch

See original GitHub issue

Hello,

I am trying to create a custom loss function in Keras, where the target values for my network and the output of my network are of different shapes. Here is the custom loss function I have defined:

def custom_loss(y_true, y_pred):
    sml= T.nnet.sigmoid( - y_pred )
    s1ml= T.nnet.sigmoid( 1.0 -y_pred )
    a = sml
    b = s1ml - sml
    c = 1.0 - s1ml
    p = T.stack((a,b,c), axis=1)
    part1 =  np.log(p + 1.0e-20)
    part2 = y_true * part1
    cost = -(part2).sum()
    return cost

y_pred is of shape (batch_size, 1) and y_true is of shape (batch_size,3), and I aim to calculate a single error value using the above code. However, Keras gives me the following error:

ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{Composite{EQ(i0, RoundHalfAwayFromZero(i1))}}(dense_3_target, Elemwise{Add}[(0, 0)].0)
Toposort index: 83
Inputs types: [TensorType(float32, matrix), TensorType(float32, matrix)]
Inputs shapes: [(1001, 3), (1001, 1)]
Inputs strides: [(12, 4), (4, 4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Sum{acc_dtype=int64}(Elemwise{Composite{EQ(i0, RoundHalfAwayFromZero(i1))}}.0)]]

Does Keras not allow you to have different y_true and y_pred shapes? My cost function requires a singular output of my network and must calculate the cost against a y_true matrix of shape (batch_size,3).

Here is the output of model.summary():

____________________________________________________________________________________________________
Layer (type)                       Output Shape        Param #     Connected to                     
===================================================================================================
convolution2d_1 (Convolution2D)    (None, 30, 1, 591)  1830        convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)      (None, 30, 1, 147)  0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)    (None, 30, 1, 138)  9030        maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)      (None, 30, 1, 34)   0           convolution2d_2[0][0]            
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)    (None, 30, 1, 25)   9030        maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D)      (None, 30, 1, 6)    0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
flatten_1 (Flatten)                (None, 180)         0           maxpooling2d_3[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                    (None, 20)          3620        flatten_1[0][0]                  
____________________________________________________________________________________________________
activation_1 (Activation)          (None, 20)          0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                    (None, 20)          420         activation_1[0][0]               
____________________________________________________________________________________________________
activation_2 (Activation)          (None, 20)          0           dense_2[0][0]                    
____________________________________________________________________________________________________
dense_3 (Dense)                    (None, 1)           21          activation_2[0][0]               
====================================================================================================
Total params: 23951

Thank you for the help!

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:4
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

24reactions
zach-nervanacommented, Dec 21, 2017

Is there any chance this will get supported in a more natural way? This is quite a hack

9reactions
bstrinercommented, Jun 1, 2017

Wow! @ssfrr @RishabGargeya so this is a little weird architecturally and I didn’t think it would work but try the below code. It trains a model where the inputs are x and y (not one-hot), and the targets are None.

@fchollet do you have any thoughts on how to approach this type of problem? In some situations, like sequence learning, you need your output sequence to also be an Input so you can use it in an RNN, and you don’t want the redundancy of it being both an input and a target. I had been using dummy targets, but that still meant I had to pass zeros or something to train, which is kind of awkward. This is also the kind of thing you might do if you don’t want to one-hot encode your targets.

I had no idea about how to skip outputs. Maybe need more examples or docs about that feature.

The below approach works for passing your target as an input but it is verbose and you have to add the losses and the metrics in the right order. If there isn’t something significantly better, I can abstract it into a custom model.

import keras.backend as K
from keras.callbacks import CSVLogger
from keras.datasets import mnist
from keras.layers import Input, Lambda, Dense, Flatten, BatchNormalization, Activation
from keras.models import Model


def main():
    # Both inputs and targets are `Input` tensors
    input_x = Input((28, 28), name='input_x', dtype='uint8')  # uint8 [0-255]
    y_true = Input((1,), name='y_true', dtype='uint8')  # uint8 [0-9]
    # Build prediction network as usual
    h = Flatten()(input_x)
    h = Lambda(lambda _x: K.cast(_x, 'float32'),
               output_shape=lambda _x: _x,
               name='cast')(h)  # cast uint8 to float32
    h = BatchNormalization()(h)  # normalize pixels
    for i in range(3):  # hidden relu and batchnorm layers
        h = Dense(256)(h)
        h = BatchNormalization()(h)
        h = Activation('relu')(h)
    y_pred = Dense(10, activation='softmax', name='y_pred')(h)  # softmax output layer
    # Lambda layer performs loss calculation (negative log likelihood)
    loss = Lambda(lambda (_yt, _yp): -K.log(_yp[K.reshape(K.arange(K.shape(_yt)[0]), (-1, 1)), _yt] + K.epsilon()),
                  output_shape=lambda (_yt, _yp): _yt,
                  name='loss')([y_true, y_pred])

    # Model `inputs` are both x and y. `outputs` is the loss.
    model = Model(inputs=[input_x, y_true], outputs=[loss])
    # Manually add the loss to the model. Required because the loss_weight will be None.
    model.add_loss(K.sum(loss, axis=None))
    # Compile with the loss weight set to None, so it will be omitted
    model.compile('adam', loss=[None], loss_weights=[None])
    # Add accuracy to the metrics
    # Cannot add as a metric to compile, because metrics for skipped outputs are skipped
    accuracy = K.mean(K.equal(K.argmax(y_pred, axis=1), K.flatten(y_true)))
    model.metrics_names.append('accuracy')
    model.metrics_tensors.append(accuracy)
    # Model summary
    model.summary()

    # Train model
    train, test = mnist.load_data()
    cb = CSVLogger("mnist_training.csv")
    model.fit(list(train), [None], epochs=300, batch_size=64, callbacks=[cb], validation_data=(list(test), [None]))


if __name__ == "__main__":
    main()

Cheers

Read more comments on GitHub >

github_iconTop Results From Across the Web

Keras custom loss function - shape mismatch despite ...
If I use a built-in loss function like categorical cross-entropy, the model trains without issue. This is despite my custom loss and categorical ......
Read more >
cira guide to custom loss functions for neural networks ... - arXiv
Custom loss function to help with class imbalance ... tensor whose elements are (ytrue - ypred)2 for all pixels and spanning all samples...
Read more >
How To Build Custom Loss Functions In Keras For Any Use ...
What you have to do is to create an MAE object from keras.losses and pass in our true and predicted labels to calculate...
Read more >
A Physics-Informed Convolutional Neural Network with ...
Custom Loss Functions for Porosity Prediction in Laser ... a given input to evaluate the discrepancy between the true and predicted label.
Read more >
Custom loss function y_true y_pred shape mismatch
Custom loss function y_true y_pred shape mismatch. keras. 20 December 2016 Posted by RishabGargeya. Hello,. I am trying to create a custom loss...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found