Keras losses discussion
See original GitHub issueI’ve been working with Keras for a little while now and one of my main frustrations are the way it handles losses. It appears restricted to the form of y_true
, y_pred
, where both tensors must have the same shape. In regular classification problems suchs as VOC, ImageNet, CIFAR, etc. this will work fine, but for more ‘exotic’ networks this poses issues.
For example, the MNIST siamese example network works around this issue by outputting the distance between two inputs as the model output and then computing the loss based on that distance.
Similarly, the Triplet Loss depends on not two but three images, meaning two distance measures. This is circumvented in this project by computing the loss inside the model, outputting that loss and using an “identity_loss
” to compute the mean loss.
The variational autoencoder demo finds another way to resolve this, by adding a custom layer which adds to the loss targets directly and setting the models’ loss targets to None
. This works apparently, but it is far from ideal as it feels like a massive workaround.
For RPN (Region Proposal Network) it is even more weird. RPN generates a variable number of proposals from an image and calculates the targets (similar to y_true
) for these proposals. In other words, y_true
cannot be computed beforehand and, nor can its shape be guessed beforehand. The only solution here would be to use an identity_loss
as in the Triplet case or use a custom layer which adds a loss target as with the variational autoencoder.
There are multiple issues that discuss this problem.
This issue aims to be a discussion point on how to improve the current scenario.
My proposal would be to reduce the restrictions on the loss parameter for a Keras model and allow for arbitrary loss tensors. Below I will show some code to show how I imagine it (Triplet network example):
def create_base_network(input_shape):
seq = Sequential()
seq.add(Conv2D(20, (5,5), input_shape=input_shape))
seq.add(MaxPool2D(pool_size=(2,2)))
seq.add(Flatten())
seq.add(Dense(128))
base_network = create_base_network((224, 224, 3))
query = Input((224, 224, 3))
positive = Input((224, 224, 3))
negative = Input((224, 224, 3))
query_embedding = base_network(query)
positive_embedding = base_network(positive)
negative_embedding = base_network(negative)
loss = Lambda(triplet_loss)([query_embedding, positive_embedding, negative_embedding])
model = Model(inputs=[query, positive, negative], outputs=None, loss=[loss])
That network wouldn’t require an output, but it does have a loss function. The deployed version would output an embedding, but this isn’t necessary for the training model.
EDIT: Basically this proposal would drop the relation between outputs
and loss
, and thus its restrictions. In addition, it would change the meaning of loss
into that of a list of tensors which have to be minimized during training. outputs
would simply be the list of tensors that get returned by the model after prediction.
@fchollet I would love to hear what you think of this proposal; is it something you would support? I can make an attempt to work on this but it is better to discuss such a significant change first. Will we have to worry about backwards compatibility, or should this be a change for the next major release of Keras?
Issue Analytics
- State:
- Created 6 years ago
- Reactions:15
- Comments:8 (2 by maintainers)
I want to echo @hgaiser. It’s extremely difficult to implement the aforementioned RPN losses in a flexible and intuitive manner. I disagree with @hgaiser, however, losses on intermediate layers are not exotic (consider every model presented at CVPR or ICCV that uses RCNN-like anchor generation). ☺️
`class CustomVariationalLayers(keras.layers.Layer): def vae_loss(self,inputs): x=inputs[0] z_decoded=inputs[1] x=K.flatten(x) z_decoded=K.flatten(z_decoded) xent_loss=keras.metrics.binary_crossentropy(x,z_decoded) kl_loss=-5e-4*K.mean( 1+z_log_var-K.square(z_mean)-K.exp(z_log_var),axis=-1) return K.mean(xent_loss+kl_loss) def call(self, inputs): x=inputs[0] z_decoded=inputs[1] loss=self.vae_loss([x, z_decoded]) self.add_loss(loss, inputs=inputs) return x
y=CustomVariationalLayers()([input_img, z_decoded])`
To call CustomVariationalLayers(), why the arguments z_mean and z_log_var are not needed ?