Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

model.evaluate() gives a different loss on training data from the one in training process

See original GitHub issue

I’m implementing a CNN model, when I just have few layers, it works well. When I tried a deeper network, I can achieve a high performance (a small loss given during the training process) on training data, but when I use model.evaluate() on training data, I get a poor performance (much greater loss). I wonder why this will happen since the evaluation are all on training data.

Here is what I got:

input_shape = (X.shape[1], X.shape[2], 1)
model = Sequential()

y = [label2id[l] for l in labels.reshape(-1)]
y =  keras.utils.to_categorical(y)

model.add(Conv2D(32, (5, 5), strides=(2,2), input_shape=input_shape))
model.add(Activation('relu'))
model.add(BatchNormalization())


model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(Conv2D(512, (1, 1)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Conv2D(15, (1, 1)))
model.add(Activation('relu'))
model.add(BatchNormalization())


model.add(GlobalAveragePooling2D())

model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(15, activation='softmax'))

opt = Adam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

model.fit(np.expand_dims(X, axis=3), y, batch_size=200, epochs=15, validation_data=(np.expand_dims(X_val,3), y_val))

The log during training:

Train on 582 samples, validate on 290 samples
Epoch 1/15
582/582 [==============================] - 14s - loss: 2.6431 - acc: 0.1821 - val_loss: 2.6653 - val_acc: 0.0759
Epoch 2/15
582/582 [==============================] - 12s - loss: 2.3759 - acc: 0.3832 - val_loss: 3.9411 - val_acc: 0.0655
Epoch 3/15
582/582 [==============================] - 13s - loss: 2.0834 - acc: 0.4141 - val_loss: 7.2338 - val_acc: 0.0655
Epoch 4/15
582/582 [==============================] - 13s - loss: 1.8380 - acc: 0.5120 - val_loss: 9.4135 - val_acc: 0.0655
Epoch 5/15
582/582 [==============================] - 13s - loss: 1.6002 - acc: 0.5550 - val_loss: 10.0389 - val_acc: 0.0655
Epoch 6/15
582/582 [==============================] - 13s - loss: 1.3725 - acc: 0.6117 - val_loss: 11.0042 - val_acc: 0.0759
Epoch 7/15
582/582 [==============================] - 13s - loss: 1.1924 - acc: 0.6443 - val_loss: 10.2766 - val_acc: 0.0862
Epoch 8/15
582/582 [==============================] - 13s - loss: 1.0529 - acc: 0.6993 - val_loss: 9.2593 - val_acc: 0.0862
Epoch 9/15
582/582 [==============================] - 13s - loss: 0.9137 - acc: 0.7491 - val_loss: 9.9668 - val_acc: 0.0897
Epoch 10/15
582/582 [==============================] - 13s - loss: 0.7928 - acc: 0.7784 - val_loss: 9.4821 - val_acc: 0.0966
Epoch 11/15
582/582 [==============================] - 13s - loss: 0.6885 - acc: 0.8179 - val_loss: 8.7342 - val_acc: 0.1000
Epoch 12/15
582/582 [==============================] - 12s - loss: 0.6094 - acc: 0.8213 - val_loss: 8.5325 - val_acc: 0.1207
Epoch 13/15
582/582 [==============================] - 12s - loss: 0.5345 - acc: 0.8488 - val_loss: 7.9924 - val_acc: 0.1207
Epoch 14/15
582/582 [==============================] - 12s - loss: 0.4800 - acc: 0.8643 - val_loss: 7.8522 - val_acc: 0.1000
Epoch 15/15
582/582 [==============================] - 12s - loss: 0.4357 - acc: 0.8660 - val_loss: 7.1004 - val_acc: 0.1172

When I evaluate on training data:

score = model.evaluate(np.expand_dims(X, axis=3), y, batch_size=32)
print score

576/582 [============================>.] - ETA: 0s[7.6189327469396426, 0.10309278350515463]

On validation data

score = model.evaluate(np.expand_dims(X_val, axis=3), y_val, batch_size=32)
print score

288/290 [============================>.] - ETA: 0s[7.1004119609964302, 0.11724137931034483]

Could someone help me? Thanks a lot.

Issue Analytics

State:
Created 6 years ago
Reactions:73
Comments:65 (4 by maintainers)

Top GitHub Comments

109reactions

j0bbycommented, Jun 26, 2018

Hello everyone,

Here is the official Kera’s answer to this question. https://keras.io/getting-started/faq/#why-is-the-training-loss-much-higher-than-the-testing-loss

Even without dropout or batch normalization, the problem will persist. The reason for this is that when you use fit, at each batch of the training data the weights are updated. The loss value returned by the fit method is not the mean of the loss of the final model, but the mean of the loss of all slightly different models used on each batch. On the other hand, when you use evaluate, the same model is used on the whole dataset. And this model actually doesn’t even appear in the loss of the fit method since even at the last batch of training, the loss computed is used to update the model’s weights.

To sum everything up, fit and evaluate have two completely different behavior, and comparing their output doesn’t make any sense !

94reactions

danielS91commented, Jun 17, 2017

It’s due to the dropout layers. During the training phase neurons are dropped. In contrast during the prediction all neurons remain in the network structure. So it’s quite likely that the results will be different. You can see it directly from the results for the validation data. They are equal, because both results are generated in the same way.

Edit: The batch normalization layers also influence the results.