question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

model.evaluate() gives a different loss on training data from the one in training process

See original GitHub issue

I’m implementing a CNN model, when I just have few layers, it works well. When I tried a deeper network, I can achieve a high performance (a small loss given during the training process) on training data, but when I use model.evaluate() on training data, I get a poor performance (much greater loss). I wonder why this will happen since the evaluation are all on training data.

Here is what I got:

input_shape = (X.shape[1], X.shape[2], 1)
model = Sequential()

y = [label2id[l] for l in labels.reshape(-1)]
y =  keras.utils.to_categorical(y)

model.add(Conv2D(32, (5, 5), strides=(2,2), input_shape=input_shape))
model.add(Activation('relu'))
model.add(BatchNormalization())


model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(Conv2D(512, (1, 1)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Conv2D(15, (1, 1)))
model.add(Activation('relu'))
model.add(BatchNormalization())


model.add(GlobalAveragePooling2D())

model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(15, activation='softmax'))

opt = Adam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

model.fit(np.expand_dims(X, axis=3), y, batch_size=200, epochs=15, validation_data=(np.expand_dims(X_val,3), y_val))

The log during training:

Train on 582 samples, validate on 290 samples
Epoch 1/15
582/582 [==============================] - 14s - loss: 2.6431 - acc: 0.1821 - val_loss: 2.6653 - val_acc: 0.0759
Epoch 2/15
582/582 [==============================] - 12s - loss: 2.3759 - acc: 0.3832 - val_loss: 3.9411 - val_acc: 0.0655
Epoch 3/15
582/582 [==============================] - 13s - loss: 2.0834 - acc: 0.4141 - val_loss: 7.2338 - val_acc: 0.0655
Epoch 4/15
582/582 [==============================] - 13s - loss: 1.8380 - acc: 0.5120 - val_loss: 9.4135 - val_acc: 0.0655
Epoch 5/15
582/582 [==============================] - 13s - loss: 1.6002 - acc: 0.5550 - val_loss: 10.0389 - val_acc: 0.0655
Epoch 6/15
582/582 [==============================] - 13s - loss: 1.3725 - acc: 0.6117 - val_loss: 11.0042 - val_acc: 0.0759
Epoch 7/15
582/582 [==============================] - 13s - loss: 1.1924 - acc: 0.6443 - val_loss: 10.2766 - val_acc: 0.0862
Epoch 8/15
582/582 [==============================] - 13s - loss: 1.0529 - acc: 0.6993 - val_loss: 9.2593 - val_acc: 0.0862
Epoch 9/15
582/582 [==============================] - 13s - loss: 0.9137 - acc: 0.7491 - val_loss: 9.9668 - val_acc: 0.0897
Epoch 10/15
582/582 [==============================] - 13s - loss: 0.7928 - acc: 0.7784 - val_loss: 9.4821 - val_acc: 0.0966
Epoch 11/15
582/582 [==============================] - 13s - loss: 0.6885 - acc: 0.8179 - val_loss: 8.7342 - val_acc: 0.1000
Epoch 12/15
582/582 [==============================] - 12s - loss: 0.6094 - acc: 0.8213 - val_loss: 8.5325 - val_acc: 0.1207
Epoch 13/15
582/582 [==============================] - 12s - loss: 0.5345 - acc: 0.8488 - val_loss: 7.9924 - val_acc: 0.1207
Epoch 14/15
582/582 [==============================] - 12s - loss: 0.4800 - acc: 0.8643 - val_loss: 7.8522 - val_acc: 0.1000
Epoch 15/15
582/582 [==============================] - 12s - loss: 0.4357 - acc: 0.8660 - val_loss: 7.1004 - val_acc: 0.1172

When I evaluate on training data:

score = model.evaluate(np.expand_dims(X, axis=3), y, batch_size=32)
print score
576/582 [============================>.] - ETA: 0s[7.6189327469396426, 0.10309278350515463]

On validation data

score = model.evaluate(np.expand_dims(X_val, axis=3), y_val, batch_size=32)
print score
288/290 [============================>.] - ETA: 0s[7.1004119609964302, 0.11724137931034483]

Could someone help me? Thanks a lot.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:73
  • Comments:65 (4 by maintainers)

github_iconTop GitHub Comments

109reactions
j0bbycommented, Jun 26, 2018

Hello everyone,

Here is the official Kera’s answer to this question. https://keras.io/getting-started/faq/#why-is-the-training-loss-much-higher-than-the-testing-loss

Even without dropout or batch normalization, the problem will persist. The reason for this is that when you use fit, at each batch of the training data the weights are updated. The loss value returned by the fit method is not the mean of the loss of the final model, but the mean of the loss of all slightly different models used on each batch. On the other hand, when you use evaluate, the same model is used on the whole dataset. And this model actually doesn’t even appear in the loss of the fit method since even at the last batch of training, the loss computed is used to update the model’s weights.

To sum everything up, fit and evaluate have two completely different behavior, and comparing their output doesn’t make any sense !

94reactions
danielS91commented, Jun 17, 2017

It’s due to the dropout layers. During the training phase neurons are dropped. In contrast during the prediction all neurons remain in the network structure. So it’s quite likely that the results will be different. You can see it directly from the results for the validation data. They are equal, because both results are generated in the same way.

Edit: The batch normalization layers also influence the results.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Evaluating on training data gives different loss - Cross Validated
When using model.fit and model.evaluate on different datasets, the result will NEVER be exactly the same. There is a multitude of factors, but ......
Read more >
Training and evaluation with the built-in methods - TensorFlow
Now, let's review each piece of this workflow in detail. The compile() method: specifying a loss, metrics, and an optimizer. To train a...
Read more >
Tensorflow model.evaluate gives different result from that ...
I am using tensorflow to do a multi-class classification. I load the training dataset and validation dataset in the ...
Read more >
Evaluate the Performance of Deep Learning Models in Keras
The gold standard for machine learning model evaluation is k-fold cross validation. It provides a robust estimate of the performance of a model...
Read more >
Step 4: Build, Train, and Evaluate Your Model
In each training iteration, batch_size number of samples from your training data are used to compute the loss, and the weights are updated...
Read more >

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found