model.evaluate() gives a different loss on training data from the one in training process
See original GitHub issueI’m implementing a CNN model, when I just have few layers, it works well. When I tried a deeper network, I can achieve a high performance (a small loss given during the training process) on training data, but when I use model.evaluate() on training data, I get a poor performance (much greater loss). I wonder why this will happen since the evaluation are all on training data.
Here is what I got:
input_shape = (X.shape[1], X.shape[2], 1)
model = Sequential()
y = [label2id[l] for l in labels.reshape(-1)]
y = keras.utils.to_categorical(y)
model.add(Conv2D(32, (5, 5), strides=(2,2), input_shape=input_shape))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Conv2D(512, (1, 1)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Conv2D(15, (1, 1)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(GlobalAveragePooling2D())
model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(15, activation='softmax'))
opt = Adam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
model.fit(np.expand_dims(X, axis=3), y, batch_size=200, epochs=15, validation_data=(np.expand_dims(X_val,3), y_val))
The log during training:
Train on 582 samples, validate on 290 samples
Epoch 1/15
582/582 [==============================] - 14s - loss: 2.6431 - acc: 0.1821 - val_loss: 2.6653 - val_acc: 0.0759
Epoch 2/15
582/582 [==============================] - 12s - loss: 2.3759 - acc: 0.3832 - val_loss: 3.9411 - val_acc: 0.0655
Epoch 3/15
582/582 [==============================] - 13s - loss: 2.0834 - acc: 0.4141 - val_loss: 7.2338 - val_acc: 0.0655
Epoch 4/15
582/582 [==============================] - 13s - loss: 1.8380 - acc: 0.5120 - val_loss: 9.4135 - val_acc: 0.0655
Epoch 5/15
582/582 [==============================] - 13s - loss: 1.6002 - acc: 0.5550 - val_loss: 10.0389 - val_acc: 0.0655
Epoch 6/15
582/582 [==============================] - 13s - loss: 1.3725 - acc: 0.6117 - val_loss: 11.0042 - val_acc: 0.0759
Epoch 7/15
582/582 [==============================] - 13s - loss: 1.1924 - acc: 0.6443 - val_loss: 10.2766 - val_acc: 0.0862
Epoch 8/15
582/582 [==============================] - 13s - loss: 1.0529 - acc: 0.6993 - val_loss: 9.2593 - val_acc: 0.0862
Epoch 9/15
582/582 [==============================] - 13s - loss: 0.9137 - acc: 0.7491 - val_loss: 9.9668 - val_acc: 0.0897
Epoch 10/15
582/582 [==============================] - 13s - loss: 0.7928 - acc: 0.7784 - val_loss: 9.4821 - val_acc: 0.0966
Epoch 11/15
582/582 [==============================] - 13s - loss: 0.6885 - acc: 0.8179 - val_loss: 8.7342 - val_acc: 0.1000
Epoch 12/15
582/582 [==============================] - 12s - loss: 0.6094 - acc: 0.8213 - val_loss: 8.5325 - val_acc: 0.1207
Epoch 13/15
582/582 [==============================] - 12s - loss: 0.5345 - acc: 0.8488 - val_loss: 7.9924 - val_acc: 0.1207
Epoch 14/15
582/582 [==============================] - 12s - loss: 0.4800 - acc: 0.8643 - val_loss: 7.8522 - val_acc: 0.1000
Epoch 15/15
582/582 [==============================] - 12s - loss: 0.4357 - acc: 0.8660 - val_loss: 7.1004 - val_acc: 0.1172
When I evaluate on training data:
score = model.evaluate(np.expand_dims(X, axis=3), y, batch_size=32)
print score
576/582 [============================>.] - ETA: 0s[7.6189327469396426, 0.10309278350515463]
On validation data
score = model.evaluate(np.expand_dims(X_val, axis=3), y_val, batch_size=32)
print score
288/290 [============================>.] - ETA: 0s[7.1004119609964302, 0.11724137931034483]
Could someone help me? Thanks a lot.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:73
- Comments:65 (4 by maintainers)
Top Results From Across the Web
Evaluating on training data gives different loss - Cross Validated
When using model.fit and model.evaluate on different datasets, the result will NEVER be exactly the same. There is a multitude of factors, but ......
Read more >Training and evaluation with the built-in methods - TensorFlow
Now, let's review each piece of this workflow in detail. The compile() method: specifying a loss, metrics, and an optimizer. To train a...
Read more >Tensorflow model.evaluate gives different result from that ...
I am using tensorflow to do a multi-class classification. I load the training dataset and validation dataset in the ...
Read more >Evaluate the Performance of Deep Learning Models in Keras
The gold standard for machine learning model evaluation is k-fold cross validation. It provides a robust estimate of the performance of a model...
Read more >Step 4: Build, Train, and Evaluate Your Model
In each training iteration, batch_size number of samples from your training data are used to compute the loss, and the weights are updated...
Read more >
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hello everyone,
Here is the official Kera’s answer to this question. https://keras.io/getting-started/faq/#why-is-the-training-loss-much-higher-than-the-testing-loss
Even without dropout or batch normalization, the problem will persist. The reason for this is that when you use
fit
, at each batch of the training data the weights are updated. The loss value returned by thefit
method is not the mean of the loss of the final model, but the mean of the loss of all slightly different models used on each batch. On the other hand, when you useevaluate
, the same model is used on the whole dataset. And this model actually doesn’t even appear in the loss of thefit
method since even at the last batch of training, the loss computed is used to update the model’s weights.To sum everything up,
fit
andevaluate
have two completely different behavior, and comparing their output doesn’t make any sense !It’s due to the dropout layers. During the training phase neurons are dropped. In contrast during the prediction all neurons remain in the network structure. So it’s quite likely that the results will be different. You can see it directly from the results for the validation data. They are equal, because both results are generated in the same way.
Edit: The batch normalization layers also influence the results.