question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Got different accuracy between history and evaluate

See original GitHub issue

I fit a model as follow: history = model.fit(x_train, y_train, epochs=50, verbose=1, validation_data=(x_val,y_val))

Got the answer : Epoch 48/50 49/49 [==============================] - 0s 3ms/step - loss: 0.0228 - acc: 0.9796 - val_loss: 3.3064 - val_acc: 0.6923 Epoch 49/50 49/49 [==============================] - 0s 3ms/step - loss: 0.0186 - acc: 1.0000 - val_loss: 3.3164 - val_acc: 0.6923 Epoch 50/50 49/49 [==============================] - 0s 2ms/step - loss: 0.0150 - acc: 1.0000 - val_loss: 3.3186 - val_acc: 0.6923

While, when I try to evaluate my model in train set with model.evaluate(x_train,y_train)

I got this [4.552013397216797, 0.44897958636283875]

I have no idea how this happen? Thank you.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:13
  • Comments:27 (3 by maintainers)

github_iconTop GitHub Comments

19reactions
mblouin02commented, Nov 11, 2018

Same issue here. When training using fit_generator, I get a training and validation accuracy that are both much higher than the ones I get when I evaluate the model manually, on training and testing data.

10reactions
RaviBansal7717commented, Aug 4, 2020

I have found a fix for this issue. I encountered this issue while using ImageDataGenerators and obviously my model accuracies on both training and validation set were far more lower when using mode.evaluate() as compared to values returned from model history. So the fix is to set shuffle=False while creating your validation generator and then your accuracy will match with the validation set. For training set it may not match as we generally keep shuffle=True for training. Below is an example of how to create validation DataGenerator for reproducible results (Set shuffle=False) :

validation_datagen=ImageDataGenerator(rescale=1./255)
validation_generator=validation_datagen.flow_from_directory(
    validation_directory,
    target_size=target_size,
    batch_size=validation_batch_size,
    class_mode=class_mode,
    shuffle=False
)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Keras evaluate give different result than history. Which one is ...
The accuracy I got by using evaluate function is 0.448. And accuracy I got by using history by averaging the validation accuracy is...
Read more >
model.fit vs model.evaluate gives different results?
The following is a small snippet of the code, but I'm trying to understand the results of model.fit with train and test dataset...
Read more >
Why Do I Get Different Results Each Time in Machine Learning?
Different results because of stochastic evaluation procedures. ... They are learning a model conditional on the historical data you have ...
Read more >
Different Results for model.evaluate() compared to model()
Hi. I have trained a MobileNets model and in the same code used the model.evaluate() on a set of test data to determine...
Read more >
Metrics - Keras
A metric is a function that is used to judge the performance of your model. ... the results from evaluating a metric are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found