Keras training and validation accuracy always converges to 0.5
See original GitHub issueSystem information
- Have I written custom code (as opposed to using example directory): yes
- OS Platform and Distribution: Linux ubuntu 4.15.0-51-generic #16~18.04.1-Ubuntu SMP
- TensorFlow backend: yes
- TensorFlow version: 1.13.1
- Keras version: 2.2.4
- Python version: 3.6.8
I have used a model (provided here] that trains a model on two categories of pictures and then tries to classify them. Furthermore, I force the network to use the same seeds when training so as to get comparable results. I also create and close the tf sessions as I have read that this may also cause problems.
Describe the current behaviour Most of the time the test and validation accuracy converge around 0.5 and the loss stays at exactly the same value for every epoch. I can run the same code, without changing it, several times in a row and get this problem and only rarely do I get a run where the neural network is trained properly and rises to and above 0.9 accuracy for training and validation.
Epoch 1/5
20/20 [==============================] - 5s 255ms/step - loss: 8.0572 - acc: 0.4994 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 2/5
20/20 [==============================] - 5s 252ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 3/5
20/20 [==============================] - 5s 251ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 4/5
20/20 [==============================] - 5s 250ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 5/5
20/20 [==============================] - 5s 252ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Describe the expected behaviour
For the same seeds the run should produce at least very similar results i.e. it should not be stuck around 0.5 accuracy.
Code to reproduce the issue
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import keras
from keras import backend as K
import tensorflow as tf
import time
import matplotlib.pyplot as plt
import sys
import os
import numpy
import random as rn
os.environ['PYTHONHASHSEED'] = '0'
numpy.random.seed(11)
rn.seed(11)
tf.set_random_seed(11)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, itner_op_parallelism_threads=1)
sess = tf.Session(graph.tf.get_default_graph(), config=session_conf)
K.set_session(sess)
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(300, 300, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
keras.optimizers.Adam(lr=0.001)
model.trainable
model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])
train_datagen = ImageDataGenerator()
test_datagen = ImageDataGenerator()
train_generator = tain_datagen.flow_from_directory('train', target_size=(100, 100), batch_size=16, class_mode='binary', shuffle=False)
validation_generator = test_datagen.flow_from_directory('validation', target_size=(100, 100), batch_size=16, class_mode='binary', shuffle=False)
history = model.fit_generator(train_generator, steps_per_epoch=320//16, epochs=5, validation_data=validation_generator, validation_steps = 80/16)
K.clear_session()
[Edit] I’ve tried a different tutorial and I get the same result. The only changes I have done to the code is to change the image dimension to the size of my images, changed the image directories and to turn off Image Augmentation, as well as changed the number of images.
About half of my runs produced output such as this
00/100 [==============================] - 11s 115ms/step - loss: 7.9406 - acc: 0.5006 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 2/10
100/100 [==============================] - 11s 111ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 3/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 4/10
100/100 [==============================] - 11s 111ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 5/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 6/10
100/100 [==============================] - 11s 109ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 7/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 8/10
100/100 [==============================] - 11s 109ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 9/10
100/100 [==============================] - 11s 106ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 10/10
100/100 [==============================] - 11s 107ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
while the other half produced
00/100 [==============================] - 11s 114ms/step - loss: 6.7217 - acc: 0.5731 - val_loss: 4.1994 - val_acc: 0.7362
Epoch 2/10
100/100 [==============================] - 11s 110ms/step - loss: 3.4067 - acc: 0.6759 - val_loss: 0.3655 - val_acc: 0.8662
Epoch 3/10
100/100 [==============================] - 11s 109ms/step - loss: 0.4138 - acc: 0.8434 - val_loss: 0.3426 - val_acc: 0.8612
Epoch 4/10
100/100 [==============================] - 11s 110ms/step - loss: 0.3220 - acc: 0.8831 - val_loss: 0.3140 - val_acc: 0.8788
Epoch 5/10
100/100 [==============================] - 11s 110ms/step - loss: 0.2494 - acc: 0.9106 - val_loss: 0.3999 - val_acc: 0.8575
Epoch 6/10
100/100 [==============================] - 11s 110ms/step - loss: 0.2244 - acc: 0.9250 - val_loss: 0.3218 - val_acc: 0.8900
Epoch 7/10
100/100 [==============================] - 11s 108ms/step - loss: 0.1907 - acc: 0.9344 - val_loss: 0.6445 - val_acc: 0.8375
Epoch 8/10
100/100 [==============================] - 11s 108ms/step - loss: 0.1743 - acc: 0.9409 - val_loss: 0.4450 - val_acc: 0.8738
Epoch 9/10
100/100 [==============================] - 11s 110ms/step - loss: 0.1382 - acc: 0.9503 - val_loss: 0.4937 - val_acc: 0.8738
Epoch 10/10
100/100 [==============================] - 11s 110ms/step - loss: 0.1301 - acc: 0.9550 - val_loss: 0.4955 - val_acc: 0.8950
(the numbers were not always exactly the same).
I though that maybe I have bad images so I tried use images from other tutorials (like this) but I got the same results: about half or more of the runs are useless because they converge at 0.5 accuracy. What’s going on?
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (2 by maintainers)
When using gen_train.flow_from_directory(‘…/data/train’, shuffle=True, class_mode=‘binary’) with activation=‘sigmoid’
be sure to add class_mode=‘binary’ to gen_train.flow_from_directory
One has to change the last layer
to
When
tain_datagen.flow_from_directory
generates batches, it uses one-hot encoding, returning a vector of two instead of scalar.