Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Keras training and validation accuracy always converges to 0.5

See original GitHub issue

System information

Have I written custom code (as opposed to using example directory): yes
OS Platform and Distribution: Linux ubuntu 4.15.0-51-generic #16~18.04.1-Ubuntu SMP
TensorFlow backend: yes
TensorFlow version: 1.13.1
Keras version: 2.2.4
Python version: 3.6.8

I have used a model (provided here] that trains a model on two categories of pictures and then tries to classify them. Furthermore, I force the network to use the same seeds when training so as to get comparable results. I also create and close the tf sessions as I have read that this may also cause problems.

Describe the current behaviour Most of the time the test and validation accuracy converge around 0.5 and the loss stays at exactly the same value for every epoch. I can run the same code, without changing it, several times in a row and get this problem and only rarely do I get a run where the neural network is trained properly and rises to and above 0.9 accuracy for training and validation.

Epoch 1/5
20/20 [==============================] - 5s 255ms/step - loss: 8.0572 - acc: 0.4994 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 2/5
20/20 [==============================] - 5s 252ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 3/5
20/20 [==============================] - 5s 251ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 4/5
20/20 [==============================] - 5s 250ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 5/5
20/20 [==============================] - 5s 252ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000

Describe the expected behaviour
For the same seeds the run should produce at least very similar results i.e. it should not be stuck around 0.5 accuracy.

Code to reproduce the issue

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import keras
from keras import backend as K

import tensorflow as tf

import time
import matplotlib.pyplot as plt
import sys
import os
import numpy
import random as rn

os.environ['PYTHONHASHSEED'] = '0'
numpy.random.seed(11)
rn.seed(11)  
tf.set_random_seed(11)

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, itner_op_parallelism_threads=1)
sess = tf.Session(graph.tf.get_default_graph(), config=session_conf)
K.set_session(sess)

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(300, 300, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))


keras.optimizers.Adam(lr=0.001)
model.trainable
model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])

train_datagen = ImageDataGenerator()
test_datagen = ImageDataGenerator()

train_generator = tain_datagen.flow_from_directory('train', target_size=(100, 100), batch_size=16, class_mode='binary', shuffle=False)
validation_generator = test_datagen.flow_from_directory('validation', target_size=(100, 100), batch_size=16, class_mode='binary', shuffle=False)

history = model.fit_generator(train_generator, steps_per_epoch=320//16, epochs=5, validation_data=validation_generator, validation_steps = 80/16)

K.clear_session()

[Edit] I’ve tried a different tutorial and I get the same result. The only changes I have done to the code is to change the image dimension to the size of my images, changed the image directories and to turn off Image Augmentation, as well as changed the number of images.

About half of my runs produced output such as this

00/100 [==============================] - 11s 115ms/step - loss: 7.9406 - acc: 0.5006 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 2/10
100/100 [==============================] - 11s 111ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 3/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 4/10
100/100 [==============================] - 11s 111ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 5/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 6/10
100/100 [==============================] - 11s 109ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 7/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 8/10
100/100 [==============================] - 11s 109ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 9/10
100/100 [==============================] - 11s 106ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 10/10
100/100 [==============================] - 11s 107ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000

while the other half produced

00/100 [==============================] - 11s 114ms/step - loss: 6.7217 - acc: 0.5731 - val_loss: 4.1994 - val_acc: 0.7362
Epoch 2/10
100/100 [==============================] - 11s 110ms/step - loss: 3.4067 - acc: 0.6759 - val_loss: 0.3655 - val_acc: 0.8662
Epoch 3/10
100/100 [==============================] - 11s 109ms/step - loss: 0.4138 - acc: 0.8434 - val_loss: 0.3426 - val_acc: 0.8612
Epoch 4/10
100/100 [==============================] - 11s 110ms/step - loss: 0.3220 - acc: 0.8831 - val_loss: 0.3140 - val_acc: 0.8788
Epoch 5/10
100/100 [==============================] - 11s 110ms/step - loss: 0.2494 - acc: 0.9106 - val_loss: 0.3999 - val_acc: 0.8575
Epoch 6/10
100/100 [==============================] - 11s 110ms/step - loss: 0.2244 - acc: 0.9250 - val_loss: 0.3218 - val_acc: 0.8900
Epoch 7/10                                                                                                                                                                              
100/100 [==============================] - 11s 108ms/step - loss: 0.1907 - acc: 0.9344 - val_loss: 0.6445 - val_acc: 0.8375                                                             
Epoch 8/10                                                                                                                                                                              
100/100 [==============================] - 11s 108ms/step - loss: 0.1743 - acc: 0.9409 - val_loss: 0.4450 - val_acc: 0.8738                                                             
Epoch 9/10                                                                                                                                                                              
100/100 [==============================] - 11s 110ms/step - loss: 0.1382 - acc: 0.9503 - val_loss: 0.4937 - val_acc: 0.8738                                                             
Epoch 10/10                                                                                                                                                                             
100/100 [==============================] - 11s 110ms/step - loss: 0.1301 - acc: 0.9550 - val_loss: 0.4955 - val_acc: 0.8950

(the numbers were not always exactly the same).

I though that maybe I have bad images so I tried use images from other tutorials (like this) but I got the same results: about half or more of the runs are useless because they converge at 0.5 accuracy. What’s going on?

Issue Analytics

State:
Created 4 years ago
Comments:11 (2 by maintainers)

Top GitHub Comments

5reactions

khatbahusaincommented, Jan 29, 2021

When using gen_train.flow_from_directory(‘…/data/train’, shuffle=True, class_mode=‘binary’) with activation=‘sigmoid’

be sure to add class_mode=‘binary’ to gen_train.flow_from_directory

3reactions

irudnytscommented, Apr 22, 2020

One has to change the last layer

model.add(Dense(1))
model.add(Activation('sigmoid'))

model.add(Dense(2))
model.add(Activation('softmax'))

When tain_datagen.flow_from_directory generates batches, it uses one-hot encoding, returning a vector of two instead of scalar.

Top Results From Across the Web

Keras model not training layers, validation accuracy always 0.5

My Keras CNN model (based on an implementation of AlexNet) always has training accuracy close to 0.5 (within +- 0.02) and the validation...

Keras training and validation accuracy always converges to 0.5

Most of the time the test and validation accuracy converge around 0.5 and the loss stays at exactly the same value for every...

My dogs vs cats models always have 0.5 accuracy

During training, accuracy always hovers around 0.5, and doesn't improve. ... models ever improve, and always converge on an accuracy of 0.5.

machine learning - Siamese model accuracy stuck at 0.5

I'm trying to train a Siamese Network model for a signatures dataset using Keras API and considering the loss only seems not bad....

Overfit and underfit | TensorFlow Core

As always, the code in this example will use the tf.keras API, ... accuracy of models on the validation data would peak after...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Keras training and validation accuracy always converges to 0.5

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Merge/concatenate confusion plus multiple layers in different models

Tensorflow new version lots of deprecations