Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

predict_generator cannot maintain data order

See original GitHub issue

It seems that predict_generator cannot maintain the data order when using multiprocessing. When feeding into several batches test data into predict_generator, the output array does not correspond to input batch index, which makes us have no clue which output is the prediction of which input, and that makes the function useless. One possible remedy for this might be using priority queue rather than normal queue to maintain the order.

Here is detailed test code.

## mnist_cnn.py in examples
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.models import *
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

batch_size = 128
nb_classes = 10
nb_epoch = 8

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(X_test, Y_test))


############# Core test code starts here #####################
def generator_from_array(X_test):
        while 1:
                for i in range(100):
                        yield X_test[i:i+1]

print('Predict on batch:')
out = []
for i in range(100):
        out_tmp = model.predict_on_batch(X_test[i:i+1])
        out.append(out_tmp)
print(out[1])
print(out[50])
print(out[-1])
print("Predict generator")
output = model.predict_generator(generator_from_array(X_test), 100, max_q_size=10, nb_worker=4, pickle_safe=True)
print(output.shape)
print(output[1])
print(output[50])
print(output[-1])

And here are results.

Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.8.0 locally
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/8
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 0.405
pciBusID 0000:81:00.0
Total memory: 15.89GiB
Free memory: 15.61GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:81:00.0)
60000/60000 [==============================] - 7s - loss: 0.3829 - acc: 0.8815 - val_loss: 0.0859 - val_acc: 0.9743
Epoch 2/8
60000/60000 [==============================] - 5s - loss: 0.1336 - acc: 0.9603 - val_loss: 0.0606 - val_acc: 0.9806
Epoch 3/8
60000/60000 [==============================] - 5s - loss: 0.1041 - acc: 0.9690 - val_loss: 0.0533 - val_acc: 0.9833
Epoch 4/8
60000/60000 [==============================] - 5s - loss: 0.0861 - acc: 0.9735 - val_loss: 0.0441 - val_acc: 0.9852
Epoch 5/8
60000/60000 [==============================] - 5s - loss: 0.0781 - acc: 0.9763 - val_loss: 0.0409 - val_acc: 0.9861
Epoch 6/8
60000/60000 [==============================] - 5s - loss: 0.0702 - acc: 0.9793 - val_loss: 0.0387 - val_acc: 0.9870
Epoch 7/8
60000/60000 [==============================] - 5s - loss: 0.0626 - acc: 0.9815 - val_loss: 0.0379 - val_acc: 0.9867
Epoch 8/8
60000/60000 [==============================] - 5s - loss: 0.0605 - acc: 0.9817 - val_loss: 0.0352 - val_acc: 0.9891
Predict on batch:
[[  1.61985781e-07   9.81094581e-06   9.99989748e-01   2.69348943e-09
    1.97990360e-10   8.48836210e-11   4.53296529e-08   7.74509276e-11
    2.23150167e-07   2.99653670e-11]]
[[  9.75747753e-06   2.34337261e-09   5.09917042e-09   1.79785129e-08
    6.84200643e-08   6.34509252e-06   9.99983668e-01   4.00663530e-11
    1.21496996e-07   1.95249289e-10]]
[[  9.69054281e-10   1.05993847e-09   1.87508320e-09   1.94809417e-07
    1.49762297e-06   4.11489260e-08   8.54344595e-10   8.08601499e-07
    1.35151751e-07   9.99997377e-01]]
Predict generator
(100, 10)
[  1.61985781e-07   9.81094581e-06   9.99989748e-01   2.69348943e-09
   1.97990360e-10   8.48836210e-11   4.53296529e-08   7.74509276e-11
   2.23150167e-07   2.99653670e-11]
[  9.99998927e-01   6.84537635e-11   4.53024768e-07   9.15579487e-11
   1.19156296e-10   1.37983824e-09   6.24313543e-08   5.71949954e-09
   1.34752597e-07   4.58147241e-07]
[  6.04119035e-04   2.68195297e-08   1.23279997e-05   2.34821496e-10
   9.99363124e-01   1.72202430e-08   1.96394576e-05   6.58836768e-07
   1.14492806e-07   3.96185520e-08]

Issue Analytics

State:
Created 7 years ago
Reactions:10
Comments:14 (3 by maintainers)

Top GitHub Comments

20reactions

patyorkcommented, Jan 15, 2017

That’s a use case, of course. If setting nb_workers=1 won’t work for you, due to slower speed, and just loading all of the inputs and calling predict is too much for your memory, you’d probably be better off writing your own generator + predict/predict_on_batch routine such that you can queue the inputs how you’d like, and be able to save the predictions (and a reference to the inputs that created them) on the fly how you’d like (and then unload) to preserve memory.

That’s a pretty niche/uncommon issue to need to solve (high speed, large dataset, prediction and saving); most likely too niche for inclusion in the Keras core.

17reactions

LamDangcommented, Apr 17, 2017

@patyork @iammarvelous Hello there, I just have the same issue that makes me lose quite some time debugging. I think this should be considered a bug, because if predict_generator() method cannot reconstruct the order then the prediction is not usable and the method is just useless.

I suggest one of the following:

At least have a warning message for user to warn them of the risk using predict_generator with workers >1
Force workers=1 all the time
Somehow keep the batch index in the queue and reconstruct the right order