question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support python generators

See original GitHub issue

I have a simple task to find the best CNN architecture for image regression. However, I have a large dataset, which cannot be loaded into memory at one time. It seems in the current release ImageRegressor only supports fit method requiring all the data (x and y) loaded in memory. How can I use generator in Autokeras? I have checked a closed issue #204, but it seems it was not solved.

I have already tried the tf.dataset by converting my generator to tf.dataset, but it didn’t work. For example,

    dataset = tf.data.Dataset.from_generator(generate_batch, (tf.float32, tf.float32))
    vq_predictor = ak.ImageRegressor()
    for i, (X, y) in enumerate(dataset):
        X_dataset = tf.data.Dataset.from_tensors(X)
        y_dataset = tf.data.Dataset.from_tensors(y)
        vq_predictor.fit(X_dataset, y_dataset, validation_split=0.2)

Then I got error:

File “C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\tasks\image.py”, line 222, in fit **kwargs) File “C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\auto_model.py”, line 231, in fit validation_split=validation_split) File “C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\auto_model.py”, line 313, in _prepare_data dataset, validation_data = utils.split_dataset(dataset, validation_split) File “C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\utils.py”, line 69, in split_dataset raise ValueError('The dataset should at least contain 2 ’ ValueError: The dataset should at least contain 2 instances to be split.

Any suggestions are highly appreciated.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:24 (7 by maintainers)

github_iconTop GitHub Comments

5reactions
theGOTOguycommented, Nov 28, 2020

@VictorReaver1999 I was also getting the “Cannot take the length of shape with unknown rank” error. The issue is that autokeras.utils.data_utils.batched doesn’t know what to do with the generator. Since the generators in the example are indeed outputting batches, just monkey-patching this function to return True will get you up and running for now.

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# Autokeras data_utils gets confused by the generator.
# Just let it know that the data is indeed batched.
ak.utils.data_utils.batched = lambda _: True

clf = ak.ImageRegressor(
    max_trials=args.max_trials,
    directory=args.save_model_dir)

Update (above left for posterity):

I found that there was a place downstream that was also failing in spite of the monkey-patch. As I experimented with AutoKeras, I found a cleaner way that both clears the downstream error and avoids the ugly monkey-patching:

def callable_iterator(generator, expected_batch_size):
  for img_batch, targets_batch in generator:
    if img_batch.shape[0] == expected_batch_size:
      yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(train_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))
val_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(val_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))

Obviously the things starting with args. should be replaced with something appropriate to your code.

This surfaces the appropriate dimensions to the necessary places in AutoKeras. AutoKeras does not behave well if you give it any incomplete batches, so I had to modify callable_iterator as above to reject incomplete batches from the iterator.

5reactions
haruizcommented, May 17, 2020

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = "dataset/train"
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# for image, label in train_dataset.take(1):
#   print("Image shape: ", image.numpy().shape)
#   print("Label: ", label.numpy())
#   plt.imshow(image.numpy()[0] * 255)
#   plt.show()


clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Use Generators and yield in Python
In this step-by-step tutorial, you'll learn about generators and yielding in Python. You'll create generator functions and generator expressions using ...
Read more >
Generators - Python Wiki
Generator functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.
Read more >
Generators in Python - GeeksforGeeks
Prerequisites: Yield Keyword and Iterators There are two terms involved when we discuss generators. Generator-Function: A generator-function is ...
Read more >
Python Generators - Programiz
In this tutorial, you'll learn how to create iterations easily using Python generators, how it is different from iterators and normal functions, and...
Read more >
Python Generator Functions - TutorialsTeacher
Python provides a generator to create your own iterator function. A generator is a special type of function which does not return a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found