Support python generators
See original GitHub issueI have a simple task to find the best CNN architecture for image regression. However, I have a large dataset, which cannot be loaded into memory at one time. It seems in the current release ImageRegressor only supports fit method requiring all the data (x and y) loaded in memory. How can I use generator in Autokeras? I have checked a closed issue #204, but it seems it was not solved.
I have already tried the tf.dataset by converting my generator to tf.dataset, but it didn’t work. For example,
dataset = tf.data.Dataset.from_generator(generate_batch, (tf.float32, tf.float32))
vq_predictor = ak.ImageRegressor()
for i, (X, y) in enumerate(dataset):
X_dataset = tf.data.Dataset.from_tensors(X)
y_dataset = tf.data.Dataset.from_tensors(y)
vq_predictor.fit(X_dataset, y_dataset, validation_split=0.2)
Then I got error:
File “C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\tasks\image.py”, line 222, in fit **kwargs) File “C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\auto_model.py”, line 231, in fit validation_split=validation_split) File “C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\auto_model.py”, line 313, in _prepare_data dataset, validation_data = utils.split_dataset(dataset, validation_split) File “C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\utils.py”, line 69, in split_dataset raise ValueError('The dataset should at least contain 2 ’ ValueError: The dataset should at least contain 2 instances to be split.
Any suggestions are highly appreciated.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:24 (7 by maintainers)
Top GitHub Comments
@VictorReaver1999 I was also getting the “Cannot take the length of shape with unknown rank” error. The issue is that autokeras.utils.data_utils.batched doesn’t know what to do with the generator. Since the generators in the example are indeed outputting batches, just monkey-patching this function to return True will get you up and running for now.
Update (above left for posterity):
I found that there was a place downstream that was also failing in spite of the monkey-patch. As I experimented with AutoKeras, I found a cleaner way that both clears the downstream error and avoids the ugly monkey-patching:
Obviously the things starting with
args.
should be replaced with something appropriate to your code.This surfaces the appropriate dimensions to the necessary places in AutoKeras. AutoKeras does not behave well if you give it any incomplete batches, so I had to modify callable_iterator as above to reject incomplete batches from the iterator.
@ciberger, this code seems to work, you can give a shot:
Then you can feed your fit function with the tf.datasets