Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError: 'input_2' while running data generators in fit_generator

See original GitHub issue

Creating a new issue for this comment https://github.com/tensorflow/neural-structured-learning/issues/73#issuecomment-731274617 Using fit generator and nsl to train an image multi-label classifier I have already split full_df into train_df and valid_df dataframes.

IMG_SIZE = (128, 128)
core_idg = ImageDataGenerator(samplewise_center=True, 
                              samplewise_std_normalization=True, 
                              horizontal_flip = True, 
                              vertical_flip = False, 
                              height_shift_range= 0.05, 
                              width_shift_range=0.1, 
                              rotation_range=5, 
                              shear_range = 0.1,
                              fill_mode = 'reflect',
                              zoom_range=0.15)
# custom flow_from_dataframe
def flow_from_dataframe(img_data_gen, in_df, path_col, y_col, **dflow_args):
    base_dir = os.path.dirname(in_df[path_col].values[0])
    print('## Ignore next message from keras, values are replaced anyways')
    df_gen = img_data_gen.flow_from_directory(base_dir, 
                                     class_mode = 'sparse',
                                    **dflow_args)
    df_gen.filenames = in_df[path_col].values
    df_gen.classes = np.stack(in_df[y_col].values)
    df_gen.samples = in_df.shape[0]
    df_gen.n = in_df.shape[0]
    df_gen._set_index_array()
    df_gen.directory = '' # since we have the full path
    print('Reinserting dataframe: {} images'.format(in_df.shape[0]))
    return df_gen

def nsl_train_generator(datagen):
    train_gen = datagen.flow_from_dataframe(dataframe=train_df,
                                         directory=None,
                                         x_col = 'newpath',
                                         y_col = 'newLabel',
                                         class_mode = 'categorical',
                                         classes = all_labels,
                                         target_size = IMG_SIZE,
                                         color_mode = 'rgb',
                                         batch_size = 64)
    for x_batch, y_batch in train_gen:
         yield {'feature': x_batch, 'label': y_batch}

def nsl_valid_generator(datagen):
    valid_gen = datagen.flow_from_dataframe(dataframe=valid_df,
                                         directory=None,
                                         x_col = 'newpath',
                                         y_col = 'newLabel',
                                         class_mode = 'categorical',
                                         classes = all_labels,
                                         target_size = IMG_SIZE,
                                         color_mode = 'rgb',
                                         batch_size = 1024) 
    for x_batch, y_batch in valid_gen:
         yield {'feature': x_batch, 'label': y_batch}

the above functions nsl_train_generator and nsl_valid_generator, I referred #3 (comment)

train_generator = nsl_train_generator(core_idg)
valid_generator = nsl_valid_generator(core_idg)
hist = adv_model.fit_generator(train_gen, 
                              steps_per_epoch=STEP_SIZE_TRAIN,
                              validation_data = valid_gen, 
                              epochs = 1,
                              callbacks = callbacks_list)

I’m getting this error,

Found 42276 validated image filenames belonging to 13 classes.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-38-6b51ad0b4644> in <module>()
      4                               validation_data = valid_gen,
      5                               epochs = 1,
----> 6                               callbacks = callbacks_list)
      7 

15 frames
/usr/local/lib/python3.6/dist-packages/neural_structured_learning/keras/adversarial_regularization.py in <listcomp>(.0)
    633         # Converts input dictionary to a list so it conforms with the model's
    634         # expected input.
--> 635         inputs = [inputs[name] for name in base_input_names]
    636     elif not self._base_with_labels_in_features:
    637       # Removes labels and sample weights from the input dictionary, since they

KeyError: 'input_2'

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7 (2 by maintainers)

Top GitHub Comments

2reactions

csferngcommented, Nov 24, 2020

Thanks for your question, @WrathofBhuvan11.

The error is due to that Keras doesn’t know which features in the input dictionary to feed to the model. One solution is to name the input placeholder during model construction:

x_in = tf.keras.Input(shape=..., dtype=..., name='feature')
y_pred = ...
model = tf.keras.Model(inputs=x_in, outputs=y_pred)

With the name argument, Keras will use the corresponding feature in the input dictionary for follow-up computation.

1reaction

DualityGapcommented, Nov 25, 2020

If we have information about the hierarchical structure in the labeling space, we should be able to leverage the ‘nearby’ labels (for example, labels under the same parental class) to enhance the learning. But how to effectively do this remains as an open and research question. My intuition is by leveraging the structure in the labeling space, the model should learn more effectively compared to regular DNN classifiers.

Let’s use semantic segmentation as an example. Assume ‘beach’ and ‘sea’ are under the same parental category ‘scenery’, while ‘people’ and ‘pets’ are under another ‘portrait’ category. In this hierarchy, the distance between ‘beach’ and ‘sea’ should be shorter (since they are under the same parental class) than the distance between ‘beach’ and ‘pet’ (under different parental class). The taste is similar to graph regularization which leverages the structure is in the feature space—and in this case the structure is in the labeling space.

This means when we are training/telling the model that it’s incorrect to segment ‘beach’ as ‘sea’, but not as wrong to segment ‘beach’ as ‘pet’. Maybe the illustration of Fig 1(a)© in this paper can provide some intuition for this explanation.

Happy to discuss further.