Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading frames to 3D CNN

See original GitHub issue

Hi, everyone,

I’m trying to load frames from a dataset to an 3D Convolutional Neural Network. I wrote an algorithm to extract frames from videos of the UCF101 Action Recognition dataset, 40 frames per video, so basically i have a new dataset with subfolders representing classes, and inside each class folder i have 40 frames per videos. So, to detail, if i have 50 videos in a class folder, i have 50*40 frames.

The model that i’m using is coded bellow:

   `def cnn_3d(self):

    """
    The 3D CNN method
    """

    #Layers
    model = Sequential()

    model.add(Conv3D(32, (3,3,3), activation='relu', input_shape=(40, 80, 80, 3)))

    model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2)))

    model.add(Conv3D(64, (3,3,3), activation='relu'))

    model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2)))

    model.add(Conv3D(128, (3,3,3), activation='relu'))

    model.add(Conv3D(128, (3,3,3), activation='relu'))

    model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2)))

    model.add(Conv3D(256, (2,2,2), activation='relu'))

    model.add(Conv3D(256, (2,2,2), activation='relu'))

    model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2)))

    #FC Layers
    model.add(Flatten())

    model.add(Dense(1024))

    model.add(Dropout(0.5))

    model.add(Dense(1024))

    model.add(Dropout(0.5))

    model.add(Dense(self.n_classes, activation='softmax'))

    return model`

And i’m trying to load 40 frames at a time to train the network. Which is the best way to do this is Keras? Just setting the input_shape as a tuple with (frames, w, h, color) ? The 3D input shape for 3D CNN takes 40 frames at a time?

I’m trying to use ImageDataGenerator to fit the model:

 `train_datagen = ImageDataGenerator(
    rescale=1./255,
    shear_range=0.0,
    zoom_range=0.0,
    horizontal_flip=False,
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=0.0,
    width_shift_range=0.0,
    height_shift_range=0.0)

 test_datagen = ImageDataGenerator(rescale=1./255)

 train_generator = train_datagen.flow_from_directory(
    '/train/',
    target_size=(80, 80),
    batch_size=32,
    class_mode='categorical')

 validation_generator = test_datagen.flow_from_directory(
    '/test/',
    target_size=(80, 80),
    batch_size=32,
    class_mode='categorical')

 model.fit_generator(
    train_generator,
    steps_per_epoch=1000,
    epochs=30,
    validation_data=validation_generator,
    validation_steps=1000)`

And i’m getting this error:

 `ValueError: Error when checking input: expected conv3d_62_input to have 5 dimensions, but got array with shape (32, 80, 80, 3)`

Someone can help me? Thanks for the support and attention!

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:15

Top GitHub Comments

5reactions

ghostcommented, Aug 11, 2018

I wrote using OpenCV to preprocess frames in order to create a big tensor of images (or batches of image, like sets of 15 frames, for example).

Use this snippet:

video_folder = '/path.../'
X_data = []
y_data = []
list_of_videos = os.listdir(vide_folder)

for i in list_of_videos:
    #Video Path
    vid = str(video_folder + i) #path to each video from list1 = os.listdir(path)
    #Reading the Video
    cap = cv2.VideoCapture(vid)
    #Reading Frames
    #fps = vcap.get(5)
    #To Store Frames
    frames = []
    for j in range(40): #here we get 40 frames, for example
        ret, frame = cap.read()
        if ret == True:
            print('Class 1 - Success!')
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #converting to gray
            frame = cv2.resize(frame,(30,30),interpolation=cv2.INTER_AREA)
            frames.append(frame)
        else:
            print('Error!')
    X_data.append(frames) #appending each tensor of 40 frames resized to 30x30
    y_data.append(1) #appending a class label to the set of 40 frames
X_data = np.array(X_data)
y_data = np.array(y_data) #ready to split! :)

You can use NumPy to reshape your final dataset to (batch, frames, height, width, channels), as nicholasding said.

So, if you have 10000 samples in total, using sets of 10 frames per input, with 30 x 30 dimensions and 1 channel of color, you may reshape you X_data like:

X_data = X_data.reshape(10000, 10, 30, 30, 1) #this way you get 5 dimensions. :)

Note that in the snippet shown above (using OpenCV), a target of one is appended in a separated array (y_data) that corresponds to each set of frames in the X_data. In this case, this target is 1. If you are working with more than one class (obviously), you may repeat this code appending other numbers to the y_data, at the end of the preprocessing, stack the X_datas to get you complete X, you can do this by using the vstack() numpy function, like this:

X_data_class_1 #numpy big tensor that contains sets of frames for the first class
X_data_class_2 #numpy big tensor that contains sets of frames for the second class
X_final = np.vstack((X_data_class_1, X_data_class_2))

This way you’ll get your final X_data with 10000 samples, repeat this process to y_data and train your model! 😃

1reaction

nicholasdingcommented, Jul 31, 2018

The 5 dimensions are (batch, frames, height, width, channels), I write a generator to produce such vector for 3D CNN training.

Top Results From Across the Web

Loading frames to 3D CNN - python - Stack Overflow

The way I would do this is to create a numpy array of size m x n where m is the number of...

Video classification with a 3D convolutional neural network

A 3D CNN uses a three-dimensional filter to perform convolutions. ... Specifically, this class contains a Python generator that loads the video frames...

3D image classification from CT scans - Keras

A 3D CNN is simply the 3D equivalent: it takes as input a 3D volume or a sequence of 2D frames (e.g. slices...

3D-CNN-Based Fused Feature Maps with LSTM Applied to ...

The 3D convolution is achieved by convolving a 3D kernel to the cube formed by stacking multiple contiguous frames together.

Two-stream fusion model using 3D-CNN and 2D-CNN via ...

Two-stream fusion model using 3D-CNN and 2D-CNN via video-frames and optical flow motion templates for hand gesture recognition.