Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Finetune the pretrained model on UCF101

See original GitHub issue

Hi , When I Finetune the pretrained model on UCF101, I adapt the evaluate_sampe.py, only use rgb' input, change the _NUM_CLASSES` to 101, add loss and optimizer after the logits, feed the training data and label to net, but I encounter the error messages:

2017-09-02 15:28:24.771133: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771169: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771190: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771194: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:24.771198: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 15:28:25.113985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:84:00.0
Total memory: 10.91GiB
Free memory: 2.11GiB
2017-09-02 15:28:25.114035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-09-02 15:28:25.114043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y
2017-09-02 15:28:25.114067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0)
INFO:tensorflow:Restoring parameters from data/checkpoints/rgb_scratch/model.ckpt
Traceback (most recent call last):
  File "i3d_finetune_ucf101.py", line 175, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "i3d_finetune_ucf101.py", line 133, in main
    feed_dict={rgb_input:batch_xs, rgb_y: batch_ys})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (8, 101) for Tensor u'Placeholder_1:0', which has shape '(?, 400)'

Here is my python file:

# Copyright 2017 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Loads a sample video and classifies using a trained Kinetics checkpoint."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf

import i3d
from  dataset import Dataset

batch_size = 8
training_iter = 1000
learning_rate = 0.001

_IMAGE_SIZE = 227
_NUM_CLASSES = 101

_SAMPLE_VIDEO_FRAMES = 79
_SAMPLE_PATHS = {
    'rgb': 'data/v_CricketShot_g04_c01_rgb.npy',
    'flow': 'data/v_CricketShot_g04_c01_flow.npy',
}

_CHECKPOINT_PATHS = {
    'rgb': 'data/checkpoints/rgb_scratch/model.ckpt',
    'flow': 'data/checkpoints/flow_scratch/model.ckpt',
    'rgb_imagenet': 'data/checkpoints/rgb_imagenet/model.ckpt',
    'flow_imagenet': 'data/checkpoints/flow_imagenet/model.ckpt',
}

_LABEL_MAP_PATH = 'data/label_map.txt'

FLAGS = tf.flags.FLAGS

tf.flags.DEFINE_string('eval_type', 'rgb', 'rgb, flow, or joint')
tf.flags.DEFINE_boolean('imagenet_pretrained', True, '')


def main(unused_argv):
  tf.logging.set_verbosity(tf.logging.INFO)
  eval_type = FLAGS.eval_type
  imagenet_pretrained = FLAGS.imagenet_pretrained

  if eval_type not in ['rgb', 'flow', 'joint']:
    raise ValueError('Bad `eval_type`, must be one of rgb, flow, joint')

  kinetics_classes = [x.strip() for x in open(_LABEL_MAP_PATH)]

  if eval_type in ['rgb', 'joint']:
    # RGB input has 3 channels.
    rgb_input = tf.placeholder(
        tf.float32,
        shape=(batch_size, 10, _IMAGE_SIZE, _IMAGE_SIZE, 3))
    rgb_y = tf.placeholder(tf.float32, [None, _NUM_CLASSES])
    with tf.variable_scope('RGB'):
      rgb_model = i3d.InceptionI3d(
          _NUM_CLASSES, spatial_squeeze=False, final_endpoint='Logits')
      rgb_logits, _ = rgb_model(
          rgb_input, is_training=True, dropout_keep_prob=1.0)
    rgb_variable_map = {}
    for variable in tf.global_variables():
      if variable.name.split('/')[0] == 'RGB':
        rgb_variable_map[variable.name.replace(':0', '')] = variable
        print('===variable:', variable)
    rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)
#    print('=====variables', rgb_variable_map)

  if eval_type in ['flow', 'joint']:
    # Flow input has only 2 channels.
    flow_input = tf.placeholder(
        tf.float32,
        shape=(1, _SAMPLE_VIDEO_FRAMES, _IMAGE_SIZE, _IMAGE_SIZE, 2))
    with tf.variable_scope('Flow'):
      flow_model = i3d.InceptionI3d(
          _NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits')
      flow_logits, _ = flow_model(
          flow_input, is_training=False, dropout_keep_prob=1.0)
    flow_variable_map = {}
    for variable in tf.global_variables():
      if variable.name.split('/')[0] == 'Flow':
        flow_variable_map[variable.name.replace(':0', '')] = variable
    flow_saver = tf.train.Saver(var_list=flow_variable_map, reshape=True)

  if eval_type == 'rgb':
    model_logits = rgb_logits
  elif eval_type == 'flow':
    model_logits = flow_logits
  else:
    model_logits = rgb_logits + flow_logits
  model_predictions = tf.nn.softmax(model_logits)
  print( '===model_predictions.shape:', model_predictions.shape)
  model_predictions = tf.reduce_mean(model_predictions, (1,2))
  print( '===model_predictions.shape:', model_predictions.shape)
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model_predictions, labels=rgb_y))
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)

  dataset = Dataset('data/rgb_train_split1.txt', 'data/rgb_test_split1.txt')
  config = tf.ConfigProto()
  config.gpu_options.allow_growth = True
  with tf.Session(config=config) as sess:
    step = 1
    while step < training_iter:
      batch_xs, batch_ys = dataset.next_batch(batch_size, 'train')
      rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb'])
      sess.run(
        optimizer,
        feed_dict={rgb_input:batch_xs, rgb_y: batch_ys})

if __name__ == '__main__':
  tf.app.run(main)

Issue Analytics

State:
Created 6 years ago
Reactions:3
Comments:14

Top GitHub Comments

5reactions

prakashjayycommented, Nov 8, 2017

_SAMPLE_VIDEO_FRAMES = 64
_IMAGE_SIZE = 224
_NUM_CLASSES = 101
_EPOCHS = 10
_BATCH_SIZE = 4

_FILE_LOC_TRAIN = glob.glob("data/train/*.npy")
print("[Total Files: {}]".format(len(_FILE_LOC_TRAIN)))

_MEAN_DATA = np.load("data/mean_data__ucf.npy")[np.newaxis, :, :, :, :]
TRAINING = True
print("Mean_Data: {}".format(_MEAN_DATA.shape))

rgb_input = tf.placeholder(
    tf.float32,
    shape=(None, _SAMPLE_VIDEO_FRAMES, _IMAGE_SIZE, _IMAGE_SIZE, 3))

y_true = tf.placeholder(
    tf.float32,
    shape=(None, _NUM_CLASSES))

with tf.variable_scope('RGB'):
  rgb_model = i3d.InceptionI3d(_NUM_CLASSES, spatial_squeeze=True, final_endpoint='Mixed_5c')
  rgb_net, _ = rgb_model( rgb_input, is_training=False, dropout_keep_prob=1.0)
  end_point = 'Logits'
  with tf.variable_scope(end_point):
    rgb_net = tf.nn.avg_pool3d(rgb_net, ksize=[1, 2, 7, 7, 1],
                           strides=[1, 1, 1, 1, 1], padding=snt.VALID)
    if TRAINING:
        rgb_net = tf.nn.dropout(rgb_net, 0.7)
    logits = i3d.Unit3D(output_channels=_NUM_CLASSES,
                    kernel_shape=[1, 1, 1],
                    activation_fn=None,
                    use_batch_norm=False,
                    use_bias=True,
                    name='Conv3d_0c_1x1')(rgb_net, is_training=True)

    logits = tf.squeeze(logits, [2, 3], name='SpatialSqueeze')
    averaged_logits = tf.reduce_mean(logits, axis=1)

  # predictions = tf.nn.softmax(averaged_logits)


rgb_variable_map = {}

for variable in tf.global_variables():
    if variable.name.split("/")[-4] == "Logits": continue
    if variable.name.split('/')[0] == 'RGB':
        rgb_variable_map[variable.name.replace(':0', '')] = variable

#print(rgb_variable_map)
rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)

model_logits = averaged_logits
model_predictions = tf.nn.softmax(model_logits)

This worked for me.

1reaction

joaoluiscarreiracommented, Jan 4, 2019

Hi,

Yes, 25 fps.

Best,

Joao

On Fri, Jan 4, 2019, 8:06 AM ChristopherSTAN <notifications@github.com wrote:

_SAMPLE_VIDEO_FRAMES = 64 _IMAGE_SIZE = 224 _NUM_CLASSES = 101 _EPOCHS = 10 _BATCH_SIZE = 4

_FILE_LOC_TRAIN = glob.glob(“data/train/*.npy”) print(“[Total Files: {}]”.format(len(_FILE_LOC_TRAIN)))

_MEAN_DATA = np.load(“data/mean_data__ucf.npy”)[np.newaxis, :, :, :, :] TRAINING = True print(“Mean_Data: {}”.format(_MEAN_DATA.shape))

rgb_input = tf.placeholder( tf.float32, shape=(None, _SAMPLE_VIDEO_FRAMES, _IMAGE_SIZE, _IMAGE_SIZE, 3))

y_true = tf.placeholder( tf.float32, shape=(None, _NUM_CLASSES))

with tf.variable_scope(‘RGB’): rgb_model = i3d.InceptionI3d(_NUM_CLASSES, spatial_squeeze=True, final_endpoint=‘Mixed_5c’) rgb_net, _ = rgb_model( rgb_input, is_training=False, dropout_keep_prob=1.0) end_point = ‘Logits’ with tf.variable_scope(end_point): rgb_net = tf.nn.avg_pool3d(rgb_net, ksize=[1, 2, 7, 7, 1], strides=[1, 1, 1, 1, 1], padding=snt.VALID) if TRAINING: rgb_net = tf.nn.dropout(rgb_net, 0.7) logits = i3d.Unit3D(output_channels=_NUM_CLASSES, kernel_shape=[1, 1, 1], activation_fn=None, use_batch_norm=False, use_bias=True, name=‘Conv3d_0c_1x1’)(rgb_net, is_training=True)
logits = tf.squeeze(logits, [2, 3], name='SpatialSqueeze')
averaged_logits = tf.reduce_mean(logits, axis=1)
predictions = tf.nn.softmax(averaged_logits)

rgb_variable_map = {}

for variable in tf.global_variables(): if variable.name.split(“/”)[-4] == “Logits”: continue if variable.name.split(‘/’)[0] == ‘RGB’: rgb_variable_map[variable.name.replace(‘:0’, ‘’)] = variable

#print(rgb_variable_map) rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)

model_logits = averaged_logits model_predictions = tf.nn.softmax(model_logits)

This worked for me.

Hello,

I want to ask you a question about the data processing.

How do you sample the video clips? 25 frames per second?

Thank you a lot. PS: My English is poor. I am sorry about that.

Chris

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmind/kinetics-i3d/issues/1#issuecomment-451378880, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6qapACZk2UtgygrVkUjheMpD4wHpX6ks5u_wuMgaJpZM4PK1Bd .

Top Results From Across the Web

Tutorial 2: Finetuning Models - MMAction2's documentation!

This tutorial provides instructions for users to use the pre-trained models to finetune them on other datasets, so that better performance can be...

1. Getting Started with Pre-trained TSN Models on UCF101

In this tutorial, we will demonstrate how to load a pre-trained TSN model from gluoncv-model-zoo and classify video frames from the Internet or...

Self-Supervised Action Recognition on UCF101 (finetuned)

Rank Model 3‑fold Accuracy Pretrain Year Tags 1 BraVe:V‑FA (TSM‑50x2) 95.7 2021 2 XDC 95.5 2019 3 CVRL (R3D‑152 2x; K600) 93.9 K600 2020 ResNet

TSN Pretrained Models on Kinetics Dataset

The performance is compared against models with only ImageNet pretraining. Trimmed Video Classification (UCF101). We finetune the Kinetics ...

Can Temporal Information Help with Contrastive Self-Supervised ...

Our best model achieves 85.1% (UCF-101) and 51.6% (HMDB-51) top-1 accuracy, ... the self-supervised pretrained model is then fully finetuned, \ie, ...