Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Save Model with Batch Size = 1 for production

See original GitHub issue

Dear all,

I am facing a problem when I try to save a model to disk with batch size = 1 and then freeze into a .pb. I am doing this in four steps:

I am able to train squeezeDet Plus with my custom dataset. For training I use a batch size of 20. The resulting trained model.ckpt is 60MB and the model.meta is 19.9MB in size.
When I evaluate this model with eval.py, I re-create the graph with Batch size = 1 (as in the demo), and the python script works fine. I correctly get the bounding boxes for each image.
Now I would like to save this graph with batch size = 1 to disk. However, when I try to save this graph using batch size = 1 as a checkpoint, the size of the model.ckpt is reduced to 30 MB and the size of the model.meta to 233 KB
Then, when I try to freeze the files in 3) I get the error “tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value” (the script for freezing the graph works fine with the .ckpt and .meta files in point 1).

I realize that there is something wrong in the way I am saving the data in step 3, the graph and weights are not saved properly, but I cannot figure out what. Am I missing something very obvious here?

This is the minimum code (adapted from eval.py) that I am using to load the graph and the weights and save them with bath size of 1 (point 3)

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import cv2
import os.path
import numpy as np
import tensorflow as tf
from config import *
from nets import *

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('dataset', 'KITTI',
                           """Currently support PASCAL_VOC or KITTI dataset.""")
tf.app.flags.DEFINE_string('data_path', '', """Root directory of data""")
tf.app.flags.DEFINE_string('image_set', 'test',
                           """Only used for VOC data."""
                           """Can be train, trainval, val, or test""")
tf.app.flags.DEFINE_string('year', '2007',
                            """VOC challenge year. 2007 or 2012"""
                            """Only used for VOC data""")
tf.app.flags.DEFINE_string('eval_dir', '/tmp/bichen/logs/squeezeDet/eval',
                            """Directory where to write event logs """)
tf.app.flags.DEFINE_string('checkpoint_path', '/tmp/bichen/logs/squeezeDet/train',
                            """Path to the training checkpoint.""")
tf.app.flags.DEFINE_integer('eval_interval_secs', 60 * 1,
                             """How often to check if new cpt is saved.""")
tf.app.flags.DEFINE_boolean('run_once', False,
                             """Whether to run eval only once.""")
tf.app.flags.DEFINE_string('net', 'squeezeDet',
                           """Neural net architecture.""")
tf.app.flags.DEFINE_string('gpu', '0', """gpu id.""")


def main(argv=None):

  """Load weights from a pre-trained squeezeDet network trained with Batch > 1
  and save the model with batch = 1 for production"""

  with tf.Graph().as_default():

    mc = kitti_squeezeDetPlus_config()
    mc.BATCH_SIZE = 1
    mc.LOAD_PRETRAINED_MODEL = False
    model = SqueezeDetPlus(mc, FLAGS.gpu)

    saver = tf.train.Saver(model.model_params)

    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:

        # Restores from checkpoint
        ckpts = set()
        ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_path)
        ckpts.add(ckpt.model_checkpoint_path)
        print ('Loading {}...'.format(ckpt.model_checkpoint_path))
        saver.restore(sess, ckpt.model_checkpoint_path)

        sess.run(tf.initialize_all_variables())

        # Run one image to test that it works
        read_full_name = "/data/squeezeDet_TF011/src/test2.jpg"
        im = cv2.imread(read_full_name)
        im = im.astype(np.float32, copy=False)
        im = cv2.resize(im, (mc.IMAGE_WIDTH, mc.IMAGE_HEIGHT))
        input_image = im - mc.BGR_MEANS

        # Detect
        det_boxes, det_probs, det_class = sess.run(
            [model.det_boxes, model.det_probs, model.det_class],
            feed_dict={model.image_input: [input_image], model.keep_prob: 1.0})  # works fine

        # Save to disk
        checkpoint_path = os.path.join("/data/squeezeDet_TF011/logs/test_freeze", 'evalBatch1.ckpt')
        step = 1
        saver.save(sess, checkpoint_path, global_step=step)


if __name__ == '__main__':
    tf.app.run()

Thanks a lot

Cheers

Issue Analytics

State:
Created 6 years ago
Comments:22

Top GitHub Comments

1reaction

keymanchen1215commented, Apr 25, 2019

Hi all When I load the original model.ckpt-8700, I check the node name in the graph, it has “image_input”. Then I double check it again after I use the below code, the “image_input” node disappear. output_graph_def = graph_util.convert_variables_to_constants(sess, input_graph_def, output_node_names.split(",") ) But the “batch/fifo_queue” node mentioned by @venuktan venuktan exists. The issue should be located in the below code： self.image_input, self.input_mask, self.box_delta_input, \ self.box_input, self.labels = tf.train.batch( self.FIFOQueue.dequeue(), batch_size=mc.BATCH_SIZE, capacity=mc.QUEUE_CAPACITY) Any one know why?

1reaction

andreapisocommented, Apr 1, 2018

Because the operation is not linked by makefile bazel when you compile for mobile. You could link the operation manually and pass 1 as keep_prob and it would work. On Sun, 1 Apr 2018 at 12:11 AM, hoonkai notifications@github.com wrote:

We did optimization for inference and removed dropout (this step is necessary if you want to run SqueezeDet on some mobile devices)

@Lisandro79 https://github.com/Lisandro79 @BichenWuUCB https://github.com/BichenWuUCB Can I ask why dropout needs to be removed? Isn’t the dropout layer trivial during inference?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BichenWuUCB/squeezeDet/issues/35#issuecomment-377703898, or mute the thread https://github.com/notifications/unsubscribe-auth/AN_wJqW_ZtWwNK7R0j6u9xCE8ETudlajks5tj6rEgaJpZM4NNDsN .