question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Save Model with Batch Size = 1 for production

See original GitHub issue

Dear all,

I am facing a problem when I try to save a model to disk with batch size = 1 and then freeze into a .pb. I am doing this in four steps:

  1. I am able to train squeezeDet Plus with my custom dataset. For training I use a batch size of 20. The resulting trained model.ckpt is 60MB and the model.meta is 19.9MB in size.
  2. When I evaluate this model with eval.py, I re-create the graph with Batch size = 1 (as in the demo), and the python script works fine. I correctly get the bounding boxes for each image.
  3. Now I would like to save this graph with batch size = 1 to disk. However, when I try to save this graph using batch size = 1 as a checkpoint, the size of the model.ckpt is reduced to 30 MB and the size of the model.meta to 233 KB
  4. Then, when I try to freeze the files in 3) I get the error “tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value” (the script for freezing the graph works fine with the .ckpt and .meta files in point 1).

I realize that there is something wrong in the way I am saving the data in step 3, the graph and weights are not saved properly, but I cannot figure out what. Am I missing something very obvious here?

This is the minimum code (adapted from eval.py) that I am using to load the graph and the weights and save them with bath size of 1 (point 3)

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import cv2
import os.path
import numpy as np
import tensorflow as tf
from config import *
from nets import *

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('dataset', 'KITTI',
                           """Currently support PASCAL_VOC or KITTI dataset.""")
tf.app.flags.DEFINE_string('data_path', '', """Root directory of data""")
tf.app.flags.DEFINE_string('image_set', 'test',
                           """Only used for VOC data."""
                           """Can be train, trainval, val, or test""")
tf.app.flags.DEFINE_string('year', '2007',
                            """VOC challenge year. 2007 or 2012"""
                            """Only used for VOC data""")
tf.app.flags.DEFINE_string('eval_dir', '/tmp/bichen/logs/squeezeDet/eval',
                            """Directory where to write event logs """)
tf.app.flags.DEFINE_string('checkpoint_path', '/tmp/bichen/logs/squeezeDet/train',
                            """Path to the training checkpoint.""")
tf.app.flags.DEFINE_integer('eval_interval_secs', 60 * 1,
                             """How often to check if new cpt is saved.""")
tf.app.flags.DEFINE_boolean('run_once', False,
                             """Whether to run eval only once.""")
tf.app.flags.DEFINE_string('net', 'squeezeDet',
                           """Neural net architecture.""")
tf.app.flags.DEFINE_string('gpu', '0', """gpu id.""")


def main(argv=None):

  """Load weights from a pre-trained squeezeDet network trained with Batch > 1
  and save the model with batch = 1 for production"""

  with tf.Graph().as_default():

    mc = kitti_squeezeDetPlus_config()
    mc.BATCH_SIZE = 1
    mc.LOAD_PRETRAINED_MODEL = False
    model = SqueezeDetPlus(mc, FLAGS.gpu)

    saver = tf.train.Saver(model.model_params)

    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:

        # Restores from checkpoint
        ckpts = set()
        ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_path)
        ckpts.add(ckpt.model_checkpoint_path)
        print ('Loading {}...'.format(ckpt.model_checkpoint_path))
        saver.restore(sess, ckpt.model_checkpoint_path)

        sess.run(tf.initialize_all_variables())

        # Run one image to test that it works
        read_full_name = "/data/squeezeDet_TF011/src/test2.jpg"
        im = cv2.imread(read_full_name)
        im = im.astype(np.float32, copy=False)
        im = cv2.resize(im, (mc.IMAGE_WIDTH, mc.IMAGE_HEIGHT))
        input_image = im - mc.BGR_MEANS

        # Detect
        det_boxes, det_probs, det_class = sess.run(
            [model.det_boxes, model.det_probs, model.det_class],
            feed_dict={model.image_input: [input_image], model.keep_prob: 1.0})  # works fine

        # Save to disk
        checkpoint_path = os.path.join("/data/squeezeDet_TF011/logs/test_freeze", 'evalBatch1.ckpt')
        step = 1
        saver.save(sess, checkpoint_path, global_step=step)


if __name__ == '__main__':
    tf.app.run()

Thanks a lot

Cheers

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:22

github_iconTop GitHub Comments

1reaction
keymanchen1215commented, Apr 25, 2019

Hi all When I load the original model.ckpt-8700, I check the node name in the graph, it has “image_input”. Then I double check it again after I use the below code, the “image_input” node disappear. output_graph_def = graph_util.convert_variables_to_constants(sess, input_graph_def, output_node_names.split(",") ) But the “batch/fifo_queue” node mentioned by @venuktan venuktan exists. The issue should be located in the below code: self.image_input, self.input_mask, self.box_delta_input, \ self.box_input, self.labels = tf.train.batch( self.FIFOQueue.dequeue(), batch_size=mc.BATCH_SIZE, capacity=mc.QUEUE_CAPACITY) Any one know why?

1reaction
andreapisocommented, Apr 1, 2018

Because the operation is not linked by makefile bazel when you compile for mobile. You could link the operation manually and pass 1 as keep_prob and it would work. On Sun, 1 Apr 2018 at 12:11 AM, hoonkai notifications@github.com wrote:

We did optimization for inference and removed dropout (this step is necessary if you want to run SqueezeDet on some mobile devices)

@Lisandro79 https://github.com/Lisandro79 @BichenWuUCB https://github.com/BichenWuUCB Can I ask why dropout needs to be removed? Isn’t the dropout layer trivial during inference?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BichenWuUCB/squeezeDet/issues/35#issuecomment-377703898, or mute the thread https://github.com/notifications/unsubscribe-auth/AN_wJqW_ZtWwNK7R0j6u9xCE8ETudlajks5tj6rEgaJpZM4NNDsN .

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to use Different Batch Sizes when Training and ...
Solution 1: Online Learning (Batch Size = 1)​​ One solution to this problem is to fit the model using online learning. This is...
Read more >
Setup and Batch Size 1 - YouTube
Introduce setup time, production cycle, batch size and how they affect capacity. One example is also included.
Read more >
training model for different batch sizes in keras - Stack Overflow
I want to train my model for different batch ...
Read more >
Restoring models when batch size is different - PyTorch Forums
I am having problem(size mismatch) when I try to save the model weights with a batch size and restore with another batch size....
Read more >
Optimize Production Batch Sizes - Demand Driven Technologies
Calculating the Right Production Batch Size · Step 1: Manage constraints · Step 2: Determine and formalize frequencies · Step 3: Translate batch...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found