Save Model with Batch Size = 1 for production
See original GitHub issueDear all,
I am facing a problem when I try to save a model to disk with batch size = 1 and then freeze into a .pb. I am doing this in four steps:
- I am able to train squeezeDet Plus with my custom dataset. For training I use a batch size of 20. The resulting trained model.ckpt is 60MB and the model.meta is 19.9MB in size.
- When I evaluate this model with eval.py, I re-create the graph with Batch size = 1 (as in the demo), and the python script works fine. I correctly get the bounding boxes for each image.
- Now I would like to save this graph with batch size = 1 to disk. However, when I try to save this graph using batch size = 1 as a checkpoint, the size of the model.ckpt is reduced to 30 MB and the size of the model.meta to 233 KB
- Then, when I try to freeze the files in 3) I get the error “tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value” (the script for freezing the graph works fine with the .ckpt and .meta files in point 1).
I realize that there is something wrong in the way I am saving the data in step 3, the graph and weights are not saved properly, but I cannot figure out what. Am I missing something very obvious here?
This is the minimum code (adapted from eval.py) that I am using to load the graph and the weights and save them with bath size of 1 (point 3)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2
import os.path
import numpy as np
import tensorflow as tf
from config import *
from nets import *
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('dataset', 'KITTI',
"""Currently support PASCAL_VOC or KITTI dataset.""")
tf.app.flags.DEFINE_string('data_path', '', """Root directory of data""")
tf.app.flags.DEFINE_string('image_set', 'test',
"""Only used for VOC data."""
"""Can be train, trainval, val, or test""")
tf.app.flags.DEFINE_string('year', '2007',
"""VOC challenge year. 2007 or 2012"""
"""Only used for VOC data""")
tf.app.flags.DEFINE_string('eval_dir', '/tmp/bichen/logs/squeezeDet/eval',
"""Directory where to write event logs """)
tf.app.flags.DEFINE_string('checkpoint_path', '/tmp/bichen/logs/squeezeDet/train',
"""Path to the training checkpoint.""")
tf.app.flags.DEFINE_integer('eval_interval_secs', 60 * 1,
"""How often to check if new cpt is saved.""")
tf.app.flags.DEFINE_boolean('run_once', False,
"""Whether to run eval only once.""")
tf.app.flags.DEFINE_string('net', 'squeezeDet',
"""Neural net architecture.""")
tf.app.flags.DEFINE_string('gpu', '0', """gpu id.""")
def main(argv=None):
"""Load weights from a pre-trained squeezeDet network trained with Batch > 1
and save the model with batch = 1 for production"""
with tf.Graph().as_default():
mc = kitti_squeezeDetPlus_config()
mc.BATCH_SIZE = 1
mc.LOAD_PRETRAINED_MODEL = False
model = SqueezeDetPlus(mc, FLAGS.gpu)
saver = tf.train.Saver(model.model_params)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
# Restores from checkpoint
ckpts = set()
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_path)
ckpts.add(ckpt.model_checkpoint_path)
print ('Loading {}...'.format(ckpt.model_checkpoint_path))
saver.restore(sess, ckpt.model_checkpoint_path)
sess.run(tf.initialize_all_variables())
# Run one image to test that it works
read_full_name = "/data/squeezeDet_TF011/src/test2.jpg"
im = cv2.imread(read_full_name)
im = im.astype(np.float32, copy=False)
im = cv2.resize(im, (mc.IMAGE_WIDTH, mc.IMAGE_HEIGHT))
input_image = im - mc.BGR_MEANS
# Detect
det_boxes, det_probs, det_class = sess.run(
[model.det_boxes, model.det_probs, model.det_class],
feed_dict={model.image_input: [input_image], model.keep_prob: 1.0}) # works fine
# Save to disk
checkpoint_path = os.path.join("/data/squeezeDet_TF011/logs/test_freeze", 'evalBatch1.ckpt')
step = 1
saver.save(sess, checkpoint_path, global_step=step)
if __name__ == '__main__':
tf.app.run()
Thanks a lot
Cheers
Issue Analytics
- State:
- Created 6 years ago
- Comments:22
Top Results From Across the Web
How to use Different Batch Sizes when Training and ...
Solution 1: Online Learning (Batch Size = 1) One solution to this problem is to fit the model using online learning. This is...
Read more >Setup and Batch Size 1 - YouTube
Introduce setup time, production cycle, batch size and how they affect capacity. One example is also included.
Read more >training model for different batch sizes in keras - Stack Overflow
I want to train my model for different batch ...
Read more >Restoring models when batch size is different - PyTorch Forums
I am having problem(size mismatch) when I try to save the model weights with a batch size and restore with another batch size....
Read more >Optimize Production Batch Sizes - Demand Driven Technologies
Calculating the Right Production Batch Size · Step 1: Manage constraints · Step 2: Determine and formalize frequencies · Step 3: Translate batch...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi all When I load the original model.ckpt-8700, I check the node name in the graph, it has “image_input”. Then I double check it again after I use the below code, the “image_input” node disappear.
output_graph_def = graph_util.convert_variables_to_constants(sess, input_graph_def, output_node_names.split(",") )
But the “batch/fifo_queue” node mentioned by @venuktan venuktan exists. The issue should be located in the below code:self.image_input, self.input_mask, self.box_delta_input, \ self.box_input, self.labels = tf.train.batch( self.FIFOQueue.dequeue(), batch_size=mc.BATCH_SIZE, capacity=mc.QUEUE_CAPACITY)
Any one know why?Because the operation is not linked by makefile bazel when you compile for mobile. You could link the operation manually and pass 1 as keep_prob and it would work. On Sun, 1 Apr 2018 at 12:11 AM, hoonkai notifications@github.com wrote: