question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Save/restore for TensorGraph

See original GitHub issue

In the past, our models have generally had a persistent Session object containing all the variables for the model. They automatically wrote out checkpoints during training, and provided a restore() method to load the variables from the most recent checkpoint into memory.

TensorGraph works differently. It creates a new Session every time you call fit() or predict(), loads the variables from the latest checkpoint, then throws away the Session before returning. I find this behavior has a couple of problems.

First, it’s really slow. If you want to run prediction thousands of times, that means it will load the variables from disk thousands of times. This is why for both RL and MAML I’ve basically had to subvert the design of TensorGraph. I use it to build the Tensorflow graph, but when I need to do any calculations I ignore the corresponding parts of TensorGraph, pull out the needed internal fields, and do the calculations directly.

It also is inflexible, since it enforces that prediction can only ever be done based on the variables in the latest checkpoint file, no others. A good example of where this causes problems is MAML. It uses TensorGraph to define the model, but the actual optimization is done with a different optimizer and a different loss function. That produces a model (which gets saved to disk) that is designed to be easy to train, but isn’t optimized for any particular task. To use it for prediction, you first do a few steps of gradient descent to produce a fine tuned version of the model (which does not get saved to disk), and do the prediction based on that. But I can’t use any of TensorGraph’s prediction methods for it, because it would just throw away the tuned variables and replace them with the generic ones from disk.

I think we should consider changing this behavior.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
peastmancommented, Aug 16, 2017

I’d suggest something roughly identical to what’s currently in A3C. When you first create the object, it creates a persistent Session. There’s a restore() method to load the most recent checkpoint from disk. I also included a restore argument to fit() that you can use to have it automatically load the latest checkpoint and continue training from where it left off.

0reactions
vid33commented, Nov 8, 2017

FWIW, the first thing I did was check whether 1. is possible.

Read more comments on GitHub >

github_iconTop Results From Across the Web

4- Save and Restore - Easy TensorFlow
To save and restore your variables, all you need to do is to call the tf.train.Saver() at the end of you graph. ......
Read more >
A quick complete tutorial to save and restore Tensorflow models
In this quick Tensorflow tutorial, you shall learn what's a Tensorflow model and how to save and restore Tensorflow models for fine-tuning and...
Read more >
how to add text preprocessing tokenization step into ...
I have a TensorFlow model SavedModel which includes saved_model.pb and variables folder. The preprocessing step has not been incorporated into ...
Read more >
tensorbuilder.api.builder API documentation
Original Documentation for tensorbuilder.add_regularization_loss. def add_regularization_loss(tensor, graph=None, scope="add_regularization_loss").
Read more >
TensorFlow学习(十二):模型的保存与恢复(上)基本操作
... information from the graph (both Save/Restore ops and SaverDefs) that ... Tensor.graph is meaningless when eager execution is enabled.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found