question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using data tensors as data sources: action plan

See original GitHub issue

We want to add the ability to feed TensorFlow data tensors (e.g. input queues) into Keras models. A few days ago I met with @athundt and we discussed his previous efforts to make it happen. Here is how we will handle it:

First step [Update: done]

The following API:

# Get data tensors
data_tensor, target_tensor = ...

# Build model on top of the data tensor
inputs = Input(tensor=data_tensor)
outputs = Dense(...)(inputs)
model = Model(inputs, outputs)

# Add internal loss
loss = loss_fn(target_tensor, outputs)
model.add_loss(loss)

# Compile without external loss
model.compile(optimizer='sgd', loss=None)

# Fit without external data
model.fit(epochs=10, steps_per_epoch=1000)

This is already 90% supported. What is missing is the steps_per_epoch argument (currently fit would only draw a single batch, so you would have to use it in a loop).

NEEDED:

  • [Update: done] PR introducing the steps_per_epoch argument in fit. Here’s how it works:
    • Based on arguments received, we determine whether training should be step-based (like in fit_generator) or sample-based (like in fit currently).
    • We have two independent code branches handling each mode.
  • [Update: done] PR introducing a MNIST example of how to use data tensors for inputs and targets, following the code snippet above. It should use the MNIST data tensors built-in in TF.

Second step

The following API:

# Get data tensors
data_tensor, target_tensor = ...

# Build model on top of the data tensor
inputs = Input(tensor=data_tensor)
outputs = Dense(...)(inputs)
model = Model(inputs, outputs)

# Compile as usual
model.compile(optimizer='sgd', loss='mse')

# Fit by passing the target tensor
model.fit(y=target_tensor, epochs=10, steps_per_epoch=1000)

Main issue: in compile, we create placeholders for the targets. We need to discard them (cache them, actually) and use the provided target tensor instead.

Solution: a model recompilation step inside fit in order to cache the previous target placeholder and replace it with our target tensor.

NEEDED:

  • PR adding support for a target tensor in the call to fit for a normally compiled model. Involves a recompilation step.

Third step

The following API:

# Get data tensors
data_tensor, target_tensor = ...

# Build model on top of placeholders
inputs = Input(shape=(...))
outputs = Dense(...)(inputs)
model = Model(inputs, outputs)

# Compile as usual
model.compile(optimizer='sgd', loss='mse')

# Fit by passing the data tensor and target tensor
model.fit(data_tensor, target_tensor, epochs=10, steps_per_epoch=1000)

It’s not 100% clear at this point how we will handle it, but we will figure it out. Most likely this will involve building a new TF graph inside fit, running training with it, then transferring weight values back to the initial graph. I’ll handle it.

CC: @athundt @Dref360 @colinskow @TimZaman

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:7
  • Comments:22 (10 by maintainers)

github_iconTop GitHub Comments

4reactions
fcholletcommented, Sep 20, 2017

For distributed training you should be using the TensorFlow estimator API. We are about to release an integration between the estimator API and Keras models. It will be in TF 1.4.

On 20 September 2017 at 06:53, PBehr notifications@github.com wrote:

Update 2 and 3 will lead to issues with distributed training. Tensorflow distributed finalizes the graph, so we get an error if we try to recompile the model. See #3997 https://github.com/fchollet/keras/issues/3997 for reference

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/7503#issuecomment-330858349, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb4MXsD3jxL6IIzgA1ErBtbPpYtFJks5skRi_gaJpZM4OroI4 .

2reactions
fcholletcommented, Nov 20, 2017

Yes, that’s still in the pipeline, as well as the ability to call fit/evaluate/predict directly on data tensors for a model built on top of placeholders. You’ll probably have it by TF 1.6.

On 20 November 2017 at 08:43, N-McA notifications@github.com wrote:

Maybe this is planned, but support for the automatic validation features (running a test on the validation set after each epoch, early stopping, learning rate adjustment based on val scores) that Keras allows would be great through this API as well. That in the pipeline?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/7503#issuecomment-345754117, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb954mI_nKLbcgk71n8XfCUry77ayks5s4awpgaJpZM4OroI4 .

Read more comments on GitHub >

github_iconTop Results From Across the Web

tf.data: Build TensorFlow input pipelines
To create an input pipeline, you must start with a data source. For example, to construct a Dataset from data in memory, you...
Read more >
Faster big-data analysis | MIT News
A new MIT computer system speeds computations involving “sparse tensors,” multidimensional data arrays that consist mostly of zeroes.
Read more >
5 Tips to Use Data to Action Plan | Studer Education
Prepare, Practice, Plan. Data is the backbone to any strong action plan. Are you making the right decisions using the right information?
Read more >
How To: Create a Streaming Data Loader for PyTorch -- Visual ...
The demo program uses a dummy data file with just 40 items. ... two-dimensional array, and then converts the data into PyTorch tensors....
Read more >
dlwpt_p1ch3_tensors - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found