question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Embedding with TensorFlow very slow: converts indices to dense gradients

See original GitHub issue

I’ve noticed that the Embedding layer with TensorFlow backend is converting sparse gradient updates to dense ones and killing the performance, as well as gobbling up lots of memory. This is making it unusable for a large scale problem with a large Embedding layer.

Here is a script that makes a model with single large embedding layer using Keras and TensorFlow directly. In Keras, it takes about 2.3 seconds / batch and uses > 9 GB of memory while training. In TensorFlow it only takes 20 ms / batch (100X faster) and uses < 4 G of memory.

This is using TensorFlow 0.11.0rc2 and the master branch of Keras.

import numpy as np

from keras.layers import Embedding, Input
from keras.models import Model

# a model with just one Embedding layer
token_ids = Input(batch_shape=(128, 20),
                          dtype='int32', name='token_ids')
token_embedding = Embedding(793471,
    512, mask_zero=False, input_length=20)(token_ids)

model = Model(input=[token_ids], output=token_embedding)
model.compile(loss='mse', optimizer='sgd')

X = np.random.randint(0, 793471, (128, 20)).astype(np.int32)
y = np.random.rand(128, 20, 512)

# compile model
model.train_on_batch(X, y)

# now time
%timeit model.train_on_batch(X, y)

Outputs:

Using TensorFlow backend.
/Users/matthewp/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gradients.py:87: UserWarning: Converting sparse IndexedSlices to a dense Tensor with 406257152 elements. This may consume a large amount of memory.
  "This may consume a large amount of memory." % num_elements)
1 loop, best of 3: 2.32 s per loop

With TensorFlow:

import numpy as np
import tensorflow as tf

token_ids = tf.placeholder(tf.int32, [128, 20])
W = tf.Variable(tf.zeros([793471, 512]))
token_embedding = tf.gather(W, token_ids)
y_ = tf.placeholder(tf.float32, [128, 20, 512])
loss = tf.reduce_mean((token_embedding - y_) ** 2)

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

init = tf.initialize_all_variables()

X = np.random.randint(0, 793471, (128, 20)).astype(np.int32)
y = np.random.rand(128, 20, 512)

with tf.Session() as sess:
    sess.run(init)
    sess.run(train_step, feed_dict={token_ids: X, y_: y})
    %timeit sess.run(train_step, feed_dict={token_ids: X, y_: y})

Outputs:

10 loops, best of 3: 20.8 ms per loop

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

9reactions
matt-peterscommented, Nov 14, 2016

After some debugging, it turns out this is due to the Keras optimizers and the way in which they compute gradient updates. The sparse gradient updates produced by the embedding layer need to be handled in a different manner then the dense gradient updates. TensorFlow optimizers provide a mechanism to do this (methods _apply_sparse and _apply_dense). This problem goes away by using a TensorFlow optimizer, e.g.:

model.compile(loss='mse', 
              optimizer=TFOptimizer(tf.train.GradientDescentOptimizer(0.1)))

outputs

100 loops, best of 3: 17.7 ms per loop
0reactions
stale[bot]commented, Jul 2, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorflow dense gradient explanation? - Stack Overflow
IndexedSlices object is implicitly converted to a dense tf. ... receives it does not have a specialized gradient function that can handle sparse...
Read more >
IndexedSlices in Tensorflow - Biswajit Sahoo
According to Tensorflow documentation, IndexedSlices are sparse ... We get IndexedSlices while taking gradients of an Embedding layer.
Read more >
How to Use Word Embedding Layers for Deep Learning with ...
Word embeddings provide a dense representation of words and their relative meanings. They are an improvement over sparse representations ...
Read more >
Using text and neural network features - Decision Forests
This example will use a pre-trained TF-Hub embedding to convert text features into a dense embedding, and then train a Random Forest on...
Read more >
Bountysource
Embedding with TensorFlow very slow : converts indices to dense gradients.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found