Embedding with TensorFlow very slow: converts indices to dense gradients
See original GitHub issueI’ve noticed that the Embedding layer with TensorFlow backend is converting sparse gradient updates to dense ones and killing the performance, as well as gobbling up lots of memory. This is making it unusable for a large scale problem with a large Embedding layer.
Here is a script that makes a model with single large embedding layer using Keras and TensorFlow directly. In Keras, it takes about 2.3 seconds / batch and uses > 9 GB of memory while training. In TensorFlow it only takes 20 ms / batch (100X faster) and uses < 4 G of memory.
This is using TensorFlow 0.11.0rc2 and the master branch of Keras.
import numpy as np
from keras.layers import Embedding, Input
from keras.models import Model
# a model with just one Embedding layer
token_ids = Input(batch_shape=(128, 20),
dtype='int32', name='token_ids')
token_embedding = Embedding(793471,
512, mask_zero=False, input_length=20)(token_ids)
model = Model(input=[token_ids], output=token_embedding)
model.compile(loss='mse', optimizer='sgd')
X = np.random.randint(0, 793471, (128, 20)).astype(np.int32)
y = np.random.rand(128, 20, 512)
# compile model
model.train_on_batch(X, y)
# now time
%timeit model.train_on_batch(X, y)
Outputs:
Using TensorFlow backend.
/Users/matthewp/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gradients.py:87: UserWarning: Converting sparse IndexedSlices to a dense Tensor with 406257152 elements. This may consume a large amount of memory.
"This may consume a large amount of memory." % num_elements)
1 loop, best of 3: 2.32 s per loop
With TensorFlow:
import numpy as np
import tensorflow as tf
token_ids = tf.placeholder(tf.int32, [128, 20])
W = tf.Variable(tf.zeros([793471, 512]))
token_embedding = tf.gather(W, token_ids)
y_ = tf.placeholder(tf.float32, [128, 20, 512])
loss = tf.reduce_mean((token_embedding - y_) ** 2)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
init = tf.initialize_all_variables()
X = np.random.randint(0, 793471, (128, 20)).astype(np.int32)
y = np.random.rand(128, 20, 512)
with tf.Session() as sess:
sess.run(init)
sess.run(train_step, feed_dict={token_ids: X, y_: y})
%timeit sess.run(train_step, feed_dict={token_ids: X, y_: y})
Outputs:
10 loops, best of 3: 20.8 ms per loop
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Tensorflow dense gradient explanation? - Stack Overflow
IndexedSlices object is implicitly converted to a dense tf. ... receives it does not have a specialized gradient function that can handle sparse...
Read more >IndexedSlices in Tensorflow - Biswajit Sahoo
According to Tensorflow documentation, IndexedSlices are sparse ... We get IndexedSlices while taking gradients of an Embedding layer.
Read more >How to Use Word Embedding Layers for Deep Learning with ...
Word embeddings provide a dense representation of words and their relative meanings. They are an improvement over sparse representations ...
Read more >Using text and neural network features - Decision Forests
This example will use a pre-trained TF-Hub embedding to convert text features into a dense embedding, and then train a Random Forest on...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
After some debugging, it turns out this is due to the Keras optimizers and the way in which they compute gradient updates. The sparse gradient updates produced by the embedding layer need to be handled in a different manner then the dense gradient updates. TensorFlow optimizers provide a mechanism to do this (methods
_apply_sparse
and_apply_dense
). This problem goes away by using a TensorFlow optimizer, e.g.:outputs
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.