Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why keras using tensorflow backend is much slower than native tensorflow

See original GitHub issue

Hi, I did a comparison of the speed between keras(theano and tensorflow backend) and tensorflow. I run same toy model on keras using theano and tensorflow backend and native tensorflow. Found that tensorflow is more faster than keras in training process. The Model is simply an embedding layer followed by two dense layer. When using tensorflow as backend of keras, I also test the speed of TFOptimizer and Keras Optimizer to avoid embedding layer’s influence. Mentioned here #4365 All the experiments run on a single nvidia k40 GPU keras 2.0.8 theano 0.9.0 tensorflow 1.2.0

Here is the result: unit: samples/sec keras-theano: 160 keras-tf-keras_opt: 246 keras-tf-tf_opt: 640 tensorflow: 1625

Tensorflow is about 2.5X faster than keras with tensoflow backend and TFOptimizer.

The scripts: Keras:

from keras.models import Sequential, Model
import numpy as np
from keras.layers import Dense,Activation, Input
from keras.layers.embeddings import Embedding
#import tensorflow as tf
from keras.optimizers import TFOptimizer
import os

os.environ['CUDA_VISIBLE_DEVICES']='1'

#model=Sequential()
inputs=Input(shape=(50,))
embedding_vec=Embedding(700000,512)(inputs)
d1=Dense(256, activation='sigmoid')(embedding_vec)

d2=Dense(10000, activation='softmax')(d1)

model=Model(inputs=inputs,outputs=d2)
#model.compile(loss='sparse_categorical_crossentropy', optimizer=TFOptimizer(tf.train.GradientDescentOptimizer(0.1)))

model.compile(loss='sparse_categorical_crossentropy',optimizer='sgd')

x_train=np.random.random_integers(0,9999,(3200,50))
y_train=np.random.random_integers(0,9999,(3200,50,1))
print model.summary()
model.fit(x_train, y_train, nb_epoch=20, batch_size=50)

tensorflow:

import tensorflow as tf
import numpy as np
from tqdm import tqdm
import os
os.environ['CUDA_VISIBLE_DEVICES']='3'
inputs=tf.placeholder(shape=(None, 50),dtype=tf.int32)
outputs=tf.placeholder(shape=(None,50),dtype=tf.int32)
#embedding_vec=EmbeddingLayer(70000,128)(inputs)
embedding = tf.get_variable(name = 'embedding', shape=(700000, 512))
embedding_vec = tf.gather(embedding, inputs)

#d1=Dense(128,256)(embedding_vec,scope='dense1')
W1 = tf.get_variable(name='W1',shape=(512,256),dtype=tf.float32)
b1 = tf.get_variable(name='b1',shape=(256,),dtype=tf.float32)
d1 = tf.matmul(tf.reshape(embedding_vec,shape=(-1,512)),W1) + b1
d1 = tf.reshape(d1,shape=(-1,50,256))
d1=tf.sigmoid(d1)

#d2=Dense(256,10000)(d1,scope='predict')
W2 = tf.get_variable(name='W2',shape=(256,10000),dtype=tf.float32)
b2 = tf.get_variable(name='b2',shape=(10000,),dtype=tf.float32)
d2 = tf.matmul(tf.reshape(d1,shape=(-1,256)),W2) + b2
d2 = tf.reshape(d2,shape=(-1,50,10000))


loss=tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=outputs,logits=d2))
opt=tf.train.GradientDescentOptimizer(0.1)
update=opt.minimize(loss)

config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
config.gpu_options.allow_growth=True

x=np.random.random_integers(0,9999,(3200,50))
y=np.random.random_integers(0,9999,(3200,50))
batch_size=50
with tf.Session(config=config) as sess:
    sess.run(tf.global_variables_initializer())
    for i in xrange(20):
        print 'epooch: ', i
        for i in tqdm(range(0,(3200/batch_size)*batch_size,batch_size)):
            x_batch=x[i:i+batch_size]
            y_batch=y[i:i+batch_size]
            _,loss_val=sess.run((update,loss),feed_dict={inputs:x_batch,outputs:y_batch})
            #print loss_val

Many thanks!!

Issue Analytics

State:
Created 6 years ago
Reactions:8
Comments:9 (3 by maintainers)

Top GitHub Comments

6reactions

fcholletcommented, Sep 5, 2017

In general you can expect tf.nn.sparse_softmax_cross_entropy_with_logits to be way better optimized because it processes logits directly instead of just applying xent to a probability distribution. Good news: it’s trivial to use in Keras, when you need it.

6reactions

fcholletcommented, Sep 5, 2017

Your model features a softmax over 10,000 classes, which is very expensive. Try an apples-to-apples comparison:

inputs = Input(shape=(50,))
embedding_vec = Embedding(700000, 512)(inputs)
d1 = Dense(256, activation='sigmoid')(embedding_vec)
d2 = Dense(10000)(d1)

model = Model(inputs=inputs, outputs=d2)

loss_fn = lambda y_true, y_pred: tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_true, logits=y_pred)
opt = tf.train.GradientDescentOptimizer(0.1)
model.compile(loss=loss_fn, optimizer=opt)

Top Results From Across the Web

If Keras is so much slower than TensorFlow/PyTorch, why do ...

Keras isn't slower than tensorflow. Keras is a thin wrapper around the Tensorflow backend. Any performance optimization's you can get in Tensorflow are ......

Tensorflow 2.0 Keras is training 4x slower than 2.0 Estimator

I believe it is slower because it is not being executed on the graph. In order to execute on the graph in TF2...

[D] Why is TensorFlow so slow? : r/MachineLearning - Reddit

TensorFlow is consistently the slowest ML framework for training various kinds of neural nets under all hardware configurations. You can also ...

Pytorch Vs Tensorflow Vs Keras: Here are the Difference You ...

TensorFlow is an open-sourced end-to-end platform, a library for multiple machine learning tasks, while Keras is a high-level neural network ...

Discuss pros/cons between Tensorflow Core and Tensorflow ...

Other questions that come to my mind : Is the Core API on local much faster than the js API doing calculations on...