TensorFlowSharp runs 5 times slower than in Python.
See original GitHub issueI have my tensorflow.keras
model trained on python 3.6 then I converted it to protobuf
graph to be used in Unity3D (game engine with TensorFlowSharp .dll plugin). My tensorflow version is 1.9.
Inside Unity TensorFlowSharp code, I measure runner.Run()
execution time using Stopwatch
and the elapsed time is around 25 ms while in python model.predict(X)
took only 5 ms.
That’s a huge difference and they are both on CPU. (I set CUDA_VISIBLE_DEVICES=-1
before importing tensorflow.keras
so it means that GPU was not utilized in python)
So how do I optimize the speed of TensorFlowSharp
or is there a different way to make prediction time faster? Suggest me anything, I have a model and I want to predict it fast in Unity. I tried serving keras model via REST API and call via HTTP request but the I/O part is slow too.
And also, is TensorFlowSharp
supposed to be slower than on Python
even though they both run on CPU?
Issue Analytics
- State:
- Created 5 years ago
- Comments:12
My issue is resolved!
It really was the overhead from a new TFSession. Basically the first call to runner.Run() takes a lot of initialization time. If you’re predicting multiple frames the subsequent “runs” take significantly lesser time, comparable to python prediction times. So if you’re predicting multiple frames do not re-create the session (BTW, I remember this to be an issue from initial python object detection sample as well). I have to say the example is also correct as it shows looping through the files within the using statement for the TFSession. Initially, I was only sampling one frame when I reported the issue. I can live with the initial overhead of 7 sec for session initialization as subsequent predictions work just fine.
Thanks!
I have the same observation (much worse in my case) using many models from coco model repo below using TensorFlowSharp.
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
For example ssd_mobilenet_v1_coco_11_06_2017 took close to 7 secs for just the following line: output = runner.Run();
Most of these models when tested in python observed prediction time close to what is reported by google (CPU only) on the same machine -Xenon 8 core 2.4Hz
Just to be clear, I’m not reporting the downloading time etc. in the example which was removed for my testing.
Any clues?
Thanks in Advance.