question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Saved model behaves differently on different machines

See original GitHub issue

After studying #439, #2228, #2743, #6737 and the new FAQ about reproducibility, I was able to get consistent, reproducible result on my development machines using Theano. If I run my code twice, I get the exact same results.

The problem is that the results are reproducible only on the same machine. In other words, if I

  • Train a model on machine A
  • Evaluate the model using predict
  • Save the model (using save_model, or model_to_json and save_weights)
  • Transfer the model to machine B and load it
  • Evaluate again the model on machine B using predict

The results of the two predicts are different. Using CPU or GPU makes no difference - after I copy the model file(s) from a machine to another, the performance of predict changes dramatically.

The only difference on the two machines is the hardware (I use my laptop’s 980M and a workstation with a Titan X Pascal) and the NVIDIA driver version, which is slightly older on the workstation. Both computers run Ubuntu 16.04 LTS and Cuda 8 with cuDNN. All libraries are on the same version on both machines, and the Python version is the same as well (3.6.1).

Is this behavior intended? I expect that running a pre-trained model on with the same architecture and weights on two different machines yields the same results, but this doesn’t seem the case.

On a side note, a suggestion: on the FAQs about reproducibility, it should be explicitly stated that the development version of Theano is needed to get reproducible results.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:12
  • Comments:29

github_iconTop GitHub Comments

7reactions
wangchenouccommented, Oct 24, 2017

@basaldella Have you fixed this issue? It seems that I have the same problem. I re-traind a model with fine tuning InceptionV3 with my own images on a GPU machine. After training, the accuracy could up to 91% which I am happy with it. During the training the improved model was saved with callbacks. So I can load the best retrained model with model.load_model(model_path), and I tested it with one image. The predict results are always the same and correct (because I know what this image belong to). the results is like this: [[ 0.00197385 0.01141251 0.02262068 0.9121536 0.00810914 0.01657074 0.00370198 0.00617629 0.00972648 0.00531203 0.00224261]]

Now, I try to copy the retrained model (HDF5 file) to my laptop, and load the model again, and test the model with the same image, then I got a totally different result. [[ 0.00373867 0.22160383 0.10066977 0.35440436 0.02839879 0.17799987 0.01744748 0.02645957 0.0299265 0.03026218 0.00908909]]

The python environment are the same in the two machine with keras 2.0.8: The result are always the same in the same machine. The weights are the same after I load the model file. …I checked many things.

Why the results are different in the two machine? Is there somebody know about this?

3reactions
rsmith49commented, Sep 1, 2017

@basaldella Yes, turns out my issue was more along the lines of #4875, and was inconsistent between different Python sessions, not just different machines.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why Loading a Previously Saved Keras Model Gives Different ...
This is due to the fact that neural networks in Keras are using randomness when initializing their weights, so on every run weights...
Read more >
Model behaves differently after saving and loading
As you can see by running the code, the losses computed by the two loaded models are different, while the two loaded models...
Read more >
Why Do I Get Different Results Each Time in Machine Learning?
Stochastic machine learning algorithms use randomness during learning, ensuring a different model is trained each run.
Read more >
Why am I getting different results on a prediction using the ...
It's like my model is behaving differently if it processes the whole test set in a single run than if it processes a...
Read more >
torch.package — PyTorch 1.13 documentation
These packages can be saved, shared, used to load and execute models at a later date or on a different machine, and can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found