ImageRecordReader crashes JVM with loaded Keras model in 1.0.0-beta7
See original GitHub issueIssue Description
I encountered a strange problem in 1.0.0-beta7 while trying to run a Keras model loaded from a .h5 file (e.g., VGG16.h5 from here) - this model previously ran fine in 1.0.0-beta6.
Calling computationGraph.feedForward(features, false)
would crash the JVM (error log, using this code snippet:
// Create VGG16 from a Keras .h5 file
ComputationGraph tmpModel = KerasModelImport.importKerasModelAndWeights("VGG16.h5");
tmpModel.init();
ImageRecordReader reader = new ImageRecordReader(224, 224, 3);
reader.initialize(new FileSplit(new File("img_125_5.jpg"))); // Test with a single image
DataSetIterator it = new RecordReaderDataSetIterator(reader, 1);
// Keras model has wrong channel order, so flip it at the reader level
reader.setNchw_channels_first(false);
INDArray features = it.next().getFeatures();
// INDArray features = Nd4j.rand(1, 224, 224, 3); // Runs fine when initializing from random array of same size
System.out.println(Arrays.toString(features.shape())); // prints [1, 224, 224, 3]
tmpModel.feedForward(features, false);
The crash would happen specifically within the ComputationGraph
class at line 1976 - figured this by stepping through the code in IntelliJ.
Strangely though, the code snippet above runs fine if you use a random numpy array of the same shape (so the issue isn’t caused by the features
shape). Looking into the values of the features
given by the DatasetIterator
, there aren’t any NaNs or weird values (all are between 0 and 1).
Also interesting to note is that the .h5 model can be saved in beta6 to a zip using model.save(new File("VGG.zip"))
, then loaded in beta7, and the above snippet works fine (swapping the KerasModelImport...
for ComputationGraph.load(new File("beta6KerasVGG.zip"), true);
Another note, the above snippet works fine if using a different model (e.g., ResNet50.h5
) - so it’s not all Keras models that this problem occurs with.
Conclusion
On one hand, it seems like the problem is caused by updates to the KerasModelImport
process - a .h5
file which loaded and ran fine in 1.0.0-beta6
now no longer works in 1.0.0-beta7
. Additionally, saving a .zip
file of the beta6 version and loading a new ComputationGraph
in beta7 circumvents the above problem.
However, it also seems like the ImageRecordReader
or DataSetIterator
could be the culprit - when those are taken out of the equation (by using a random INDArray
) no errors occur.
Attached files
Version Information
Please indicate relevant versions, including, if relevant:
- Deeplearning4j version - 1.0.0-beta7
- Platform information (OS, etc) - Ubuntu 18.04
- CUDA version, if used
- NVIDIA driver version, if in use
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (11 by maintainers)
I’ve made a simple Gradle project to demonstrate this and help you reproduce it.
Instructions
main()
method inMain.java
. The project initializes using beta6 so themain()
method should complete successfully.build.gradle
, change thend4j
anddl4j
versions from1.0.0-beta6
to1.0.0-beta7
. Let your IDE import these changes.main()
again. This should now cause the program to crash (JVM crash on Ubuntu 18.04 (log file attached) and nondescript Gradle error on Windows 10).In
Main.java
, I’ve also written in some different scenarios that I’ve tried to help debug the issue; most notable isScenario 3
which is the duplicating fix mentioned above.Hopefully this can be reproduced on your machine, let me know if there’s any other info you’d like 😃
Attached Files
hs_err_pid17974.log
Closing in favor of sam’s linked issue with more details: https://github.com/eclipse/deeplearning4j/issues/8785