New DL4J version breaks Keras model importing
See original GitHub issueIssue Description
I’ve had a few issues with using some Keras models in DL4J after updating to the most recent version.
These are using .h5 files that can be found here
First Issue: Keras models now load with incorrect channel ordering.
Loading a model (e.g., VGG16.h5) previously worked fine (in 1.0.0-beta6) and was able to be used for training and inference:
ComputationGraph kerasModel = KerasModelImport.importKerasModelAndWeights("VGG16.h5");
INDArray testVals = Nd4j.zeros(1, 3, 224, 224);
kerasModel.feedForward(testVals, false);
Now when they are loaded and run (exact same code, but with 1.0.0-beta7), the ordering is incorrect:
Exception in thread "main" org.deeplearning4j.exception.DL4JInvalidInputException: Cannot do forward pass in Convolution layer (layer name = block1_conv1, layer index = 1): input array channels does not match CNN layer configuration (data format = NHWC, data input channels = 224, [minibatch, height, width, channels]=[1, 3, 224, 224]; expected input channels = 3) (layer name: block1_conv1, layer index: 1, layer type: ConvolutionLayer)
Note: Convolution layers can be configured for either NCHW (channels first) or NHWC (channels last) format for input images and activations.
Layers can be configured using .dataFormat(CNN2DFormat.NCHW/NHWC) when constructing the layer, or for the entire net using .setInputType(InputType.convolutional(height, width, depth, CNN2DForman.NCHW/NHWC)).
ImageRecordReader and NativeImageLoader can also be configured to load image data in either NCHW or NHWC format which must match the network
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.validateInputDepth(ConvolutionLayer.java:327)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(ConvolutionLayer.java:357)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:489)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
at org.deeplearning4j.nn.graph.ComputationGraph.ffToLayerActivationsDetached(ComputationGraph.java:1976)
at org.deeplearning4j.nn.graph.ComputationGraph.feedForward(ComputationGraph.java:1581)
at org.deeplearning4j.nn.graph.ComputationGraph.feedForward(ComputationGraph.java:1524)
I can fix that issue by changing the order (e.g., INDArray testVals = Nd4j.zeros(1, 224, 224, 3);
) but this seems like a band-aid fix and probably shouldn’t be necessary; the DL4J version of VGG still works fine with the original channel order.
The question is: Has the default channel order changed when importing Keras models? If so, how does one return it to the default from the last release? I couldn’t find anywhere in the docs that mentioned how to set this (KerasModelBuilder doesn’t have a .setInputType()
method).
Second issue: SIGSEGV with running some models
I can fix the above error by changing the channel order at the ImageRecordReader
:
reader.setNchw_channels_first(false);
However, running the VGG model causes a SIGSEGV to crash the JVM - error log.
Unfortunately, the code snippet above doesn’t reproduce this issue.
For some reason, other Keras model being loaded in a similar way work fine (e.g. ResNet50.h5
)
The log says the problematic frame is C 0x00007f1f180a20b3
- is this an issue in the underlying C code that causes running these Keras models to throw an error?
Version Information
Please indicate relevant versions, including, if relevant:
- Deeplearning4j version - 1.0.0-beta7
- Platform information (OS, etc) - Ubuntu 18.04
- CUDA version, if used
- NVIDIA driver version, if in use
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
As the model is fixed in your case, you have to change the data input, i.e. create your image record reader with nchw_channels_first = false.
Or, if you can’t change that, you can permute the channels yourself, just like the image record reader would have done: https://github.com/eclipse/deeplearning4j/blob/master/datavec/datavec-data/datavec-data-image/src/main/java/org/datavec/image/recordreader/BaseImageRecordReader.java#L250
As this is literally the only difference between nchw_channels_first = false and nchw_channels_first = true, the crash you’ve seen shouldn’t be caused by this change.
I believe I’ve found my answer… NativeImageLoader doesn’t support the flag for all of the asMatrix permutations. I was trying to load a BufferedImage from memory into an NDArray …
but also reviewing the code of NativeImageLoader , I see that it calls ndarray.permute on it to convert the channel order. so I can achieve the old behavior with the NativeImageLoader if I manually call permute after I load the array as follows: