Keras TF ImageNet Preprocessing contradicts TF, may affect pretrained weights
See original GitHub issueIt appears Keras’ imagenet image preprocessing may be inconsistent with how it is done in tf, in particular keras sets values to [-1, 1]
while in tf the expected range is [0, 1]
.
- slim/preprocessing/inception_preprocessing.py
- notes
If dtype is tf.float32 then the range should be [0, 1]
- notes
- keras preprocess_input()
- notes
tf: will scale pixels between -1 and 1, sample-wise
.
- notes
Here is the key doc from Keras’ preprocess_input()
:
def preprocess_input(x, data_format=None, mode='caffe'):
"""Preprocesses a tensor encoding a batch of images.
# Arguments
x: input Numpy or symoblic tensor, 3D or 4D.
data_format: data format of the image tensor.
mode: One of "caffe", "tf".
- caffe: will convert the images from RGB to BGR,
then will zero-center each color channel with
respect to the ImageNet dataset,
without scaling.
- tf: will scale pixels between -1 and 1,
sample-wise.
# Returns
Preprocessed tensor.
"""
Here are the key docs from the tf slim function preprocess_for_train()
which specify a [0, 1] range:
def preprocess_for_train(image, height, width, bbox,
fast_mode=True,
scope=None,
add_image_summaries=True):
"""Distort one image for training a network.
Distorting images provides a useful technique for augmenting the data
set during training in order to make the network invariant to aspects
of the image that do not effect the label.
Additionally it would create image_summaries to display the different
transformations applied to the image.
Args:
image: 3-D Tensor of image. If dtype is tf.float32 then the range should be
[0, 1], otherwise it would converted to tf.float32 assuming that the range
is [0, MAX], where MAX is largest positive representable number for
int(8/16/32) data type (see `tf.image.convert_image_dtype` for details).
height: integer
width: integer
bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
where each coordinate is [0, 1) and the coordinates are arranged
as [ymin, xmin, ymax, xmax].
fast_mode: Optional boolean, if True avoids slower transformations (i.e.
bi-cubic resizing, random_hue or random_contrast).
scope: Optional scope for name_scope.
add_image_summaries: Enable image summaries.
Returns:
3-D float Tensor of distorted image used for training with range [-1, 1].
"""
side notes:
- https://github.com/tensorflow/tensorflow/issues/15722 is the corresponding tf issue
- the tf image adjustments guide doesn’t have much documenting the inputs.
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Keras Applications
Model Size (MB) Top‑1 Accuracy Top‑5 Accuracy Parameters Depth Time (ms) per infer...
Xception 88 79.0% 94.5% 22.9M 81 109.4
VGG16 528 71.3% 90.1% 138.4M...
Read more >How can I use a pre-trained neural network with grayscale ...
The model's architecture cannot be changed because the weights have been trained for a specific input configuration.
Read more >Transfer learning and fine-tuning | TensorFlow Core
This leads us to how a typical transfer learning workflow can be implemented in Keras: Instantiate a base model and load pre-trained weights...
Read more >A Practical Tutorial With Examples for Images and Text in Keras
In this case, you can, for example, use the weights from the pre-trained models to initialize the weights of the new model. As...
Read more >Applications - Keras Documentation
Models for image classification with weights trained on ImageNet: ... For instance, if you have set image_dim_ordering=tf , then any model loaded from...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@ztf-ucas I ran lots of automated experiments with both to try and find an optimal choice, and I recall it honestly didn’t make a big difference in my use case. I simply default to ‘tf’ now, but as with all neural networks your mileage may vary based on your dataset and problem type.
Yes you are right. Thanks for catching this. Reopening the issue since its still remains.