Lower-than-expected ImageNet accuracies of pretrained MobileNet V2 & V3 models
See original GitHub issueI tried to validate the pretrained MobileNet V2 and V3 models available at keras.applications.MobileNetV2() and keras.applications.MobileNetV3(). To my surprise, both yielded lower-than-expected Top-1 accuracies on ImageNet 2012.
- MobileNet V2: expected = 71.8%, measured = 61.6%
- MobileNet V3: expected = 75.6%, measured = 71.0%
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import numpy as np
import os
import time
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
print(tf.__version__) #2.7.1
print(keras.__version__) #2.7.0
Prepare ImageNet 2012 validation
labels_path = tf.keras.utils.get_file('ImageNetLabels.txt','https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')
imagenet_labels = np.array(open(labels_path).read().splitlines())
data_dir_val = '/home/le_user/imagenet_dataset/'
write_dir_val = '/home/le_user/imagenet_dataset_tfds'
# Construct a tf.data.Dataset
download_config_val = tfds.download.DownloadConfig(
extract_dir=os.path.join(write_dir_val, 'extracted'),
manual_dir=data_dir_val)
download_and_prepare_kwargs_val = {
'download_dir': os.path.join(write_dir_val, 'downloaded'),
'download_config': download_config_val,
}
def resize_with_crop(image, label):
i = image
i = tf.cast(i, tf.float32)
i = tf.image.resize_with_crop_or_pad(i, 224, 224)
i = tf.keras.applications.mobilenet_v2.preprocess_input(i)
return (i, label)
def resize_with_crop_v3(image, label):
i = image
i = tf.cast(i, tf.float32)
i = tf.image.resize_with_crop_or_pad(i, 224, 224)
i = tf.keras.applications.mobilenet_v3.preprocess_input(i)
return (i, label)
ds = tfds.load('imagenet2012',
data_dir=os.path.join(write_dir_val, 'data'),
split='validation',
shuffle_files=False,
download=False,
as_supervised=True,
download_and_prepare_kwargs=download_and_prepare_kwargs_val)
strategy = tf.distribute.MirroredStrategy()
AUTOTUNE = tf.data.AUTOTUNE
BATCH_SIZE_PER_REPLICA = 128
NUM_GPUS = strategy.num_replicas_in_sync
ds_single = ds.map(resize_with_crop)
ds_single = ds_single.batch(batch_size=BATCH_SIZE_PER_REPLICA)
ds_single = ds_single.cache().prefetch(buffer_size=AUTOTUNE)
Use pre-trained weights to validate accuracy
mbv2_eval = keras.applications.MobileNetV2(include_top=True,
weights='imagenet')
mbv2_eval.trainable = False
mbv2_eval.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
start_time = time.time()
result = mbv2_eval.evaluate(ds_single)
print(f"--- Single-GPU eval took {(time.time() - start_time)} seconds ---")
print(dict(zip(mbv2_eval.metrics_names, result)))
Output is
391/391 [==============================] - 49s 118ms/step - loss: 1.7855 - accuracy: 0.6155
--- Single-GPU eval took 48.85072922706604 seconds ---
{'loss': 1.7854770421981812, 'accuracy': 0.6154599785804749}
System information.
- Have I written custom code (as opposed to using a stock example script provided in Keras): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos rhel fedora
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 2.7.1
- Python version: 3.8.12
- Bazel version (if compiling from source):
- GPU model and memory:Tesla V100, 16GB
- Exact command to reproduce: See above
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with: python -c “import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)”
Describe the problem.
Describe the problem clearly here. Be sure to convey here why it’s a bug in Keras or why the requested feature is needed.
Describe the current behavior.
Describe the expected behavior.
- Do you want to contribute a PR? (yes/no):
- If yes, please read this page for instructions
- Briefly describe your candidate solution(if contributing):
Standalone code to reproduce the issue.
Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.
Source code / logs.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
Issue Analytics
- State:
- Created a year ago
- Comments:10 (4 by maintainers)
Top GitHub Comments
No problem – with KerasCV we are working to close this gap.
I am currently working on MobileNetV3 weight offerings for KerasCV, and when those are available they will be fully reproducible using our training scripts. So keep an eye out for those coming soon!
@ianstenbit, thanks for the explanation and pointers. The KerasCV models repo is news to me, and it definitely looks very interesting. I hope it can go a long way.
So far, it seems quite convoluted and unreliable to reproduce/train some classic CV models with TF2/Keras. Using MB v2 as an example, the official MB v2 training script in the TF Model Garden doesn’t even specify the optimizer, thus won’t run. And if the proper optimizer is specified per the paper and added into that training setup, the validation accuracy is nowhere near the expected.
This of course is outside the scope of this issue or keras-team. Thanks again.