Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed with Post Training Quantization after Quantization Aware Training

See original GitHub issue

Describe the requests I am working with recent neural networks targeting mobile devices, and I found there are obstacles to perform integer-quantization after QAT.

I know these APIs are not available now, but if you have plans to address following issues, please let me know when they will be available 😃

AveragePooling2D

x = layers.Conv2D(32, 5, padding='same', activation='relu')(input)
x = layers.AveragePooling2D((2, 2), (2, 2), padding='same')(x)  #<- succeed to convert, failed to prepare
x = layers.Conv2D(64, 5, padding='same', activation='relu')(x)

tensorflow/lite/kernels/pooling.cc:94 input->params.scale != output->params.scale (-1045139600 != 653455232) Node number 2 (AVERAGE_POOL_2D) failed to prepare.

Same with MaxPooling2D problem.

MaxPooling2D

x = layers.Conv2D(32, 5, padding='same', activation='relu')(input)
x = layers.MaxPooling2D((2, 2), (2, 2), padding='same')(x)  #<- succeed to convert, failed to prepare
x = layers.Conv2D(64, 5, padding='same', activation='relu')(x)

tensorflow/lite/kernels/pooling.cc:94 input->params.scale != output->params.scale (-1045139600 != 653454832) Node number 2 (MAX_POOL_2D) failed to prepare.

Same with AveragePooling2D problem.

Residual connection

input = tf.keras.Input(input_shape)
shortcut = input
x = layers.Conv2D(16, 1, padding='same', use_bias=False)(input)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x)
x = x + shortcut  #<- failed to convert addition because '+' reduced to TensorFlowOpLayer, not Add.

Layer tf_op_layer_AddV2:<class ‘tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer’> is not supported. You can quantize this layer by passing a tfmot.quantization.keras.QuantizeConfig instance to the quantize_annotate_layer API.

This problem cause below failure.

HardSwish

x = layers.Conv2D(32, 3, 2, padding='same', use_bias=False)(input)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x + 3) * (1 / 6)  #<- equivalent to `HardSwish`

Layer tf_op_layer_AddV2_1:<class ‘tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer’> is not supported. You can quantize this layer by passing a tfmot.quantization.keras.QuantizeConfig instance to the quantize_annotate_layer API.

There are two levels of the problem.
- I had configured QuantizeConfig to support TensorFlowOpLayer to use Add and Multiply ops, however these ops are placed between BN and ReLU6, Conv2D-BN-ReLU layers could not be fused correctly. -> Quantized MobileNetV3 became slower than floating pointer version on the android device.
- Main building block of MobileNetV3: Conv2D-BN-HardSwish is not supported pattern.

GlobalAveragePooling-Dense

x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(1024, activation='relu')(x)  #<- succeed to convert, failed to prepare

tensorflow/lite/kernels/kernel_util.cc:129 std::abs(input_product_scale - bias_scale) <= 1e-6 * std::min(input_product_scale, bias_scale) was not true. Node number 4 (FULLY_CONNECTED) failed to prepare.

This bug prevent me from benchmark official MobileNetV2 network imported from tf.keras.

System information

TensorFlow installed from (source or binary): binary

TensorFlow version: 2.2.0 (release)

TensorFlow Model Optimization version: 0.3.0 (release)

Python version: 3.6.0

Code to reproduce the issue Gist to reproduce full test https://gist.github.com/kalaluthien/b270c71afb6866ae61ef0dc088a762f2

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:15 (8 by maintainers)

Top GitHub Comments

2reactions

nutsiepullycommented, Jun 23, 2020

Also, regarding HardSwish if you have the time and are interested, I’m happy to guide you in how to implement support for it 😃

2reactions

nutsiepullycommented, Jun 23, 2020

Regarding MobileNetV2 reproduction, looking at your code it seems you are training on CIFAR. It won’t be as straight-forward to reproduce the full training.

We trained a Keras MobileNet V2 model with hyperparams from this. We then quantized the model and trained again for a few epochs.

I think the reason your conversion code is failing is due to

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

Try removing it, and I think conversion should work. If it doesn’t, please let me know. Basically, the QAT conversion by default uses Float inputs/outputs based on the model signature. There is work in progress in TFLiteConverterV2 to support a different model interface int8/uint8 etc.

See this.

Hope this helps.

Top Results From Across the Web

"Model not quantized" even after post-training quantization

You need to convert your model to TensorFlow Lite and it must be quantized using either quantization-aware training (recommended) or full ...

Loss Aware Post-training Quantization - arXiv

While training is a powerful method to compensate for accuracy loss due to quantization, it is often desirable to be able to quantize...

Quantization aware training comprehensive guide - TensorFlow

Welcome to the comprehensive guide for Keras quantization aware training. This page documents various use cases and shows how to use the API ......

BRECQ: PUSHING THE LIMIT OF POST-TRAINING

end retraining, called Post-training Quantization (PTQ). PTQ usually requires a ... order error into the network output, as shown in the following theorem....

Optimizing Models with Quantization-Aware Training in Keras

In part one, we saw that Post Training Quantization helped us to reduce the model size by 10 times and improved the model...