Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Quantization: QuantizeModelsTest.testModelEndToEnd() function does not check the correctness of quantization process

See original GitHub issue

Describe the bug

Hi, I’m trying to understand the model optimization API and create quantized version of the MobileNetV2 model for my side project (a small library for training custom detection models for mobile). However, I’m struggling to get any positive results using this library compared to old approach when I was using converter.representative_dataset for model conversion.

I found that there are some end-to-end tests in your repository in the tensorflow_model_optimization.python.core.quantization.keras.quantize_models_test.py file, however it seems these tests do not check whether the conversion actually makes sense or not. I mean the conversion is successful but it is so inaccurate that cannot be used in practice.

If you will check the predictions outputs of converted model you will see it produces only zeros (I get similar behavior when training on real data for much longer time). The picture below shows the min/max and std values of the output of the converted MobileNetV2 model from your end to end test:

When I initialize this model with imagenet weights I still get different values at the outputs between keras quantized and converted to tflite model:

What are your general suggestions for checking for possible sources of invalid conversion? For example, I’m aware that I should finetune my quantized model for a couple of epochs, however I’m not sure how long or if this actually matters. Are batchnorm layers supported (I saw that composition of ConvBatchNormRelu is implemented). In general (or maybe in near future), should I expect to get similar accuracy of quantized Keras model and converted to tflite one ?

Thanks, Krzysztof

System information

TensorFlow installed from (source or binary): binary

TensorFlow version: tf-nightly==2.2.0.dev20200316

TensorFlow Model Optimization version: tf-model-optimization-nightly==0.2.1.dev20200320 (compiled from source code, from master branch)

Python version: 3.7.0

Code to reproduce the issue

This is slightly modified version of the _verify_tflite function from your tests, which I used to check the outputs of the converted models.

def _verify_tflite(tflite_file, x_test, y_test, model):
    interpreter = tf.lite.Interpreter(model_path=tflite_file)
    interpreter.allocate_tensors()
    input_index = interpreter.get_input_details()[0]['index']
    output_index = interpreter.get_output_details()[0]['index']
    
    keras_predictions = model.predict(x_test)
    tflite_predictions = []
    for x, _ in zip(x_test, y_test):
      x = x.reshape((1,) + x.shape)
      interpreter.set_tensor(input_index, x)
      interpreter.invoke()
      outputs = interpreter.get_tensor(output_index)
      tflite_predictions.append(outputs)

    return np.vstack(tflite_predictions), keras_predictions

tflite_predictions, keras_predictions = _verify_tflite(tflite_file, x_train, y_train, base_model)

print(tflite_predictions.min(), tflite_predictions.max(), tflite_predictions.std())
print(keras_predictions.min(), keras_predictions.max(), keras_predictions.std())

Issue Analytics

State:
Created 4 years ago
Comments:19 (8 by maintainers)

Top GitHub Comments

2reactions

nutsiepullycommented, May 5, 2020

This is a duplicate of this bug. Closing this, and following up there.

0reactions

sayakpaulcommented, Apr 30, 2020

Thanks for noticing it @kmkolasinski. Just fixed it.

Top Results From Across the Web

Post-training quantization | TensorFlow Lite

Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, ...

Getting Started: End to End — TensorFlow 2.x Quantization ...

Let's check the quantized model's accuracy immediately after Q/DQ nodes are inserted. # Compile quantized model quantized_model.compile( optimizer= ...

Model Quantizing - 2.5 English - Xilinx

This tutorial will demonstrate how to quantize models with custom operations step-by-step. Note: Custom model via subclassing tf.keras.Model is not ...

Quantization — PyTorch 1.13 documentation

This file is in the process of migration to torch/ao/quantization , and is ... PyTorch supports multiple approaches to quantizing a deep learning...

Quantize ONNX Models | onnxruntime

If it is not possible to represent 0 uniquely after quantization, it will result in accuracy errors. ONNX quantization representation format. There are...