Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to convert a model (trained with quantization awareness) to int8?

See original GitHub issue

I’ve being trying to convert a model trained with quantization awareness to a .tflite file that only uses int8 operations. I modified the quantization aware training example from guide adding the following lines before the conversion (as I would do in a usual conversion):

converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]

converter.representative_dataset = representative_data_gen

# Added the next 3 lines
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

quant_tflite_model = converter.convert()
file = open('quant_aware_model.tflite', 'wb')
file.write(quant_tflite_model)

However it raises the following error:

RuntimeError: Max and min for dynamic tensors should be recorded during calibration

I’ve checked the documentation but didn’t find any example of how to perform this operation. Is it supported?

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

alanchiaocommented, Apr 7, 2020

If you think it’s too hard to find your answer in the above documentation and have thoughts, welcome any suggestions.

0reactions

EscVMcommented, Jul 7, 2020

@suttergustavo

I have the same problem! Have you found a solution?

I have a model trained with quantization aware. I followed the official TensorFlow guide, where is written “After this, you have an actually quantized model with int8 weights and uint8 activations.”

converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model) converter.optimizations = [tf.lite.Optimize.DEFAULT]

quantized_tflite_model = converter.convert()

The conversion works, but at the edge tpu complier says that the model is not quantized:

" Edge TPU Compiler version 2.1.302470888 Invalid model: bin/srmodel_full_integer_qaware.tflite Model not quantized "

Top Results From Across the Web

Improving INT8 Accuracy Using Quantization Aware Training ...

To train with quantization awareness, add a flag in the training_config component of the training spec file. This triggers the train command to ......

Deploying Quantization Aware Trained models in INT8 using ...

Quantization Aware training (QAT) simulates quantization during training by quantizing weights and activation layers. This will help to reduce the loss in ...

INT8 Inference of Quantization-Aware trained models using ...

Accelerating Deep Neural Networks (DNN) inference is an important step in realizing latencycritical deployment of real-world applications ...

Post-training quantization | TensorFlow Lite

You can quantize an already-trained float TensorFlow model when you convert it to TensorFlow Lite format using the TensorFlow Lite Converter.

Inside Quantization Aware Training - Towards Data Science

To optimize our neural networks to run for low power and low storage devices, various model optimization techniques are used.