How to convert a model (trained with quantization awareness) to int8?
See original GitHub issueI’ve being trying to convert a model trained with quantization awareness to a .tflite file that only uses int8 operations. I modified the quantization aware training example from guide adding the following lines before the conversion (as I would do in a usual conversion):
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.representative_dataset = representative_data_gen
# Added the next 3 lines
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
quant_tflite_model = converter.convert()
file = open('quant_aware_model.tflite', 'wb')
file.write(quant_tflite_model)
However it raises the following error:
RuntimeError: Max and min for dynamic tensors should be recorded during calibration
I’ve checked the documentation but didn’t find any example of how to perform this operation. Is it supported?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Improving INT8 Accuracy Using Quantization Aware Training ...
To train with quantization awareness, add a flag in the training_config component of the training spec file. This triggers the train command to ......
Read more >Deploying Quantization Aware Trained models in INT8 using ...
Quantization Aware training (QAT) simulates quantization during training by quantizing weights and activation layers. This will help to reduce the loss in ...
Read more >INT8 Inference of Quantization-Aware trained models using ...
Accelerating Deep Neural Networks (DNN) inference is an important step in realizing latencycritical deployment of real-world applications ...
Read more >Post-training quantization | TensorFlow Lite
You can quantize an already-trained float TensorFlow model when you convert it to TensorFlow Lite format using the TensorFlow Lite Converter.
Read more >Inside Quantization Aware Training - Towards Data Science
To optimize our neural networks to run for low power and low storage devices, various model optimization techniques are used.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

If you think it’s too hard to find your answer in the above documentation and have thoughts, welcome any suggestions.
@suttergustavo
I have the same problem! Have you found a solution?
I have a model trained with quantization aware. I followed the official TensorFlow guide, where is written “After this, you have an actually quantized model with int8 weights and uint8 activations.”
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model) converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()
The conversion works, but at the edge tpu complier says that the model is not quantized:
" Edge TPU Compiler version 2.1.302470888 Invalid model: bin/srmodel_full_integer_qaware.tflite Model not quantized "