Yolo-v3-tiny-tf model with INT-8 precision gives bad inferences
See original GitHub issueHi We are working on integrating yolo-v3-tiny-tf int-8 IR model into the dlstreamer pipeline following the documentation provided for changing model We were able to integrate yolo-v3-tiny-tf IR model (non quantized) and test it. but we failed to get proper inference with the INT-8 models for the same. these converted INT-8 model were validated using open model zoo sample for object detection and it was giving a proper inference.
The steps follow for conversion of the yolo-v3-tiny-tf to int8 model are provided below:
This Quantization document is based on yolo_v3_tiny_tf model
Requirements
- Openvino-dev 2022.1
- Openvino 2022.1
Steps for Quantization
Step 1: Obtain the OMZ model (yolo_v3_tiny_tf)
omz_downloader --name yolo_v3_tiny_tf
omz_converter --name yolo_v3_tiny_tf
This step downloads the frozen model and converts them to it’s appropriate IR representation.
Step 2 : Obtain the DataSet for Optimization
For this Model COCO 2017 Validation dataset was selected
wget http://images.cocodataset.org/zips/val2017.zip
unzip val2017.zip
Step 3: Create a Json <quantization_spec.json> (Optional instead of pot arguments)
Note: Use FP16 to convert to FP16-INT8
{
"model": {
"model_name": "yolo-v3-tiny-tf",
"model": "<path to the model>/yolo-v3-tiny-tf/FP32/yolo-v3-tiny-tf.xml",
"weights": "<path to the model>/yolo-v3-tiny-tf/FP32/yolo-v3-tiny-tf.bin"
},
"engine": {
"type": "simplified",
"data_source": "<path to the dataset where the images are stored>/val2017"
},
"compression": {
"target_device": "CPU",
"algorithms": [
{
"name": "DefaultQuantization",
"params": {
"preset": "performance",
"stat_subset_size": 300,
"shuffle_data": false
}
}
]
}
}
Step 4: Use post optimization tool of the openvino to finish the process
This step converts the FP32/FP16 models to its FP32-INT8/FP16-INT8 models
the INT8 models will be available in “yolov3_int8” directory
pot -c quantization_spec.json --output-dir yolov3_int8 -d
Step 5 : Validation
Test the converted model with open model zoo demo object detection sample.
python3 object_detection_demo.py -d CPU -i <path to the input video> -m <path to INT8 model xml> -at yolo --labels <OMZ_DIR>/data/dataset_classes/coco_80cl.txt
For integrating the pipeline server the steps followed are as per the document
copy the downloaded and converted model under
<pipeline-server>/models/object_detection/yolo-v3-tiny-tf
Directory structure looks something like this under yolo-v3-tiny-tf
coco-80cl.txt FP16 FP32 FP32-INT8 yolo-v3-tiny-tf yolo-v3-tiny-tf.json
Created new pipeline
cp -r pipelines/gstreamer/object_detection/person_vehicle_bike pipelines/gstreamer/object_detection/yolo-v3-tiny-tf
Edited the template of pipeline.json
sed -i -e s/\\[person_vehicle_bike\\]/\\[yolo-v3-tiny-tf\\]/g pipelines/gstreamer/object_detection/yolo-v3-tiny-tf/pipeline.json
Ran the pipeline server
./docker/run.sh -v /tmp:/tmp --models models --pipelines pipelines/gstreamer
with this we were able to the inferencing with FP16, FP32 but we were not able to do inference with FP32-INT8 IR model
Could you please let us know what steps we are missing to integrate Quantized model ?
Thanks
Issue Analytics
- State:
- Created a year ago
- Comments:8
Top GitHub Comments
@whbruce These are the same model that i used for the pipeline server I have tried this the open model zoo sample openvino 2022.1.0 version, Below are the logs and attachment for the same
(setup) intel@intel-WL10:~/workspace/open_model_zoo/demos/object_detection_demo/python$ python3 object_detection_demo.py -d CPU -i bottle.mp4 -m /home/intel/workspace/pipeline-server/models/object_detection/yolo-v3-tiny-tf/FP32-INT8/yolo-v3-tiny-tf.xml -at yolo --labels /home/intel/workspace/pipeline-server/models/object_detection/yolo-v3-tiny-tf/coco-80cl.txt [ INFO ] OpenVINO Runtime [ INFO ] build: 2022.1.0-7019-cdb9bec7210-releases/2022/1 [ INFO ] Reading model /home/intel/workspace/pipeline-server/models/object_detection/yolo-v3-tiny-tf/FP32-INT8/yolo-v3-tiny-tf.xml [ WARNING ] The parameter “input_size” not found in YOLO wrapper, will be omitted [ WARNING ] The parameter “num_classes” not found in YOLO wrapper, will be omitted [ INFO ] Input layer: image_input, shape: [1, 416, 416, 3], precision: f32, layout: NHWC [ INFO ] Output layer: conv2d_12/Conv2D/YoloRegion, shape: [1, 255, 26, 26], precision: f32, layout: [ INFO ] Output layer: conv2d_9/Conv2D/YoloRegion, shape: [1, 255, 13, 13], precision: f32, layout: [ INFO ] The model /home/intel/workspace/pipeline-server/models/object_detection/yolo-v3-tiny-tf/FP32-INT8/yolo-v3-tiny-tf.xml is loaded to CPU [ INFO ] Device: CPU [ INFO ] Number of streams: 4 [ INFO ] Number of threads: AUTO [ INFO ] Number of model infer requests: 5 [ INFO ] Metrics report: [ INFO ] Latency: 84.1 ms [ INFO ] FPS: 30.5 [ INFO ] Decoding: 0.4 ms [ INFO ] Preprocessing: 0.6 ms [ INFO ] Inference: 81.0 ms [ INFO ] Postprocessing: 1.9 ms [ INFO ] Rendering: 0.2 ms
yes, @brmarkus
With the downloaded models from the open model zoo (i.e.,The Original models), the pipeline server is able to give the inference, but when it comes the INT-8 Quantized model based on Accuracy Aware Algo the inference is failing with the pipeline server These same Quantized model were validated with the object detection sample from the OMZ repo.