question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can not quantize model in Per tensor way

See original GitHub issue

Hello teams,

I try to quantize all the parameters of my model in a per_tensor way. And I found that the final output quantization model still contains layers per_channel.

the yaml file is following:

version: 1.0

model:                                               # mandatory. used to specify model specific information.
  name: mobilenetv2
  framework: onnxrt_qlinearops                       # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.

quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
  approach: post_training_static_quant               # optional. default value is post_training_static_quant.
  calibration:
    dataloader:
      batch_size: 1
      dataset:
        ImagenetRaw:
          data_path: /home/tau/Workspace/databank/imagenet/ILSVRC/Data/CLS-LOC/val
          image_list: /home/tau/Workspace/databank/imagenet/caffe_labels/val.txt      # download from http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
      transform:
        Rescale: {}
        Resize:
          size: 256
        CenterCrop:
          size: 224
        Normalize:
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
        Transpose:
          perm: [2, 0, 1]
        Cast:
          dtype: float32
  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
    weight:
      granularity: per_tensor
      scheme: asym
      dtype: int8
      algorithm: minmax
    activation:
      granularity: per_tensor
      scheme: asym
      algorithm: minmax

tuning:
  accuracy_criterion:
    relative:  0.02                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
  exit_policy:
    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
  random_seed: 9527                                  # optional. random seed for deterministic tuning.

Thanks.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7

github_iconTop GitHub Comments

3reactions
mengniwang95commented, Jun 2, 2022

Hi, 1.12 version support per-tensor way. If you want to get per-tensor quantized model directly, pls add model_wise in yaml file like https://github.com/intel/neural-compressor/blob/aac0a0ec860d6d875467a8b7fb119ec18713fd48/neural_compressor/template/ptq.yaml#L43 and set ‘granularity’ to per_tensor

0reactions
zihaomucommented, Jun 2, 2022

Hi, 1.12 version support per-tensor way. If you want to get per-tensor quantized model directly, pls add model_wise in yaml file like

https://github.com/intel/neural-compressor/blob/aac0a0ec860d6d875467a8b7fb119ec18713fd48/neural_compressor/template/ptq.yaml#L43

and set ‘granularity’ to per_tensor

Thanks @mengniwang95, this will be of great help to us.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cannot quantize part of a model · Issue #46073 - GitHub
It is possible to quantize per-layer using quantized aware training as you mention. You can use the quantize_annotate_layer as mentioned for ...
Read more >
Post-training quantization | TensorFlow Lite
You can quantize an already-trained float TensorFlow model when you convert it to TensorFlow Lite format using the TensorFlow Lite Converter.
Read more >
Quantization — PyTorch 1.13 documentation
At lower level, PyTorch provides a way to represent quantized tensors and perform operations with them. They can be used to directly construct...
Read more >
Quantization - Neural Network Distiller
In many cases, taking a model trained for FP32 and directly quantizing it to INT8, without any re-training, can result in a relatively...
Read more >
Practical tips for better quantization results - Heartbeat
2. Set the qconfig for only those layers you want to quantize, not the whole model. For instance, instead of model.qconfig , ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found