Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can not quantize model in Per tensor way

See original GitHub issue

Hello teams,

I try to quantize all the parameters of my model in a per_tensor way. And I found that the final output quantization model still contains layers per_channel.

the yaml file is following:

version: 1.0

model:                                               # mandatory. used to specify model specific information.
  name: mobilenetv2
  framework: onnxrt_qlinearops                       # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.

quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
  approach: post_training_static_quant               # optional. default value is post_training_static_quant.
  calibration:
    dataloader:
      batch_size: 1
      dataset:
        ImagenetRaw:
          data_path: /home/tau/Workspace/databank/imagenet/ILSVRC/Data/CLS-LOC/val
          image_list: /home/tau/Workspace/databank/imagenet/caffe_labels/val.txt      # download from http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
      transform:
        Rescale: {}
        Resize:
          size: 256
        CenterCrop:
          size: 224
        Normalize:
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
        Transpose:
          perm: [2, 0, 1]
        Cast:
          dtype: float32
  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
    weight:
      granularity: per_tensor
      scheme: asym
      dtype: int8
      algorithm: minmax
    activation:
      granularity: per_tensor
      scheme: asym
      algorithm: minmax

tuning:
  accuracy_criterion:
    relative:  0.02                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
  exit_policy:
    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
  random_seed: 9527                                  # optional. random seed for deterministic tuning.

Thanks.

Issue Analytics

State:
Created a year ago
Comments:7

Top GitHub Comments

3reactions

mengniwang95commented, Jun 2, 2022

Hi, 1.12 version support per-tensor way. If you want to get per-tensor quantized model directly, pls add model_wise in yaml file like https://github.com/intel/neural-compressor/blob/aac0a0ec860d6d875467a8b7fb119ec18713fd48/neural_compressor/template/ptq.yaml#L43 and set ‘granularity’ to per_tensor

0reactions

zihaomucommented, Jun 2, 2022

Hi, 1.12 version support per-tensor way. If you want to get per-tensor quantized model directly, pls add model_wise in yaml file like

https://github.com/intel/neural-compressor/blob/aac0a0ec860d6d875467a8b7fb119ec18713fd48/neural_compressor/template/ptq.yaml#L43

and set ‘granularity’ to per_tensor

Thanks @mengniwang95, this will be of great help to us.