Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error in Training

See original GitHub issue

When i transfer StatAssist-GradBoost to my face detection task, an error happend in training phase; Below is log:

Epoch: [6]  [  0/202]  eta: 0:25:54  lr: 0.001  img/s: 40.46101009839138  loss: 53.6955 (53.6955)  losses: 53.6955 (53.6955)  box losses: 4.2484 (4.2484)  class losses: 3.2239 (3.2239)  landmark losses: 41.9749 (41.9749)  time: 7.6959  data: 6.1140  max mem: 4881
Traceback (most recent call last):
  File "train_quantization_sg.py", line 586, in <module>
    main(args)
  File "train_quantization_sg.py", line 419, in main
    args.print_freq, priors)
  File "train_quantization_sg.py", line 100, in train_one_epoch
    output = model(image)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt1/FaceDetect-QAT/models/qat_slim.py", line 29, in forward
    x)
  File "/opt1/FaceDetect-QAT/models/net_slim.py", line 131, in _forward_impl
    x1 = self.conv1(inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/intrinsic/qat/modules/conv_fused.py", line 243, in forward
    return self.activation_post_process(F.relu(ConvBn2d._forward(self, input)))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/intrinsic/qat/modules/conv_fused.py", line 95, in _forward
    conv = self._conv_forward(input, self.weight_fake_quant(scaled_weight))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/quantization/fake_quantize.py", line 81, in forward
    _scale, _zero_point = self.calculate_qparams()
  File "/opt/conda/lib/python3.7/site-packages/torch/quantization/fake_quantize.py", line 76, in calculate_qparams
    return self.activation_post_process.calculate_qparams()
  File "/opt/conda/lib/python3.7/site-packages/torch/quantization/observer.py", line 510, in calculate_qparams
    return self._calculate_per_channel_qparams(self.min_vals, self.max_vals)
  File "/opt/conda/lib/python3.7/site-packages/torch/quantization/observer.py", line 148, in _calculate_per_channel_qparams
    assert (torch.sum(diff) == len(diff)), "min_vals should be less than max_vals for indices."
AssertionError: min_vals should be less than max_vals for indices.

In addation, it can train a few epoch.

Issue Analytics

State:
Created 3 years ago
Comments:12

Top GitHub Comments

1reaction

xieyddcommented, Jul 3, 2020

@yjyoo3312 One Question has nothing to do with the issue. Have you compare the tf2 qat with pytorch qat. ：）

1reaction

xieyddcommented, Jul 2, 2020

@yjyoo3312 I have invited you to my private repo, because the repo is not ready open : ） Thanks again. Below is my test result

Plan	Hard	Medium	Easy	WiderFace Speed
FaceDetect	0.31	0.64	0.78
FaceDetect-QAT	0.294	0.612	0.749
FaceDetect-QAT-StatAssist-GradBoost-fbgemm	0.31	0.64	0.759	0.076s
FaceDetect-QAT-StatAssist-GradBoost-qnnpack	0.307	0.634	0.766	0.058s