question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error in Training

See original GitHub issue

When i transfer StatAssist-GradBoost to my face detection task, an error happend in training phase; Below is log:

Epoch: [6]  [  0/202]  eta: 0:25:54  lr: 0.001  img/s: 40.46101009839138  loss: 53.6955 (53.6955)  losses: 53.6955 (53.6955)  box losses: 4.2484 (4.2484)  class losses: 3.2239 (3.2239)  landmark losses: 41.9749 (41.9749)  time: 7.6959  data: 6.1140  max mem: 4881
Traceback (most recent call last):
  File "train_quantization_sg.py", line 586, in <module>
    main(args)
  File "train_quantization_sg.py", line 419, in main
    args.print_freq, priors)
  File "train_quantization_sg.py", line 100, in train_one_epoch
    output = model(image)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt1/FaceDetect-QAT/models/qat_slim.py", line 29, in forward
    x)
  File "/opt1/FaceDetect-QAT/models/net_slim.py", line 131, in _forward_impl
    x1 = self.conv1(inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/intrinsic/qat/modules/conv_fused.py", line 243, in forward
    return self.activation_post_process(F.relu(ConvBn2d._forward(self, input)))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/intrinsic/qat/modules/conv_fused.py", line 95, in _forward
    conv = self._conv_forward(input, self.weight_fake_quant(scaled_weight))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/quantization/fake_quantize.py", line 81, in forward
    _scale, _zero_point = self.calculate_qparams()
  File "/opt/conda/lib/python3.7/site-packages/torch/quantization/fake_quantize.py", line 76, in calculate_qparams
    return self.activation_post_process.calculate_qparams()
  File "/opt/conda/lib/python3.7/site-packages/torch/quantization/observer.py", line 510, in calculate_qparams
    return self._calculate_per_channel_qparams(self.min_vals, self.max_vals)
  File "/opt/conda/lib/python3.7/site-packages/torch/quantization/observer.py", line 148, in _calculate_per_channel_qparams
    assert (torch.sum(diff) == len(diff)), "min_vals should be less than max_vals for indices."
AssertionError: min_vals should be less than max_vals for indices.

In addation, it can train a few epoch.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12

github_iconTop GitHub Comments

1reaction
xieyddcommented, Jul 3, 2020

@yjyoo3312 One Question has nothing to do with the issue. Have you compare the tf2 qat with pytorch qat. :)

1reaction
xieyddcommented, Jul 2, 2020

@yjyoo3312 I have invited you to my private repo, because the repo is not ready open : ) Thanks again. Below is my test result

Plan Hard Medium Easy WiderFace Speed
FaceDetect 0.31 0.64 0.78
FaceDetect-QAT 0.294 0.612 0.749
FaceDetect-QAT-StatAssist-GradBoost-fbgemm 0.31 0.64 0.759 0.076s
FaceDetect-QAT-StatAssist-GradBoost-qnnpack 0.307 0.634 0.766 0.058s
Read more comments on GitHub >

github_iconTop Results From Across the Web

Training & Test Error: Validating Models in Machine Learning
Remember that the training error is calculated by using the same data for training the model and calculating its error rate. For calculating...
Read more >
What is a training and test error? - Quora
Training error is the error that you get when you run the trained model back on the training data. Remember that this data...
Read more >
What are the “training error” and “test error” used in deep ...
Training error is simply an error that occurs during model training, i.e. dataset inappropriately handle during preprocessing or in feature ...
Read more >
Assessing the Performance (Types and Sources of Error) in ...
The training error is defined as the average loss that occurred during the training process. It is given by: Here, m_t is the...
Read more >
Training and Testing Errors - CMU Statistics
Training and Testing Errors · Reminder: statistical (regression) models · Reminder: linear regression models · Shifting tides: a focus on ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found