Error in Training
See original GitHub issueIssue Description
When i transfer StatAssist-GradBoost to my face detection task, an error happend in training phase; Below is log:
Epoch: [6] [ 0/202] eta: 0:25:54 lr: 0.001 img/s: 40.46101009839138 loss: 53.6955 (53.6955) losses: 53.6955 (53.6955) box losses: 4.2484 (4.2484) class losses: 3.2239 (3.2239) landmark losses: 41.9749 (41.9749) time: 7.6959 data: 6.1140 max mem: 4881
Traceback (most recent call last):
File "train_quantization_sg.py", line 586, in <module>
main(args)
File "train_quantization_sg.py", line 419, in main
args.print_freq, priors)
File "train_quantization_sg.py", line 100, in train_one_epoch
output = model(image)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/opt1/FaceDetect-QAT/models/qat_slim.py", line 29, in forward
x)
File "/opt1/FaceDetect-QAT/models/net_slim.py", line 131, in _forward_impl
x1 = self.conv1(inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/intrinsic/qat/modules/conv_fused.py", line 243, in forward
return self.activation_post_process(F.relu(ConvBn2d._forward(self, input)))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/intrinsic/qat/modules/conv_fused.py", line 95, in _forward
conv = self._conv_forward(input, self.weight_fake_quant(scaled_weight))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/quantization/fake_quantize.py", line 81, in forward
_scale, _zero_point = self.calculate_qparams()
File "/opt/conda/lib/python3.7/site-packages/torch/quantization/fake_quantize.py", line 76, in calculate_qparams
return self.activation_post_process.calculate_qparams()
File "/opt/conda/lib/python3.7/site-packages/torch/quantization/observer.py", line 510, in calculate_qparams
return self._calculate_per_channel_qparams(self.min_vals, self.max_vals)
File "/opt/conda/lib/python3.7/site-packages/torch/quantization/observer.py", line 148, in _calculate_per_channel_qparams
assert (torch.sum(diff) == len(diff)), "min_vals should be less than max_vals for indices."
AssertionError: min_vals should be less than max_vals for indices.
In addation, it can train a few epoch.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12
Top Results From Across the Web
Training & Test Error: Validating Models in Machine Learning
Remember that the training error is calculated by using the same data for training the model and calculating its error rate. For calculating...
Read more >What is a training and test error? - Quora
Training error is the error that you get when you run the trained model back on the training data. Remember that this data...
Read more >What are the “training error” and “test error” used in deep ...
Training error is simply an error that occurs during model training, i.e. dataset inappropriately handle during preprocessing or in feature ...
Read more >Assessing the Performance (Types and Sources of Error) in ...
The training error is defined as the average loss that occurred during the training process. It is given by: Here, m_t is the...
Read more >Training and Testing Errors - CMU Statistics
Training and Testing Errors · Reminder: statistical (regression) models · Reminder: linear regression models · Shifting tides: a focus on ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@yjyoo3312 One Question has nothing to do with the issue. Have you compare the tf2 qat with pytorch qat. :)
@yjyoo3312 I have invited you to my private repo, because the repo is not ready open : ) Thanks again. Below is my test result