Failed resume on INT8 w/ 60% of sparsity model
See original GitHub issueDescribe the bug
Missing some keys when resuming the model.
Resnet50:
RuntimeError: Error(s) when loading model parameters:
Missing key(s):
"module.module.conv1.pre_ops.0.op._mask",
"module.module.conv1.pre_ops.0.op.uniform",
"module.module.layer1.0.conv1.pre_ops.0.op._mask",
"module.module.layer1.0.conv1.pre_ops.0.op.uniform",
"module.module.layer1.0.conv2.pre_ops.0.op._mask",
...
Inception_v3:
RuntimeError: Error(s) when loading model parameters:
Missing key(s):
"module.module.Conv2d_1a_3x3.conv.pre_ops.0.op._mask",
"module.module.Conv2d_1a_3x3.conv.pre_ops.0.op.uniform",
"module.module.Conv2d_2a_3x3.conv.pre_ops.0.op._mask",
"module.module.Conv2d_2a_3x3.conv.pre_ops.0.op.uniform",
"module.module.Conv2d_2b_3x3.conv.pre_ops.0.op._mask",
...
Steps to Reproduce
Following README
, download the pre-trained INT8 w/ 60% of sparsity model, resnet50 and inception_v3, and then follow the below command resumes the model and convert to .onnx
.
Resnet50:
python3 main.py -m test --config=configs/sparsity_quantization/inceptionV3_imagenet_sparsity_int8.json --resume=inceptionV3_imagenet_sparsity_int8.pth --to-onnx=resnet50_sparse_int8.onnx
Inception_v3:
python3 main.py -m test --config=configs/sparsity_quantization/inceptionV3_imagenet_sparsity_int8.json --resume=inceptionV3_imagenet_sparsity_int8.pth --to-onnx=inceptionV3_sparse_int8.onnx
Environment:
- OS: Linux Ubuntu 16.04
- Framework version: PyTorch 1.3.1
- Python version: 3.6.7
- OpenVINO version: 2019 R3.1
- CUDA/cuDNN version: 10.1
- GPU model and memory: 11GB * 2
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Overview — OpenVINO™ documentation — Version(latest)
OpenVINO is an open-source toolkit for optimizing and deploying deep learning models. It provides boosted deep learning performance for vision, audio, and ...
Read more >CUDA Toolkit 12.0 Released for General Availability
This release is the first major release in many years and it focuses on new programming models and CUDA application acceleration through new ......
Read more >Release Notes for Intel® Distribution of OpenVINO™ toolkit ...
Based on Convolutional Neural Networks (CNNs), the toolkit extends CV workloads across Intel® hardware, maximizing performance. It accelerates applications with ...
Read more >Experimental implementation of a neural network optical ...
Once the levels of sparsity are higher than 60%, the reduction in performance due to the quantization gets accelerated. Moreover, we observe that...
Read more >ICLR 2022 Conference - OpenReview
Efficiently Modeling Long Sequences with Structured State Spaces · Albert Gu, Karan Goel, Christopher Re. 28 Sept 2021 (modified: 04 Mar 2022) ICLR...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Greetings, @FengYen-Chang !
This is once again an issue with the mismatch between exported .pth checkpoint formats and current NNCF state, sorry about that. Still, you can convert the model to .onnx: try using the
--weights
key instead of--resume
when specifying the source .pth checkpoint to be converted from to the scripts.@FengYen-Chang ,
--resume
does strict checks on loaded model checkpoint parameters vs. what is required in the model instantiated inside PyTorch, while--weights
just does best-effort parameter loading.--weights
is indispensable while doing compression fine-tune training starting from a full-precision uncompressed model; on the other hand,--resume
is used for continuing training with the same config/training script if for some reason the training process had been interrupted, and also during evaluation runs in the-m test
mode with one of the example checkpoints available via README.md links, to be sure that we evaluate the same model that is being instantiated inside PyTorch.We aim to make all published checkpoints possible to be evaluated via
--resume
, but sadly, this is not yet the case due to mismatches between Python model code when the published checkpoints were originally trained and current NNCF state; still,--weights
workaround works for the most part.