question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed resume on INT8 w/ 60% of sparsity model

See original GitHub issue

Describe the bug

Missing some keys when resuming the model.

Resnet50:

RuntimeError: Error(s) when loading model parameters:
        Missing key(s):
                "module.module.conv1.pre_ops.0.op._mask",
                "module.module.conv1.pre_ops.0.op.uniform",
                "module.module.layer1.0.conv1.pre_ops.0.op._mask",
                "module.module.layer1.0.conv1.pre_ops.0.op.uniform",
                "module.module.layer1.0.conv2.pre_ops.0.op._mask",
...

Inception_v3:

RuntimeError: Error(s) when loading model parameters:
        Missing key(s):
                "module.module.Conv2d_1a_3x3.conv.pre_ops.0.op._mask",
                "module.module.Conv2d_1a_3x3.conv.pre_ops.0.op.uniform",
                "module.module.Conv2d_2a_3x3.conv.pre_ops.0.op._mask",
                "module.module.Conv2d_2a_3x3.conv.pre_ops.0.op.uniform",
                "module.module.Conv2d_2b_3x3.conv.pre_ops.0.op._mask",
...

Steps to Reproduce Following README, download the pre-trained INT8 w/ 60% of sparsity model, resnet50 and inception_v3, and then follow the below command resumes the model and convert to .onnx.

Resnet50:

python3 main.py -m test --config=configs/sparsity_quantization/inceptionV3_imagenet_sparsity_int8.json --resume=inceptionV3_imagenet_sparsity_int8.pth --to-onnx=resnet50_sparse_int8.onnx

Inception_v3:

python3 main.py -m test --config=configs/sparsity_quantization/inceptionV3_imagenet_sparsity_int8.json --resume=inceptionV3_imagenet_sparsity_int8.pth --to-onnx=inceptionV3_sparse_int8.onnx

Environment:

  • OS: Linux Ubuntu 16.04
  • Framework version: PyTorch 1.3.1
  • Python version: 3.6.7
  • OpenVINO version: 2019 R3.1
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: 11GB * 2

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
vshamporcommented, Nov 13, 2019

Greetings, @FengYen-Chang !

This is once again an issue with the mismatch between exported .pth checkpoint formats and current NNCF state, sorry about that. Still, you can convert the model to .onnx: try using the --weights key instead of --resume when specifying the source .pth checkpoint to be converted from to the scripts.

0reactions
vshamporcommented, Nov 14, 2019

@FengYen-Chang , --resume does strict checks on loaded model checkpoint parameters vs. what is required in the model instantiated inside PyTorch, while --weights just does best-effort parameter loading. --weights is indispensable while doing compression fine-tune training starting from a full-precision uncompressed model; on the other hand, --resume is used for continuing training with the same config/training script if for some reason the training process had been interrupted, and also during evaluation runs in the -m test mode with one of the example checkpoints available via README.md links, to be sure that we evaluate the same model that is being instantiated inside PyTorch.

We aim to make all published checkpoints possible to be evaluated via --resume, but sadly, this is not yet the case due to mismatches between Python model code when the published checkpoints were originally trained and current NNCF state; still, --weights workaround works for the most part.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Overview — OpenVINO™ documentation — Version(latest)
OpenVINO is an open-source toolkit for optimizing and deploying deep learning models. It provides boosted deep learning performance for vision, audio, and ...
Read more >
CUDA Toolkit 12.0 Released for General Availability
This release is the first major release in many years and it focuses on new programming models and CUDA application acceleration through new ......
Read more >
Release Notes for Intel® Distribution of OpenVINO™ toolkit ...
Based on Convolutional Neural Networks (CNNs), the toolkit extends CV workloads across Intel® hardware, maximizing performance. It accelerates applications with ...
Read more >
Experimental implementation of a neural network optical ...
Once the levels of sparsity are higher than 60%, the reduction in performance due to the quantization gets accelerated. Moreover, we observe that...
Read more >
ICLR 2022 Conference - OpenReview
Efficiently Modeling Long Sequences with Structured State Spaces · Albert Gu, Karan Goel, Christopher Re. 28 Sept 2021 (modified: 04 Mar 2022) ICLR...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found