Loading finetuned checkpoint. Local method not working.
See original GitHub issueHi! For some time I though it was a correct way to load pretrained named model (such as tera
, mockingjay
, cpc
) provided in example_extract.py
files:
ckpt_path=`/path/to/valid/ckpt-1337steps.pt`
Upstream_local = getattr(importlib.import_module('hubconf'), 'tera')
model = Upstream_local(ckpt=ckpt_path).to(device)
But then I noticed it never uses ckpt
to load model. However, I was using <model_name>
instead of <model_name>_local
as written in the examples because of errors with last one method:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-23-44af8ca274e3> in <module>()
11 ckpt_root = '/content/TeraFineTuned/'
12 Upstream_local = getattr(importlib.import_module('hubconf'), 'tera_local')
---> 13 model = Upstream_local(ckpt=ckpt_root+"states-150000.ckpt").to(device)
3 frames
/content/S3PRL/upstream/tera/hubconf.py in tera_local(ckpt, feature_selection, *args, **kwargs)
15 if feature_selection is None:
16 feature_selection = -1
---> 17 return _UpstreamExpert(ckpt, feature_selection, *args, **kwargs)
18
19
/content/S3PRL/upstream/mockingjay/expert.py in __init__(self, ckpt, feature_selection, **kwargs)
37 'permute_input' : 'False' }
38
---> 39 self.transformer = PretrainedTransformer(options, inp_dim=-1)
40 assert hasattr(self.transformer, 'extracter'), 'This wrapper only supports `on-the-fly` ckpt with built in feature extracters.'
41
/content/S3PRL/upstream/mockingjay/builder.py in __init__(self, options, inp_dim, config, online_config, verbose)
255 """
256 def __init__(self, options, inp_dim, config=None, online_config=None, verbose=False):
--> 257 super(PretrainedTransformer, self).__init__(options, inp_dim, config, online_config, verbose)
258
259 # Build model
/content/S3PRL/upstream/mockingjay/builder.py in __init__(self, options, inp_dim, config, on_the_fly_config, verbose)
60
61 # Set model config
---> 62 self.model_config = TransformerConfig(self.config['transformer'])
63 self.hidden_size = self.model_config.hidden_size
64 self.num_layers = self.model_config.num_hidden_layers
KeyError: 'transformer'
I think it is a bug. Is there any valid way to load fine-tuned checkpoint for now?
Edit: Same error if using torch.hub.load
:
model = torch.hub.load('s3prl/s3prl', 'tera_local', ckpt=ckpt_path).to(device)
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Loading model from checkpoint after error in training - Beginners
Let's say I am finetuning a model and during training an error is encountered and the training stops. Let's also say that, using...
Read more >How to load a fine tuned pytorch huggingface bert model from ...
Just save your model using model.save_pretrained, here is an example: model.save_pretrained("<path_to_dummy_folder>").
Read more >A Guide To Using Checkpoints — Ray 2.2.0
Tune stores checkpoints on the node where the trials are executed. If you are training on more than one node, this means that...
Read more >Checkpointing — PyTorch Lightning 1.6.0 documentation
Lightning provides functions to save and load checkpoints. ... fine-tune a model or use a pre-trained model for inference without having to retrain...
Read more >Saving and loading a general checkpoint in PyTorch
In this recipe, we will explore how to save and load multiple checkpoints. Setup. Before we begin, we need to install torch if...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hey,
This sounds interesting that perhaps
unsupervised pretraining -> supervised finetuning
have some essential difference withsupervised pretraining from scratch
, and we can examine whether the former yields better results. Sure then I will come up a way to support this.Hi, thank you for your interest.
In my opinion, since tera and mockingjay are transformer representations learned via reconstruction loss on predicting mels their representations are robust and have high quality on reproducing mels (then speech). But if I need to extract only features dependent on timbre/acoustic or only linguistic features, these representations can be improved by finetuning, thus decreasing the leakage of other (not needed) features.
In my experiment, I wonder if my models quality could be improved if representations used as input PPG features (basically linguistic only, w/o acoustics) would be produced by finetuned tera/mockingjay (now it uses pre-trained). Here I denote PPG features as hidden state of model trained for ASR or phone recognition tasks.
Also, words above are applicable to other models (cpc, apc, wav2vec2, etc)