from_pretrained() does not update configuration in exp_manager
See original GitHub issueDescribe the bug
When fine tuning from a NeMo model (ex. stt_en_cn1024), the exp_manager’s cfg is not updated properly. I can see that in my run the model uses one config, but WandB reports another.
This issue did not occur in v1.4.0 and happened after I upgraded to v1.5.0. Maybe it has to do with order of operations? See below.
Steps/Code to reproduce bug
import pytorch_lightning as pl
from nemo.collections.asr.models import EncDecCTCModelBPE
from nemo.core.config import hydra_runner
from nemo.utils.exp_manager import exp_manager
@hydra_runner(config_path="conf/citrinet/", config_name="config")
def main(cfg):
trainer = pl.Trainer(**cfg.trainer)
log_dir = exp_manager(trainer, cfg.get("exp_manager", None))
asr_model = EncDecCTCModelBPE.from_pretrained(model_name=cfg.init_from_pretrained_model)
asr_model.encoder.unfreeze()
asr_model.change_vocabulary(
new_tokenizer_dir=cfg.model.tokenizer.dir,
new_tokenizer_type=cfg.model.tokenizer.type
)
asr_model.setup_optimization(cfg.model.optim)
asr_model.setup_training_data(cfg.model.train_ds)
asr_model.setup_multiple_validation_data(cfg.model.validation_ds)
asr_model.spec_augmentation = asr_model.from_config_dict(cfg.model.spec_augment)
asr_model.set_trainer(trainer)
trainer.fit(asr_model)
Expected behavior
WandB cfg should display the proper config (Pastebin of the WandB config)
Environment overview (please complete the following information)
- Environment location: Docker (nvcr.io/nvidia/pytorch:21.10-py3) on AWS EC2 using
docker run -it bash <image>
- Method of NeMo install:
pip install nemo_toolkit[asr]==1.5.1
Additional context
GPU model: V100 Nvidia driver: 460
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:16 (9 by maintainers)
Top Results From Across the Web
ManageEngine OpManager - Troubleshooting Guide
OpManager troubleshooting guide helps you resolve the common problems that you might encounter when using OpManager.
Read more >Update from 123329 to 124006 fails - ManageEngine Pitstop
Hi, After upgrading via UpdateManager.bat from 12.3.329 to 12.4.006, it does not allow the ManageEngine OpManager service to run.Windows display a alert. I....
Read more >ManageEngine OpManager - Troubleshooting Guide
The DNS Server does not exist. Solution. Ensure that the DNS Server is reachable and configure the DNS Server address properly. Mapping.
Read more >Configuring System Settings | OpManager Help
Learn how to configure system settings in OpManager | The following system settings can be enabled/disabled by the users | OpManager Help.
Read more >OpManager best prac ces - Table of Contents - ManageEngine
Configure the relevant creden als. ○ Define Device Templates and monitors. If your device type is not found in the supported devices.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for digging into this ! This is very interesting “bug” - it affects cases where you load a model to fine-tune, modify config then save and retrain. It’s not an invalid usecase, we should at least notify PTL team of this.
On the Nemo side I patch up a solution for PTL 1.6 (inside of the model.cfg setter) and await PTLs official solution for Nemo 1.7 release.
Setup metric is kinda niche - you’ll normally never modify metrics. ASR is weird where the domain changes quite drastically (eng to mandarin for example) and so ofc the calculation metric needs to change entirely (wer to cer). NLP / NMT has a universal metric like accuracy or BLEU which works without modifications. I’ll discuss with the ASR team if they thing a specialized setup_metric() would be worth it. Thanks for the feedback !
Oh I think I got the issue, the config passed to the main method is a new config, but the model is restored by the old config embedded in the Nemo file - you should be able to do model.cfg = cfg before your train step to correctly update the internal config of the model. That’s a good catch, I believe only one of the notebooks shows this. Might as well add to the documentation.
Still, trainer should be set before calling any other function, or passed into the restore_from or from_pretrained call itself