Cannot reproduce SimulST from README
See original GitHub issueHi all, I’m trying to reproduce your system for simultaneous ST and in the README it is written that, for data preprocessing and ASR pre-training, I have to follow the offline S2T part. I’ve preprocessed the MustC data and I’ve downloaded the best_checkpoint from the ASR offline pre-training, then I’ve tried to launch the wait-k version of the training but this error arises:
Traceback (most recent call last):
File "/home/fonorato/anaconda3/envs/myenv/bin/fairseq-train", line 33, in <module>
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
File "/home/fonorato/fairseqSLT/fairseq/fairseq_cli/train.py", line 450, in cli_main
distributed_utils.call_main(cfg, main)
File "/home/fonorato/fairseqSLT/fairseq/fairseq/distributed/utils.py", line 364, in call_main
main(cfg, **kwargs)
File "/home/fonorato/fairseqSLT/fairseq/fairseq_cli/train.py", line 79, in main
model = task.build_model(cfg.model)
File "/home/fonorato/fairseqSLT/fairseq/fairseq/tasks/speech_to_text.py", line 110, in build_model
return super(SpeechToTextTask, self).build_model(args)
File "/home/fonorato/fairseqSLT/fairseq/fairseq/tasks/fairseq_task.py", line 633, in build_model
model = models.build_model(args, self)
File "/home/fonorato/fairseqSLT/fairseq/fairseq/models/__init__.py", line 96, in build_model
return model.build_model(cfg, task)
File "/home/fonorato/fairseqSLT/fairseq/fairseq/models/speech_to_text/convtransformer.py", line 189, in build_model
encoder = cls.build_encoder(args)
File "/home/fonorato/fairseqSLT/fairseq/fairseq/models/speech_to_text/convtransformer.py", line 161, in build_encoder
component=encoder, checkpoint=args.load_pretrained_encoder_from
File "/home/fonorato/fairseqSLT/fairseq/fairseq/checkpoint_utils.py", line 688, in load_pretrained_component_from_model
component.load_state_dict(component_state_dict, strict=True)
File "/home/fonorato/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ConvTransformerEncoder:
Missing key(s) in state_dict: "conv.0.weight", "conv.0.bias", "conv.2.weight", "conv.2.bias", "out.weight", "out.bias".
Unexpected key(s) in state_dict: "subsample.conv_layers.0.weight", "subsample.conv_layers.0.bias", "subsample.conv_layers.1.weight", "subsample.conv_layers.1.bias", "layer_norm.weight", "layer_norm.bias".
Which is the architecture size that you have used in your paper “SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation”? Is it incompatible with your s2t_transformer_s architecture whose checkpoints are downloadable from the README of the offline part?
I’ve noticed also that the link in the simultaneous README which points to the offline README is broken because you’ve recently moved the offline speech2text part from the examples to the main part of the repository.
Thank you
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (8 by maintainers)
Top GitHub Comments
Thanks for the detailed information. Did you run the unzip command on the model file we provided? It looks that the model file is successfully loaded. The error you got is mainly because we had a bug in the databin and the agent. We just update the databin and fixed some bugs in dd74992, with new instructions on how to evaluate a pretrained model. Could you please redownload the databin and follow the new instruction?
Thank you so much for the feedback and sorry for the inconvenience.
@sarapapi, sorry for the late reply, the main difference is that the subsampler is slightly different, compare https://github.com/pytorch/fairseq/blob/master/fairseq/models/speech_to_text/s2t_transformer.py#L50 vs https://github.com/pytorch/fairseq/blob/master/fairseq/models/speech_to_text/convtransformer.py#L241 You cannot load the pre-trained s2t_transformer and use it to initialize the encoder of convtransformer.py While we prepare the checkpoint, I recommend following the instructions (including ASR pre-training).