suggestion: adding a more detailed data pipeline instruction on fine-tuning colab tutorial
See original GitHub issueIt’s been great to see more tutorials being added to learn speechbrain. For this old tutorial Pretrained Models and Fine-Tuning with huggface, I notice there is slight inconsistency between the loading data procedure to use pre-trained model and to fine-tune the model.
In using pre-trained model to transcribe the file, the audio is loaded using the EncoderDecoderASR
interface, which will 1) load the audio using torchaudio 2) normalize the audio using an audio normalizer.
In the fine-tuning section, the audio is loaded using dataset.add_dynamic_item(sb.dataio.dataio.read_audio, takes="file_path", provides="signal")
, and no normalizer is used within the function.
Since the decoding relies on normalized audio, wouldn’t it be more natural to fine-tune the model based on normalized audio rather than raw audio?
It might also be useful to add a section about how to use the fine-tuned model for transcribing file.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6
Top GitHub Comments
I am going to fix this today, thank you @ziz19 for pointing this out
@popcornell Thank you for much for the explanation. Github username is perfectly fine! Speechbrain is so much easier to use compared to other frameworks. Thank you all for the hard work! I’m keep using Speechbrain now and in the future.