The output wav file of the speech enhancement task is distorted.
See original GitHub issueThank you for all your help.
I tried the vctk_noisy voice enhancement task in egs2.
However, the output wav file (e.g.
espnet/egs2/vctk_noisy/enh1/exp/enh_train_raw/enhanced_tt_2spk/logdir/output.1/wavs/1/p232_001.wav
) is very distorted.
The loss is decreasing, so it seems to be learning.
Is there something wrong with my settings? I would like to know why the audio is distorted and what to do about it.
Issue Analytics
- State:
- Created a year ago
- Comments:12
Top Results From Across the Web
Enhancement of Noisy Speech with Low Speech Distortion ...
Abstract. A speech enhancement method based on probabilistic geometric approach to spectral subtraction (PGA) performed on.
Read more >What is Audio-to-Audio? - Hugging Face
Audio-to-Audio is a family of tasks in which the input is an audio and the output is one or multiple generated audios.
Read more >Reasons why current speech-enhancement algorithms do not ...
Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others.
Read more >Speech enhancement by LSTM-based noise suppression ...
Speech enhancement is the task of removing interferences from a degraded speech signal and thereby improving the perceived quality and ...
Read more >Sound Capture and Speech Enhancement for ... - Microsoft
The talk will discuss both classical approaches using statistical signal processing and deep learning using neural networks. It will be ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I see. You could check out
espnet2/bin/enh_inference.py
to understand how it works. Basically, it rescales the estimated audio to be in range [-1.0, 1.0]. If your model is trained with some scale-invariant loss, it is likely the model output will have a large magnitude, and normalization is needed in this case.I think it is too small. If possible, please increase the amount to at least tens of hours.