Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The output wav file of the speech enhancement task is distorted.

See original GitHub issue

Thank you for all your help. I tried the vctk_noisy voice enhancement task in egs2. However, the output wav file (e.g. espnet/egs2/vctk_noisy/enh1/exp/enh_train_raw/enhanced_tt_2spk/logdir/output.1/wavs/1/p232_001.wav) is very distorted.

The loss is decreasing, so it seems to be learning.

Is there something wrong with my settings? I would like to know why the audio is distorted and what to do about it.

Issue Analytics

State:
Created a year ago
Comments:12

Top GitHub Comments

1reaction

Emrys365commented, Apr 21, 2022

@Emrys365 Thank you very much for your reply.

I have figured out why the output audio is distorted. If I change inference_args="--normalize_output_wav true" to inference_args="--normalize_output_wav false" in enh.sh, the output audio is distorted.

What does normalization mean?

I see. You could check out espnet2/bin/enh_inference.py to understand how it works. Basically, it rescales the estimated audio to be in range [-1.0, 1.0]. If your model is trained with some scale-invariant loss, it is likely the model output will have a large magnitude, and normalization is needed in this case.

1reaction

Emrys365commented, Apr 14, 2022

what is the amount of training data (say ____ hours)?

The amount of training data is about 8.5 hours.

I think it is too small. If possible, please increase the amount to at least tens of hours.

Top Results From Across the Web

Enhancement of Noisy Speech with Low Speech Distortion ...

Abstract. A speech enhancement method based on probabilistic geometric approach to spectral subtraction (PGA) performed on.

What is Audio-to-Audio? - Hugging Face

Audio-to-Audio is a family of tasks in which the input is an audio and the output is one or multiple generated audios.

Reasons why current speech-enhancement algorithms do not ...

Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others.

Speech enhancement by LSTM-based noise suppression ...

Speech enhancement is the task of removing interferences from a degraded speech signal and thereby improving the perceived quality and ...

Sound Capture and Speech Enhancement for ... - Microsoft

The talk will discuss both classical approaches using statistical signal processing and deep learning using neural networks. It will be ...