question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The output wav file of the speech enhancement task is distorted.

See original GitHub issue

Thank you for all your help. I tried the vctk_noisy voice enhancement task in egs2. However, the output wav file (e.g. espnet/egs2/vctk_noisy/enh1/exp/enh_train_raw/enhanced_tt_2spk/logdir/output.1/wavs/1/p232_001.wav) is very distorted.

p232_001

The loss is decreasing, so it seems to be learning.

Is there something wrong with my settings? I would like to know why the audio is distorted and what to do about it.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12

github_iconTop GitHub Comments

1reaction
Emrys365commented, Apr 21, 2022

@Emrys365 Thank you very much for your reply.

I have figured out why the output audio is distorted. If I change inference_args="--normalize_output_wav true" to inference_args="--normalize_output_wav false" in enh.sh, the output audio is distorted.

What does normalization mean?

I see. You could check out espnet2/bin/enh_inference.py to understand how it works. Basically, it rescales the estimated audio to be in range [-1.0, 1.0]. If your model is trained with some scale-invariant loss, it is likely the model output will have a large magnitude, and normalization is needed in this case.

1reaction
Emrys365commented, Apr 14, 2022

what is the amount of training data (say ____ hours)?

The amount of training data is about 8.5 hours.

I think it is too small. If possible, please increase the amount to at least tens of hours.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Enhancement of Noisy Speech with Low Speech Distortion ...
Abstract. A speech enhancement method based on probabilistic geometric approach to spectral subtraction (PGA) performed on.
Read more >
What is Audio-to-Audio? - Hugging Face
Audio-to-Audio is a family of tasks in which the input is an audio and the output is one or multiple generated audios.
Read more >
Reasons why current speech-enhancement algorithms do not ...
Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others.
Read more >
Speech enhancement by LSTM-based noise suppression ...
Speech enhancement is the task of removing interferences from a degraded speech signal and thereby improving the perceived quality and ...
Read more >
Sound Capture and Speech Enhancement for ... - Microsoft
The talk will discuss both classical approaches using statistical signal processing and deep learning using neural networks. It will be ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found