Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Documentation: Audio + Text Feature Extraction

See original GitHub issue

The usage instructions are missing some information on the feature preprocessing step.

It would be helpful to give more exact instructions on how to extract spectrograms and phoneme features. Does the code expect phoneme .lab files from Festival, as indicated in r9y9’s deepvoice code? https://github.com/r9y9/deepvoice3_pytorch/tree/master/vctk_preprocess
is it possible to use deepvoice code to get the exact features expected by nonparaSeq2SeqVC? If so, which scripts are needed?

I’m going to try out the code now, and I’ll send PRs on documentation when I’m confident I can add something.

Thanks!

Issue Analytics

State:
Created 4 years ago
Comments:26 (21 by maintainers)

Top GitHub Comments

2reactions

jxzhangggcommented, Dec 20, 2019

See the commit ab1f8d4 for cleaning the text alignments

2reactions

jxzhangggcommented, Dec 19, 2019

Alignments are not necessary. Despite I extracted the alignments, I actually didn’t use it during training. Therefore, all you need for preparing training data is phoneme sequences. Anyway, I used “https://github.com/r9y9/deepvoice3_pytorch/blob/master/vctk_preprocess/prepare_vctk_labels.py” to get my phoneme labels. Configuring all the paths and enviroments should get the cutted phoneme labels as

0 750000 p
750000 1050000 l
1050000 3150000 iy
3150000 5600000 z
5600000 5750000 k
5750000 8450000 ao
8450000 8600000 l
8600000 9950000 s
9950000 10249999 t
10249999 12950000 eh
12950000 13200000 l
13200000 14750000 ax

Top Results From Across the Web

6.2. Feature extraction — scikit-learn 1.2.0 documentation

feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as...

Feature Extractor - Hugging Face

A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., ...

Audio Feature Extractions — PyTorch Tutorials 1.13.1+cu117 ...

Audio Feature Extractions ; transforms implements features as objects, using implementations from ; functional and ; torch.nn.Module . They can be serialized using ......

Visualizing Audio Data and Performing Feature Extraction

In this article, we will be visualizing audio data followed by extracting useful features from the audio. Data. Data is collected from Kaggle....

Audio Feature Extraction - Devopedia

To train any statistical or ML model, we need to first extract useful features from an audio signal. Audio feature extraction is a...