Force Align text and Audio (dataset)
See original GitHub issueHi @BenAAndrew I am on the step where I am trying to align the text and audio of an audiobook. I have acquired the audio and text from amazon audible. Unfortunately, I was not able to assign the help label to this issue. I don’t think I have the permission for that.
- using virtualenv
In order to work through align.py, I had to modify it. After modifying I was able to run the file. Below is the modified part of the file. Also in the screenshot category I have mentioned how I am trying to execute this file.
import os
import sys
import json
import logging
import argparse
from pydub import AudioSegment
sys.path.append(".")
from search import FuzzySearch
from audio import DEFAULT_RATE, read_frames_from_file, vad_split
from dataset.transcribe import stt
Screenshots
Failure Point
Questions
- Do you suggest to use a virtualenv?
- Do I need to reduce the quality of wav file or the mp3 file?
Link to the dataset
I have the book.txt and the mp3 file. I have converted that mp3 to wav file when I am trying to use the align. Please let me know if you can try using my dataset. Thanks for the help in advance.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
Forced Alignment: How to match audio with a transcript via ...
“According to Wiki [1], forced alignment refers to the process by which orthographic transcriptions are aligned to audio recordings to automatically generate ...
Read more >A collection of links and notes on forced alignment tools - GitHub
Typical applications of forced alignment include Audio-eBooks, closed captioning, and automating the creation of training data for automated speech ...
Read more >Forced Alignment with Wav2Vec2 - PyTorch
This tutorial shows how to align transcript to speech with torchaudio , using ... First we import the necessary packages, and fetch data...
Read more >Forced alignment - NCSU Phonetics Lab - NC State University
Most forced alignment systems are based on the HTK Speech Recognition Toolkit. HTK stands for Hidden Markov Model Toolkit.
Read more >6 Forced Alignment - Kaldi Tutorial
6.1 Prepare alignment files ... To extract alignments for new transcripts and audio, you'll need to create new versions of the files in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@BenAAndrew I am going to close this issue for now. As you said we can come back to discuss this at a later stage.
I will create the PR for the imports. Let me know if you want me to test something regarding audio conversion.