Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Audio-to-text alignment with trained Espnet2 asr model

See original GitHub issue

Hi, can anyone tell how one does audio-to-text alignment using Espnet2? I can see there is asr_align.py in Espnet and was curious if Espnet2 provides a similar interface. Thank you

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:22 (9 by maintainers)

Top GitHub Comments

1reaction

SaadBazazcommented, Dec 9, 2021

I am ashamed.

1reaction

kamo-naoyukicommented, Mar 8, 2021

Thanks.

In my installation, an import error occurred in the text cleaner upon importing Speech2Text on some libraries that are optional for ASR. I changed the corresponding imports to conditional and included this in a separate commit.

Text cleaner is a mandatory module for espnet. Please keep espnet2/text/cleaner.py as it is.

The output format is a regular kaldi-style segments file.

Segments style is good for the output format of the command line tool.

Sorry, I’m not sure how you are giving the sampling rate to the ctc-segmentation function.

Input data includes (data/name of) audio file, ground truth and optionally an utterance name. In espnet1, the audio data was stored in a json. In espnet2, this method has changed. For example, asr_inference uses ASRTask.build_streaming_iterator. What module / dataloader is best to use here instead?

Please use ASRTask.build_streaming_iterator as it is. I’m not sure why you asked this.

Top Results From Across the Web

espnet2.bin.asr_align — ESPnet 202211 documentation

[docs]class CTCSegmentation: """Align text to audio using CTC segmentation. Usage: Initialize with given ASR model and parameters. If needed, parameters for ...

ESPnet2 ASR model - Hugging Face

This model was trained by YushiUeda using swbd_sentiment recipe in espnet. ... ASR config: conf/tuning/train_asr_conformer_wav2vec2.yaml; token_type: word ...

IWSLT 2021 The 18th International Conference on Spoken ...

and Analyses with Sentence-Aligned Data ... Huawei Translation Services Center, China ... training of the ASR and MT components, model.

arXiv:2007.09127v2 [eess.AS] 5 Oct 2020

stage approach that uses an ASR model pre-trained with ... ment or segmentation, an utterance-wise alignment between audio and text is.

The 2020 ESPnet Update: New Features, Broadened ...

end-to-end neural ASR modeling based on these sequence to se- ... The training system of ESPnet2 is shared with all DNN tasks, ASR,....