question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Audio-to-text alignment with trained Espnet2 asr model

See original GitHub issue

Hi, can anyone tell how one does audio-to-text alignment using Espnet2? I can see there is asr_align.py in Espnet and was curious if Espnet2 provides a similar interface. Thank you

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:22 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
SaadBazazcommented, Dec 9, 2021

I am ashamed.

1reaction
kamo-naoyukicommented, Mar 8, 2021

Thanks.

  • In my installation, an import error occurred in the text cleaner upon importing Speech2Text on some libraries that are optional for ASR. I changed the corresponding imports to conditional and included this in a separate commit.

Text cleaner is a mandatory module for espnet. Please keep espnet2/text/cleaner.py as it is.

  • The output format is a regular kaldi-style segments file.

Segments style is good for the output format of the command line tool.

Sorry, I’m not sure how you are giving the sampling rate to the ctc-segmentation function.

  • Input data includes (data/name of) audio file, ground truth and optionally an utterance name. In espnet1, the audio data was stored in a json. In espnet2, this method has changed. For example, asr_inference uses ASRTask.build_streaming_iterator. What module / dataloader is best to use here instead?

Please use ASRTask.build_streaming_iterator as it is. I’m not sure why you asked this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

espnet2.bin.asr_align — ESPnet 202211 documentation
[docs]class CTCSegmentation: """Align text to audio using CTC segmentation. Usage: Initialize with given ASR model and parameters. If needed, parameters for ...
Read more >
ESPnet2 ASR model - Hugging Face
This model was trained by YushiUeda using swbd_sentiment recipe in espnet. ... ASR config: conf/tuning/train_asr_conformer_wav2vec2.yaml; token_type: word ...
Read more >
IWSLT 2021 The 18th International Conference on Spoken ...
and Analyses with Sentence-Aligned Data ... Huawei Translation Services Center, China ... training of the ASR and MT components, model.
Read more >
arXiv:2007.09127v2 [eess.AS] 5 Oct 2020
stage approach that uses an ASR model pre-trained with ... ment or segmentation, an utterance-wise alignment between audio and text is.
Read more >
The 2020 ESPnet Update: New Features, Broadened ...
end-to-end neural ASR modeling based on these sequence to se- ... The training system of ESPnet2 is shared with all DNN tasks, ASR,....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found