How to fine-tune wav2vec 2.0 with TIMIT
See original GitHub issue❓ Questions and Help
What is your question?
Hello.
this paper says that wav2vec 2.0 works well as for phoneme recognition task (with TIMIT dataset), but some important information is missing in the README.md to do it myself. I’m at a standstill. Did anyone complete this task?
What’s your environment?
- fairseq Version: master
- PyTorch Version: 1.6
- OS: Linux (Debian)
- How you installed fairseq: based on the README.md
- Python version: 3.7
- CUDA/cuDNN version: Tesla T4 (CUDA 11.0)
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 77C P0 49W / 70W | 11914MiB / 15109MiB | 95% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:38 (11 by maintainers)
Top Results From Across the Web
How to fine-tune wav2vec 2.0 with TIMIT · Issue #2922 - GitHub
this paper says that wav2vec 2.0 works well as for phoneme recognition task (with TIMIT dataset), but some important information is missing in ......
Read more >Fine-Tune Wav2Vec2 for English ASR with Transformers
This makes sense taking into account that Timit is a read speech corpus. We can see that the transcriptions contain some special characters, ......
Read more >Fine-tuning Wav2Vec for Speech Recognition with Lightning ...
To fine-tune our first Wav2Vec model, we will be using the TIMIT Acoustic-Phonetic Continuous Speech Corpus, a dataset curated with labeled ...
Read more >Wave2Vec2.0 fine-tuning english - Kaggle
The transcriptions look very clean and the language seems to correspond more to written text than dialogue. This makes sense taking into account...
Read more >Fine-tuning Wav2Vec2 for English ASR - Google Colab
In this notebook, we will give an in-detail explanation of how Wav2Vec2's pretrained checkpoints can be fine-tuned on any English ASR dataset.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@noetits - Yes, I didn’t have good PER results before. Now, I was able to re-create the phoneme recognition results on TIMIT with good PER values of 8.6 PER, which is very close to the values in the research paper. See this thread: https://github.com/pytorch/fairseq/issues/3425#issuecomment-813406954
@dzubke Thank you for your helping. As you mentioned, I’m trying to do recognize phoneme (WIP here), not alphabet.
~And I have trouble below now.~ resolved https://github.com/huggingface/datasets/issues/2125