Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to fine-tune wav2vec 2.0 with TIMIT

See original GitHub issue

❓ Questions and Help

What is your question?

Hello.

this paper says that wav2vec 2.0 works well as for phoneme recognition task (with TIMIT dataset), but some important information is missing in the README.md to do it myself. I’m at a standstill. Did anyone complete this task?

What’s your environment?

fairseq Version: master
PyTorch Version: 1.6
OS: Linux (Debian)
How you installed fairseq: based on the README.md
Python version: 3.7
CUDA/cuDNN version: Tesla T4 (CUDA 11.0)

$ nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   77C    P0    49W /  70W |  11914MiB / 15109MiB |     95%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:38 (11 by maintainers)

Top GitHub Comments

2reactions

dzubkecommented, Apr 5, 2021

@noetits - Yes, I didn’t have good PER results before. Now, I was able to re-create the phoneme recognition results on TIMIT with good PER values of 8.6 PER, which is very close to the values in the research paper. See this thread: https://github.com/pytorch/fairseq/issues/3425#issuecomment-813406954

2reactions

kosuke-kitaharacommented, Mar 29, 2021

@dzubke Thank you for your helping. As you mentioned, I’m trying to do recognize phoneme (WIP here), not alphabet.

~And I have trouble below now.~ resolved https://github.com/huggingface/datasets/issues/2125