Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Really slow Wav2Vec 2 pretraining

See original GitHub issue

What is your question?

I’m trying to pretrain wav2vec2 base model on my own dataset and it is really slow. I want to speed it up. My dataset contains about 100 hours of speech. It’s stored in data directory. All of them are single channel 16khz wav files. This

python examples/wav2vec/wav2vec_manifest.py /path/to/waves --dest /manifest/path --ext $ext --valid-percent $valid

runs fine and gives correct manifest. I expected reasonable slow down compared to original set up of 64 Tesla V100 but not thousand times slower. Currently one epoch takes 5 minutes and if I want to do 400000 updates it’d take almost 4 years!

Code

I don’t code anything but here’s comand i’m using for training.

fairseq-train --distributed-world-size 6 ./manifest/ --save-dir ./res --fp16 --num-workers 16 --task audio_pretraining 
--criterion wav2vec --arch wav2vec2 --log-keys '["prob_perplexity","code_perplexity","temp"]'
--quantize-targets --extractor-mode default --conv-feature-layers '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] * 2' 
--final-dim 256 --latent-vars 320 --latent-groups 2 --latent-temp '(2,0.5,0.999995)' --infonce --optimizer adam 
--adam-betas '(0.9,0.98)' --adam-eps 1e-06 --lr-scheduler polynomial_decay --total-num-update 10000 --lr 0.0005 
--warmup-updates 800 --mask-length 10 --mask-prob 0.65 --mask-selection static --mask-other 0 
--encoder-layerdrop 0.05 --dropout-input 0.1 --dropout-features 0.1 --feature-grad-mult 0.1 
--loss-weights '[0.1, 10]' --conv-pos 128 --conv-pos-groups 16 --num-negatives 100 
--cross-sample-negatives 0 --max-sample-size 250000 --min-sample-size 32000 --dropout 0.1 
--attention-dropout 0.1 --weight-decay 0.01 --max-tokens 1400000 --max-update 10000 
--skip-invalid-size-inputs-valid-test --ddp-backend no_c10d --update-freq 64/6 --save-interval 5000

What have you tried?

I tried installing apex as it should speed up fairseq. It gave a 2x boost but it’s still years to go.

What’s your environment?

fairseq Version (e.g., 1.0 or master): 0.10.1
PyTorch Version (e.g., 1.0) 1.8.0a0+1606899
OS (e.g., Linux): Ubuntu 20.04.1 LTS (Focal Fossa)
How you installed fairseq (pip, source): pip install fairseq==0.10.1
Build command you used (if compiling from source): None
Python version: Python 3.8.5
CUDA/cuDNN version: CUDA 11.1.105
GPU models and configuration: six RTX 3090
Any other relevant information: I also tested on VM with two rtx 2080 ti

Issue Analytics

State:
Created 3 years ago
Comments:8

Top GitHub Comments

1reaction

jubick1337commented, Mar 19, 2021

Yep @marma @apoca909 I got it but it is still really slow process. I expermiented a lot with different setups an yet to archive training speed that was indicated in original paper.

0reactions

stale[bot]commented, Apr 18, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!