Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Wav2Vec2] Improve SpecAugment function by converting numpy based function to pytorch based function

See original GitHub issue

🚀 Feature request

As can be seen here: https://github.com/huggingface/transformers/blob/11655fafdd42eb56ad94e09ecd84d4dc2d1041ae/src/transformers/models/wav2vec2/modeling_wav2vec2.py#L47,

the function _compute_mask_indices (responsible for spec augment) of Wav2Vec2 is written in numpy which means that the function is not GPU compatible. The function could simply be rewritten in PyTorch, which should make training on GPU faster.

This “Good First Issue” is about converting _compute_mask_indices to PyTorch while keeping the same functionality.

Your contribution

I’m happy to guide the contributor through the PR!

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:17 (11 by maintainers)

Top GitHub Comments

2reactions

punitvaracommented, Mar 1, 2021

yes sure. Let me try. I will send PR tomorrow. Bit late to work on it for now.

1reaction

01-vyomcommented, May 3, 2021

@patrickvonplaten Made a PR.

Top Results From Across the Web

Wav2Vec2 - Hugging Face

The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech ... This function makes use of Python's multiprocessing....

Speech Recognition with Wav2Vec2 - PyTorch

First, we will create a Wav2Vec2 model that performs the feature extraction and the ... torchaudio.functional.resample() works on CUDA tensors as well.

Wav2Vec 2.0: Self-Supervised Learning for ASR

Wav2Vec 2.0: state-of-the-art model for Automatic Speech Recognition. It takes advantage from a self-supervised training and contrastive learning.

Fine-tuning XLSR-Wav2Vec2 for WOLOF ASR with | Kaggle

The main function was inspired by Remove Background/Dead Noise, ... a look at Common Voice and define a vocabulary based on the dataset's...

Robust Speech Recognition via Large-Scale Weak ... - OpenAI

tion, it is often simple or rule-based and still detectable from other unhandled aspects such as ... function (Hendrycks & Gimpel, 2016) where...