Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wave2vec OOM while doing inference

See original GitHub issue

❓ Questions and Help

Before asking:

search the issues. yes
search the docs. yes

What is your question?

When I’m trying to do inference on a audio of length of around 52 sec , I’m getting this error RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 326730288 bytes. Error code 12 (Cannot allocate memory) the inference is going to take almost 326.730288 MB. And when I ran free -h I’m having this much of free space.

sh-4.2$ free -h
             total       used       free     shared    buffers     cached
Mem:          7.7G       977M       6.7G         0B        90M       384M
-/+ buffers/cache:       502M       7.2G
Swap:         3.0G       290M       2.7G

Would you please help me regarding this issue. @patrickvonplaten .

code

import soundfile as sf
import librosa
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer

tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-base-960h")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")

input_audio, _ = librosa.load(filename, 
                              sr=16000)
input_values = tokenizer(input_audio, return_tensors="pt").input_values
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
text = tokenizer.batch_decode(predicted_ids)[0]

sample audio file in wave format

https://github.com/abhinavsp0730/video-to-text-ap/blob/main/sample_audio_1.wav

Issue Analytics

State:
Created 3 years ago
Comments:9 (2 by maintainers)

Top GitHub Comments

3reactions

patrickvonplatencommented, Mar 16, 2021

Hey @olafthiele - make sure wrap your code into a with torch.no_grad(): to same memory. This snippet should work:

import soundfile as sf
import librosa
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer

tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-base-960h")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")

input_audio, _ = librosa.load(filename, 
                              sr=16000)
input_values = tokenizer(input_audio, return_tensors="pt").input_values

with torch.no_grad():
    logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)
text = tokenizer.batch_decode(predicted_ids)[0]

0reactions

abhinavsp0730commented, Mar 19, 2021

Thanks, @patrickvonplaten for the help I’m closing this issue as it has been resolved.