can't allocate memory error with wav2vec2
See original GitHub issueI am trying out the wav2vec2 model for ASR from the huggingface library. Here, I am passing a 7 min(~15 MB file) long wav file having a conversation(english) to the wav2vec2 model. I am getting “can’t allocate memory” error. I found that the model uses all 64 GB of the available RAM. Can anyone help with this.
transformers
version: 4.3.2- Platform: Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.3
- PyTorch version (GPU?): 1.7.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: (NA)
- Using distributed or parallel set-up in script?: (NA)
Code
import os
import librosa
import soundfile as sf
from pydub import AudioSegment
def convert_audio_segment(fp, upload_dir_path):
"""Convert audio file"""
USER_UPLOAD_DIR = upload_dir_path
formats_to_convert = ['.m4a']
dirpath = os.path.abspath(USER_UPLOAD_DIR)
if fp.endswith(tuple(formats_to_convert)):
(path, file_extension) = os.path.splitext(fp)
file_extension_final = file_extension.replace('.', '')
file_handle = ''
try:
track = AudioSegment.from_file(fp,
file_extension_final)
print("track", track)
wav_path = fp.replace(file_extension_final, 'wav')
file_handle = track.export(wav_path, format='wav')
except Exception:
print("ERROR CONVERTING " + str(fp))
return file_handle
else:
print("No file format conversion required " + str(fp))
return fp
def load_wav2vec_100h_model():
tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-base-100h")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-100h")
return tokenizer, model
def correct_sentence(input_text):
sentences = nltk.sent_tokenize(input_text)
return (' '.join([s.replace(s[0],s[0].capitalize(),1) for s in sentences]))
def asr_transcript(tokenizer, model, input_file):
speech, fs = sf.read(input_file)
if len(speech.shape) > 1:
speech = speech[:,0] + speech[:,1]
if fs != 16000:
speech = librosa.resample(speech, fs, 16000)
input_values = tokenizer(speech, return_tensors="pt").input_values
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = tokenizer.decode(predicted_ids[0])
return correct_sentence(transcription.lower())
if __name__ == "__main__":
tokenizer_100h, model_100h = load_wav2vec_100h_model()
wav_input = 'Recording_biweu.wav'
fp = wav_input
processed_file = convert_audio_segment(str(fp), str(data_dir))
text = asr_transcript(tokenizer_100h,model_100h,processed_file)
print(text)
I am adding more details about my wav file here
General
Complete name : Recording_biweu.wav
Format : Wave
File size : 13.8 MiB
Duration : 7 min 30 s
Overall bit rate mode : Constant
Overall bit rate : 256 kb/s
Track name : Recording_biweu
Recorded date : 2021
Writing application : Lavf57.83.100
Audio
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 7 min 30 s
Bit rate mode : Constant
Bit rate : 256 kb/s
Channel(s) : 1 channel
Sampling rate : 16.0 kHz
Bit depth : 16 bits
Stream size : 13.8 MiB (100%)
Error
Some weights of the model checkpoint at facebook/wav2vec2-base-100h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.mask_time_emb_vector']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
File "asr_wav2vec2.py", line 130, in <module>
text = asr_transcript(tokenizer_100h,model_100h,processed_file)
File "asr_wav2vec2.py", line 96, in asr_transcript
logits = model(input_values).logits
File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 795, in forward
outputs = self.wav2vec2(
File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 646, in forward
encoder_outputs = self.encoder(
File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 457, in forward
hidden_states, attn_weights = layer(hidden_states, output_attentions=output_attentions)
File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 392, in forward
hidden_states, attn_weights, _ = self.attention(hidden_states, output_attentions=output_attentions)
File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 286, in forward
attn_weights = torch.bmm(query_states, key_states.transpose(1, 2))
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 24373495488 bytes. Error code 12 (Cannot allocate memory)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Wav2vec2.0 memory issue - Models - Hugging Face Forums
I am training locally. I have 24gb gpu. Error is RuntimeError: CUDA out of memory. Tried to allocate 562.00 MiB (GPU 1; 23.65...
Read more >Running out of memory with pytorch - Stack Overflow
I am trying to train a model using huggingface's wav2vec for audio classification. I keep getting this error:
Read more >ffmpeg Error while filtering: Cannot allocate memory
I've recently been getting the following error: Error while filtering: Cannot allocate memory Failed to inject frame into filter network: Cannot ...
Read more >Resolving CUDA Being Out of Memory With Gradient ...
Implementing gradient accumulation and automatic mixed precision to solve CUDA out of memory issue when training big deep learning models which requires ...
Read more >transcode Cannot allocate memory - Linux - Emby Community
It feels like the requested memory exceeds the device itself. How to solve this problem thank you 07:05:15.190 [hevc @ 0x2df4000] Error ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Okay, so the issue isn’t in the number of samples as I thought previously: there seems to be a single audio stream in your recording.
However, the issue here is that it’s a 7 minutes and 30 seconds long recording, which really is very very long. I talked about it with @patrickvonplaten, and he mentions that Wav2Vec2 was trained on ~40 seconds of recording maximum. What one could do here is split the recording in 30 seconds chunks. You’re using
librosa
and you can do that easily withlibrosa.stream
.Here for example your method to retrieve the transcript is the following:
I’ve updated it to the following (please note that it’s the first time I’ve used
librosa
myself so the parameters I put for the stream values may be wrong):With this I seem to obtain sensible results! This could probably be improved in the following ways:
librosa.stream
are correct. Changing these seem to have a very big impact on the transcript.This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.