Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

raise RuntimeError("Failed to load audio from {}".format(filepath))

See original GitHub issue

System Info

i want to run

run_speech_recognition_ctc.py but i got the error when run the Single GPU CTC script. python run_speech_recognition_ctc.py \ --dataset_name="common_voice" \ --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \ --dataset_config_name="tr" \ --output_dir="./wav2vec2-common_voice-tr-demo" \ --overwrite_output_dir \ --num_train_epochs="15" \ --per_device_train_batch_size="16" \ --gradient_accumulation_steps="2" \ --learning_rate="3e-4" \ --warmup_steps="500" \ --evaluation_strategy="steps" \ --text_column_name="sentence" \ --length_column_name="input_length" \ --save_steps="400" \ --eval_steps="100" \ --layerdrop="0.0" \ --save_total_limit="3" \ --freeze_feature_encoder \ --gradient_checkpointing \ --chars_to_ignore , ? . ! - \; \: \" “ % ‘ ” � \ --fp16 \ --group_by_length \ --push_to_hub \ --do_train --do_eval

The ERROR :

raise RuntimeError("Failed to load audio from {}".format(filepath)) RuntimeError: Failed to load audio from /root/.cache/huggingface/datasets/downloads/extracted``/05be0c29807a73c9b099873d2f5975dae6d05e9f7d577458a2466ecb9a2b0c6b/cv-corpus-6.1-2020-12-11/tr/clips``/common_voice_tr_17346025.mp3

Who can help?

@patrickvonplaten @anton-l

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

i just run the steps written on example folder

Expected behavior

i just want to get the result

Issue Analytics

State:
Created a year ago
Comments:10 (5 by maintainers)

Top GitHub Comments

1reaction

albertvillanovacommented, Aug 1, 2022

Hi @mehrdad78, thanks for reporting (and thanks @LysandreJik for drawing my attention to this).

I have manually checked the TAR file, its content and specifically the MP3 file raising the error: cv-corpus-6.1-2020-12-11/ru/clips/common_voice_ru_18849051.mp3

I can load it without any problem (our Datasets library, under the hood uses torchaudio for mp3 files):

In [1]: import torchaudio

In [2]: path = "./data/common_voice/ru/cv-corpus-6.1-2020-12-11/ru/clips/common_voice_ru_18849051.mp3"

In [3]: data = torchaudio.load(path, format="mp3")

In [4]: data
Out[4]: 
(tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.6095e-04,
           3.2425e-05,  8.8751e-05]]),
 48000)

This makes me think that maybe the source of your issue is sox. This is a non-Python dependency that must be installed manually using your operating system package manager, e.g.

sudo apt-get install sox

You have the installation instruction of Datasets with support for Audio in our docs: Installation > Audio

1reaction

LysandreJikcommented, Aug 1, 2022

Have you ever encountered this error @albertvillanova @mariosasko ?

Top Results From Across the Web

torchaudio "RuntimeError: Error loading audio file: failed to ...

However, when I ran the code under "Training", it gave me the following error. RuntimeError: Error loading audio file: failed to open file ......

Source code for torchaudio.backend.sox_io_backend - PyTorch

Tensor, int]: raise RuntimeError("Failed to load audio from {}".format(filepath)) ... Args: filepath (path-like object or file-like object): Source of audio ...

Python load audio - ProgramCreek.com

Raises : AudioIOReadError: If librosa is unable to load the audio data. ... files = find_files(directory) id_reg_exp = re.compile(FILE_PATTERN) print("files ...

Source code for lhotse.audio

:param force_read_audio: Set it to ``True`` for audio files that do not have any ... or raise an error if they exceeded a...

Audio data augmentation_zuheb - Kaggle

input"]).decode("utf8")) # Any results you write to the current directory are ... librosa.core.load(file_path)[0] #, sr=16000 if len(data)>input_length: ...