question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

raise RuntimeError("Failed to load audio from {}".format(filepath))

See original GitHub issue

System Info

i want to run

run_speech_recognition_ctc.py but i got the error when run the Single GPU CTC script. python run_speech_recognition_ctc.py \ --dataset_name="common_voice" \ --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \ --dataset_config_name="tr" \ --output_dir="./wav2vec2-common_voice-tr-demo" \ --overwrite_output_dir \ --num_train_epochs="15" \ --per_device_train_batch_size="16" \ --gradient_accumulation_steps="2" \ --learning_rate="3e-4" \ --warmup_steps="500" \ --evaluation_strategy="steps" \ --text_column_name="sentence" \ --length_column_name="input_length" \ --save_steps="400" \ --eval_steps="100" \ --layerdrop="0.0" \ --save_total_limit="3" \ --freeze_feature_encoder \ --gradient_checkpointing \ --chars_to_ignore , ? . ! - \; \: \" “ % ‘ ” � \ --fp16 \ --group_by_length \ --push_to_hub \ --do_train --do_eval

The ERROR :

raise RuntimeError("Failed to load audio from {}".format(filepath)) RuntimeError: Failed to load audio from /root/.cache/huggingface/datasets/downloads/extracted``/05be0c29807a73c9b099873d2f5975dae6d05e9f7d577458a2466ecb9a2b0c6b/cv-corpus-6.1-2020-12-11/tr/clips``/common_voice_tr_17346025.mp3

Who can help?

@patrickvonplaten @anton-l

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

i just run the steps written on example folder

Expected behavior

i just want to get the result

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
albertvillanovacommented, Aug 1, 2022

Hi @mehrdad78, thanks for reporting (and thanks @LysandreJik for drawing my attention to this).

I have manually checked the TAR file, its content and specifically the MP3 file raising the error: cv-corpus-6.1-2020-12-11/ru/clips/common_voice_ru_18849051.mp3

I can load it without any problem (our Datasets library, under the hood uses torchaudio for mp3 files):

In [1]: import torchaudio

In [2]: path = "./data/common_voice/ru/cv-corpus-6.1-2020-12-11/ru/clips/common_voice_ru_18849051.mp3"

In [3]: data = torchaudio.load(path, format="mp3")

In [4]: data
Out[4]: 
(tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.6095e-04,
           3.2425e-05,  8.8751e-05]]),
 48000)

This makes me think that maybe the source of your issue is sox. This is a non-Python dependency that must be installed manually using your operating system package manager, e.g.

sudo apt-get install sox

You have the installation instruction of Datasets with support for Audio in our docs: Installation > Audio

1reaction
LysandreJikcommented, Aug 1, 2022

Have you ever encountered this error @albertvillanova @mariosasko ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

torchaudio "RuntimeError: Error loading audio file: failed to ...
However, when I ran the code under "Training", it gave me the following error. RuntimeError: Error loading audio file: failed to open file ......
Read more >
Source code for torchaudio.backend.sox_io_backend - PyTorch
Tensor, int]: raise RuntimeError("Failed to load audio from {}".format(filepath)) ... Args: filepath (path-like object or file-like object): Source of audio ...
Read more >
Python load audio - ProgramCreek.com
Raises : AudioIOReadError: If librosa is unable to load the audio data. ... files = find_files(directory) id_reg_exp = re.compile(FILE_PATTERN) print("files ...
Read more >
Source code for lhotse.audio
:param force_read_audio: Set it to ``True`` for audio files that do not have any ... or raise an error if they exceeded a...
Read more >
Audio data augmentation_zuheb - Kaggle
input"]).decode("utf8")) # Any results you write to the current directory are ... librosa.core.load(file_path)[0] #, sr=16000 if len(data)>input_length: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found