Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

🚀 Feature Request: Loading audio data from BytesIO or memory

See original GitHub issue

🚀 Feature

The load API does not support loading audio bytes from the memory. It would a great addition to be able to load file like object, e.g. BytesIO. This is would be similar to SoundFile’s read function (https://github.com/bastibe/SoundFile/blob/master/soundfile.py#L170)

Motivation

This addition will support a use case for reading audio as blobs directly from DB instead writing the files locally first.

Pitch

Without this feature, torchaudio.load is not useful for users who load files from DB and would love to use torchaudio for all audio operations.

Alternatives

SoundFile supports loading from bytes but currently does not support MP3 files. CommonVoice’s audio files are saved in MP3, which requires to convert to FLAC or WAV before training.

waveform, samplerate = sf.read(file=io.BytesIO(audio_bytes), dtype='float32')

Issue Analytics

State:
Created 3 years ago
Reactions:15
Comments:16 (7 by maintainers)

Top GitHub Comments

8reactions

antimoracommented, Jul 23, 2020

@mthrok and others.

I found a workaround for the memory leak that I described a comment of the bug I have reported: https://github.com/irmen/pyminiaudio/issues/19#issuecomment-663178015. The solution still uses miniaudio’s functionality but calls different function. The memory leak appears in pyminiaudio’s implementation of decode* functions, which do not release memory.

For those wishing to use pyminiaudio’s in memory MP3 decoder, here is a working code which I will be using in my Common Voice training. Note: I have reimplemented mp3_read_f32 function because of the https://github.com/irmen/pyminiaudio/issues/18 bug and it currently does not report sample_rate back to the caller.

import numpy as np
import torchaudio
import array
from pathlib import Path

import matplotlib.pyplot as plt
import torch
from miniaudio import DecodeError, ffi, lib
import resampy
import soundfile as sf

# get mp3 bytes
audio_bytes = Path('common_voice_en_20603299.mp3').read_bytes()


def mp3_read_f32(data: bytes) -> array:
    '''Reads and decodes the whole mp3 audio data. Resulting sample format is 32 bits float.'''
    config = ffi.new('drmp3_config *')
    num_frames = ffi.new('drmp3_uint64 *')
    memory = lib.drmp3_open_memory_and_read_pcm_frames_f32(data, len(data), config, num_frames, ffi.NULL)
    if not memory:
        raise DecodeError('cannot load/decode data')
    try:
        samples = array.array('f')
        buffer = ffi.buffer(memory, num_frames[0] * config.channels * 4)
        samples.frombytes(buffer)
        return samples, config.sampleRate, config.channels
    finally:
        lib.drmp3_free(memory, ffi.NULL)
        ffi.release(num_frames)


decoded_audio, sample_rate, channels = mp3_read_f32(audio_bytes)

assert channels == 1

# TODO handle channels > 1 cases

decoded_audio = np.asarray(decoded_audio)

# Resample to 16000
decoded_audio = resampy.resample(decoded_audio, sample_rate, 16000, axis=0, filter='kaiser_best')

decoded_audio = torch.FloatTensor(decoded_audio)

# Or resample with torchaudio's sinc_interpolation
# resampler = torchaudio.transforms.Resample(sample_rate, 16000)
# decoded_audio = resampler(decoded_audio)

# Scale down to [-1:1] Resampling somehow scales up.
decoded_audio /= decoded_audio.abs().max()

print('Max:', decoded_audio.max())
print('Min:', decoded_audio.min())
print('Shape:', decoded_audio.shape)
print('Dtype:', decoded_audio.dtype)

# plot to visually verify
plt.plot(decoded_audio, linewidth=1)
plt.savefig('mp3_read_f32-16000-torchaudio-normalized-kaiser_best.png')

# test audio quality
sf.write(open('mp3_read_f32.wav', 'wb'), decoded_audio.numpy(), 16000)

mp3_read_f32-16000-torchaudio-normalized-kaiser_best

3reactions

antimoracommented, Jul 21, 2020

@mthrok , thank you very much for you detailed quick response. This is very helpful.

I agree with you regarding the challenges and limitations of currently used back-ends.

After you have mentioned miniaudio library, I have checked out and I can confirm it perfectly satisfies my use case. Not only I can load MP3 data from memory but I can also down-sample (from 44100 to 16000) on the fly. Also the library seems native and does not spawn a separate process like pydub.AudioSegment. Another bonus is there are no OS dependency, like ffmpeg. miniaudio uses C lib (https://miniaud.io/). I definitely recommend looking into this as a new backend.

For those who wishes to see a working code, here it is:


from pathlib import Path

import matplotlib.pyplot as plt
import torch
from miniaudio import SampleFormat, decode


# get mp3 bytes
audio_bytes = Path('common_voice_en_20603299.mp3').read_bytes()

# decode mp3 bytes, and at the same time downsample and have the output in signed 32 bit integer
decoded_audio = decode(audio_bytes, nchannels=1, sample_rate=16000, output_format=SampleFormat.SIGNED32)

# create tensor out of the audio samples
decoded_audio = torch.FloatTensor(decoded_audio.samples)

# normalize 32 integer bit audio by dividing by 2147483648 (or short hand 1 << 31)
decoded_audio /= (1 << 31)

print('Max:', decoded_audio.max())
print('Min:', decoded_audio.min())
print('Shape:', decoded_audio.shape)
print('Dtype:', decoded_audio.dtype)

# plot to visually verify
plt.plot(decoded_audio.numpy(), linewidth=1)
plt.savefig('miniaudio-16000-normalized.png')

MP3 file: common_voice_en_20603299.zip

Plot output: miniaudio-16000-normalized

Top Results From Across the Web

Failed to load audio from io.BytesIO object - Beginners

I want to do ASR with wav2vec 2.0 and the Common Voice German dataset. After loading the data, I want to prepare the...

How to send BytesIO using requests post - python

Whenever I write to the buffer, the pointer will always point to the end of the buffer and waiting for a new write....

io — Core tools for working with streams — Python 3.11.1 ...

BufferedRandom provides a buffered interface to seekable streams. Another BufferedIOBase subclass, BytesIO , is a stream of in-memory bytes. The TextIOBase ...

Releases — Panel v0.14.2

See the HoloViz blog for a visual summary of the major features added in each release. Version 0.14.2#. Date: 2022-12-14. This release primarily...

npmsearchfullcat_npm143.txt - GitHub

=andy.potanin 2013-07-23 0.0.2 require watch load fs emitter events adventize ... file size amnesia Easy memory sharing (javascript variable/json) between…