Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

File objects work with librosa.load but not librosa.stream

See original GitHub issue

Description

Loading a WAV file via tf.gfile.GFile works with librosa.load but not librosa.stream. Interestingly the same error appears regardless of if it is a bucket path or a local path.

Steps/Code to Reproduce

import librosa as lr
import tensorflow as tf

path = 'gs://my-bucket/example.wav'

# Loading the entire file works as expected.
with tf.gfile.GFile(path, 'rb') as f:
    lr.load(f, 48000)

# The streaming method crashes.
with tf.gfile.GFile(path, 'rb') as f:
    assert f.seekable() is True
    for waveforms in lr.stream(f, 128, 1024, 512):
        print(waveforms.shape)

Expected Results

No errors.

Actual Results

RuntimeError: Error opening : File contains data in an unknown format.

Versions

import platform; print(platform.platform())...
Linux-4.9.0-9-amd64-x86_64-with-debian-9.9
Python 3.5.3 (default, Sep 27 2018, 17:25:39) 
[GCC 6.3.0 20170516]
NumPy 1.17.0
SciPy 1.3.0
librosa 0.7.0

Issue Analytics

State:
Created 4 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

carlthomecommented, Aug 7, 2019

And super curiously, this will work fine

import soundfile as sf

with tf.gfile.GFile(path, 'rb') as f:
    for waveform in sf.blocks(f, blocksize=1024):
        pass

so perhaps it’s something with the overlapping blocks.

0reactions

bmcfeecommented, Aug 7, 2019

Thanks for tracking this down!

Require a sample rate to be input into librosa.stream and remove the sf.info call.

I don’t like this option, because it connotes that an automatic sampling rate conversion would be invoked. If you give the wrong sampling rate in the call, there’s no obvious recourse.

Check if path is a file object and call path.seek(0) after sf.info (hacky).

I agree that this is hacky, but I think it might be the best option.

Update PySoundFile’s sf.info function to reset the pointer or defensively copy its argument (perhaps unwanted behaviour, but cannot think of reasons why).

That’s a tricky one. Maybe it would be worth discussing with the soundfile devs?

Top Results From Across the Web

Advanced I/O Use Cases — librosa 0.10.0.dev0 documentation

Advanced I/O Use Cases¶. This section covers advanced use cases for input and output which go beyond the I/O functionality currently provided by...

librosa.load — librosa 0.10.0.dev0 documentation

Load an audio file as a floating point time series. Audio will be automatically resampled to the given rate (default sr=22050 ). To...

librosa.stream — librosa 0.10.0.dev0 documentation

Stream audio in fixed-length buffers. This is primarily useful for processing large files that won't fit entirely in memory at once. Instead of...

Streaming for large files - librosa blog

This post explains how to process large files with the new block streaming interface.

librosa.load — librosa 0.8.1 documentation

Any codec supported by soundfile or audioread will work. Any string file paths, or any object implementing Python's file interface (e.g. pathlib.Path )...