librosa.display.waveshow memory leak?
See original GitHub issueDescribe the bug
When I try to run librosa.display.waveshow
on a long wav file (hour long), 44.1kHz, and plot the waveform, it works in Colab. However, try running the same cell a few times and the memory just keeps growing until the session crashes. This is problematic because I’d like to be able to run this waveform generating function in a loop over many hour-long files.
To Reproduce
Example: Here’s a colab link (assuming you have access to 1 hour-long wav files: https://colab.research.google.com/drive/1HIXqFM4NIw6qflcS4Ss5yAMvJD_XNmta )
import librosa.display
import librosa
import matplotlib.pyplot as plt
file_path = '/content/drive/MyDrive/wav_files/denoised_hour_long_wav_file.wav'
y, sr = librosa.load(file_path, mono=False, sr=None)
orig_path ='/content/drive/MyDrive/wav_files/hour_long_wav_file.wav'
y_2, sr_2 = librosa.load(orig_path, mono=False, sr=None)
fig = plt.figure(figsize=(20,4), dpi=100)
for i in range(10):
#Running this in a loop to show how it will crash after roughly 3-5 times of doing this.
librosa.display.waveshow(y_2, sr=sr_2, alpha=0.4)
librosa.display.waveshow(y, sr=sr, color='r', alpha=0.5)
fig.tight_layout()
fig.show()
Try running this code in a cell multiple times. If you isolate just these 2 lines:
librosa.display.waveshow(y_2, sr=sr_2, alpha=0.4)
librosa.display.waveshow(y, sr=sr, color='r', alpha=0.5)
And run those in their own cell multiple times (or in a while loop, or for loop if you have an hour long file), you’ll notice the memory growing until it crashes and restarts the session.
Expected behavior I generate waveforms without the memory filling up and crashing Colab.
Software versions*
Please run the following Python code snippet and paste the output below.
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import librosa; print("librosa", librosa.__version__)
librosa.show_versions()
Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
Python 3.7.13 (default, Apr 24 2022, 01:04:09)
[GCC 7.5.0]
NumPy 1.21.6
SciPy 1.7.3
librosa 0.9.2
INSTALLED VERSIONS
------------------
python: 3.7.13 (default, Apr 24 2022, 01:04:09)
[GCC 7.5.0]
librosa: 0.9.2
audioread: 2.1.9
numpy: 1.21.6
scipy: 1.7.3
sklearn: 1.0.2
joblib: 1.1.0
decorator: 4.4.2
soundfile: 0.10.3
resampy: 0.3.1
numba: 0.51.2
pooch: v1.6.0
packaging: 21.3
numpydoc: None
sphinx: 1.8.6
sphinx_rtd_theme: None
sphinx_multiversion: None
sphinx_gallery: None
mir_eval: None
ipython: None
sphinxcontrib-svg2pdfconverter: None
pytest: 3.6.4
pytest-mpl: None
pytest-cov: None
matplotlib: 3.2.2
samplerate: None
soxr: None
contextlib2: installed, no version number available
presets: None
Issue Analytics
- State:
- Created a year ago
- Comments:9 (7 by maintainers)
Top GitHub Comments
I did some digging and it looks like every call to
librosa.display.waveshow
makes a copy of data somehow and it stick around even though it’s not in use. Given that each wav files is 600+ MB, memory usage grows really quickly. The Colab notebook probably crashes because you hit the RAM limit. I can see the memory grow and grow and grow on my laptop, but at least it doesn’t crash.There isn’t an easy fix I could find right now, but here are some suggestions to work around this issue in the short term:
Thanks, that’s a good data point. We’re actually doing a sprint right now at scipy and trying to work out if this is a librosa, matplotlib, or jupyter issue exactly.