Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

librosa.display.waveshow memory leak?

See original GitHub issue

Describe the bug When I try to run librosa.display.waveshow on a long wav file (hour long), 44.1kHz, and plot the waveform, it works in Colab. However, try running the same cell a few times and the memory just keeps growing until the session crashes. This is problematic because I’d like to be able to run this waveform generating function in a loop over many hour-long files.

To Reproduce

Example: Here’s a colab link (assuming you have access to 1 hour-long wav files: https://colab.research.google.com/drive/1HIXqFM4NIw6qflcS4Ss5yAMvJD_XNmta )

import librosa.display
import librosa
import matplotlib.pyplot as plt

file_path = '/content/drive/MyDrive/wav_files/denoised_hour_long_wav_file.wav'
y, sr = librosa.load(file_path, mono=False, sr=None)

orig_path ='/content/drive/MyDrive/wav_files/hour_long_wav_file.wav'
y_2, sr_2 = librosa.load(orig_path, mono=False, sr=None)


fig = plt.figure(figsize=(20,4), dpi=100)
for i in range(10):
  #Running this in a loop to show how it will crash after roughly 3-5 times of doing this.
  librosa.display.waveshow(y_2, sr=sr_2, alpha=0.4)
  librosa.display.waveshow(y, sr=sr, color='r', alpha=0.5)
fig.tight_layout()
fig.show()

Try running this code in a cell multiple times. If you isolate just these 2 lines:

librosa.display.waveshow(y_2, sr=sr_2, alpha=0.4)
librosa.display.waveshow(y, sr=sr, color='r', alpha=0.5)

And run those in their own cell multiple times (or in a while loop, or for loop if you have an hour long file), you’ll notice the memory growing until it crashes and restarts the session.

Expected behavior I generate waveforms without the memory filling up and crashing Colab.

Software versions*

Please run the following Python code snippet and paste the output below.

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import librosa; print("librosa", librosa.__version__)

librosa.show_versions()

Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
Python 3.7.13 (default, Apr 24 2022, 01:04:09) 
[GCC 7.5.0]
NumPy 1.21.6
SciPy 1.7.3
librosa 0.9.2
INSTALLED VERSIONS
------------------
python: 3.7.13 (default, Apr 24 2022, 01:04:09) 
[GCC 7.5.0]

librosa: 0.9.2

audioread: 2.1.9
numpy: 1.21.6
scipy: 1.7.3
sklearn: 1.0.2
joblib: 1.1.0
decorator: 4.4.2
soundfile: 0.10.3
resampy: 0.3.1
numba: 0.51.2
pooch: v1.6.0
packaging: 21.3

numpydoc: None
sphinx: 1.8.6
sphinx_rtd_theme: None
sphinx_multiversion: None
sphinx_gallery: None
mir_eval: None
ipython: None
sphinxcontrib-svg2pdfconverter: None
pytest: 3.6.4
pytest-mpl: None
pytest-cov: None
matplotlib: 3.2.2
samplerate: None
soxr: None
contextlib2: installed, no version number available
presets: None

Issue Analytics

State:
Created a year ago
Comments:9 (7 by maintainers)

Top GitHub Comments

1reaction

achabotlcommented, Jul 16, 2022

I did some digging and it looks like every call to librosa.display.waveshow makes a copy of data somehow and it stick around even though it’s not in use. Given that each wav files is 600+ MB, memory usage grows really quickly. The Colab notebook probably crashes because you hit the RAM limit. I can see the memory grow and grow and grow on my laptop, but at least it doesn’t crash.

There isn’t an easy fix I could find right now, but here are some suggestions to work around this issue in the short term:

Downsample your signal before plotting so each plot call takes less memory.
If what you’re after is the screenshot, consider something like this:

for i in range(10):
  a1 = librosa.display.waveshow(y_2, sr=sr_2, alpha=0.4, max_points=2000)
  a2 = librosa.display.waveshow(y, sr=sr, color='r', alpha=0.5, max_points=2000)

  # save the image here

  # Then delete the AdaptiveWaveplot objects to free the memory.
  del a1, a2

1reaction

bmcfeecommented, Jul 16, 2022

Thanks, that’s a good data point. We’re actually doing a sprint right now at scipy and trying to work out if this is a librosa, matplotlib, or jupyter issue exactly.

Top Results From Across the Web

librosa.display.waveshow memory leak? issue

Describe the bug When I try to run librosa.display.waveshow on a long wav file (hour long), 44.1kHz, and plot the waveform, it works...

How to prevent memory leakage while using librosa?

This is not a memory leak issue, just that you are trying to load too much data at a time. Process the audio...

Vertical waveshow · Issue #1496 - GitHub

TLDR: it could be useful if waveshow could draw transposed waveforms, ... Vertical waveshow #1496 ... librosa.display.waveshow memory leak?

librosa.display.waveshow — librosa 0.10.0.dev0 documentation

librosa.display.waveshow¶ ... Visualize a waveform in the time domain. This function constructs a plot which adaptively switches between a raw samples-based view ...

Changelog — librosa 0.10.0.dev0 documentation

#1207 librosa.display.waveshow , adaptively visualize waveforms by amplitude envelope when zoomed out, or raw sample values when zoomed in. Brian McFee.