question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Real-time onsets/chroma with pyaudio and librosa

See original GitHub issue

I haven’t really found anything regarding this, besides the PCEN Streaming example, which uses librosa.stream.

I’d like to extract the onsets and chroma information from an audio stream, but i’m getting quite confused whether i need to implement an overlap-strategy on my own or if this is already handled. Also not quite sure, what frame hop size to chose, as the calls are only working on the chunk of audio.

From what i’ve gathered so far:

import pyaudio
import librosa
import time
import numpy as np

CHUNK = 2048
FORMAT = pyaudio.paInt16
CHANNELS = 1
SHORT_NORMALIZE = (1.0 / 32768.0)
DVC_IDX = 2

N_FFT = 1024
HOP_LENGTH = int(N_FFT / 2)

p = pyaudio.PyAudio()
d_info = p.get_device_info_by_index(DVC_IDX)
SAMPLE_RATE = int(d_info['defaultSampleRate'])


def callback(input_data, frame_count, time_info, flags):
    buffer = np.frombuffer(input_data, dtype=np.int16)
    buffer = buffer * SHORT_NORMALIZE

    onsets = librosa.onset.onset_strength(y=buffer, sr=SAMPLE_RATE, lag=1, center=False)
    chroma = librosa.feature.chroma_stft(y=buffer, sr=SAMPLE_RATE, center=False, n_fft=N_FFT , hop_length=HOP_LENGTH )

    return input_data, pyaudio.paContinue


stream = p.open(
    format=FORMAT,
    channels=CHANNELS,
    rate=SAMPLE_RATE,
    input=True,
    frames_per_buffer=CHUNK,
    input_device_index=2,
    stream_callback=callback
)

stream.start_stream()

# keep main thread alive
while stream.is_active():
    time.sleep(0.1)

stream.stop_stream()
stream.close()

This will return onsets (5, ) and chroma (12, 3) for every chunk of audio.

When looking at the onsets:

[0.         1.24526488 0.99135636 0.70424905 0.75233263]
[0.         2.03855642 3.30533546 1.6733954  0.57786365]

the first value is always zero. Probably because there is no previous frame to compare against, how would i solve this?

Also, are the 3 chromagram calculations enough to be able to estimate the pitch correctly?

I’d appreciate any input on this.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
bmcfeecommented, Dec 20, 2021

This is probably a better question for the discussion forum, but I’ll try to answer briefly here.

I’d like to extract the onsets and chroma information from an audio stream, but i’m getting quite confused whether i need to implement an overlap-strategy on my own or if this is already handled.

Since you’re not using librosa.stream (which only works on soundfile objects and not pyaudio, at least for now), you’ll have to manage buffer overlap yourself. Really, all that stream does on top of the soundfile blocks interface is provide another layer of buffering up from frames so that you can coherently handle frame overlap across blocks. This is explained in our blog post on the topic: https://librosa.org/blog/2019/07/29/stream-processing/#stream-processing

Also not quite sure, what frame hop size to chose, as the calls are only working on the chunk of audio. … the first value is always zero. Probably because there is no previous frame to compare against, how would i solve this?

If you’re doing onset detection, you’ll need at least two frames worth of data to work with. You might also want to have a rolling buffer so that the last buffer can be used to detect onsets in the current buffer. Hop length is up to you; it won’t matter for chroma, but it definitely will for onsets. Note: your example forgot to include hop-length in the call to onset detection, which is why you have 5 frames there instead of 3.

Also, are the 3 chromagram calculations enough to be able to estimate the pitch correctly?

That’s a tough call. I’d guess probably not: chroma-stft is not terribly accurate to begin with, and it’s particularly noisy when there are transients/discontinuities involved. It’s impossible to say for sure without knowing the sampling rate here (which isn’t included in your example), but think about the time extent that 3 frames of audio covers for your configuration, and how that might relate to expected note duration. (Also, number of periods for the frequencies involved!)

You might also want to do some kind of moving average to smooth the chroma over time, eg:

chroma = 0.5 * chroma + 0.5 * prev_chroma

(or pick your favorite balance between current and previous). This will induce a bit of latency, but ought to stabilize things over time without much overhead.

1reaction
clbreccommented, Jan 14, 2022

Sorry that I haven’t gotten back to you, didn’t have too much time to look into this again. But your detailed answers certainly helped me to understand how librosa can be used for real-time application and what the pitfalls are that need to be taken into account! 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python Librosa with Microphone input - audio - Stack Overflow
v=AShHJdSIxkY ] to generate real-time sinwave from microphone input using pyaudio and I was wondering if I can do something similar with librosa...
Read more >
Live/Real-time analysis of audio stream - Google Groups
Hey guys, I am wondering whether it is possible to analyse an audio stream rather than a static input file, say for instance...
Read more >
librosa.stream — librosa 0.10.0.dev0 documentation
Stream audio in fixed-length buffers. This is primarily useful for processing large files that won't fit entirely in memory at once. Instead of...
Read more >
Live data from pyaudio and librosa into resnet32
Dear people, As of now I have written a code for cough detection , which detects sound above certain value of decibels and...
Read more >
Realtime Audio Visualization in Python - SWHarden.com
Here's a simple demo to show how I get realtime microphone audio into numpy arrays using PyAudio. This isn't really that special. It's...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found