Real-time onsets/chroma with pyaudio and librosa
See original GitHub issueI haven’t really found anything regarding this, besides the PCEN Streaming
example, which uses librosa.stream
.
I’d like to extract the onsets and chroma information from an audio stream, but i’m getting quite confused whether i need to implement an overlap-strategy on my own or if this is already handled. Also not quite sure, what frame hop size to chose, as the calls are only working on the chunk of audio.
From what i’ve gathered so far:
import pyaudio
import librosa
import time
import numpy as np
CHUNK = 2048
FORMAT = pyaudio.paInt16
CHANNELS = 1
SHORT_NORMALIZE = (1.0 / 32768.0)
DVC_IDX = 2
N_FFT = 1024
HOP_LENGTH = int(N_FFT / 2)
p = pyaudio.PyAudio()
d_info = p.get_device_info_by_index(DVC_IDX)
SAMPLE_RATE = int(d_info['defaultSampleRate'])
def callback(input_data, frame_count, time_info, flags):
buffer = np.frombuffer(input_data, dtype=np.int16)
buffer = buffer * SHORT_NORMALIZE
onsets = librosa.onset.onset_strength(y=buffer, sr=SAMPLE_RATE, lag=1, center=False)
chroma = librosa.feature.chroma_stft(y=buffer, sr=SAMPLE_RATE, center=False, n_fft=N_FFT , hop_length=HOP_LENGTH )
return input_data, pyaudio.paContinue
stream = p.open(
format=FORMAT,
channels=CHANNELS,
rate=SAMPLE_RATE,
input=True,
frames_per_buffer=CHUNK,
input_device_index=2,
stream_callback=callback
)
stream.start_stream()
# keep main thread alive
while stream.is_active():
time.sleep(0.1)
stream.stop_stream()
stream.close()
This will return onsets (5, )
and chroma (12, 3)
for every chunk of audio.
When looking at the onsets:
[0. 1.24526488 0.99135636 0.70424905 0.75233263]
[0. 2.03855642 3.30533546 1.6733954 0.57786365]
the first value is always zero. Probably because there is no previous frame to compare against, how would i solve this?
Also, are the 3 chromagram calculations enough to be able to estimate the pitch correctly?
I’d appreciate any input on this.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
This is probably a better question for the discussion forum, but I’ll try to answer briefly here.
Since you’re not using librosa.stream (which only works on soundfile objects and not pyaudio, at least for now), you’ll have to manage buffer overlap yourself. Really, all that stream does on top of the soundfile blocks interface is provide another layer of buffering up from frames so that you can coherently handle frame overlap across blocks. This is explained in our blog post on the topic: https://librosa.org/blog/2019/07/29/stream-processing/#stream-processing
If you’re doing onset detection, you’ll need at least two frames worth of data to work with. You might also want to have a rolling buffer so that the last buffer can be used to detect onsets in the current buffer. Hop length is up to you; it won’t matter for chroma, but it definitely will for onsets. Note: your example forgot to include hop-length in the call to onset detection, which is why you have 5 frames there instead of 3.
That’s a tough call. I’d guess probably not: chroma-stft is not terribly accurate to begin with, and it’s particularly noisy when there are transients/discontinuities involved. It’s impossible to say for sure without knowing the sampling rate here (which isn’t included in your example), but think about the time extent that 3 frames of audio covers for your configuration, and how that might relate to expected note duration. (Also, number of periods for the frequencies involved!)
You might also want to do some kind of moving average to smooth the chroma over time, eg:
(or pick your favorite balance between current and previous). This will induce a bit of latency, but ought to stabilize things over time without much overhead.
Sorry that I haven’t gotten back to you, didn’t have too much time to look into this again. But your detailed answers certainly helped me to understand how librosa can be used for real-time application and what the pitfalls are that need to be taken into account! 😄