Fluctuating results with SRP-PHAT algorithm
See original GitHub issueThis is my experimental setup: I have an anechoic room and want to test the DOA estimation performed of two sources (these use two samples from the ARCTIC corpus) by the SRP algorithm.
import matplotlib.pyplot as plt
import numpy as np
import pyroomacoustics as pra
from scipy.io import wavfile
import sounddevice
ROOM_DIMENSIONS = np.array([15, 12]) # x, y, z
SNR = 20 # dB
DISTANCE = 0.25 # between microphones, in meters
c = 343. # m/s
def main():
"""
The main entry point for the application.
"""
corpus = get_speech_corpus()
fs = corpus.samples[0].fs
# locate 2 sources
azimuth_ref = np.array(
np.radians([2, 95])
)
distance = 3 # meters
# ALGORITHM parameters
nfft = 512 # FFT size
sigma2 = 10**(-SNR / 10) / (4. * np.pi * distance)**2
freq_bins = np.arange(5, 60) # FFT bins to use for DOA estimation
freq_range = [500, 3500] # frequency range to use for DOA estimation
# Room without reflections and absorptions
room = create_room(ROOM_DIMENSIONS, fs, sigma2=sigma2)
# Get the first four speech samples
samples = corpus.samples[:4]
mic_array = pra.linear_2D_array(center=ROOM_DIMENSIONS/2, M=4, phi=0, d=DISTANCE)
room.add_microphone_array(pra.MicrophoneArray(mic_array, room.fs))
# Add sources with samples that are exactly one second long (from real speech samples)
num_samples = int(fs)
# add the sources to specified locations with respect to the array
for i, angle in enumerate(azimuth_ref):
source_location = ROOM_DIMENSIONS / 2 + distance * np.r_[np.cos(angle), np.sin(angle)]
source_signal = samples[i].data[:num_samples] # trim to one second of sample
room.add_source(source_location, signal=source_signal)
# simulate the mic raw input signals convolved with the real impulse responses (RIRs)
room.simulate(snr=SNR)
# generate the STFT frames for overlapping frames
X = np.array(
[pra.stft(signal, nfft, nfft // 2, transform=np.fft.rfft).T for signal in room.mic_array.signals]
)
algo_names = ['SRP']
# run the algorithms and plot the results
for algo_name in algo_names:
# the max_four parameter is necessary for FRIDA only
doa = pra.doa.algorithms[algo_name](mic_array, fs, nfft, c=c, num_src=len(azimuth_ref) , max_four=4)
# this call here perform localization on the frames in X
doa.locate_sources(X)
doa.polar_plt_dirac(azimuth_ref=azimuth_ref)
plt.title(algo_name)
# doa.azimuth_recon contains the reconstructed location of the source
print(algo_name)
print("\tSources at:", np.sort(
np.degrees(doa.azimuth_recon)
), "degrees"
)
print("\tReal azimuth:", np.degrees(azimuth_ref), "degrees")
#print("\tAbsolute error:", np.degrees(pra.doa.circ_dist(doa.azimuth_recon, azimuth_ref)), "degrees")
plt.show()
def create_room(room_dimensions, fs, absorption=0.0, max_order=0, sigma2=0):
"""
Create the simulated ShoeBox room.
Parameters:
-----------
- `room_dimensions`: Dimensions of the room
- `fs`: The sampling rate at which the RIR (of the microphones) will be generated.
- `absorption`: Absorption of walls, reflections are multiplied with (1 - absorption) for every wall they hit.
- `max_order`: The maximum number of reflections allowed in the ISM.
"""
room = pra.ShoeBox(room_dimensions, fs=fs, absorption=absorption, max_order=max_order, sigma2_awgn=sigma2)
return room
def get_speech_corpus():
# CMU ARCTIC CORPUS
# 16 kHz signals
# Contains utterances of short sentences by male US speakers.
# Here, the corpus for speaker bdl is automatically downloaded
# if it is not available already
corpus = pra.datasets.CMUArcticCorpus(download=True, speaker=['bdl'])
return corpus
However, when I run this example, I get very fluctuating results. Half of these cases the SRP algorithm returns as an estimation [0, 265]
([2, 95]
is the correct angle of the two sources), and the other (fine) estimation I get is [0, 95]
. Since the room has no reverberation at all, I expected much better, consistent results from SRP-PHAT. Is there something I need to tweak to get better results?
See those two plots:
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top Results From Across the Web
Interpolation Methods for the SRP-Phat Algorithm - iwaenc
The SRP-PHAT algorithm can be seen as a two-step localization method. ... mance of the methods is examined and the results are presented...
Read more >Exploiting a Geometrically Sampled Grid in the SRP-PHAT for ...
A new SRP-PHAT localization algorithm based on the GSG method is also introduced. The proposed method exploits the intersections of the ...
Read more >Source Localization with Acoustic Sensor Arrays Using ... - NCBI
Experimental results on a realistic speech database show statistically significant ... SRP-PHAT is a widely used algorithm for speaker localization based on ...
Read more >SRP-PHAT methods of locating simultaneous multiple talkers ...
Two new methods for locating multiple sound sources using a single segment of data from a large-aperture microphone array are presented.
Read more >Interpolation methods for the SRPPHAT algorithm
This results in the fact that the function fitting TDE interpolation methods can not be The steered response power-phase transform (SRP-PHAT) algo- used ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @L0g4n , you are using a linear microphone array. A linear microphone array inherently can only resolve sources in 0 to 180 degrees. This is due to the front-back symmetry of the array: signals coming from the front cannot be distinguished from those coming from the back. There are several solutions:
Force all sources to be in
[0, 180]
, for example like thisSRP-PHAT is a search based method. You are currently searching in the full
[0, 360]
interval. This way, due to small numerical differences, the maximum is sometimes on one side, sometimes on the other, leading to the fluctuations that you are observing. You can reduce the search space to[0, 180]
. This way, you will also cut down computations. This can be achieved by specifying theazimuth
argument in the constructor ofSRP
(see the doc). Note thatazimuth
should be provided in radians, not degrees.If you really need to be able to resolve sources in
[0, 360]
, the only solution is to use a 2D array. This could be a circular one, or the combination of two linear arrays at different angles.By the way, the green line in the graph you provided is the cost function from SRP. The algorithm will perform peak picking on this line. You can observe the symmetry leading to fluctuations there.
I hope this helps.
Thanks!