question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fluctuating results with SRP-PHAT algorithm

See original GitHub issue

This is my experimental setup: I have an anechoic room and want to test the DOA estimation performed of two sources (these use two samples from the ARCTIC corpus) by the SRP algorithm.

import matplotlib.pyplot as plt
import numpy as np
import pyroomacoustics as pra
from scipy.io import wavfile
import sounddevice

ROOM_DIMENSIONS = np.array([15, 12]) # x, y, z
SNR = 20 # dB
DISTANCE = 0.25 # between microphones, in meters
c = 343. # m/s

def main():
    """
    The main entry point for the application.
    """
    corpus = get_speech_corpus()
    fs = corpus.samples[0].fs

    # locate 2 sources
    azimuth_ref = np.array(
        np.radians([2, 95])
    )
    distance = 3 # meters

    # ALGORITHM parameters
    nfft = 512 # FFT size
    sigma2 = 10**(-SNR / 10) / (4. * np.pi * distance)**2
    freq_bins = np.arange(5, 60)  # FFT bins to use for DOA estimation
    freq_range = [500, 3500] # frequency range to use for DOA estimation

    # Room without reflections and absorptions
    room = create_room(ROOM_DIMENSIONS, fs, sigma2=sigma2)

    # Get the first four speech samples
    samples = corpus.samples[:4]

    mic_array = pra.linear_2D_array(center=ROOM_DIMENSIONS/2, M=4, phi=0, d=DISTANCE)

    room.add_microphone_array(pra.MicrophoneArray(mic_array, room.fs))

    # Add sources with samples that are exactly one second long (from real speech samples)
    num_samples = int(fs)

    # add the sources to specified locations with respect to the array
    for i, angle in enumerate(azimuth_ref):
        source_location = ROOM_DIMENSIONS / 2 + distance * np.r_[np.cos(angle), np.sin(angle)]
        source_signal = samples[i].data[:num_samples] # trim to one second of sample
        room.add_source(source_location, signal=source_signal)

    # simulate the mic raw input signals convolved with the real impulse responses (RIRs)
    room.simulate(snr=SNR)

    # generate the STFT frames for overlapping frames
    X = np.array(
        [pra.stft(signal, nfft, nfft // 2, transform=np.fft.rfft).T for signal in room.mic_array.signals]
    )

    algo_names = ['SRP']

    # run the algorithms and plot the results
    for algo_name in algo_names:
        # the max_four parameter is necessary for FRIDA only
        doa = pra.doa.algorithms[algo_name](mic_array, fs, nfft, c=c, num_src=len(azimuth_ref) , max_four=4)

        # this call here perform localization on the frames in X
        doa.locate_sources(X)

        doa.polar_plt_dirac(azimuth_ref=azimuth_ref)
        plt.title(algo_name)

        # doa.azimuth_recon contains the reconstructed location of the source
        print(algo_name)
        print("\tSources at:", np.sort(
                np.degrees(doa.azimuth_recon)
            ), "degrees"
        )
        print("\tReal azimuth:", np.degrees(azimuth_ref), "degrees")
        #print("\tAbsolute error:", np.degrees(pra.doa.circ_dist(doa.azimuth_recon, azimuth_ref)), "degrees")

    plt.show()



def create_room(room_dimensions, fs, absorption=0.0, max_order=0, sigma2=0):
    """
    Create the simulated ShoeBox room.

    Parameters:
    -----------

    - `room_dimensions`: Dimensions of the room
    - `fs`: The sampling rate at which the RIR (of the microphones) will be generated.
    - `absorption`: Absorption of walls, reflections are multiplied with (1 - absorption) for every wall they hit.
    - `max_order`: The maximum number of reflections allowed in the ISM.

    """
    room = pra.ShoeBox(room_dimensions, fs=fs, absorption=absorption, max_order=max_order, sigma2_awgn=sigma2)

    return room

def get_speech_corpus():
    # CMU ARCTIC CORPUS
    # 16 kHz signals
    # Contains utterances of short sentences by male US speakers.
    # Here, the corpus for speaker bdl is automatically downloaded
    # if it is not available already
    corpus = pra.datasets.CMUArcticCorpus(download=True, speaker=['bdl'])

    return corpus

However, when I run this example, I get very fluctuating results. Half of these cases the SRP algorithm returns as an estimation [0, 265] ([2, 95] is the correct angle of the two sources), and the other (fine) estimation I get is [0, 95]. Since the room has no reverberation at all, I expected much better, consistent results from SRP-PHAT. Is there something I need to tweak to get better results?

See those two plots: fine_estimation bad_estimation

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
fakufakucommented, Dec 10, 2019

Hi @L0g4n , you are using a linear microphone array. A linear microphone array inherently can only resolve sources in 0 to 180 degrees. This is due to the front-back symmetry of the array: signals coming from the front cannot be distinguished from those coming from the back. There are several solutions:

  1. Force all sources to be in [0, 180], for example like this

    if angle > 180:
        angle = 360 - angle
    
  2. SRP-PHAT is a search based method. You are currently searching in the full [0, 360] interval. This way, due to small numerical differences, the maximum is sometimes on one side, sometimes on the other, leading to the fluctuations that you are observing. You can reduce the search space to [0, 180]. This way, you will also cut down computations. This can be achieved by specifying the azimuth argument in the constructor of SRP (see the doc). Note that azimuth should be provided in radians, not degrees.

  3. If you really need to be able to resolve sources in [0, 360], the only solution is to use a 2D array. This could be a circular one, or the combination of two linear arrays at different angles.

By the way, the green line in the graph you provided is the cost function from SRP. The algorithm will perform peak picking on this line. You can observe the symmetry leading to fluctuations there.

I hope this helps.

0reactions
L0g4ncommented, Dec 12, 2019

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Interpolation Methods for the SRP-Phat Algorithm - iwaenc
The SRP-PHAT algorithm can be seen as a two-step localization method. ... mance of the methods is examined and the results are presented...
Read more >
Exploiting a Geometrically Sampled Grid in the SRP-PHAT for ...
A new SRP-PHAT localization algorithm based on the GSG method is also introduced. The proposed method exploits the intersections of the ...
Read more >
Source Localization with Acoustic Sensor Arrays Using ... - NCBI
Experimental results on a realistic speech database show statistically significant ... SRP-PHAT is a widely used algorithm for speaker localization based on ...
Read more >
SRP-PHAT methods of locating simultaneous multiple talkers ...
Two new methods for locating multiple sound sources using a single segment of data from a large-aperture microphone array are presented.
Read more >
Interpolation methods for the SRPPHAT algorithm
This results in the fact that the function fitting TDE interpolation methods can not be The steered response power-phase transform (SRP-PHAT) algo- used ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found