Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RQA and Smith-Waterman

See original GitHub issue

Description

(Forking this issue off from #741 , since it’s pretty different from the initial idea there. Tagging @ctralie for input though.)

librosa does not currently implement much in the way of recurrence quantification analysis, beyond standard DTW. It would be easy to provide some basic functions for recurrence post-processing, such as the L/S/Q methods described by Serra et al., 2009, and Smith-Waterman.

The L-method simply accumulates diagonals, and resets the counter whenever there’s a gap (R[i,j] = 0).

The S-method accumulates diagonals, but allows for gaps of length 1 by considering the max over the immediate diagonal and knight’s moves away from R[i, j].

The Q method is closely related to Smith-Waterman, with the following modifications:

only gaps of length 1 are directly computed
instead of diagonal/horizontal/vertical, the gaps are diagonal and knight’s moves
different gap penalties are applied depending on whether an entry introduces a new gap or extends an existing one.

While Q and SW are related, I don’t see a clean way to implement both with the same underlying function. I think it would be simpler to have two independent functions for these. Q and S can be simply implemented though: as noted by Serra et al., when the gap penalties become large, the Q method simplifies to the S method (all gaps instigate a reset). Likewise, if we make knight’s moves a toggle, then it’s easy to restrict Q down to L (with large penalties and knight moves disabled).

As mentioned in the previous thread, I’ve prototyped these already with numba, and they’re pretty fast. The only remaining challenges are:

supporting sparse input
making a generic API to encapsulate all of the above methods
generalize to accumulate link strength rather than counting steps. With binary input, this recovers the original implementation.

Issue Analytics

State:
Created 5 years ago
Comments:14 (9 by maintainers)

Top GitHub Comments

1reaction

bmcfeecommented, Mar 25, 2019

Nice! That phase transition between 10 and 20 is interesting; I wonder what’s all about.

You might also find jakevdp’s post on non-uniform FFT implementations interesting: https://jakevdp.github.io/blog/2015/02/24/optimizing-python-with-numpy-and-numba/ as it touches on some of these issues.

1reaction

ctraliecommented, Mar 25, 2019

Great point! Thanks for keeping me honest. It looks more like a factor of 3:

Numba_Vs_Cython

Anyway, I’m excited about two things here

How simple numba is, and how much better it is than raw Python (never used it before)
That there’s still a 3x factor to be had for people like me who are doing large scale experiments

I’ve been having an absolute nightmare getting cython to work cross platform in another project, so I’ll definitely look into numba

Top Results From Across the Web

Smith–Waterman algorithm - Wikipedia

The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or ...

Teaching - Smith-Waterman - Freiburg RNA Tools

Smith and Michael S. Waterman (1981) computes optimal local alignments of two sequences. This means it identifies the two subsequences that are best...

Searching protein sequence libraries: comparison of the ...

The similarity scores of the related and unrelated sequences were determined using either the FASTA program or the Smith-Waterman local similarity algorithm.

Smith-Waterman Algorithm - Stanford Computer Science

The Smith-Waterman algorithm is a database search algorithm developed by T.F. Smith and M.S. Waterman, and based on an earlier model appropriately named ......

EMBOSS Water - Pairwise Sequence Alignment - EMBL-EBI

EMBOSS Water uses the Smith-Waterman algorithm (modified for speed enhancments) to calculate the local alignment of a sequence to one or more other ......