question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RQA and Smith-Waterman

See original GitHub issue

Description

(Forking this issue off from #741 , since it’s pretty different from the initial idea there. Tagging @ctralie for input though.)

librosa does not currently implement much in the way of recurrence quantification analysis, beyond standard DTW. It would be easy to provide some basic functions for recurrence post-processing, such as the L/S/Q methods described by Serra et al., 2009, and Smith-Waterman.

The L-method simply accumulates diagonals, and resets the counter whenever there’s a gap (R[i,j] = 0).

The S-method accumulates diagonals, but allows for gaps of length 1 by considering the max over the immediate diagonal and knight’s moves away from R[i, j].

The Q method is closely related to Smith-Waterman, with the following modifications:

  • only gaps of length 1 are directly computed
  • instead of diagonal/horizontal/vertical, the gaps are diagonal and knight’s moves
  • different gap penalties are applied depending on whether an entry introduces a new gap or extends an existing one.

While Q and SW are related, I don’t see a clean way to implement both with the same underlying function. I think it would be simpler to have two independent functions for these. Q and S can be simply implemented though: as noted by Serra et al., when the gap penalties become large, the Q method simplifies to the S method (all gaps instigate a reset). Likewise, if we make knight’s moves a toggle, then it’s easy to restrict Q down to L (with large penalties and knight moves disabled).

As mentioned in the previous thread, I’ve prototyped these already with numba, and they’re pretty fast. The only remaining challenges are:

  1. supporting sparse input
  2. making a generic API to encapsulate all of the above methods
  3. generalize to accumulate link strength rather than counting steps. With binary input, this recovers the original implementation.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:14 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
bmcfeecommented, Mar 25, 2019

Nice! That phase transition between 10 and 20 is interesting; I wonder what’s all about.

You might also find jakevdp’s post on non-uniform FFT implementations interesting: https://jakevdp.github.io/blog/2015/02/24/optimizing-python-with-numpy-and-numba/ as it touches on some of these issues.

1reaction
ctraliecommented, Mar 25, 2019

Great point! Thanks for keeping me honest. It looks more like a factor of 3:

Numba_Vs_Cython

Anyway, I’m excited about two things here

  1. How simple numba is, and how much better it is than raw Python (never used it before)
  2. That there’s still a 3x factor to be had for people like me who are doing large scale experiments

I’ve been having an absolute nightmare getting cython to work cross platform in another project, so I’ll definitely look into numba

Read more comments on GitHub >

github_iconTop Results From Across the Web

Smith–Waterman algorithm - Wikipedia
The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or ...
Read more >
Teaching - Smith-Waterman - Freiburg RNA Tools
Smith and Michael S. Waterman (1981) computes optimal local alignments of two sequences. This means it identifies the two subsequences that are best...
Read more >
Searching protein sequence libraries: comparison of the ...
The similarity scores of the related and unrelated sequences were determined using either the FASTA program or the Smith-Waterman local similarity algorithm.
Read more >
Smith-Waterman Algorithm - Stanford Computer Science
The Smith-Waterman algorithm is a database search algorithm developed by T.F. Smith and M.S. Waterman, and based on an earlier model appropriately named ......
Read more >
EMBOSS Water - Pairwise Sequence Alignment - EMBL-EBI
EMBOSS Water uses the Smith-Waterman algorithm (modified for speed enhancments) to calculate the local alignment of a sequence to one or more other ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found