question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Any benefit of multiple sample_sets in allele frequency spectrum

See original GitHub issue

See also #203.

Working through the algorithms for the AFS it’s not obvious to me that there’s any real benefit perf-wise to providing multiple sample_set arguments. When we’re computing the AFS values, the only thing we’re actually sharing between the different sample sets is the parents array (the clinching argument for allowing other run concurrently is that we share some calculations across different output stats, thereby making it all more efficient).

When we allow for multiple sample sets in the AFS functions we are hit with choices about how we shape the output array. Firstly, to make the arrays rectangular, we need to make the frequency dimension equal to the size of the largest sample set, which is potentially wasteful if one sample set is much larger than the other. The second choice is which order the dimensions should go - it’s not clear to me whether either ordering is better.

Given all this, I wonder if it’s worth the trouble allowing for multiple sample sets when computing the AFS, particularly when you’d almost certainly be better off running several of these in parallel rather than the current vectorised version - the memory access patterns will be pretty nasty, I think.

Thoughts @petrelharp?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
petrelharpcommented, Jun 26, 2019

Sounds good - only one sample set is fine. Although it’s maybe worth thinking ahead to how the joint AFS will work.

0reactions
jeromekellehercommented, Aug 13, 2019

Closed in #248 and #274.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The 'other' allele frequency: applications of the site ... - The G-cat
To do this, the SFS classifies each allele into a certain category ... Left: the full site-frequency spectrum, counting how many alleles ...
Read more >
Allele frequency spectrum - Wikipedia
In population genetics, the allele frequency spectrum, sometimes called the site frequency spectrum, is the distribution of the allele frequencies of a ......
Read more >
Measurement of the human allele frequency spectrum ... - NCBI
A problem with past SNP-based studies is that they were biased by the way polymorphisms were discovered, meaning that the SNPs did not...
Read more >
Sampling strategies for frequency spectrum-based population ...
Analyses based on the allele frequency spectrum (AFS) have become increasingly popular when considering population genomic datasets, in part due ...
Read more >
Fast and accurate approximation of the joint site frequency ...
The site frequency spectrum (SFS) is a statistic that summarizes the ... For samples taken from P different populations, the SFS is a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found