question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

implement mean_descendants as a node statistic

See original GitHub issue

As discussed in #271, this is a node statistic. The differences to current node stats are that (a) it normalizes by the amount that the node is actually an ancestor; and (b) it is polarised (and polarised=False will give something that is always equal to 1). I propose that we make the behavior of (a) a parameter, and implement that normalization at the python level.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
hyanwongcommented, Apr 17, 2021

Right. But there is a close link between (windowed) GNN and genomic descent. It would be helpful to give an example like this somewhere, when describing it.

1reaction
petrelharpcommented, Apr 17, 2020

I agree, that normalizing by the size of the sample set is always sensible; it’s at least easy for the end user to reverse, so we don’t need an option for it (and certainly not a different function!).

Note: we don’t have a function called compute_ancestry already - when you say “these two exist” do you mean using the definition above in this thread?

All these things are ways of summarizing “what proportion of the genomes of X are inherited from Y” and “over what proportion of the genome is any of X descended from Y”. I vote we should just provide a simple and descriptively-named method for computing these two quantities, in a way that makes it easy to compute these various downstream statistics. For instance, if we define:

def compute_ancestry(ts, sample_sets):
   n = np.array([len(x) for x in sample_sets])
   def f(x):
      return x/n
   A = ts.sample_count_stat(sample_sets, f, len(sample_sets), polarised=True, mode='node', strict=False)
   return A

def any_ancestry(ts, sample_sets):
   n = np.array([len(x) for x in sample_sets])
   def f(x):
      return 1.0 * (sum(x) > 0)
   A = ts.sample_count_stat(sample_sets, f, 1, polarised=True, mode='node', strict=False)
   return A

def genomic_descent(ts, sample_sets):
   A = ts.compute_ancestry(sample_sets)
   D = ts.any_ancestry(sample_sets)
   for k in range(A.shape[1]):
      A[:, k] /= D
   return A

So, my propsal would be to remove (or, redefine?) mean_descendants, implement the two functions that I have above named compute_ancestry and any_ancestry, and figure out more descriptive names for them. And, probably, make the normalization by any_ancestry an argument to mean_descendants.

What do you think, @awohns - is this sufficiently general? Ideas for what to call these things?

Read more comments on GitHub >

github_iconTop Results From Across the Web

D3.js node.descendants() Function - GeeksforGeeks
The node.descendants() function in d3.js library is used to generate and return an array of descendant nodes. Syntax: node.descendants();.
Read more >
Introduction to DAGs - OSF
In the above DAG, A and C are ancestors of X. decendents: Nodes that are “downstream” from a particular variable. In the above...
Read more >
Statistical measurement of trees' similarity | SpringerLink
Ancestor an ancestor is the nth degree parent of a descendant node where n > 0. As an example, the ancestor of 4th...
Read more >
A new resolution function to evaluate tree shape statistics
Given a node i of T, an ancestor node of i is a node on the unique path from i to the root...
Read more >
Stat 882: Statistical Phylogenetics – Lecture 1 Contents
clock assumption specifies that the time from each external node to the ... Note that it is also reasonable to use amino acid...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found