question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Redefine summary function to update rather than return

See original GitHub issue

While working through the changes needed to implement the AFS in C over in #248, I found myself copying and pasting a depressing amount of code, most of it doing basically the same thing as the general stats machinery.

It occured to me that if we changed the way the summary function works slightly, we should be able to reuse all the machinery when we’re computing the jAFS and be able to specify a summary function like the other stats.

Currently, we’re defining the f such that f(x) returns a 1D result vector, which we then add on to the current output window, w. That is, we do this in the naive case:

    sigma = np.zeros((ts.num_trees, m))
    for tree in ts.trees():
        x = np.zeros((ts.num_nodes, k))
        x[ts.samples()] = w
        for u in tree.nodes(order="postorder"):
            for v in tree.children(u):
                x[u] += x[v]
        if polarised:
            s = sum(tree.branch_length(u) * f(x[u]) for u in tree.nodes())
        else:
            s = sum(
                tree.branch_length(u) * (f(x[u]) + f(total - x[u]))
                for u in tree.nodes())
          sigma[tree.index] = s 

whereas we’d now do:

   sigma = np.zeros((ts.num_trees, m))
    for tree in ts.trees():
        x = np.zeros((ts.num_nodes, k))
        x[ts.samples()] = w
        for u in tree.nodes(order="postorder"):
            for v in tree.children(u):
                x[u] += x[v]
             f(tree.branch_length(u), x[u],  polarised, sigma[tree.index])

That is:

  • We change the summary function to one that directly updates the output window by incrementing it
  • We push the responsibility for dealing with polarisation down into the summary function (I guess, if you wanted to, you could have two different summary functions polarised & unpolarised for each stat). This is motivated by the AFS, where the polarised & unpolarised are pretty different and I can’t see how we could efficiently do both in a general way. (It’s not clear how this generalises to the site algorithm though, this is a bit different.)
  • We pass in a constant multiplier factor so that branch length/span multiplied in as the values are computed.

The summary function will know about the expected output dimensions (an arbitrary n-D array)

This is a lot less elegant mathematically, but should be much more convenient and efficient computationally. The goal would then be that we do the minimal number of updates to the output window to compute the stats that we want (i.e. touching memory as little as possible). We can also them put in arbitrary dimensioned arrays as the window elements, which should be general enough for most things!

It’s a big messy chunk of work to actually do this, so I don’t want to get started unless we agree it’s a good idea. We should definitely land #251 before making a start also.

What do you think @petrelharp, @molpopgen?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
jeromekellehercommented, Aug 22, 2019

I think we’ll have to push this back to 0.2.1. This isn’t going to be a user-visible part of the API much anyway (certainly not before we publish the paper), so I’m OK with potentially changing it after the initial release.

0reactions
jeromekellehercommented, Nov 19, 2019

Closing this; I think we’ve change the underlying code in such a way as it’s not relevant any more.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Change the summary function or custom calculation for a field ...
Select a field in the Values area for which you want to change the summary function of the PivotTable report. On the Analyze...
Read more >
Excel Pivot Table Summary Functions Sum Count Change
The pivot table's Sum function totals all the underlying values for each item in the field. The result is the same as using...
Read more >
Edit default summary function in R gives error for multiple ...
You can define a new function summary_adj.data.frame function using 'getS3method(summary.data.frame)' as a prototype. Note I change the z ...
Read more >
19 Functions | R for Data Science - Hadley Wickham
Functions allow you to automate common tasks in a more powerful and ... As requirements change, you only need to update code in...
Read more >
Object Oriented Programming with S3 and R6 - RPubs
This returns some new data then you apply another function, ... Rather than having to write dozens of methods for every kind of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found