Fst function
See original GitHub issueFst is one of the fundamental building blocks of population structure analysis. We will want to compute Fst between pairs of populations, with the “population” labels designated by some properties of the input dataset (see #224).
There are a number of different estimators for Fst (scikit-allel implements 3), so we should provide a method to specify the estimator for the statistic as a parameter. I suggest something like the following:
def Fst(ds, *, estimator=None, **kwargs):
if estimator = None:
estimator = "hudson"
estimator_map = {
"hudson": hudson_Fst,
"weir_cockerham": wc_Fst,
"patterson": patterson_Fst
}
return estimator_map[estimator](ds, **kwargs)
These correspond to the three definitions in scikit-allele. We may not want all three initially, and just implementing the Hudson estimator may be sufficient. We can test our implementations by comparing with scikit-allele and tskit
(ps. I prefer to use None
as the default value for estimator
, as there may be situations in the future where we might prefer to have a different default depending on properties of the dataset. If we leave estimator="hudson"
in the signature, then there’s no way to tell if the user just wants the default or has specifically asked for “hudson”. In general, unless we’re totally sure that the default is never going to change, I think it’s better to use None
as the default value in the signature.)
Issue Analytics
- State:
- Created 3 years ago
- Comments:9
Top GitHub Comments
I’d like to add the estimator described by Harris and DeGiorgio (2016) to the list at some point. Their estimator is based on Hudson’s but attempts to correct for related and inbred individuals using kinship.
Closing this, since we have Fst now, from #100 and #292