Method for grouping samples which is understood by library functions
See original GitHub issuePopulation structure analysis depends on us being able to assign labels to samples. In this thread we discussed the possibility of using individual datasets, one for each group, but grouping by a categorical variable is much more flexible and idiomatic.
We need to develop some conventions which allows the user to do some standard grouping of samples, and functions which use these groupings (Fst
, or divergence
, for example), should understand these conventions and update the output dataset accordingly.
It may also be worth thinking about how we might group the variants dimension while we’re thinking about it. For example, we might want to get the pairwise Fst values for all pairs of populations, for all chromosomes in a dataset. It would be super-nice if we had an idiomatic way of running these calculations.
@eric-czech - you had some concrete ideas on this in the earlier thread, what are your thoughts?
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (1 by maintainers)
Top GitHub Comments
We still need to implement cohort subsets (and add some docs), so I’d like to leave this open until that’s done.
Thanks @eric-czech -
cohort
is an excellent choice. It’s clearly about samples, but also quite flexible in that it just means “samples with some shared characteristic”.