Subsample by observations grouping
See original GitHub issue- Additional function parameters / changed functionality / changed defaults?
- New analysis tool: A simple analysis tool you have been using and are missing in
sc.tools
? - New plotting function: A kind of plot you would like to seein
sc.pl
? - External tools: Do you know an existing package that should go into
sc.external.*
? - Other?
Related to scanpy.pp.subsample, it would be useful to have a subsampling tool that subsamples based on the key of an observations grouping. E.g., if I have an observation key ‘MyGroup’ with possible values [‘A’, ‘B’], and there are 10,000 cells of type ‘A’ and 2,000 cells of type ‘B’ and I want only max 5,000 cells of each type, then this function would subsample 5,000 cells of type ‘A’ but retain all 2,000 cells of type ‘B’.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:10 (8 by maintainers)
Top Results From Across the Web
Subsample by observations grouping · Issue #987
E.g., if I have an observation key 'MyGroup' with possible values ['A', 'B'], and there are 10,000 cells of type 'A' and 2,000...
Read more >finding means and standard deviations for subgroups - SPH
1.9 Subgroup analyses: finding means and standard deviations for subgroups. There are (at least) three ways to do subgroup analyses in R.
Read more >Take random sample by group
I have a data frame made by almost 50,000 rows spread in 15 different IDs (every ID has thousands of observations) ...
Read more >Take random sample based on groups in R
SD parameter which selects a sample grouping data using the “by” parameter. The number of rows chosen from each group depends on the...
Read more >Solved: Random sampling in different groups
I want to do random sampling for 2 samples from each group. so in this example data, I will get 8 samples from...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’ll reopen this cause I think it’s quite relevant still and could be very straightforward to implement with sklearn resample
also, there is an entire package for subsampling strategies which is probably quite relevant: https://github.com/scikit-learn-contrib/imbalanced-learn
line here for reference: https://github.com/theislab/scanpy/blob/48cc7b38f1f31a78902a892041902cc810ddfcd3/scanpy/preprocessing/_simple.py#L857
Something like this should work. Note, this is not tested.
Hope that helps.