High level of background: misassigned droplets
See original GitHub issueDear coders of CellBender,
First of all thank you for providing such useful tool! I actually found out that this can be easily run on a free GPU powered google colab, which is nice for those not having access to one!
While the tool works flawlessly for several of my datasets, some particular 10X runs coming from the same lab shows some issues: When looking at the log pdf, a lot of droplets from the empty droplet plateau are misassigned to cells, whereas I am rather keen to believe that they should be empty.
One particularity of these datasets is that they all share a very high amount of background (for the following example, the plateau is around 2000 UMIs!):
The log at the start of the run is the following:
cellbender remove-background --input drive/My Drive/ML10_raw_feature_bc_matrix.h5 --output ML10_150_output.h5 --cuda --expected-cells 21000 --total-droplets-included 50000 --epochs 150
cellbender:remove-background: 2020-02-12 12:44:41
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from file drive/My Drive/ML10_raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Trimming dataset for inference.
cellbender:remove-background: Prior on counts in empty droplets is 1807
cellbender:remove-background: Prior on counts for cells is 14313
cellbender:remove-background: Excluding barcodes with counts below 1445
cellbender:remove-background: Using 21000 probable cell barcodes, plus an additional 29000 barcodes, and 28217 empty droplets.
cellbender:remove-background: Running inference...
...
Prior on counts in empty droplets seems reasonable to me, or should I choose an higher one?
The output log pdf is as following:
Following the documentation, I decided to run the analysis by increasing the number of z-dims, z-layers and epochs with the following command:
cellbender remove-background --input drive/My Drive/ML10_raw_feature_bc_matrix.h5 --output ML10_highbckgrd_output.h5 --cuda --expected-cells 21000 --total-droplets-included 50000 --epochs 300 --z-dim 200 --z-layers 1000
But this did not improved anything, and actually training shows weird behaviour probably due to the too high parameters:
Am I missing something? a parameter that could influence the misassignment?
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (6 by maintainers)
Top GitHub Comments
It’s not clear if the v2 branch will help with this issue (yet), although once v2 is done, it will be a significant improvement in a number of ways.
We hope that cell calling is one of those improvements… but v2 is not complete yet. I think you may see some improvement in the current state of v2, but I am working on a few more ways to address this.
For now, what you can count on is: remove-background v1 does not leave cells out. All cells will be called cells. But it will pick up some empty droplets. This is worse in some datasets than others. Currently, the best practice is to filter those out based on other QC metrics downstream.
@sjfleming yes i saw that and we are using this now. Also we have got a collaborator who is letting me run the samples on their GPU. I have now been using this and samples run fine 😃
Thanks Devika