Dotplot where sizes are proportional to p-value and the color to log2-fold change?
See original GitHub issue@fidelram, as discussed today, could we adopt pl.rank_genes_groups_dotplot
so that it reads this information from .uns['rank_genes_groups']
?
Maybe just a simple switch? Or having arguments color
and size
be a choice from a selection {pvals
, pvals_adj
, log2FC
, expression
, frac-genes-expressed
}.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:21 (18 by maintainers)
Top Results From Across the Web
Dotplot where sizes are proportional to p-value and the color ...
Maybe just a simple switch? ... Dotplot where sizes are proportional to p-value and the color to log2-fold change? #562.
Read more >ClusterProfiler dotplot mapping fold change to colour of dots
Thanks in advance for your help!! My dot plot displays fine but the colour specified by the color = "median(dataFrame_FC)" does not work....
Read more >Dot plot visualization — DotPlot • Seurat - Satija Lab
Intuitive way of visualizing how feature expression changes across different identity classes (clusters). The size of the dot encodes the percentage of ...
Read more >06 Differential expression analysis – Introduction to RNA-seq
A positive gene fold change means that the gene is upregulated in the P. ... a results table with log2 fold changes, p...
Read more >How to make the size of points on a plot proportional to p-value?
You can just bind them into a data.frame and ggplot them: df=data.frame(x,y,pValues) library(ggplot2) ggplot(data=df) + aes(x=x, y=y, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sounds great!
Re tidy: Storing things internally in tidy format also seems inefficient to me… I remember a long discussion with Philipp more than 2 years ago… 😄
Re diffxpy: If you say that diffxpy has a good solution, why should we build a new one? Can’t we just use their solution?
Completely agreed.
Re sc.extract
One of the core ideas of Scanpy (as opposed to, say, scikit learn) was to have this model of taking the burden of bookkeeping from the user as much as possible. This design messed up, in particular, the return values of
rank_genes_groups
. I would have loved to return a collection of dataframes, but I didn’t want to mess this up. Also, the return values ofpp.neighbors
orpl.paga
aren’t great.There is a trade-off between having nice APIs and return values (such as dataframes) and a transparent and efficient on-disk representation in terms of HDF5, zarr or another format. These days, I’d even consider simply pickling things, which would have saved us a lot of work; but I thought that we’d need established compression facilities, concatenation possibilities, some way to manually “look into” an on-disk object (both from R and from the command line) so that it’s maximally transparent and then the widely established, cross-language, but old-school and not entirely scalable HDF5 seemed the best. The Human Cell Atlas decided in favor of zarr meanwhile. But that’s not a drama, because Scanpy only writes “storage-friendly” values to AnnData, that is, arrays and dicts. HDF5 knows how to handle them and zarr also. If one uses xarray or dataframes, one has to think about how this gets written to disk.
That being said: it’s likely that we’ll continue to choose representations for on-disk (and in-memory) storage that aren’t convenient (rec arrays, for instance), a three-dimensional xarray and dicts.
A general solution for this problem would be the mentioned
sc.extract
API, similar tosc.plotting
(which also completely hides the complexity of the object from the user), but not for returning visualizations, but nice objects.The first function in that namespace should be
sc.ex.neighbors
, which should return an instance ofsc.Neighbors
(which can then disappear from the root API). Similarly, whensc.pp.neighbors
is called withinplace=False
, one should directly get aNeighbors
object returned.Now, we can apply this logic to every single function that doesn’t have a simple return value. Upon calling the function with
inplace=False
, you’ll get a “nice” object that is convenient to handle. If you call a functionsc.tl.function
in a pipeline withinplace=True
but later on, you’ll want this nice object, you’d callsc.ex.function
.I think DataFrames (a case like
tl.marker_gene_overlap
) should definitely be handled within AnnData and noextract
function is necessary. But the differential expression result is a prime example for such a case. I think a functionrank_genes_groups
that returns aRankGenesGroups
object, which then has.to_df()
function (e.g. the functionrank_genes_groups
from (https://github.com/theislab/scanpy/pull/619) could immediately go into that namespace. Maybe we can even borrow adiffxpy
object for that. The good thing is, we can keep the current rec arrays as they are very efficient and basic data types, which will work with hdf5 and zarr and xarray and everything else that might come in the future. And: Fidel wrote a ton of plotting functions around them already, which we don’t want to simply rewrite… We don’t have to as users won’t see the recarrays anymore…Other possible names for the API would be
sc.cast
orsc.object
(sc.ob
), less conflicting withsc.external
. I thinksc.ob
makes sense as it really makes clear that Scanpy’s main API is for writing convenient scripts for compute-heavy stuff in a functional way. If one wants to transition to more light-weight “post-analysis”, one can transition to objects that are designed for specific tasks.PS: I’d love to move away from the name
rank_genes_groups
at some point, and simply have something likedifftest
orDiffTest
… I always thought that we might have differential expression tests for longitudinal data at some point (like Monocle), otherwise the function would berank_genes
but I don’t think this is gonna happen soon, and if, it will be in theexternal
API… A minimal difftest API should though continue be in the core of Scanpy, with at its heart, a scalable Wilcoxon rank (much more scalable than scipy’s or diffxpy’s), the t test and the scikit learn logreg approach.diffxpy
with it’s tensorflow dependency can then handle very complex cases…OK, we have those alternatives:
sc.get
I think
sc.get
is the best option here!