Normalization fun with vmin/vmax
See original GitHub issueDear people,
In our plotting functions, vmin and vmax are great arguments, we all love them. First of all, let’s all hold our hands, close our eyes and thank matplotlib developers for implementing them.
But when we jointly plot some continuous variables that are not on the exact same scale (e.g. some genes), it’s not possible to specify a single vmin/vmax value that fits all variables especially if they have some outliers. This deeply saddens us and forces us to watch a few more episodes of Stranger Things and it certainly doesn’t help 😦
I would like to hear your thoughts about how to fix that. But before that, as a responsible person, I did some homework and I spent around 34 minutes to understand how color normalization works in matplotlib (https://matplotlib.org/3.1.1/tutorials/colors/colormapnorms.html) and tried to implement a custom normalization class (https://matplotlib.org/3.1.1/tutorials/colors/colormapnorms.html#custom-normalization-manually-implement-two-linear-ranges).
My idea is simply to specify vmin/vmax in terms of quantiles of the color vector which can be shared between variables instead of a specific value. One way, I thought, might be to pass a norm
argument with a custom normalization object to our lovely plot_scatter
. However, as far as I understand, it’s not possible because in the quantile function in the custom normalization class requires the entire color vector for each continuous variable which is not super convenient because it’s too much preprocessing to find different quantile values for each variable and pass a vmin/vmax vector to the plotting function. Not user-friendly and still requires modifications in the code 😦
Instead, I added two ugly arguments named vmin_quantile
and vmax_quantile
to the plot_scatter
function which allows me to specify a single quantile value for vmin/vmax which is then translated into real values separately for each variable:
This solved my problem but I was wondering if it makes sense to add this to scanpy. What do you think?
Finally, here is the way I use it:
plot_scatter(adata, basis='umap', color=['louvain', 'NKG7', 'GNLY', 'KIT'], cmap='Reds', vmax_quantile=0.999)
PS: It needs a few more lines to be accessible from other functions like sc.pl.umap.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (5 by maintainers)
I sympathize with the problem which I tend to solve by plotting individual scatterplots, each one with its specific vmax. But I will be happy to have a better output.
I like the idea of using quantile but I would avoid an increasing list of parameters. Thus @ivirshup suggestion to use
functools.partial
seems better. I like the flexibility it provides and I think we should implement it, but I don’t know if this would be difficult to document and explain to the user that just would like to compute the quantile. An idea would be to allow some encoding for vmax as for examplevmax='q99'
which would be interpreted as np.quantile.My suggestion is to
partial
The following options would then be valid:
Fixed in #794