Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Normalization fun with vmin/vmax

See original GitHub issue

Dear people,

In our plotting functions, vmin and vmax are great arguments, we all love them. First of all, let’s all hold our hands, close our eyes and thank matplotlib developers for implementing them.

But when we jointly plot some continuous variables that are not on the exact same scale (e.g. some genes), it’s not possible to specify a single vmin/vmax value that fits all variables especially if they have some outliers. This deeply saddens us and forces us to watch a few more episodes of Stranger Things and it certainly doesn’t help 😦

I would like to hear your thoughts about how to fix that. But before that, as a responsible person, I did some homework and I spent around 34 minutes to understand how color normalization works in matplotlib (https://matplotlib.org/3.1.1/tutorials/colors/colormapnorms.html) and tried to implement a custom normalization class (https://matplotlib.org/3.1.1/tutorials/colors/colormapnorms.html#custom-normalization-manually-implement-two-linear-ranges).

My idea is simply to specify vmin/vmax in terms of quantiles of the color vector which can be shared between variables instead of a specific value. One way, I thought, might be to pass a norm argument with a custom normalization object to our lovely plot_scatter. However, as far as I understand, it’s not possible because in the quantile function in the custom normalization class requires the entire color vector for each continuous variable which is not super convenient because it’s too much preprocessing to find different quantile values for each variable and pass a vmin/vmax vector to the plotting function. Not user-friendly and still requires modifications in the code 😦

Instead, I added two ugly arguments named vmin_quantile and vmax_quantile to the plot_scatter function which allows me to specify a single quantile value for vmin/vmax which is then translated into real values separately for each variable:

This solved my problem but I was wondering if it makes sense to add this to scanpy. What do you think?

Finally, here is the way I use it:

plot_scatter(adata, basis='umap', color=['louvain', 'NKG7', 'GNLY', 'KIT'], cmap='Reds', vmax_quantile=0.999)

PS: It needs a few more lines to be accessible from other functions like sc.pl.umap.

Issue Analytics

State:
Created 4 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

fidelramcommented, Aug 15, 2019

I sympathize with the problem which I tend to solve by plotting individual scatterplots, each one with its specific vmax. But I will be happy to have a better output.

I like the idea of using quantile but I would avoid an increasing list of parameters. Thus @ivirshup suggestion to use functools.partial seems better. I like the flexibility it provides and I think we should implement it, but I don’t know if this would be difficult to document and explain to the user that just would like to compute the quantile. An idea would be to allow some encoding for vmax as for example vmax='q99' which would be interpreted as np.quantile.

My suggestion is to

add vectorized vmax and vmin
each entry of vmax or vmin would be interpreted depending on the data type. Besides a number, if it is a string then it is interpreted as for example quantile if it starts with ‘q’ or as a function if the type is partial

The following options would then be valid:

sc.pl.{scatterfunc}(adata, color=["gene1", "gene2"], vmax=[4., 3.])
sc.pl.{scatterfunc}(adata, color=["gene1", "gene2"], vmax=['q80', 'q90'])

from functools import partial
sc.pl.{scatterfunc}(adata, color=["gene1", "gene2"], vmax=[partial(np.mean), partial(np.median)])

# combination
sc.pl.{scatterfunc}(adata, color=["gene1", "gene2",  "gene3"], 
                    vmax=[4., 'q85', partial(np.percentile, q=90])

0reactions

flying-sheepcommented, Aug 23, 2019

Fixed in #794

Top Results From Across the Web

Normalization fun with vmin/vmax · Issue #775 · scverse/scanpy

My idea is simply to specify vmin/vmax in terms of quantiles of the color vector which can be shared between variables instead of...

matplotlib.colors.Normalize — Matplotlib 3.6.2 documentation

A class which, when called, linearly normalizes data into the [0.0, 1.0] interval. Parameters: vmin, vmaxfloat or None. If vmin and/or vmax is...

PDFs and Normalization — histlite 2022.11.0 documentation

In this tutorial, we discuss the treatment of histograms as probality density functions(PDFs). Histogram as PDF; PDF Modeling; PDF Sampling; 2D Histograms ...

Data Preprocessing with Python Pandas — Part 3 Normalisation

Clipping is useful when a column contains some outliers. We can set a maximum vmax and a minimum value vmin and set all...

python - Two Matplotlib Colorbars with different (conditional ...

Normalize (vmin=d.min(), vmax=d.max()) cmapd = mpl.cm.ScalarMappable(norm=normd, cmap=mpl.cm.Reds_r) cmapd.set_array([]) # real for i in ...