Ability to plot embeddings that NaN values
See original GitHub issueIn working with Kun.Leng@ucsf.edu to remix his neuronal AD dataset for Corpora, we came upon the issue of visualizing subcluster embeddings that only have coordinate values for a subpopulation of cells. The solution we would like is to fill the other cell coordinates with NaNs and cellxgene would plot only the cells with numerical coordinates for each embedding. Currently, cellxgene shows a blank plot if there are any np.nan
or np.inf
in the embedding.
Previous issue identifying the “bug”: #1118
Update: cellxgene will now accept embeddings containing a NaN value. +/- Inf continue to be rejected. This issue is left open as we are still working out how to improve the UX experience for embeddings with NaN values.
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (17 by maintainers)
Top Results From Across the Web
graphSage returns an array of NaN values as result #94
The bug was caused by l2 normalization leading to division by zero. This can happen if you are embedding a graph that has...
Read more >Plotting masked and NaN values - Matplotlib
Sometimes you need to plot data with missing values. One possibility is to simply remove undesired data points. The line plotted through the...
Read more >Pytorch word embeddings results in nan values - Stack Overflow
After training a word embedding model on a large-ish corpus, my embeddings converge to nan values. This model is about as simple as...
Read more >Handling Missing Data with Graph Representation Learning
Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based ...
Read more >Python | Visualize missing values (NaN) values using ...
Missingno library offers a very nice way to visualize the distribution of NaN values. Missingno is a Python library and compatible with Pandas....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I agree with the reasons against adding another number to the labels. The icon is also intuitive and I like it. As long as the user can find the number of cells missing/included, such as by hovering over the icon, I think it is useful.
Just to throw out another idea, what if the icon was a pie chart that conveyed the fraction of cells missing? Like a yellow wedge proportional to missing values? That way the user can tell at a glance the severity of the ‘warning’ for each label. This might be useful in cases where there are many categories like cell types, and it would be time consuming to hover over each label to check how many are missing.
I was afraid of that 😃
There are a number of disadvantages to doubling the number on the label, primarily:
I was thinking of something like this after the label itself, designating that there were missing values, that would provide more context on hover. I believe this should be yellow, for ‘warning’, and toggleable off to avoid distraction if unwanted.