question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ability to plot embeddings that NaN values

See original GitHub issue

In working with Kun.Leng@ucsf.edu to remix his neuronal AD dataset for Corpora, we came upon the issue of visualizing subcluster embeddings that only have coordinate values for a subpopulation of cells. The solution we would like is to fill the other cell coordinates with NaNs and cellxgene would plot only the cells with numerical coordinates for each embedding. Currently, cellxgene shows a blank plot if there are any np.nan or np.inf in the embedding.

Previous issue identifying the “bug”: #1118


Update: cellxgene will now accept embeddings containing a NaN value. +/- Inf continue to be rejected. This issue is left open as we are still working out how to improve the UX experience for embeddings with NaN values.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:17 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
mattcaicommented, Jun 16, 2020

I agree with the reasons against adding another number to the labels. The icon is also intuitive and I like it. As long as the user can find the number of cells missing/included, such as by hovering over the icon, I think it is useful.

Just to throw out another idea, what if the icon was a pie chart that conveyed the fraction of cells missing? Like a yellow wedge proportional to missing values? That way the user can tell at a glance the severity of the ‘warning’ for each label. This might be useful in cases where there are many categories like cell types, and it would be time consuming to hover over each label to check how many are missing.

1reaction
colinmegillcommented, Jun 15, 2020

I was afraid of that 😃

There are a number of disadvantages to doubling the number on the label, primarily:

  • Variable amount of width, potentially quite a bit of space
  • Confusion with the concept of ‘currently selected’ and ‘subset’, we don’t want to confuse the user with what fraction we’re displaying — the idea of ‘out of x’ is quite nuanced as the user might be looking at x cells (current selection) out of y subset out of z total.

I was thinking of something like this after the label itself, designating that there were missing values, that would provide more context on hover. I believe this should be yellow, for ‘warning’, and toggleable off to avoid distraction if unwanted.

image

Read more comments on GitHub >

github_iconTop Results From Across the Web

graphSage returns an array of NaN values as result #94
The bug was caused by l2 normalization leading to division by zero. This can happen if you are embedding a graph that has...
Read more >
Plotting masked and NaN values - Matplotlib
Sometimes you need to plot data with missing values. One possibility is to simply remove undesired data points. The line plotted through the...
Read more >
Pytorch word embeddings results in nan values - Stack Overflow
After training a word embedding model on a large-ish corpus, my embeddings converge to nan values. This model is about as simple as...
Read more >
Handling Missing Data with Graph Representation Learning
Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based ...
Read more >
Python | Visualize missing values (NaN) values using ...
Missingno library offers a very nice way to visualize the distribution of NaN values. Missingno is a Python library and compatible with Pandas....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found