RFC: User-defined colors
See original GitHub issueRFC: User-defined colors
This is a proposed use case for public comment. We are considering adding this in a future release. If you have comments, please leave them, or just give us a thumbs up/down reaction. Thanks!
Authors: @mweiden, @bkmartinjr Document status: Open for Review Last date for comments: EOD 2020-04-06
Need
Researchers publishing datasets with cellxgene need the ability to ensure colors match their paper publication colors. Some example deployments include:
Note that these data publisher user stories are distinct from use cases of users consuming data through the cellxgene deployment. Further, we are not prioritizing custom colors for continuous variables at this time.
For more, see the user stories below.
In-scope user stories:
- As a cellxgene user hosting datasets accompanying my paper, I want to specify custom colors for data with a category label per dataset, so that I can make my cellxgene deployments and/or images match my publication.
Not-in-scope user stories:
- As a cellxgene user using a cellxgene deployment to explore data, I want to override colors in the UI with my own custom colors, so that I can better interpret the data and generate images for new publications.
- As a cellxgene user hosting datasets accompanying my paper, I want to select from a set of colormaps for continuous variables per dataset, so that I can make my cellxgene deployments and/or images match my publication.
Sources:
- https://github.com/chanzuckerberg/cellxgene/issues/446 - How to specify a custom palette for a category?
- https://github.com/chanzuckerberg/cellxgene/issues/1152 - Use categorical colors from anndata object
Definitions:
- Category-label pair: Currently in annotations, you create a category, then labels for that category. So colors will be assigned per combination of 1) category 2) label for that category. In CS-speak this is a 2-tuple of (category_name, label_name). Example:
(organ, spleen)
- .cxg:
.cxg
is cellxgene’s native file format.
Approach
The implementation would follow this user flow:
- Upon launch, cellxgene would inspect the data file loaded for color information and use it if its usage is clear. Each data file type (cxg, h5ad, and others in the future) will require its own format-specific standard for recovering this data.
- If there are no colors specified in the data file, cellxgene uses its own, default palettes and colormaps. If the user provides colors for some but not all of the category-label pairs in a category, cellxgene uses a best effort strategy, using the colors specified by the user first, then falling back to default colors.
Note that if the user starts with a .loom
file and would like to add custom colors, they will have to use cellxgene prepare
to convert their .loom
file into an .h5ad
and add custom colors to that .h5ad
file using the scanpy standard. With this model, we imagine the following user stories:
- Sally has an H5AD file and is using
ScanPy.plotting
to explore various visualizations. As part of this, Sally has set the ScanPy color map, ie,.uns['{var}_colors]
to preferred colors. When Sally loads this dataset in cellxgene, it will display the categories using the same colors. - Jane has a Loom file, and has converted it to an H5AD using
cellxgene prepare
. Jane would like to prepare the dataset so that the categorical colors match her collaborator’s preference. Jane uses the ScanPy/AnnData package to set the colormap in the anndata object, and saves it. She loads the resulting H5AD in cellxgene, and verifies that the colors match her expectations.
Supported file formats
This section enumerates the file formats that cellxgene can draw color information from and the heuristics it uses to do so.
Heuristics for other common data formats may be possible. We plan to add these incrementally. As always, it is important to keep these heuristics independent of cellxgene’s core data model.
.h5ad
We propose to adapt color information from scanpy objects. This involves pulling categorical color information from .uns["{category}_colors"]
and zipping it together, in order, with category labels to form a mapping from category-label pairs to colors.
.cxg
In .cxg,
we propose that color information can be stored as JSON in the cxg_group_metadata
. The dictionary should have the following format:
{
"<category_name>": {
"<label_name>": "<color_hex_code>",
...
},
...
}
Ellipses are included for brevity.
For example, see the JSON below:
{
"louvain": {
"Dendritic cells": "#1f77b4",
"FCGR3A+ Monocytes": "#ff7f0e",
"CD14+ Monocytes": "#2ca02c",
"NK cells": "#d62728",
...
},
...
}
To support loading color information into .cxg
files, the cxgtool.py
conversion tool will draw color information from the scanpy standard color format.
.loom
To our understanding, .loom
matrix files do not support a color data structure, nor is it part of common conventions. Please call it out if we’re wrong here!
Server-client interaction
After the cellxgene server extracts color information from either the base data file or from user configuration, that color configuration will be encoded in JSON and exposed via the /api/v0.2/colors
API endpoint. The JSON data structure returned by the API will match the format stored in .cxg
files.
Benefits
Principles motivating the approach:
- Simplicity, both in terms of system design and usability
- Extensibility across data formats supported by cellxgene in the future
- Isolation of internal cellxgene data model from external data formats
- PnP - is interoperable with .h5ad files created by scanpy users
- Uses existing data formats
Known drawbacks:
- We’ll have to maintain small modules for loading color information from common color annotation patterns in each data format we support.
Alternatives
- Not implement this feature
- Pros
- Keeps the system simpler
- Cons
- High demand for this feature
- Pros
- Just take color information from the CLI
- Pros
- Keeps the system simpler
- Avoids building in more external format-specific standards into our code
- Cons
- Decreases ease of use. Newcomers to cellxgene should be able to PnP (Plug-n-play) their data into the CLI and see their data as they expect it to look.
- Pros
- Specify color in the UI
- Pros
- WYSIWYG editing of colors
- No configuration files nor code required
- Cons
- Does not fulfill the core user stories, which are for data publishers not data consumers. Data publishers need to be able to deploy the data configured and RO.
- Pros
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:11 (6 by maintainers)
Top GitHub Comments
@dburkhardt The rationale behind the publisher design is that you only need to set the colors once. After that point, all access of color information can be RO. Setting colors in the front-end (FE) means making colors RW, configurable more than once. For this reason, you could consider FE color configuration to be an added feature layer on top of the publisher case.
Also, there are a bunch of questions that arise out of a FE color configuration feature when cellxgene is deployed to the web that we haven’t sorted through yet. If a user changes the colors of category-label pairs, are those automatically reflected for other users? If so, do we need authentication and authorization such that only approved users can do this?
We may take this on in the future, but, for now, we’re proposing keeping the scope tight on the publisher’s problem so that we can quickly deliver the core of the needed functionality.
@nh3 note that this RFC scopes user-defined colors to category-labels. We may address colormaps, but in a future feature set.
The comment period is now closed. Thanks everyone for your thoughtful input!