Request for comment: gene list functionality & data structure
See original GitHub issueData structure
- Lists are not mutually exclusive: a gene may be present in multiple lists
- Lists are nonredundant: a gene may only be present once in a given list
- Lists are ordered: gene position in a list is meaningful
- Lists are nonexhaustive: not every gene will appear in a list
- Lists have variable length
- Lists names are strings that follow the same conventions as category names
- Feature names (i.e., genes) are present in
var_names
e.g.,
{'cardiomyocyte_markers' : ['gene1', 'gene2', 'gene3', 'gene0'],
'microglia_markers': ['gene1', 'gene7, 'gene4']}
Loading gene lists
We should load gene sets from three places:
1 - adata.uns['rank_genes_groups']
These are the results of precomputed one-vs-all differential expression, and include the log fold change and adjusted p-values.
Input: structured arrays as described here
Output: gene set with title name vs. all
, contents are gene histograms as rendered for differential expression (i.e., include log fold change and p-values), ordered by log fold change
2 - any .uns
field specified at the command line
--gene-sets-field [Default: None
]
Name of field in adata.uns to look for gene set definitions; expects a dictionary of lists or a dictionary of sets.
Input: dictionary of lists, tuples, or sets. Aligns with scanpy specification of marker genes.
Output: gene set with the title key
, contents are gene histograms rendered in the order they were provided (if list or tuple) or alphanumerically sorted (if set)
- from a CSV that mirrors the cell annotations data format
Example prepared here
Interacting with gene lists
Core functionality:
- Add and remove genes from lists
- Color by & select by overall expression of all genes in a list (method TBD)
- Duplicate existing lists (similar to existing cell annotation affordances)
- Move and copy individual genes between lists (e.g., drag and drop)
Saving gene lists
Still noodling on this; open questions
1 - JSON (easier merge and more ‘correct’) vs CSV (easier to edit, ship around and deal with)
Current thinking: CSV
2 - Autosave to separate file as for cell annotations?
Current thinking: yes!!
3 - Mutable vs nonmutable groups as for annotations?
Current thinking: ???
Implementation plan
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (4 by maintainers)
For reference, here is my first thought for how to indicate in the UI when listed gene list genes are not available:
Closing, tracking implementation work here