Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for -`Name, -`Key`, and -`SearchKey` global attributes

See original GitHub issue

This is a proposal to generalise the default schema we currently use internally.

Paging @slinnarsson, @gioelelm and @pl-ki for feedback. If everything is OK I can implement this, and Peter will be able to use it in his updates to the pipeline.

Support for global attributes `..Name`, `..Key`, and `..SearchKey`

Currently, we use Accession as the “row key attribute”, Gene as the default “row search key” attribute, and CellID as both “column key” attribute and “column search key” attribute. None of this is written down in the documentation.

It also ties the structure of our loom files to the current pipeline. Perhaps these labels will change in the future - they have already changed a few times internally in the past!

Suggestion: add global attributes that state which of the row/column attributes are the key attributes. That way, writing scripts for the pipeline will be more “future proof”, since it won’t have to worry about changing labels.

Similarly, we assume rows represents genes, and columns represents cells, but people might use loom for different purposes. I suggest adding a name attribute for that.

For backwards compatibility, we can use the above defaults as fall-backs if these attributes are missing.

The proposed schema ends up as:

rowName (optional, is used to label the scatterplot view in loom-viewer. Defaults to “Genes” if Gene is present as a row attribute, “Rows” otherwise)
rowKey (optional, must contain a unique value for each row. Defaults to Accession if it is present in row attributes, uses row index numbers otherwise)
rowSearchKey (optional, defaults to Gene if it is present as a row attributes, uses row index numbers otherwise)
colName (optional, is used to label the scatterplot view in loom-viewer. Defaults to “Cells” if CellID is present in column attributes, “Columns” otherwise)
colKey (optional, must contain a unique value for each column. Defaults to CellID if it is present in column attributes, uses column index numbers otherwise)
colSearchkey (optional, defaults to CellID if it is present in column attributes, uses column index numbers otherwise)

Using `rowSearchKey` and `colSearchKey` for row/column getter functions

Currently, the documentation suggests the following way to access a row/column by Gene or CellID:

>>> ds[np.logical_or(ds.Gene == "Actb", ds.Gene == "Gapdh"),:]
array([[  2.,   9.,   9., ...,   0.,  14.,   0.],
       [  0.,   1.,   4., ...,   0.,  14.,   3.]], dtype=float32)

>>> ds[:, ds.CellID == "AAACATACATTCTC-1"]
array([[ 0.],
       [ 0.],
       [ 0.],
       ...,
       [ 0.],
       [ 0.],
       [ 0.]], dtype=float32)

I think it would be convenient to create a helper function that would wrap this NumPy logic:

>>> ds.getRows(["Actb", "Gapdh"])
array([[  2.,   9.,   9., ...,   0.,  14.,   0.],
       [  0.,   1.,   4., ...,   0.,  14.,   3.]], dtype=float32)

>>> ds.getColumns(["AAACATACATTCTC-1"])
array([[ 0.],
       [ 0.],
       [ 0.],
       ...,
       [ 0.],
       [ 0.],
       [ 0.]], dtype=float32)

These would use the rowSearchKey and colSearchKey attributes for choosing a default attribute, with the option for users to override this: ds.getRows(list, searchKey=self.rowSearchKey)

TODO list if this is all approved:

add rowName, rowKey, rowSearchKey, colName, colKey and colSearchKey support to loom creation methods
- create
- create_from_cellranger
- _create_sparse
add getRows method to LoomConnection
add getColumns method to LoomConnection

Issue Analytics

State:
Created 6 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

gioelelmcommented, Oct 31, 2017

For uniformity with the python ecosystem and in particular with the widely used package for tabular formats pandas I would suggest against calling the special indexing attribute getRows and getCols and I would suggest instead .loc and .iloc. See here

>>> ds.loc[["Actb", "Gapdh"], :]
array([[  2.,   9.,   9., ...,   0.,  14.,   0.],
       [  0.,   1.,   4., ...,   0.,  14.,   3.]], dtype=float32)

>>> ds.loc[:, ["AAACATACATTCTC-1"]]
array([[ 0.],
       [ 0.],
       [ 0.],
       ...,
       [ 0.],
       [ 0.],
       [ 0.]], dtype=float32)

0reactions

slinnarssoncommented, Nov 2, 2017

I don’t think this is a good idea, for several reasons:

You’re creating a lot of new syntax that only works with special attributes. All other attributes will still have to be accessed using numpy-style fancy indexing. This will be very confusing.
you’re borrowing syntax from pandas. This will create the expectation that loompy works like pandas. Will loc and iloc support all five allowed (by pandas) arguments? That’s a lot of new code to support and debug.
by creating two different ways of achieving the same effect, one of which only works in special cases, you’re making loompy harder to learn. Every code example that uses loc/iloc will not transfer to any other attribute except the specially designated ones.
if the concern is specifically to support getting rows (or columns) given a set of values, numpy already has a fine syntax for that:

rows = np.isin(ds.Gene, [”Actb”, ”Gapdh”])
data = ds[rows, :]

This has the virtue of working the same for all attributes. Plus, isin is just one of many set operations in numpy, so it is also much more powerful.

if the concern is to make it easy for pandas-users to learn loompy, we should instead make a tutorial that teaches the equivalent loompy idioms
all the new syntax will be broken on loom files that do not have the rowSearchKey attribute (currently, 100% of all loom files). Before using loc/iloc you’ll have to check for (or set) rowSearchKey (every time!). That’s a lot of boilerplate code. People won’t do it and their code will break unexpectedly.
this also makes me reconsider the rowKeys (etc) attributes. I definitely don’t think those should be loompy standards, because we will then always be tempted to rely on them. They might be loom-viewer standards though.

Top Results From Across the Web

Global attributes - HTML: HyperText Markup Language | MDN

Chrome Edge accesskey Full support. ChromeYes. Toggle history Full support. Edge12. Tog... autocapitalize Full support. Chrome43. Toggle history Full support. Edge79. Tog... autocomplete Full support. ChromeYes....

GS1 Global Data Model Attribute Implementation Guideline

An understandable and useable description of a product using a combination of key elements such as Brand Name, Sub-Brand (if applicable), Functional Name, ......

HTML Global attributes - W3Schools

The global attributes are attributes that can be used with all HTML elements. ... accesskey, Specifies a shortcut key to activate/focus an element....

Query operations in DynamoDB - AWS Documentation

The Query operation in Amazon DynamoDB finds items based on primary key values. You must provide the name of the partition key attribute...

Global Attributes - Salesforce Help

A global attribute needs only a name and a target product option field. Global attributes use the following objects. Global Attribute: A configuration ......