Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

indexing of AnnData

See original GitHub issue

The task is to write the following preprocessing sequence using an AnnData instance adata.

meanFilter = 0.01
cvFilter = 2
nr_pcs = 50

ddata = adata.to_dict()
X = ddata['X']
# row normalize                                                                                                                                                                  
X = row_norm(X, max_fraction=0.05, mult_with_mean=True)
# filter out genes with mean expression < 0.1 and coefficient of variance <                                                                                                      
# cvFilter                                                                                                                                                                       
X, gene_filter = filter_genes_cv(X, meanFilter, cvFilter)
# compute zscore of filtered matrix                                                                                                                                              
Xz = zscore(X)
# PCA                                                                                                                                                                            
Xpca = pca(Xz, nr_comps=nr_pcs)
# update dictionary                                                                                                                                                              
ddata['X'] = X
ddata['Xpca'] = Xpca
ddata['var_names'] = ddata['var_names'][gene_filter]
sett.m(0, 'Xpca has shape',
    ddata['Xpca'].shape[0], 'x', ddata['Xpca'].shape[1])
from ..ann_data import AnnData
adata = AnnData(ddata)
print(adata.X)

While the previous snippet works just as expected, when I want to do the same without a ddata object, some uncontrolled behavior comes up. Indexing doesn’t work as expected anymore. @flying-sheep: could you have a look at why adata['Xpca'] = Xpca in the following throws an

>>> adata['Xpca'] = Xpca
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

in the following snippet

X = adata.X
# row normalize                                                                                                                                                                  
X = row_norm(X, max_fraction=0.05, mult_with_mean=True)
# filter out genes with mean expression < 0.1 and coefficient of variance <                                                                                                      
# cvFilter                                                                                                                                                                       
X, gene_filter = filter_genes_cv(X, meanFilter, cvFilter)
# compute zscore of filtered matrix                                                                                                                                              
Xz = zscore(X)
# PCA                                                                                                                                                                            
Xpca = pca(Xz, nr_comps=nr_pcs)
# update adata                                                                                                                                                                   
adata.X = X
adata = adata.var_names[gene_filter] # filter genes                                                                                                                              
adata['Xpca'] = Xpca
sett.m(0, 'Xpca has shape',
    adata['Xpca'].shape[0], 'x', adata['Xpca'].shape[1])
print(adata.X)

I played around quite some bit, but the only solution that I got running then had the numerically incorrect result. It’s quite to hard to keep this sequence of steps nicely organized.

PS: the snippet appears in scanpy/preprocess/advanced.py and an example would be ./scanpy.py nestorowa16 diffmap -r pp.

Issue Analytics

State:
Created 7 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

falexwolfcommented, Feb 9, 2017

ok, of course this right, let’s discuss in person.

0reactions

flying-sheepcommented, Feb 9, 2017

something like adata.var = adata.var[gene_filter] should work

i disagree. adata = adata[:, gene_filter] should work. the object shouldn’t be able to be in an invalid state.

Top Results From Across the Web

Login. In order to view this documentation, you must log in first. Username: Password: Login. Or. Do you have a password? Access here ......

Introducing anndata: indexing, views and HDF5-backing

Indexing and Views. Similar to numpy arrays, AnnData objects can either hold actual data or reference another AnnData object. In the later case, ......

Indexing anndata for plots - Help - Scanpy

Is it possible to pass a subset of the anndata object to the plotting interface? I am looking for something like this (to...

indexing of AnnData · Issue #4 · scverse/scanpy - GitHub

The task is to write the following preprocessing sequence using an AnnData instance adata. meanFilter = 0.01 cvFilter = 2 nr_pcs = 50...

Create an Annotated Data Matrix - anndata

Indexing into an AnnData object can be performed by relative position with numeric indices, or by labels. To avoid ambiguity with numeric indexing...