Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is there a way to merge adata objects with vars ordered differently

See original GitHub issue

I am trying to integrate and batch correct scRNA data from multiple experiments. I have found a package called scanorama that provides a convenient wrapper for anndata objects. The method seems to generate low dimensional embeddings that suggest it is performing relatively well. The problem comes because scanorama re-orders the var axis to be sorted alphabetically in the corresponding corrected anndata objects it produces, while the original anndata objs are NOT sorted.

I have a ton of important information in the original anndata objs that is NOT retained in the corrected anndata objects scanpy produces as discussed here (https://github.com/brianhie/scanorama/issues/57). In addition to this the index of the originals obs table is obliterated and replaced with numbers (but the order here is NOT changed).

I believe that I can simply do this to faithfully regain annotations in the original obs axis

corrected.obs = original.obs

because this order is not altered but I have important data saved about the genes in var as well as the other attributes in the original that I really want/need in the corrected anndata object. The author suggested ensuring that the var axis be sorted prior to being provided to the scanorama function but I must admit that I am not sure how to do this even. My explorations lead me to believe that I do not understand the “guts” of these objects well enough to know if I am on the right track.

The following

adata[:,["OR4F5", "FAM138A"]].copy().X != adata[:,["FAM138A","OR4F5"]].X.copy()

shows no alteration to the X attributes but the corresponding var attributes reflect the altered order.

Is there a way to import the corrected anndata object as a “layer” or “view” accessible from the original? Or perhaps is there a better way than I have thought of so far to integrate these two very important versions of my data?

Thank you for your tool and any advice/guidance you are able to provide!

Issue Analytics

State:
Created 3 years ago
Comments:11 (2 by maintainers)

Top GitHub Comments

1reaction

xgusecommented, Nov 19, 2020

@ivirshup

Ok. that was the problem. I just happened to randomly choose two genes that are zero across the board. So reordering them is undetectable via equality testing. Choosing populated genes shows the expected behavior.

1reaction

brianhiecommented, Nov 18, 2020

I just wrote up a new version of Scanorama that preserves .obs in the corrected output.

You can also now just do integration (which benchmarks better than “batch correction”) if you are only concerned with clustering/visualization. Scanorama v1.7 just adds this to the adatas as adata.obsm['X_scanorama'].

Hope that helps!