Is there a way to merge adata objects with vars ordered differently
See original GitHub issueI am trying to integrate and batch correct scRNA data from multiple experiments. I have found a package called scanorama that provides a convenient wrapper for anndata objects. The method seems to generate low dimensional embeddings that suggest it is performing relatively well. The problem comes because scanorama re-orders the var axis to be sorted alphabetically in the corresponding corrected anndata objects it produces, while the original anndata objs are NOT sorted.
I have a ton of important information in the original anndata objs that is NOT retained in the corrected anndata objects scanpy produces as discussed here (https://github.com/brianhie/scanorama/issues/57). In addition to this the index of the originals obs
table is obliterated and replaced with numbers (but the order here is NOT changed).
I believe that I can simply do this to faithfully regain annotations in the original obs
axis
corrected.obs = original.obs
because this order is not altered but I have important data saved about the genes in var
as well as the other attributes in the original that I really want/need in the corrected anndata object. The author suggested ensuring that the var
axis be sorted prior to being provided to the scanorama function but I must admit that I am not sure how to do this even. My explorations lead me to believe that I do not understand the “guts” of these objects well enough to know if I am on the right track.
The following
adata[:,["OR4F5", "FAM138A"]].copy().X != adata[:,["FAM138A","OR4F5"]].X.copy()
shows no alteration to the X
attributes but the corresponding var
attributes reflect the altered order.
Is there a way to import the corrected anndata object as a “layer” or “view” accessible from the original? Or perhaps is there a better way than I have thought of so far to integrate these two very important versions of my data?
Thank you for your tool and any advice/guidance you are able to provide!
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (2 by maintainers)
@ivirshup
Ok. that was the problem. I just happened to randomly choose two genes that are zero across the board. So reordering them is undetectable via equality testing. Choosing populated genes shows the expected behavior.
I just wrote up a new version of Scanorama that preserves .obs in the corrected output.
You can also now just do integration (which benchmarks better than “batch correction”) if you are only concerned with clustering/visualization. Scanorama v1.7 just adds this to the adatas as
adata.obsm['X_scanorama']
.Hope that helps!