merge/concatenate gene expression .h5 files
See original GitHub issuehi, Thanks for the amazing tool.
I’m trying to read 5 samples’ airrseq files and gex files using ‘ir.io.read_10x_vdj’ and ‘sc.read_10x_h5’ respectively. Then integrate the merged objects using ‘ir.pp.merge_with_ir’ I can succesfully concatenate the airrseq data usiung ‘.concatenate’ but I get an error when I do the same with the gex/RNAseq .h5 files.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Traceback (most recent call last):
File "/gstore/home/chintalh/scirpy_tmp.py", line 16, in <module>
data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5)
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/anndata.py", line 1757, in concatenate
out = concat(
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 818, in concat
alt_annot = merge_dataframes(
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 531, in merge_dataframes
dfs = [df.reindex(index=new_index) for df in dfs]
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 531, in <listcomp>
dfs = [df.reindex(index=new_index) for df in dfs]
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/util/_decorators.py", line 324, in wrapper
return func(*args, **kwargs)
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4767, in reindex
return super().reindex(**kwargs)
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/generic.py", line 4809, in reindex
return self._reindex_axes(
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4592, in _reindex_axes
frame = frame._reindex_index(
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4611, in _reindex_index
return self._reindex_with_indexers(
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/generic.py", line 4874, in _reindex_with_indexers
new_data = new_data.reindex_indexer(
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 666, in reindex_indexer
self.axes[axis]._validate_can_reindex(indexer)
File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
Here’s my code:
#!/usr/bin/env python
import scanpy as sc
import scirpy as ir
import numpy as np
data1 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437928_SAM24396521/output/airrseq_10x/6a190d9e-181a-449e-bd43-5a3ab6bb4e04/call-consolidated_output/execution/LIB5437928_SAM24396521/vdj_outs/all_contig_annotations.csv")
data2 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437926_SAM24396520/output/airrseq_10x/88a6c238-f13c-4ca2-8e03-1407964cadb2/call-consolidated_output/execution/LIB5437926_SAM24396520/vdj_outs/all_contig_annotations.csv")
data3 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437924_SAM24396519/output/airrseq_10x/e91a811a-d224-4d5a-896b-8e597a8b1582/call-consolidated_output/execution/LIB5437924_SAM24396519/vdj_outs/all_contig_annotations.csv")
data4 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437930_SAM24396522/output/airrseq_10x/f41965bf-e0c3-4b97-a7ee-866b0c27a3b0/call-consolidated_output/execution/LIB5437930_SAM24396522/vdj_outs/all_contig_annotations.csv")
data5 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437932_SAM24396523/output/airrseq_10x/43d01408-8b1f-4ae8-a7d6-56d07376d647/call-consolidated_output/execution/LIB5437932_SAM24396523/vdj_outs/all_contig_annotations.csv")
cat=['SAM24396521','SAM24396520','SAM24396519','SAM24396522','SAM24396523']
data_tcr = data1.concatenate(data2, data3, data4, data5, batch_key='Sample', batch_categories=cat)
#########
data_gex1 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60ba4af7a2285de000d9d442/SAM24396521/output/cellranger_count_601/86f080b1-2b8d-4af5-b934-e215f362045a/call-run_cellranger_count/shard-0/execution/LIB5437929_SAM24396521/outs/filtered_feature_bc_matrix.h5")
#data_gex1.var_names_make_unique()
data_gex2 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396520/output/cellranger_count_601/4582a707-75fa-4720-9054-49ffb9b389f6/call-run_cellranger_count/shard-0/execution/LIB5437927_SAM24396520/outs/filtered_feature_bc_matrix.h5")
#data_gex2.var_names_make_unique()
data_gex3 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396519/output/cellranger_count_601/770eeaed-72a4-4402-8471-d654f7cd325d/call-run_cellranger_count/shard-0/execution/LIB5437925_SAM24396519/outs/filtered_feature_bc_matrix.h5")
#data_gex3.var_names_make_unique()
data_gex4 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396522/output/cellranger_count_601/f6389e38-5f08-421e-9028-8bd25815e13e/call-run_cellranger_count/shard-0/execution/LIB5437931_SAM24396522/outs/filtered_feature_bc_matrix.h5")
#data_gex4.var_names_make_unique()
data_gex5 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396523/output/cellranger_count_601/0752c4ea-3449-4df3-9872-08e48fbd77a9/call-run_cellranger_count/shard-0/execution/LIB5437933_SAM24396523/outs/filtered_feature_bc_matrix.h5")
#data_gex5.var_names_make_unique()
#data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5, batch_key='data_gex', batch_categories=gex_filenames)
data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5)
ANy tips apppreciated, thanks!
Issue Analytics
- State:
- Created 2 years ago
- Comments:13
Top Results From Across the Web
scirpy - bytemeta
Add wrapper for conga · merge/concatenate gene expression .h5 files · Using own distance matrix for clonotype_network · TRAV/DV rearrangements · Import read_airr ......
Read more >Untitled
The h5py Python module must be installed to use HDF5 files (included in ... RNA-seq data consists of expression measurements for all genes...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
they work also on the integrated object.
The issue here is that your obs names (i.e. cell barcodes) are not unique.
Try calling
adata.obs_names_make_unique()
.Hope that helps!
EDIT: sorry, confused obs and var (obs = celll barcodes, var = genes)
On Fri, 10 Sept 2021 at 20:46, Himanshu Chintalapudi < @.***> wrote:
Update to 0.9.0?
Yes, you’ll need v0.9!
I just updated on conda and I have 0.8.0
conda upgrade --all
in your env should work. If not you can try to install v0.9 explicitly:conda install "scirpy>=0.9.0"
—