Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

merge/concatenate gene expression .h5 files

See original GitHub issue

hi, Thanks for the amazing tool.

I’m trying to read 5 samples’ airrseq files and gex files using ‘ir.io.read_10x_vdj’ and ‘sc.read_10x_h5’ respectively. Then integrate the merged objects using ‘ir.pp.merge_with_ir’ I can succesfully concatenate the airrseq data usiung ‘.concatenate’ but I get an error when I do the same with the gex/RNAseq .h5 files.

Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Traceback (most recent call last):
  File "/gstore/home/chintalh/scirpy_tmp.py", line 16, in <module>
    data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5)
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/anndata.py", line 1757, in concatenate
    out = concat(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 818, in concat
    alt_annot = merge_dataframes(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 531, in merge_dataframes
    dfs = [df.reindex(index=new_index) for df in dfs]
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 531, in <listcomp>
    dfs = [df.reindex(index=new_index) for df in dfs]
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/util/_decorators.py", line 324, in wrapper
    return func(*args, **kwargs)
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4767, in reindex
    return super().reindex(**kwargs)
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/generic.py", line 4809, in reindex
    return self._reindex_axes(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4592, in _reindex_axes
    frame = frame._reindex_index(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4611, in _reindex_index
    return self._reindex_with_indexers(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/generic.py", line 4874, in _reindex_with_indexers
    new_data = new_data.reindex_indexer(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 666, in reindex_indexer
    self.axes[axis]._validate_can_reindex(indexer)
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

Here’s my code:

#!/usr/bin/env python
import scanpy as sc
import scirpy as ir
import numpy as np
data1 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437928_SAM24396521/output/airrseq_10x/6a190d9e-181a-449e-bd43-5a3ab6bb4e04/call-consolidated_output/execution/LIB5437928_SAM24396521/vdj_outs/all_contig_annotations.csv")

data2 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437926_SAM24396520/output/airrseq_10x/88a6c238-f13c-4ca2-8e03-1407964cadb2/call-consolidated_output/execution/LIB5437926_SAM24396520/vdj_outs/all_contig_annotations.csv")

data3 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437924_SAM24396519/output/airrseq_10x/e91a811a-d224-4d5a-896b-8e597a8b1582/call-consolidated_output/execution/LIB5437924_SAM24396519/vdj_outs/all_contig_annotations.csv")

data4 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437930_SAM24396522/output/airrseq_10x/f41965bf-e0c3-4b97-a7ee-866b0c27a3b0/call-consolidated_output/execution/LIB5437930_SAM24396522/vdj_outs/all_contig_annotations.csv")

data5 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437932_SAM24396523/output/airrseq_10x/43d01408-8b1f-4ae8-a7d6-56d07376d647/call-consolidated_output/execution/LIB5437932_SAM24396523/vdj_outs/all_contig_annotations.csv")


cat=['SAM24396521','SAM24396520','SAM24396519','SAM24396522','SAM24396523']
data_tcr = data1.concatenate(data2, data3, data4, data5, batch_key='Sample', batch_categories=cat)

#########
data_gex1 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60ba4af7a2285de000d9d442/SAM24396521/output/cellranger_count_601/86f080b1-2b8d-4af5-b934-e215f362045a/call-run_cellranger_count/shard-0/execution/LIB5437929_SAM24396521/outs/filtered_feature_bc_matrix.h5")
#data_gex1.var_names_make_unique()
data_gex2 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396520/output/cellranger_count_601/4582a707-75fa-4720-9054-49ffb9b389f6/call-run_cellranger_count/shard-0/execution/LIB5437927_SAM24396520/outs/filtered_feature_bc_matrix.h5")
#data_gex2.var_names_make_unique()
data_gex3 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396519/output/cellranger_count_601/770eeaed-72a4-4402-8471-d654f7cd325d/call-run_cellranger_count/shard-0/execution/LIB5437925_SAM24396519/outs/filtered_feature_bc_matrix.h5")
#data_gex3.var_names_make_unique()
data_gex4 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396522/output/cellranger_count_601/f6389e38-5f08-421e-9028-8bd25815e13e/call-run_cellranger_count/shard-0/execution/LIB5437931_SAM24396522/outs/filtered_feature_bc_matrix.h5")
#data_gex4.var_names_make_unique()
data_gex5 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396523/output/cellranger_count_601/0752c4ea-3449-4df3-9872-08e48fbd77a9/call-run_cellranger_count/shard-0/execution/LIB5437933_SAM24396523/outs/filtered_feature_bc_matrix.h5")
#data_gex5.var_names_make_unique()

#data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5, batch_key='data_gex', batch_categories=gex_filenames)
data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5)

ANy tips apppreciated, thanks!

Issue Analytics

State:
Created 2 years ago
Comments:13

Top GitHub Comments

1reaction

grstcommented, Sep 10, 2021

they work also on the integrated object.

The issue here is that your obs names (i.e. cell barcodes) are not unique.

Try calling adata.obs_names_make_unique().

Hope that helps!

EDIT: sorry, confused obs and var (obs = celll barcodes, var = genes)

On Fri, 10 Sept 2021 at 20:46, Himanshu Chintalapudi < @.***> wrote:

Hi, Sorry to bother again. So, for this type of integrated analysis, would it make sense to do the TCR quality control and pre-processing of the expression data within the for-loop before merging with ir.pp.merge_with_ir()? Or do the QC functions work for TCR and exp work with the merged ‘adata’ object? I ask as I’m getting errors when running ir.tl.define_clonotype_clusters() :

Traceback (most recent call last): File “/gstore/home/chintalh/scripts/scirpy_example2.py”, line 38, in <module> ir.tl.define_clonotype_clusters( File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/io/_util.py”, line 67, in check_wrapper return f(*args, **kwargs) File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/_tools/_clonotypes.py”, line 260, in define_clonotype_clusters ctn = ClonotypeNeighbors( File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/ir_dist/_clonotype_neighbors.py”, line 58, in init self._prepare(adata) File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/ir_dist/_clonotype_neighbors.py”, line 63, in _prepare self._make_clonotype_table(adata) File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/ir_dist/_clonotype_neighbors.py”, line 73, in _make_clonotype_table raise ValueError(“Obs names need to be unique!”) ValueError: Obs names need to be unique!

Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/icbi-lab/scirpy/issues/289#issuecomment-917130866, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVZRV3KJY77ONIPJZ25U4DUBJHALANCNFSM5CZXIIHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

0reactions

grstcommented, Sep 11, 2021

Update to 0.9.0?

Yes, you’ll need v0.9!

I just updated on conda and I have 0.8.0

Will that work effectively?

conda upgrade --all in your env should work. If not you can try to install v0.9 explicitly:

conda install "scirpy>=0.9.0"

—

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/icbi-lab/scirpy/issues/289#issuecomment-917208222, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVZRV34HOB32DTP3I3S3RTUBJWZNANCNFSM5CZXIIHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Top Results From Across the Web

scirpy - bytemeta

Add wrapper for conga · merge/concatenate gene expression .h5 files · Using own distance matrix for clonotype_network · TRAV/DV rearrangements · Import read_airr ......