question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

merge/concatenate gene expression .h5 files

See original GitHub issue

hi, Thanks for the amazing tool.

I’m trying to read 5 samples’ airrseq files and gex files using ‘ir.io.read_10x_vdj’ and ‘sc.read_10x_h5’ respectively. Then integrate the merged objects using ‘ir.pp.merge_with_ir’ I can succesfully concatenate the airrseq data usiung ‘.concatenate’ but I get an error when I do the same with the gex/RNAseq .h5 files.

Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Traceback (most recent call last):
  File "/gstore/home/chintalh/scirpy_tmp.py", line 16, in <module>
    data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5)
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/anndata.py", line 1757, in concatenate
    out = concat(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 818, in concat
    alt_annot = merge_dataframes(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 531, in merge_dataframes
    dfs = [df.reindex(index=new_index) for df in dfs]
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/anndata/_core/merge.py", line 531, in <listcomp>
    dfs = [df.reindex(index=new_index) for df in dfs]
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/util/_decorators.py", line 324, in wrapper
    return func(*args, **kwargs)
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4767, in reindex
    return super().reindex(**kwargs)
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/generic.py", line 4809, in reindex
    return self._reindex_axes(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4592, in _reindex_axes
    frame = frame._reindex_index(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/frame.py", line 4611, in _reindex_index
    return self._reindex_with_indexers(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/generic.py", line 4874, in _reindex_with_indexers
    new_data = new_data.reindex_indexer(
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 666, in reindex_indexer
    self.axes[axis]._validate_can_reindex(indexer)
  File "/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

Here’s my code:

#!/usr/bin/env python
import scanpy as sc
import scirpy as ir
import numpy as np
data1 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437928_SAM24396521/output/airrseq_10x/6a190d9e-181a-449e-bd43-5a3ab6bb4e04/call-consolidated_output/execution/LIB5437928_SAM24396521/vdj_outs/all_contig_annotations.csv")

data2 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437926_SAM24396520/output/airrseq_10x/88a6c238-f13c-4ca2-8e03-1407964cadb2/call-consolidated_output/execution/LIB5437926_SAM24396520/vdj_outs/all_contig_annotations.csv")

data3 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437924_SAM24396519/output/airrseq_10x/e91a811a-d224-4d5a-896b-8e597a8b1582/call-consolidated_output/execution/LIB5437924_SAM24396519/vdj_outs/all_contig_annotations.csv")

data4 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437930_SAM24396522/output/airrseq_10x/f41965bf-e0c3-4b97-a7ee-866b0c27a3b0/call-consolidated_output/execution/LIB5437930_SAM24396522/vdj_outs/all_contig_annotations.csv")

data5 = ir.io.read_10x_vdj("/gstore/data/genomics/congee_rest_runs/60eefd86a2285de000dec387/LIB5437932_SAM24396523/output/airrseq_10x/43d01408-8b1f-4ae8-a7d6-56d07376d647/call-consolidated_output/execution/LIB5437932_SAM24396523/vdj_outs/all_contig_annotations.csv")


cat=['SAM24396521','SAM24396520','SAM24396519','SAM24396522','SAM24396523']
data_tcr = data1.concatenate(data2, data3, data4, data5, batch_key='Sample', batch_categories=cat)

#########
data_gex1 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60ba4af7a2285de000d9d442/SAM24396521/output/cellranger_count_601/86f080b1-2b8d-4af5-b934-e215f362045a/call-run_cellranger_count/shard-0/execution/LIB5437929_SAM24396521/outs/filtered_feature_bc_matrix.h5")
#data_gex1.var_names_make_unique()
data_gex2 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396520/output/cellranger_count_601/4582a707-75fa-4720-9054-49ffb9b389f6/call-run_cellranger_count/shard-0/execution/LIB5437927_SAM24396520/outs/filtered_feature_bc_matrix.h5")
#data_gex2.var_names_make_unique()
data_gex3 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396519/output/cellranger_count_601/770eeaed-72a4-4402-8471-d654f7cd325d/call-run_cellranger_count/shard-0/execution/LIB5437925_SAM24396519/outs/filtered_feature_bc_matrix.h5")
#data_gex3.var_names_make_unique()
data_gex4 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396522/output/cellranger_count_601/f6389e38-5f08-421e-9028-8bd25815e13e/call-run_cellranger_count/shard-0/execution/LIB5437931_SAM24396522/outs/filtered_feature_bc_matrix.h5")
#data_gex4.var_names_make_unique()
data_gex5 = sc.read_10x_h5("/gstore/data/genomics/congee_rest_runs/60bfb4e7a2285de000da35db/SAM24396523/output/cellranger_count_601/0752c4ea-3449-4df3-9872-08e48fbd77a9/call-run_cellranger_count/shard-0/execution/LIB5437933_SAM24396523/outs/filtered_feature_bc_matrix.h5")
#data_gex5.var_names_make_unique()

#data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5, batch_key='data_gex', batch_categories=gex_filenames)
data_gex = data_gex1.concatenate(data_gex2, data_gex3, data_gex4, data_gex5)

ANy tips apppreciated, thanks!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13

github_iconTop GitHub Comments

1reaction
grstcommented, Sep 10, 2021

they work also on the integrated object.

The issue here is that your obs names (i.e. cell barcodes) are not unique.

Try calling adata.obs_names_make_unique().

Hope that helps!

EDIT: sorry, confused obs and var (obs = celll barcodes, var = genes)

On Fri, 10 Sept 2021 at 20:46, Himanshu Chintalapudi < @.***> wrote:

Hi, Sorry to bother again. So, for this type of integrated analysis, would it make sense to do the TCR quality control and pre-processing of the expression data within the for-loop before merging with ir.pp.merge_with_ir()? Or do the QC functions work for TCR and exp work with the merged ‘adata’ object? I ask as I’m getting errors when running ir.tl.define_clonotype_clusters() :

Traceback (most recent call last): File “/gstore/home/chintalh/scripts/scirpy_example2.py”, line 38, in <module> ir.tl.define_clonotype_clusters( File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/io/_util.py”, line 67, in check_wrapper return f(*args, **kwargs) File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/_tools/_clonotypes.py”, line 260, in define_clonotype_clusters ctn = ClonotypeNeighbors( File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/ir_dist/_clonotype_neighbors.py”, line 58, in init self._prepare(adata) File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/ir_dist/_clonotype_neighbors.py”, line 63, in _prepare self._make_clonotype_table(adata) File “/gstore/home/chintalh/miniconda3/envs/scirpy/lib/python3.9/site-packages/scirpy/ir_dist/_clonotype_neighbors.py”, line 73, in _make_clonotype_table raise ValueError(“Obs names need to be unique!”) ValueError: Obs names need to be unique!

Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/icbi-lab/scirpy/issues/289#issuecomment-917130866, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVZRV3KJY77ONIPJZ25U4DUBJHALANCNFSM5CZXIIHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

0reactions
grstcommented, Sep 11, 2021

Update to 0.9.0?

Yes, you’ll need v0.9!

I just updated on conda and I have 0.8.0

Will that work effectively?

conda upgrade --all in your env should work. If not you can try to install v0.9 explicitly:

conda install "scirpy>=0.9.0"

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/icbi-lab/scirpy/issues/289#issuecomment-917208222, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVZRV34HOB32DTP3I3S3RTUBJWZNANCNFSM5CZXIIHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Read more comments on GitHub >

github_iconTop Results From Across the Web

scirpy - bytemeta
Add wrapper for conga · merge/concatenate gene expression .h5 files · Using own distance matrix for clonotype_network · TRAV/DV rearrangements · Import read_airr ......
Read more >
Untitled
The h5py Python module must be installed to use HDF5 files (included in ... RNA-seq data consists of expression measurements for all genes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found