question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed reading 10x h5 file - CellBender v2.1

See original GitHub issue

Last week I ran CellBender v1 over a CellRanger V3 library I got, and then successfully loaded the h5 output into a Seurat object.

CreateSeuratObject(Read10X_h5("the_cell_bender_outout_filtered.h5"))

Now, when I tried using CellBender V2.1 instead of V1 over the same CellRanger library, I got the following exception:

Error in `[[.H5File`(infile, paste0(genome, "/", feature_slot)) :
  An object with name matrix/gene_names does not exist in this group

Comparing the output h5 files of both versions they seem to have the same format with the difference that the V2.1 output has the "PYTABLES_FORMAT_VERSION". Is it flagged like that on purpose? This is the flag that causes the Seurat Read10X_h5 function to fail: (part of Read10X_h5 code)

  infile <- hdf5r::H5File$new(filename = filename, mode = "r")
  genomes <- names(x = infile)
  output <- list()
  if (!infile$attr_exists("PYTABLES_FORMAT_VERSION")) {
    if (use.names) {
      feature_slot <- "features/name"
    }
    else {
      feature_slot <- "features/id"
    }
  }
  else {
    if (use.names) {
      feature_slot <- "gene_names"
    }
    else {
      feature_slot <- "genes"
    }
  }

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
sjflemingcommented, Oct 27, 2020

Okay, so the if statement that @GreenGilad mentioned above is indeed the problem. Since we create a file using PyTables, there will be a PYTABLES_FORMAT_VERSION attribute.

My proposal is to substitute their current if statement https://github.com/satijalab/seurat/blob/fe93b05745e55ec2f66e3f0b4c4196aad9f4d5a7/R/preprocessing.R#L1155

with

if (hdf5r::existsGroup(infile, 'matrix'))

For now, I think the code below is a potential workaround. I may try to submit a pull request to Seurat to incorporate this, as I think it makes more sense than relying on a version attribute from PyTables.

library(Matrix)

ReadCB_h5 <- function(filename, use.names = TRUE, unique.features = TRUE) {
  if (!requireNamespace('hdf5r', quietly = TRUE)) {
    stop("Please install hdf5r to read HDF5 files")
  }
  if (!file.exists(filename)) {
    stop("File not found")
  }
  infile <- hdf5r::H5File$new(filename = filename, mode = 'r')
  genomes <- names(x = infile)
  output <- list()
  if (hdf5r::existsGroup(infile, 'matrix')) {
    # cellranger version 3
    message('CellRanger version 3+ format H5')
    if (use.names) {
      feature_slot <- 'features/name'
    } else {
      feature_slot <- 'features/id'
    }
  } else {
    message('CellRanger version 2 format H5')
    if (use.names) {
      feature_slot <- 'gene_names'
    } else {
      feature_slot <- 'genes'
    }
  }
  for (genome in genomes) {
    counts <- infile[[paste0(genome, '/data')]]
    indices <- infile[[paste0(genome, '/indices')]]
    indptr <- infile[[paste0(genome, '/indptr')]]
    shp <- infile[[paste0(genome, '/shape')]]
    features <- infile[[paste0(genome, '/', feature_slot)]][]
    barcodes <- infile[[paste0(genome, '/barcodes')]]
    sparse.mat <- sparseMatrix(
      i = indices[] + 1,
      p = indptr[],
      x = as.numeric(x = counts[]),
      dims = shp[],
      giveCsparse = FALSE
    )
    if (unique.features) {
      features <- make.unique(names = features)
    }
    rownames(x = sparse.mat) <- features
    colnames(x = sparse.mat) <- barcodes[]
    sparse.mat <- as(object = sparse.mat, Class = 'dgCMatrix')
    # Split v3 multimodal
    if (infile$exists(name = paste0(genome, '/features'))) {
      types <- infile[[paste0(genome, '/features/feature_type')]][]
      types.unique <- unique(x = types)
      if (length(x = types.unique) > 1) {
        message("Genome ", genome, " has multiple modalities, returning a list of matrices for this genome")
        sparse.mat <- sapply(
          X = types.unique,
          FUN = function(x) {
            return(sparse.mat[which(x = types == x), ])
          },
          simplify = FALSE,
          USE.NAMES = TRUE
        )
      }
    }
    output[[genome]] <- sparse.mat
  }
  infile$close_all()
  if (length(x = output) == 1) {
    return(output[[genome]])
  } else{
    return(output)
  }
}

Loading a CellBender remove-background output file in a legacy CellRanger v2 format:

image

Loading a CellBender remove-background output file in the newer CellRanger v3+ format: (I had to subset to “Gene Expression” to successfully use CreateSeuratObject)

image

Read more comments on GitHub >

github_iconTop Results From Across the Web

Background Removal Guidance for Single Cell Gene ...
This article assumes the raw sequence data have already been demultiplexed using either the bcl2fastq or mkfastq pipelines (i.e., you are ...
Read more >
Read & Write Data Functions • scCustomize
This causes Seurat::Read10X_h5() to to fail when trying to import data ... Import 10X Genomics H5 Formatted Files single directory with file ......
Read more >
Read 10X hdf5 file — Read10X_h5 • Seurat - Satija Lab
Read count matrix from 10X CellRanger hdf5 file. This can be used to read both scATAC-seq and scRNA-seq matrices. Read10X_h5(filename, use.names = TRUE, ......
Read more >
791699v2.full.pdf - bioRxiv
droplet-based single-cell experiments using CellBender ... input: (1) raw HDF5 file from 10x Genomics' CellRanger v2+ count pipeline, (2) ...
Read more >
Scalable single-cell RNA sequencing from full transcripts with ...
After shallow sequencing, alignment and error correction of reads, ... 3e,f), SeqAmp with 2 µM TSO and 1 µM of each PCR primer...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found