Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed reading 10x h5 file - CellBender v2.1

See original GitHub issue

Last week I ran CellBender v1 over a CellRanger V3 library I got, and then successfully loaded the h5 output into a Seurat object.

CreateSeuratObject(Read10X_h5("the_cell_bender_outout_filtered.h5"))

Now, when I tried using CellBender V2.1 instead of V1 over the same CellRanger library, I got the following exception:

Error in `[[.H5File`(infile, paste0(genome, "/", feature_slot)) :
  An object with name matrix/gene_names does not exist in this group

Comparing the output h5 files of both versions they seem to have the same format with the difference that the V2.1 output has the "PYTABLES_FORMAT_VERSION". Is it flagged like that on purpose? This is the flag that causes the Seurat Read10X_h5 function to fail: (part of Read10X_h5 code)

  infile <- hdf5r::H5File$new(filename = filename, mode = "r")
  genomes <- names(x = infile)
  output <- list()
  if (!infile$attr_exists("PYTABLES_FORMAT_VERSION")) {
    if (use.names) {
      feature_slot <- "features/name"
    }
    else {
      feature_slot <- "features/id"
    }
  }
  else {
    if (use.names) {
      feature_slot <- "gene_names"
    }
    else {
      feature_slot <- "genes"
    }
  }

Issue Analytics

State:
Created 3 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

1reaction

sjflemingcommented, Oct 27, 2020

https://github.com/satijalab/seurat/issues/3653

1reaction

sjflemingcommented, Oct 27, 2020

Okay, so the if statement that @GreenGilad mentioned above is indeed the problem. Since we create a file using PyTables, there will be a PYTABLES_FORMAT_VERSION attribute.

My proposal is to substitute their current if statement https://github.com/satijalab/seurat/blob/fe93b05745e55ec2f66e3f0b4c4196aad9f4d5a7/R/preprocessing.R#L1155

with

if (hdf5r::existsGroup(infile, 'matrix'))

For now, I think the code below is a potential workaround. I may try to submit a pull request to Seurat to incorporate this, as I think it makes more sense than relying on a version attribute from PyTables.

library(Matrix)

ReadCB_h5 <- function(filename, use.names = TRUE, unique.features = TRUE) {
  if (!requireNamespace('hdf5r', quietly = TRUE)) {
    stop("Please install hdf5r to read HDF5 files")
  }
  if (!file.exists(filename)) {
    stop("File not found")
  }
  infile <- hdf5r::H5File$new(filename = filename, mode = 'r')
  genomes <- names(x = infile)
  output <- list()
  if (hdf5r::existsGroup(infile, 'matrix')) {
    # cellranger version 3
    message('CellRanger version 3+ format H5')
    if (use.names) {
      feature_slot <- 'features/name'
    } else {
      feature_slot <- 'features/id'
    }
  } else {
    message('CellRanger version 2 format H5')
    if (use.names) {
      feature_slot <- 'gene_names'
    } else {
      feature_slot <- 'genes'
    }
  }
  for (genome in genomes) {
    counts <- infile[[paste0(genome, '/data')]]
    indices <- infile[[paste0(genome, '/indices')]]
    indptr <- infile[[paste0(genome, '/indptr')]]
    shp <- infile[[paste0(genome, '/shape')]]
    features <- infile[[paste0(genome, '/', feature_slot)]][]
    barcodes <- infile[[paste0(genome, '/barcodes')]]
    sparse.mat <- sparseMatrix(
      i = indices[] + 1,
      p = indptr[],
      x = as.numeric(x = counts[]),
      dims = shp[],
      giveCsparse = FALSE
    )
    if (unique.features) {
      features <- make.unique(names = features)
    }
    rownames(x = sparse.mat) <- features
    colnames(x = sparse.mat) <- barcodes[]
    sparse.mat <- as(object = sparse.mat, Class = 'dgCMatrix')
    # Split v3 multimodal
    if (infile$exists(name = paste0(genome, '/features'))) {
      types <- infile[[paste0(genome, '/features/feature_type')]][]
      types.unique <- unique(x = types)
      if (length(x = types.unique) > 1) {
        message("Genome ", genome, " has multiple modalities, returning a list of matrices for this genome")
        sparse.mat <- sapply(
          X = types.unique,
          FUN = function(x) {
            return(sparse.mat[which(x = types == x), ])
          },
          simplify = FALSE,
          USE.NAMES = TRUE
        )
      }
    }
    output[[genome]] <- sparse.mat
  }
  infile$close_all()
  if (length(x = output) == 1) {
    return(output[[genome]])
  } else{
    return(output)
  }
}

Loading a CellBender remove-background output file in a legacy CellRanger v2 format:

Loading a CellBender remove-background output file in the newer CellRanger v3+ format: (I had to subset to “Gene Expression” to successfully use CreateSeuratObject)

Top Results From Across the Web

Background Removal Guidance for Single Cell Gene ...

This article assumes the raw sequence data have already been demultiplexed using either the bcl2fastq or mkfastq pipelines (i.e., you are ...

Read & Write Data Functions • scCustomize

This causes Seurat::Read10X_h5() to to fail when trying to import data ... Import 10X Genomics H5 Formatted Files single directory with file ......

Read 10X hdf5 file — Read10X_h5 • Seurat - Satija Lab

Read count matrix from 10X CellRanger hdf5 file. This can be used to read both scATAC-seq and scRNA-seq matrices. Read10X_h5(filename, use.names = TRUE, ......

791699v2.full.pdf - bioRxiv

droplet-based single-cell experiments using CellBender ... input: (1) raw HDF5 file from 10x Genomics' CellRanger v2+ count pipeline, (2) ...

Scalable single-cell RNA sequencing from full transcripts with ...

After shallow sequencing, alignment and error correction of reads, ... 3e,f), SeqAmp with 2 µM TSO and 1 µM of each PCR primer...