Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Human readable files/folders as Genus_species_strain

See original GitHub issue

In my (older unreleased scruffier) version of this tool I create a hierarchy like this:

Kingdom/
   Genus/
       species/
            strain/
                    blah.gbk
                    blah.fna
                    blah.gff

I currently set blah to Genus_species_strain but that loses the GCA_xxxx accession. I was thinking of having a a ‘mirror’ folder of symlinks with human readable names.

It was tricky to extract the strain as it appears in up to 3 different columns sometimes, but it mostly works.

The reason for this is to make it easy to work with sequences and get human readable labels etc.

Issue Analytics

State:
Created 7 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

kblincommented, Sep 9, 2016

So, apart from making up my mind how to best do the cleanup, I’ve got a working implementation of this. Expect this in the next version of ncbi-genome-download.

1reaction

tseemanncommented, Aug 25, 2016

FIrst I try and match the strain from column 8 $x[7] =~ m/^(\S+)\s+(\S+)(\s+.*)?$/

Then if that fails I try column 9: $x[8] =~ m/^strain=(.*)$/)

If that fails I use the assembly ID: $strain ||= $x[0];

And then i sanitize the string:

$s =~ s/[\[\]]/ /g; $s =~ s/^\s+//g; $s =~ s/\s+$//g; $s =~ s/\s+/_/g; $s =~ s/['"]/_/g; $s =~ s{[/;:]}{-}g; $s =~ s/_+/_/g;

Top Results From Across the Web

Genomes Download (FTP) FAQ - NCBI

Genomes Download (FTP) FAQ. What are the highlights of the genomes FTP site? What is the easiest way to download data for multiple...

GitHub - biobakery/humann

HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or ...

mOTUs: Profiling Taxonomic Composition, Transcriptional ...

To this end, we developed mOTUs: a software tool and database for profiling taxonomic composition, transcriptional activity, and strain ...

Complete genome sequence of Jiangella gansuensis strain ...

Complete genome sequence of Jiangella gansuensis strain YIM 002 T (DSM 44835 T ), the type species of the genus Jiangella and source...

Standardized phylogenetic and molecular evolutionary ...

None have shown broad utility incorporating multiple species (i.e. genus-level phylogeny) or genera within a single tree, nor have any been ...