question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Human readable files/folders as Genus_species_strain

See original GitHub issue

In my (older unreleased scruffier) version of this tool I create a hierarchy like this:

Kingdom/
   Genus/
       species/
            strain/
                    blah.gbk
                    blah.fna
                    blah.gff

I currently set blah to Genus_species_strain but that loses the GCA_xxxx accession. I was thinking of having a a ‘mirror’ folder of symlinks with human readable names.

It was tricky to extract the strain as it appears in up to 3 different columns sometimes, but it mostly works.

The reason for this is to make it easy to work with sequences and get human readable labels etc.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
kblincommented, Sep 9, 2016

So, apart from making up my mind how to best do the cleanup, I’ve got a working implementation of this. Expect this in the next version of ncbi-genome-download.

1reaction
tseemanncommented, Aug 25, 2016

FIrst I try and match the strain from column 8 $x[7] =~ m/^(\S+)\s+(\S+)(\s+.*)?$/

Then if that fails I try column 9: $x[8] =~ m/^strain=(.*)$/)

If that fails I use the assembly ID: $strain ||= $x[0];

And then i sanitize the string:

$s =~ s/[\[\]]/ /g; $s =~ s/^\s+//g; $s =~ s/\s+$//g; $s =~ s/\s+/_/g; $s =~ s/['"]/_/g; $s =~ s{[/;:]}{-}g; $s =~ s/_+/_/g;

Read more comments on GitHub >

github_iconTop Results From Across the Web

Genomes Download (FTP) FAQ - NCBI
Genomes Download (FTP) FAQ. What are the highlights of the genomes FTP site? What is the easiest way to download data for multiple...
Read more >
GitHub - biobakery/humann
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or ...
Read more >
mOTUs: Profiling Taxonomic Composition, Transcriptional ...
To this end, we developed mOTUs: a software tool and database for profiling taxonomic composition, transcriptional activity, and strain ...
Read more >
Complete genome sequence of Jiangella gansuensis strain ...
Complete genome sequence of Jiangella gansuensis strain YIM 002 T (DSM 44835 T ), the type species of the genus Jiangella and source...
Read more >
Standardized phylogenetic and molecular evolutionary ...
None have shown broad utility incorporating multiple species (i.e. genus-level phylogeny) or genera within a single tree, nor have any been ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found