Human readable files/folders as Genus_species_strain
See original GitHub issueIn my (older unreleased scruffier) version of this tool I create a hierarchy like this:
Kingdom/
Genus/
species/
strain/
blah.gbk
blah.fna
blah.gff
I currently set blah
to Genus_species_strain
but that loses the GCA_xxxx
accession. I was thinking of having a a ‘mirror’ folder of symlinks with human readable names.
It was tricky to extract the strain
as it appears in up to 3 different columns sometimes, but it mostly works.
The reason for this is to make it easy to work with sequences and get human readable labels etc.
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Genomes Download (FTP) FAQ - NCBI
Genomes Download (FTP) FAQ. What are the highlights of the genomes FTP site? What is the easiest way to download data for multiple...
Read more >GitHub - biobakery/humann
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or ...
Read more >mOTUs: Profiling Taxonomic Composition, Transcriptional ...
To this end, we developed mOTUs: a software tool and database for profiling taxonomic composition, transcriptional activity, and strain ...
Read more >Complete genome sequence of Jiangella gansuensis strain ...
Complete genome sequence of Jiangella gansuensis strain YIM 002 T (DSM 44835 T ), the type species of the genus Jiangella and source...
Read more >Standardized phylogenetic and molecular evolutionary ...
None have shown broad utility incorporating multiple species (i.e. genus-level phylogeny) or genera within a single tree, nor have any been ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So, apart from making up my mind how to best do the cleanup, I’ve got a working implementation of this. Expect this in the next version of
ncbi-genome-download
.FIrst I try and match the strain from column 8
$x[7] =~ m/^(\S+)\s+(\S+)(\s+.*)?$/
Then if that fails I try column 9:
$x[8] =~ m/^strain=(.*)$/)
If that fails I use the assembly ID:
$strain ||= $x[0];
And then i sanitize the string:
$s =~ s/[\[\]]/ /g; $s =~ s/^\s+//g; $s =~ s/\s+$//g; $s =~ s/\s+/_/g; $s =~ s/['"]/_/g; $s =~ s{[/;:]}{-}g; $s =~ s/_+/_/g;