Script to BLAST spacers against host genome
See original GitHub issueMotivation
Research has suggested that there may be cases when CRISPR systems are used for something besides immunity to foreign DNA - perhaps they could be regulating the host genome, or they might simply be inactive. A clue that one of these things might be happening is if there are spacers that come from their own host genome. To this end, we need functions to (A) BLAST spacers against the host genome and (B) analyze the BLAST output. The first function is described in this issue. The second is described in issue #62.
The Function
Input:
- List of spacers in the genome. Example files in data/spacers contain a list of all the CRISPR spacers identified for that organism. The organism is identified by an NCBI accession number, which is the name of the file.
- Accession number of organism (title of file containing list of spacers) - i.e. NC_000853.
- Optional time window
t
(in months) - if date downloaded is more thant
months ago, re-download from NCBI. - moved to Issue #74 - Optional BLAST parameters - could be left as default for first iteration.
The function should do the following:
- Check if the genome of the organism has already been downloaded - i.e., is it in
phageParser/data/prokaryote_genomes
? - moved to Issue #74 - If it hasn’t been downloaded OR if it is more than
t
months out of date, fetch the genome from NCBI using acc2gb.py. - moved to Issue #74 - BLAST the list of spacers for that organism against the genome using BioPython’s standalone BLAST wrapper. Note: you will need to install BLAST+ locally. The script BLAST_loop.py could be used as a template for this process. The main difference between the two is that in this version the subject sequence is a single genome, not a database.
Output:
- XML file with BLAST output.
Note: The default parameters may need some fiddling - the defaults in BLAST_loop.py are a good start.
Issue Analytics
- State:
- Created 8 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
The CRISPR Spacer Space Is Dominated by Sequences from ...
The CRISPR defense function is mediated by sequences from parasitic elements, known as spacers, that are inserted into CRISPR arrays and then ...
Read more >Streamlining CRISPR spacer-based bacterial host predictions ...
CRISPR spacers can be used to predict hosts of unknown phages, as spacers represent biological records of past phage–bacteria interactions.
Read more >davidchyou/CRISPRHost: Predict viral hosts by BLASTN ...
This application takes a partial prokaryotic virus sequence and predicts the host. It utilizes the fact that foreign DNA molecules such as phage...
Read more >The CRISPRdb database and tools to display CRISPRs and to ...
A BLAST (blastn) can be run using selected spacers against public sequence databases (GenBank, EMBL, DDBJ, PDB) with a cutoff of 0.1 for...
Read more >Imprecise Spacer Acquisition Generates CRISPR
munity relies on genetic memories, termed spacers, for sequence-specific recognition of infections. The diversity of spacers within host ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@lwgray will be taking a look at this issue!
blast.py
(#201) is now a general-purpose blast script.