question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Script to BLAST spacers against host genome

See original GitHub issue

Motivation

Research has suggested that there may be cases when CRISPR systems are used for something besides immunity to foreign DNA - perhaps they could be regulating the host genome, or they might simply be inactive. A clue that one of these things might be happening is if there are spacers that come from their own host genome. To this end, we need functions to (A) BLAST spacers against the host genome and (B) analyze the BLAST output. The first function is described in this issue. The second is described in issue #62.


The Function

Input:

  • List of spacers in the genome. Example files in data/spacers contain a list of all the CRISPR spacers identified for that organism. The organism is identified by an NCBI accession number, which is the name of the file.
  • Accession number of organism (title of file containing list of spacers) - i.e. NC_000853.
  • Optional time window t (in months) - if date downloaded is more than t months ago, re-download from NCBI. - moved to Issue #74
  • Optional BLAST parameters - could be left as default for first iteration.

The function should do the following:

  • Check if the genome of the organism has already been downloaded - i.e., is it in phageParser/data/prokaryote_genomes? - moved to Issue #74
  • If it hasn’t been downloaded OR if it is more than t months out of date, fetch the genome from NCBI using acc2gb.py. - moved to Issue #74
  • BLAST the list of spacers for that organism against the genome using BioPython’s standalone BLAST wrapper. Note: you will need to install BLAST+ locally. The script BLAST_loop.py could be used as a template for this process. The main difference between the two is that in this version the subject sequence is a single genome, not a database.

Output:

  • XML file with BLAST output.

Note: The default parameters may need some fiddling - the defaults in BLAST_loop.py are a good start.

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
mbonsmacommented, Apr 27, 2016

@lwgray will be taking a look at this issue!

0reactions
mbonsmacommented, Jun 12, 2017

blast.py (#201) is now a general-purpose blast script.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The CRISPR Spacer Space Is Dominated by Sequences from ...
The CRISPR defense function is mediated by sequences from parasitic elements, known as spacers, that are inserted into CRISPR arrays and then ...
Read more >
Streamlining CRISPR spacer-based bacterial host predictions ...
CRISPR spacers can be used to predict hosts of unknown phages, as spacers represent biological records of past phage–bacteria interactions.
Read more >
davidchyou/CRISPRHost: Predict viral hosts by BLASTN ...
This application takes a partial prokaryotic virus sequence and predicts the host. It utilizes the fact that foreign DNA molecules such as phage...
Read more >
The CRISPRdb database and tools to display CRISPRs and to ...
A BLAST (blastn) can be run using selected spacers against public sequence databases (GenBank, EMBL, DDBJ, PDB) with a cutoff of 0.1 for...
Read more >
Imprecise Spacer Acquisition Generates CRISPR
munity relies on genetic memories, termed spacers, for sequence-specific recognition of infections. The diversity of spacers within host ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found