Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rerunning analyses

See original GitHub issue

I used pyani to compare 23 genomes that are all around 5 Mb in size. Running on two of my laptop’s processors, pyani took ~6 minutes to run the tetra analysis, ~1 hour to run the ANIg, and ~4 hours each to run the ANIb and ANIblastall analyses. Not long after that another genome became available. I was hoping that I could use pyani to compare the 24th genome to the other 23 without having to rerun all the previous comparisons. It looks to me like that’s what the -f and --noclobber arguments are supposed to accomplish. But when I ran

./average_nucleotide_identity.py -i tests/my_genomes/ -o tests/my_genomes_ANIm_output -m ANIm -g -v -f --noclobber

pyani still re-calculated every possible pairwise ANI (which took a little over an hour).

It would be a real time-saver (especially for larger data sets) if pyani can check whether any pairwise comparisons have already been done and then skip them.

Issue Analytics

State:
Created 8 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

peterjccommented, Mar 30, 2016

Thanks for the tip about skipping the computations to generate the graphics. It wasn’t obvious exactly how to do it, so I’ve prepared an example here.

First run forgetting to request graphics:

$ ~/repositories/pyani/average_nucleotide_identity.py -i . -o demo
$ ls -1 demo/ANIm_*
demo/ANIm_alignment_coverage.tab
demo/ANIm_alignment_lengths.tab
demo/ANIm_percentage_identity.tab
demo/ANIm_similarity_errors.tab

Then re-run to get the graphics adding --graphics --skip_nucmer --force --noclobber,

$ ~/repositories/pyani/average_nucleotide_identity.py -i . -o demo --graphics --skip_nucmer --force --noclobber
WARNING: NOCLOBBER: not actually deleting directory
WARNING: Skipping NUCmer run (as instructed)!
$ ls -1 demo/ANIm_*
demo/ANIm_alignment_coverage.eps
demo/ANIm_alignment_coverage.pdf
demo/ANIm_alignment_coverage.png
demo/ANIm_alignment_coverage.tab
demo/ANIm_alignment_lengths.eps
demo/ANIm_alignment_lengths.pdf
demo/ANIm_alignment_lengths.png
demo/ANIm_alignment_lengths.tab
demo/ANIm_percentage_identity.eps
demo/ANIm_percentage_identity.pdf
demo/ANIm_percentage_identity.png
demo/ANIm_percentage_identity.tab
demo/ANIm_similarity_errors.eps
demo/ANIm_similarity_errors.pdf
demo/ANIm_similarity_errors.png
demo/ANIm_similarity_errors.tab

Cheers!

1reaction

peterjccommented, Mar 30, 2016

Putting the *.delta files under a sub-folder would also help when there are tens of thousands of them - it makes the handful of summary files easier to see 😉