Rerunning analyses
See original GitHub issueI used pyani to compare 23 genomes that are all around 5 Mb in size. Running on two of my laptop’s processors, pyani took ~6 minutes to run the tetra analysis, ~1 hour to run the ANIg, and ~4 hours each to run the ANIb and ANIblastall analyses. Not long after that another genome became available. I was hoping that I could use pyani to compare the 24th genome to the other 23 without having to rerun all the previous comparisons. It looks to me like that’s what the -f and --noclobber arguments are supposed to accomplish. But when I ran
./average_nucleotide_identity.py -i tests/my_genomes/ -o tests/my_genomes_ANIm_output -m ANIm -g -v -f --noclobber
pyani still re-calculated every possible pairwise ANI (which took a little over an hour).
It would be a real time-saver (especially for larger data sets) if pyani can check whether any pairwise comparisons have already been done and then skip them.
Issue Analytics
- State:
- Created 8 years ago
- Comments:9 (9 by maintainers)
Thanks for the tip about skipping the computations to generate the graphics. It wasn’t obvious exactly how to do it, so I’ve prepared an example here.
First run forgetting to request graphics:
Then re-run to get the graphics adding
--graphics --skip_nucmer --force --noclobber
,Cheers!
Putting the
*.delta
files under a sub-folder would also help when there are tens of thousands of them - it makes the handful of summary files easier to see 😉