replica exchange performance
See original GitHub issueI’ve been working with the replica exchange module, trying to run standard temperature REMD. I set up an OpenMM script using the proposed format here: openmmtools–Missing feature for Replica Exchange ? I’m running on Titan, which is set up with single-GPU nodes with K80 cards and aprun for job submission. I have openmm, openmmtools, and yank all installed using conda.
I’m finding that the speed decays when switching from just openmm to a “single replica” yank setup, and then further when adding replicas. I sense that some of this is just different output file formats/needs, and then communication time between nodes/cards when adding replicas, but I’m wondering how much should be expected? The degree of slowdown makes me feel like I must have something not configured right, like maybe each replica isn’t being properly assigned to its own node/GPU. I’m submitting test jobs using e.g. aprun -n 2 -N 1 python yank_test.py
for two replicas.
Thanks!
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
If you’re doing temperature REMD (and you’re not doing this already), then you could also try to use
ParallelTempering
instead of theReplicaExchange
class. We haven’t used it much, but the computation of the MBAR energy matrix at each iteration should be faster.Both the Gibbs sampling procedure and the MBAR energy matrix computation scale superlinearly w.r.t. the number of states, and the I/O operations scale more or less linearly so a worsened performance is to be expected, although I’m not sure about the actual numbers.
Also, I’d make sure you are not using
GHMCMove
, which is presented as an example in the snippet on that thread, unless you require exact sampling of the distribution.@jlincoff : Can you provide more information to help us debug this?