MPI bug when multiple GPUs are used per calculation
See original GitHub issueI wanted to create an issue about this in openmmtools as well since we completed the transfering of the multistate
code from YANK.
We’re still experiencing mixing problem with the latest version of MPI when multiple GPUs are available (see choderalab/yank#1130 and #407). I’ve added a test to test_sampling
checking this in #407, but I still haven’t figure out the reason for the bug.
I’m sure this was working correctly in YANK during the SAMPLing challenge (i.e., YANK 0.20.1, right before adding the multistate
module) so the next step would probably be trying binary search on the YANK versions after that to identify where the problem was introduced.
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (10 by maintainers)
Top Results From Across the Web
MPI bug when multiple GPUs are used per calculation #449
I wanted to create an issue about this in openmmtools as well since we completed the transfering of the multistate code from YANK....
Read more >Multi GPU Programming with MPI (Part I+II+III)
How to use MPI for inter GPU communication with CUDA and. OpenACC ... EXAMPLE: JACOBI SOLVER – MULTI GPU ... Solves the 2D-Laplace...
Read more >Did the GPU obfuscate the load imbalance in my MPI ...
Abstract—. The current proliferation of GPU-based HPC systems neces- sitates a method for assessing the performance of simulations.
Read more >MPI domain decomposition - HOOMD-blue - Read the Docs
HOOMD-blue supports multi-GPU (and multi-CPU) simulations using MPI. It uses a spatial domain decomposition approach similar to the one used by LAMMPS.
Read more >MPI + GPU : how to mix the two techniques - Stack Overflow
You use MPI between tasks (for which think nodes, although you can have multiple tasks per node), and each task may or may...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks a lot for all the work and effort from @zhang-ivy , specially by pointing me to the differences between yank versions
0.20.1
and0.21.0
which was when this issue first appeared. I think I managed to come up with a solution.I made a PR with a probable solution. As far as I could tell, the
_replica_thermodynamic_states
attribute was not getting broadcasted to the MPI context. More details in the PR@zhang-ivy if you can confirm that this solves it it with all your examples and systems it would be really nice as a validation. Just need to install the
fix-mpi-replica-mix
branch with something likepip install "git+https://github.com/choderalab/openmmtools.git@fix-mpi-replica-mix"
That sounds like the best thing to try now that you have a working test! You could use
git bisect run
to automate this process. Since the samplers moved from YANK toopenmmtools
, it could be a simple matter of testing the last version of YANK that included the multistate samplers (which presumably still have the bug) and then bisecting between that and 0.20.1.