Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Binding replicas to different platforms

See original GitHub issue

Hello everyone, Thanks for making this package! I am trying to do a simple Hamiltonian or Replica exchange simulation where I would like simulation objects to use multiple available gpus.

Naively, i would have thought that the Hamiltonian exchange object would take in a series of simulation objects where each simulation is bound to a platform and the exchange swaps out the state every n steps. However, the code seems to take in the systems objects only which makes me think that its creating the integrator and platform objects somewhere internally. Are those options exposed somewhere or can i pass a list of GPU indices to them?

Here is what i have right now based off the yank website.

nvidia-smi 
Thu Dec  8 10:03:32 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:05:00.0     Off |                    0 |
| N/A   26C    P8    28W / 175W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:09:00.0     Off |                    0 |
| N/A   23C    P8    28W / 175W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |

here is the python script

cat test.py 
#!/bin/env python
from simtk import openmm as mm
from yank.repex import *
from openmmtools import testsystems
from mpi4py import MPI
testsystem = testsystems.AlanineDipeptideImplicit()
[base_system, positions] = [testsystem.system, testsystem.positions]
# Copy baseline system.
systems = [base_system for index in range(2)]
# Create temporary file for storing output.
import tempfile
file = tempfile.NamedTemporaryFile() # temporary file for testing
store_filename = "./test.dat"#file.name
# Create baseline state.
base_state = ThermodynamicState(base_system, temperature=298.0*unit.kelvin)
# Create simulation.
simulation = HamiltonianExchange(store_filename,mpicomm=MPI.COMM_WORLD)
simulation.create(base_state, systems, positions)
simulation.number_of_iterations = 2 # set the simulation to only run 2 iterations
simulation.timestep = 2.0 * unit.femtoseconds # set the timestep for integration
simulation.nsteps_per_iteration = 50 # run 50 timesteps per iteration
simulation.minimize = False
# Run simulation.
simulation.run()

and here is the error i get.

mpirun -hosts gpu-27-21 -np 1 -env CUDA_VISIBLE_DEVICES 0 python test.py  : -np 1 -env CUDA_VISIBLE_DEVICES 1 python test.py
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    simulation.create(base_state, systems, positions)
  File "/home/msultan/software/anaconda/lib/python3.5/site-packages/yank/repex.py", line 2773, in create
    ReplicaExchange.create(self, states, positions, options=options, metadata=metadata)
  File "/home/msultan/software/anaconda/lib/python3.5/site-packages/yank/repex.py", line 692, in create
    self._initialize_create()
  File "/home/msultan/software/anaconda/lib/python3.5/site-packages/yank/repex.py", line 975, in _initialize_create
    self.ncfile.close()
AttributeError: 'HamiltonianExchange' object has no attribute 'ncfile'

I am certain, I am doing something incredibly dumb here but i cant figure out what. Any help would be greatly appreciated.

Cheers!

Issue Analytics

State:
Created 7 years ago
Comments:29 (17 by maintainers)

Top GitHub Comments

3reactions

andrrizzicommented, Dec 10, 2016

Sorry! I forgot about this. This trick works for me with two GPUs. Let me know if it fixes for you.

mpicomm = MPI.COMM_WORLD
if mpicomm.rank == 0:
    simulation = HamiltonianExchange(store_filename, mpicomm=mpicomm)
    simulation.number_of_iterations = 2 # set the simulation to only run 2 iterations
    simulation.timestep = 2.0 * unit.femtoseconds # set the timestep for integration
    simulation.nsteps_per_iteration = 50 # run 50 timesteps per iteration
    simulation.minimize = False
    simulation.create(base_state, systems, positions)
    del simulation
else:
    dummy_var = False
    mpicomm.bcast(dummy_var, root=0)
    mpicomm.bcast(dummy_var, root=0)
mpicomm.barrier()
simulation = HamiltonianExchange(store_filename, mpicomm=mpicomm)
simulation.resume()

simulation.run()

1reaction

msultancommented, Dec 11, 2016

yayy! its working now. Thanks so much for all your help @andrrizzi !

nvidia-smi 
Sat Dec 10 22:38:28 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:0A:00.0     Off |                    0 |
| N/A   29C    P0    79W / 175W |     65MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:85:00.0     Off |                    0 |
| N/A   43C    P0    86W / 175W |     65MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     44907    C   python                                          63MiB |
|    1     44908    C   python                                          63MiB |

Top Results From Across the Web

D.3. Managing Replicas and Replication Agreements

Replicas are joined in a replication agreement that copies data between them. Replication agreements are bilateral: the data is replicated from the first ......

(PDF) New replica selection technique for binding replica sites ...

This paper presents a replica selection strategy that adapts its criteria dynamically so as to best approximate application providers' and ...

How To Set Up Replication in MySQL - DigitalOcean

Every server in a replication environment, including the source and all its replicas, must have their own unique server-id value.

Deployments | Architecture | OpenShift Container Platform 3.11

Replica sets can be used independently, but are used by deployments to orchestrate pod creation, deletion, and updates. Deployments manage their replica sets ......

docker service create - Docker Documentation

docker service ls ID NAME MODE REPLICAS IMAGE 4cdgfyky7ozw redis replicated ... If you use bind mounts and your host and containers have...