question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPU test_alchemy.py failures.

See original GitHub issue

Hello,

I am experiencing some test failures for OpenMMTools when using the GPU platform and was hoping I could get some help resolving them.

I have set up a test environment for OpenMMTools using a Conda environment and the YAML here: https://github.com/choderalab/openmmtools/blob/master/devtools/conda-envs/test_env.yaml which yields this environment for me: environment.txt

I git clone OpenMMTools, install with pip, then run nosetests test_alchemy.py in the test directory of my clone.

On CPU this yield the following output: cpu.txt

On GPU I get many more errors about large energy differences for HostGuestExplicit with PME based tests: gpu.txt

Am I running these test correctly or am I missing something in the GPU case? Any help with this would be greatly appreciated.

For reference I’m using an Ubuntu Linux 64 machine with an RTX 2070S, Driver Version: 470.63.01, CUDA Version: 11.4, cudatoolkit=11.2.2

Thanks, Alex

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
adw62commented, Mar 17, 2022

Hello,

I have run these tests again on my machine this morning using the latest OpenMM and OpenMMtools installed by conda which gives this env: environment.txt

I now no longer see any of these errors on GPU. For reference I’m now using CUDA Driver Version: 510.47.03 and CUDA Version: 11.6 with RTX2070S. So I think with the latest code this is no longer an issue and I will close it now, Thanks 😃

Alex

1reaction
jchoderacommented, Sep 30, 2021

For one of the failing tests (ideally the simplest/smallest one), can you serialize out the System and State objects to XML for us to investigate? Something like this:

from simtk.openmm import XmlSerializer
with open('system.xml', 'wt') as outfile:
    outfile.write(XmlSerializer.serialize(context.getSystem()))
with open('state.xml', 'wt') as outfile:
    outfile.write(XmlSerializer.serialize(context.getState(getPositions=True, getForces=True, getEnergy=True, getParameters=True, getIntegratorParameters=True)))

This will make it easier for us to look at in detail, especially if we have to bring in Peter Eastman.

Read more comments on GitHub >

github_iconTop Results From Across the Web

jax/tests/lax_scipy_sparse_test.py segfaults on GPU; other ...
I'm unable to run all unit tests with jaxlib==0.1.60+cuda111. I suspect this is an issue for all GPU builds. $ python3 -m pytest...
Read more >
Cuda driver errors on the machine without GPU while loading ...
Machine for NN training: Ubuntu 16.04.4 LTS, 3 x K80 GPU; python 3.6.7, tensorflow 1.12.0 - all code works here.
Read more >
Python, Performance, and GPUs - Towards Data Science
We're improving the state of scalable GPU computing in Python. This post lays out the current status, and describes future work.
Read more >
vGPU Enabled UVMs May be Unable to be Created, or May ...
In environments using Nvidia Tesla GPUs, the vGPU assignment may not ... ERROR set_power_state_task.py:1215 Failed to transit power state: ...
Read more >
How Do You Know if Your GPU is Failing - YouTube
How Do You Know if Your GPU is Failing▻▻▻SUBSCRIBE for more: https://www.youtube.com/user/Britec09?sub_confirmation=1Having trouble with ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found