MPI repex trailblazing dies at end of trailblazing
See original GitHub issueI just noticed that an MPI run using repex and trailblazing for the BRD4 test system using this YAML file died at the end of trailblazing:
2018-11-19 04:58:52,202 - DEBUG - yank.pipeline - trailblazing: state_parameter lambda_sterics, simulated_value 0.08887985583853127, current_parameter_value 0.04839281265346118, std_du 0.5134917528793879
2018-11-19 04:59:25,224 - DEBUG - yank.pipeline - trailblazing: state_parameter lambda_sterics, simulated_value 0.04839281265346118, current_parameter_value 0.0, std_du 0.4132120996374473
2018-11-19 04:59:25,225 - DEBUG - yank.pipeline - Alchemical path found: {'lambda_electrostatics': [1.0, 0.95, 0.8999999999999999, 0.8499999999999999, 0.7999999999999998, 0.7499999999999998, 0.6999999999999997, 0.6387209700669697, 0.5790253242758696, 0.5204060356681246, 0.4609442605375813, 0.40433981917479267, 0.33567070513317865, 0.27416285442810556, 0.21103242826939206, 0.14972564502170452, 0.07998556932087207, 0.0009736900429375486, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'lambda_sterics': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9575795865483813, 0.9121943363134406, 0.8686284868410266, 0.8281742054378803, 0.7835707001058548, 0.7395478276369869, 0.6977469237017255, 0.6567304355151427, 0.6159670050468764, 0.5850659760831278, 0.5438792978602357, 0.5114994213498266, 0.48223554046780187, 0.4522580699381295, 0.4260537632558338, 0.397933628369692, 0.3732724389208485, 0.35396829064262253, 0.3340455062541764, 0.31613594671256723, 0.2986968556315033, 0.28266408417845745, 0.26755113898802224, 0.2544825877884863, 0.24134717331761885, 0.22876015799568095, 0.21898392917678333, 0.2079355647081965, 0.19668728551527873, 0.18387991041683313, 0.16512157756140372, 0.14415015504378637, 0.11860319976227818, 0.08887985583853127, 0.04839281265346118, 0.0]}
2018-11-19 04:59:25,226 - DEBUG - yank.mpi - Node 1/1: executing <function ExperimentBuilder._generate_yaml at 0x2ab72708ed08>
2018-11-19 04:59:25,261 - DEBUG - yank.mpi - Node 1/1: waiting for barrier after <function ExperimentBuilder._generate_yaml at 0x2ab72708ed08>
2018-11-19 08:55:18,458 - CRITICAL - yank.mpi - MPI node 12/16 raised an exception and called Abort()! The exception traceback follows
Traceback (most recent call last):
File "/home/chodera/miniconda/bin/yank", line 11, in <module>
load_entry_point('yank==0.23.8', 'console_scripts', 'yank')()
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/cli.py", line 73, in main
dispatched = getattr(commands, command).dispatch(command_args)
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/commands/script.py", line 138, in dispatch
yaml_builder.run_experiments()
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/experiment.py", line 778, in run_experiments
self._generate_experiments_protocols()
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/experiment.py", line 2452, in _generate_experiments_protocols
send_results_to=None, group_size=1, sync_nodes=True)
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/mpi.py", line 531, in distribute
*other_args, **kwargs)
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/mpi.py", line 386, in exec_tasks
raise error
TypeError: Group 12/16 Node 1/1 received an exception from another MPI process. Original stack trace follow:
Traceback (most recent call last):
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/mpi.py", line 357, in exec_tasks
results.append(task(distributed_arg, *other_args, **kwargs))
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/experiment.py", line 2548, in _generate_experiment_protocol
alchemical_phase = phase.initialize_alchemical_phase()
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/experiment.py", line 359, in initialize_alchemical_phase
alchemical_phase.equilibrate(n_iterations, mcmc_moves=mcmc_move)
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/yank.py", line 1197, in equilibrate
self._sampler.equilibrate(n_iterations=n_iterations, mcmc_moves=mcmc_moves)
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/multistate/multistatesampler.py", line 616, in equilibrate
self._propagate_replicas()
File "/home/chodera/miniconda/lib/python3.6/site-packages/openmmtools-0.17.0-py3.6.egg/openmmtools/utils.py", line 87, in _wrapper
return func(*args, **kwargs)
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/multistate/multistatesampler.py", line 1195, in _propagate_replicas
send_results_to=0)
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/mpi.py", line 531, in distribute
*other_args, **kwargs)
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/mpi.py", line 376, in exec_tasks
raise error
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/mpi.py", line 357, in exec_tasks
results.append(task(distributed_arg, *other_args, **kwargs))
File "/home/chodera/miniconda/lib/python3.6/site-packages/yank-0.23.8-py3.6-linux-x86_64.egg/yank/multistate/multistatesampler.py", line 1226, in _propagate_replica
output_dir = os.path.join(os.path.dirname(self._reporter.filepath), 'nan-error-logs')
File "/home/chodera/miniconda/lib/python3.6/posixpath.py", line 156, in dirname
p = os.fspath(p)
TypeError: Node 1/1: expected str, bytes or os.PathLike object, not method
This simulation is on lilac
in
/data/chodera/chodera/gsk/yank-benchmark/BRD4/repex-rmsd-2
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Trailblazing Palestinian Journalist Killed in West Bank
Ms. Abu Akleh, 51, a Palestinian American reporter who was killed in the West Bank on Wednesday, was a household name across the...
Read more >No More Excuses: Israel's Attack On The Press Requires ...
... story “Shireen Abu Akleh, Trailblazing Palestinian Journalist, Dies at 51,” making it sound as if she died peacefully in her sleep.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I was just hoping to get more information about what the actual error was so I could debug it!
Of course! Opening the PR soon.