question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility of including a feature of dealing with dhdl files from extended simulations

See original GitHub issue

Dear alchemlyb developers, First I want to thank you all for your hard work in developing this pretty user-friendly package. Today I was using alchemlyb to analyze the dhdl files of a replica-exchange simulation. Since I was running long simulations, I extended the simulation of each replica for several times. However, I found that this might cause two problems when parsing the GROMACS dhdl files.

Specifically, when parsing one of the files, I got the error shown as below. This error happened because the last line of the file to be parsed was incomplete as the simulation was ended by timeout. As a result, the end of the last line was -1.5258789e- instead of -1.5258789e-5, leading to ValueError when converting the last string of the line into a float when dtype was specified as np.float64. (See Line 265 in _extract_dataframe.)

TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wehs7661/anaconda3/lib/python3.7/site-packages/alchemlyb/parsing/gmx.py", line 133, in extract_dHdl
    df = _extract_dataframe(xvg, headers)
  File "/home/wehs7661/anaconda3/lib/python3.7/site-packages/alchemlyb/parsing/gmx.py", line 267, in _extract_dataframe
    float_precision='high')
  File "/home/wehs7661/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f     
    return _read(filepath_or_buffer, kwds)
  File "/home/wehs7661/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 435, in _read        
    data = parser.read(nrows)
  File "/home/wehs7661/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1139, in read
    ret = self._engine.read(nrows)
  File "/home/wehs7661/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1995, in read        
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 991, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1123, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1197, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: could not convert string to float: '-1.5258789e-'

In addition, it seems that currently, the GROMACS parser is not able to deal with the overlapped time frames when the simulation is extended. Specifically, say that the simulation of the first replica was ended by timeout and the last time frame in system_dhdl.xvg was 1592 ps, but the last time frame of the corresponding .cpt file was only updated to 1562 ps since the .cpt file updates only every 15 minutes. As a result, if we use run gmx mdrun with the -cpi option to extend the simulation, the dhdl file of the extended simulation, system_dhdl.part0002.xvg will start from 1562 ns rather than 1592 ns. In this situation, when we use dHdl_coul = pd.concat([extract_dHdl(xvg, T=300) for xvg in files['Coulomb']]) or u_nk_coul = pd.concat([extract_u_nk(xvg, T=300) for xvg in files['Coulomb']]), it seems that extract_dHdl or extract_u_nk are not able to discard the part of data corresponding to the overlapped time frames (from 1562 ps to 1592 ps) in system_dhdl.xvg and adopt the data of these time frames in system_dhdl.part0002.xvg.

While apparently, with another Python script, both problems above can be externally solved by modifying the dhdl files such that the incomplete lines and the duplicated time frames are discarded, I’m wondering if it is worthy to address these issues internally in alchemlyb instead. After all, this situation happens a lot when users extend their simulations.

Thanks a lot in advance!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
dotsdlcommented, May 19, 2020

We can add this as a preprocessor, yes. I quite like this philosophy of making these things easy for our data structures, which double as reference implementations for some pandas-fu.

0reactions
orbeckstcommented, May 19, 2020

(I tagged it “invalid” because we don’t have “wont fix” as a tag – it does not mean that it wasn’t a valid question.)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Molecular dynamics parameters (.mdp options)
A sample mdp file is available. This should be appropriate to start a normal simulation. Edit it to suit your specific needs and...
Read more >
Guidelines for the analysis of free energy calculations - PMC
Free energy calculations based on molecular dynamics (MD) simulations show considerable promise for applications ranging from drug discovery ...
Read more >
HSPICE User Guide: Simulation and Analysis - UCSD CSE
HSPICE Features for Running Higher-Level Simulations . ... output_file.lis, or with a file extension that you specify, depending on which.
Read more >
GROMACS Tutorial Free Energy Calculations: Methane in Water
This tutorial will guide the user through the process of calculating a simple free energy change, the decoupling (i.e. removal) of van der...
Read more >
Multiscale Simulations of Biological Membranes
Because of all of these features, simulations are a great technique to ... the variety of lipids that the simulation models should include, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found