Amber parser should be improved
See original GitHub issueHello everyone, I think the Amber_parser needs some adjustment, other than issue #221 which was causing misreading of time information for both u_nk and dHdl parsers. Here I report the fixes that I think should be made, and I think the best thing should be a complete rewrite of the parser, properly tested:
-
Temperature is required in input for both parsers (extract_dHdl and extract_u_nk), but system temperature is already automatically read from the output file (reading
temp0
). We could make theT
variable optional in the parsers, and I suggest not removing it, not to break existing scripts from users. We should check that the requested temperatureT
is the same as reported by the AMBER output file and report eventual inconsistencies to the user. -
related to point 1), currently if
temp0
is not found we log that ‘Non-constant temperature MD not currently supported.’ I don’t think that this is right, as temp0 is settled even in non-constant temperature MD, so I think that the check is meaningless. I agree that we could warn the users if the simulation was not performed at a constantT
, but that’s not so easy to check, and I don’t think a similar check exists for other parsers (the user should know the simulation have to be run in NPT/NVT ensemble). -
AMBER output file is one only with all the information for TI and MBAR reported every
ntpr
timesteps. Currently, I see two problems here: 3a. If the user wants to extract both dHdl and u_nk values, the output file is completely parsed twice, which can waste quite some time. 3b. If the user only wants dHdl values, and the file doesn’t have proper MBAR information, the user is needlessly warned that MBAR values are not read. I think we should refactor the code to have a single run through the AMBER output file, extracting just what’s needed by the user in one go. I imagine we could have the main function calledextract_dHdl_and_u_nk
, which defaults to returns both dHdl and u_nk, while maintaining the currentextract_dHdL
andextract_u_nk
which will simply call the main function with the optional switch to extract only what’s needed. -
AMBER output can optionally report every
ntave
steps the average of dHdl and other parameters, right now parsing dHdl we are collecting these values (here we are also collecting different components reported by AMBER), but we are not using this information at all in the code… I think we could just skip the reading of the averages since we are collecting all the values (and then we’ll perform better equilibration/subsampling analysis).
I’m working on implementing those changes, please let me know if you think anything else should be changed (or if I should just not touch anything 😃 )!
Issue Analytics
- State:
- Created a year ago
- Comments:28 (28 by maintainers)
I’m concerned with NAMD parser not having a dHdl reader so we cannot have
read_dHdl_and_u_nk
for NAMD. I think a better approach might be to haveextract(file, T)
which returns a dictionary, which mimics the pymbar4 interface where things are stored as a dictionary. For not NAMD parsers,extract(file, T)
returns {‘dHdl’: dHdl, ‘u_nk’: u_nk}, for NAMD, it will just be {‘u_nk’: u_nk}. Then we could have a common interface and it could also cope with the case when dHdl parser is not present.I understand that everything has to be made in very small steps, I’m sorry I didn’t understand this earlier, but I thought the AMBER file parsing was a little bit “left behind” here as there are many small different issues to be addressed. That’s the reason 1 year ago or so I gave up on using alchemlyb to analyze my AMBER simulations as I hadn’t the time to check what was wrong (I couldn’t concatenate properly different files,
ntpr
problem. among other things).I thought I could just fix (or try to 😃 ) different things in one go, I’m sorry.
I think it would be better to close this issue as “not planned”, and I can re-open different issues for every simple thing I can address in small fixes.
For now, I leave this open, but I opened two issues addressing points 1)/2), and 4), leaving this issue just for point 3). let me know if it’s better to close it and open another dedicated issue!