Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fix netcdf string compression

See original GitHub issue

Just opening an issue to record my experiments for reference.

With the current way we save strings, the zlib compression doesn’t work. I was able to make it work by saving strings the old way as a character array.

str_out = netcdf.stringtochar(np.array(python_string, 'S4'))

dimension = len(str_out)
dimension_name = 'string' + str(dimension)
if dimension_name not in self._storage_dict[storage].dimensions:
    self._storage_dict[storage].createDimension(dimension_name, dimension)

nc_variable = self._storage_dict[storage].createVariable(name, 'S1', (dim_name,), zlib=True)

nc_variable[:] = str_out

With this code the resulting nc file that contained the serialization of alchemically modified Src got reduced from 52M to 32K!!

I didn’t test the restoring so we may have to tweak it a little. Also, with this implementation it may be impossible to update strings since I’m not sure you can change the dimension of a variable after its creation. The point is that maybe netcdf doesn’t support the compression of the new variable-length str type variables.

Worse case scenario, we could save permanently the expensive bits (thermodynamic states, topology, …), and save as vlen strings the things that need to be updated (i.e. mcmcmoves statistics).

Issue Analytics

State:
Created 6 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

3reactions

Lnadencommented, Jul 5, 2017

After adding both the OpenMMTools and the YANK fixes, here are some numbers for an explicit solvent setup of FXR, with anisotropic dispersion correction, 0 iterations taken (so initial file size)