question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fix netcdf string compression

See original GitHub issue

Just opening an issue to record my experiments for reference.

With the current way we save strings, the zlib compression doesn’t work. I was able to make it work by saving strings the old way as a character array.

str_out = netcdf.stringtochar(np.array(python_string, 'S4'))

dimension = len(str_out)
dimension_name = 'string' + str(dimension)
if dimension_name not in self._storage_dict[storage].dimensions:
    self._storage_dict[storage].createDimension(dimension_name, dimension)

nc_variable = self._storage_dict[storage].createVariable(name, 'S1', (dim_name,), zlib=True)

nc_variable[:] = str_out

With this code the resulting nc file that contained the serialization of alchemically modified Src got reduced from 52M to 32K!!

I didn’t test the restoring so we may have to tweak it a little. Also, with this implementation it may be impossible to update strings since I’m not sure you can change the dimension of a variable after its creation. The point is that maybe netcdf doesn’t support the compression of the new variable-length str type variables.

Worse case scenario, we could save permanently the expensive bits (thermodynamic states, topology, …), and save as vlen strings the things that need to be updated (i.e. mcmcmoves statistics).

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
Lnadencommented, Jul 5, 2017

After adding both the OpenMMTools and the YANK fixes, here are some numbers for an explicit solvent setup of FXR, with anisotropic dispersion correction, 0 iterations taken (so initial file size)

File Pre-comp Size Post-comp. Size Factor Reduction
Solvent.nc 16 MB 1.1 MB 14.5x
Complex.nc 184 MB 16 MB 11x

Speed of Serialization step Pre-comp : 195 s Post-comp : 49 s Factor : ~4x

I think this makes strong progress towards fixing this. The file sizes are still a bit large, but if acceptable, we can probably close this issue.

1reaction
Lnadencommented, Jul 7, 2017

My thoughts exactly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

NetCDF: Variables - Unidata Software Documentation
Set the zlib compression and shuffle settings for a variable in an netCDF/HDF5 file ... Use this functions to free resources associated with...
Read more >
Adding compression to a NetCDF file using xarray
1 Answer 1 · I don't completely understand the variable part. Its a has a single band and looks like this: ` <xarray....
Read more >
Can i remap and compress a NETCDF at the same time?
The following modification to your code should work: cdo -z zip -remapnn,r7432x13317 petcomp.nc FINAL.nc. Read the CDO user guide to see ...
Read more >
netCDF4 API documentation
Data stored in netCDF Variable objects can be compressed and decompressed ... Since there is no native fixed-length string netCDF datatype, ...
Read more >
nccopy - Copy a netCDF file, optionally changing format ...
If this option is not specified and the input file has compressed variables, the compression will still be preserved in the output, using...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found