Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incremental writes with scipy.io.netcdf_file

See original GitHub issue

scipy.io.netcdf_file is great, but although it supports memory mapping for incremental reads of data from disk, it doesn’t support any form of incremental writes. The flush() method writes an entire file from scratch.

This would be nice to have for writing datasets larger than fit into memory (e.g., with the 64-bit offset variant).

Possible levels of support, ordered by roughly ascending difficulty / decreasing importance:

Modifying data in existing variables without new memory allocation (perhaps setting mmap.ACCESS_WRITE would suffice?)
Appending new entries for record variables (only adds data to the end of the file)
Modifying metadata of existing variables. In general this requires rewriting the file to make room for new and/or enlarged metadata entries, but we could do this copy in a streaming fashion.

Issue Analytics

State:
Created 5 years ago
Comments:8 (7 by maintainers)

Top GitHub Comments

2reactions

shoyercommented, Jun 2, 2022

To be clear, even though I think most users would be better suited by using netcdf4-python, scipy’s pure Python netcdf3 reader is quite valuable. Xarray users (including myself) use it all the time because it often has better performance and is easier to install.

There are a few longstanding issues, but this does not detract from it’s usefulness. The module is basically feature complete, even if it could be improved.

I think it would be fine to mark netcdf_file as “legacy” code (per Ralf’s recent suggestion for categorization on the developer list), but I would strongly object to removing it from SciPy.

1reaction

orbeckstcommented, Jun 2, 2022

I think https://github.com/scipy/scipy/issues/9157#issuecomment-1144305199 summarized my views.

If getting good write performance (maybe for well-defined subset of ncdf3 files) is relatively easy then that would be convenient to have in a light-weight implementation. However, given scarce resources I very much understand that netcdf I/O is not considered a core competency of scipy — and that’s ok, given the existence of other libraries.