Incremental writes with scipy.io.netcdf_file
See original GitHub issuescipy.io.netcdf_file is great, but although it supports memory mapping for incremental reads of data from disk, it doesn’t support any form of incremental writes. The flush() method writes an entire file from scratch.
This would be nice to have for writing datasets larger than fit into memory (e.g., with the 64-bit offset variant).
Possible levels of support, ordered by roughly ascending difficulty / decreasing importance:
- Modifying data in existing variables without new memory allocation (perhaps setting
mmap.ACCESS_WRITEwould suffice?) - Appending new entries for record variables (only adds data to the end of the file)
- Modifying metadata of existing variables. In general this requires rewriting the file to make room for new and/or enlarged metadata entries, but we could do this copy in a streaming fashion.
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
scipy.io.netcdf_file — SciPy v1.9.3 Manual
When writing data to a NetCDF file, there is often the need to indicate the 'record dimension'. A record dimension is the unbounded...
Read more >Windowed writes in python, e.g. to NetCDF - Stack Overflow
The xarray input/output docs note that xarray does not support incremental writes, only incremental reads except by streaming through ...
Read more >replace netCDF4 with scipy.io.netcdf for the Amber ... - GitHub
I've updated the plot above with the revised writing benchmarks. I can confirm that MDAnalysis is writing .ncdf files with double type for ......
Read more >Reading and writing files - Xarray
Xarray supports direct serialization and IO to several file formats, ... Reading and writing netCDF files with xarray requires scipy or the ...
Read more >Serialization and IO — xray 0.5.0 documentation
Reading and writing netCDF files with xray requires the netCDF4-Python library or scipy to be installed. We can save a Dataset to disk...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

To be clear, even though I think most users would be better suited by using netcdf4-python, scipy’s pure Python netcdf3 reader is quite valuable. Xarray users (including myself) use it all the time because it often has better performance and is easier to install.
There are a few longstanding issues, but this does not detract from it’s usefulness. The module is basically feature complete, even if it could be improved.
I think it would be fine to mark netcdf_file as “legacy” code (per Ralf’s recent suggestion for categorization on the developer list), but I would strongly object to removing it from SciPy.
I think https://github.com/scipy/scipy/issues/9157#issuecomment-1144305199 summarized my views.
If getting good write performance (maybe for well-defined subset of ncdf3 files) is relatively easy then that would be convenient to have in a light-weight implementation. However, given scarce resources I very much understand that netcdf I/O is not considered a core competency of scipy — and that’s ok, given the existence of other libraries.