question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incremental writes with scipy.io.netcdf_file

See original GitHub issue

scipy.io.netcdf_file is great, but although it supports memory mapping for incremental reads of data from disk, it doesn’t support any form of incremental writes. The flush() method writes an entire file from scratch.

This would be nice to have for writing datasets larger than fit into memory (e.g., with the 64-bit offset variant).

Possible levels of support, ordered by roughly ascending difficulty / decreasing importance:

  1. Modifying data in existing variables without new memory allocation (perhaps setting mmap.ACCESS_WRITE would suffice?)
  2. Appending new entries for record variables (only adds data to the end of the file)
  3. Modifying metadata of existing variables. In general this requires rewriting the file to make room for new and/or enlarged metadata entries, but we could do this copy in a streaming fashion.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
shoyercommented, Jun 2, 2022

To be clear, even though I think most users would be better suited by using netcdf4-python, scipy’s pure Python netcdf3 reader is quite valuable. Xarray users (including myself) use it all the time because it often has better performance and is easier to install.

There are a few longstanding issues, but this does not detract from it’s usefulness. The module is basically feature complete, even if it could be improved.

I think it would be fine to mark netcdf_file as “legacy” code (per Ralf’s recent suggestion for categorization on the developer list), but I would strongly object to removing it from SciPy.

1reaction
orbeckstcommented, Jun 2, 2022

I think https://github.com/scipy/scipy/issues/9157#issuecomment-1144305199 summarized my views.

If getting good write performance (maybe for well-defined subset of ncdf3 files) is relatively easy then that would be convenient to have in a light-weight implementation. However, given scarce resources I very much understand that netcdf I/O is not considered a core competency of scipy — and that’s ok, given the existence of other libraries.

Read more comments on GitHub >

github_iconTop Results From Across the Web

scipy.io.netcdf_file — SciPy v1.9.3 Manual
When writing data to a NetCDF file, there is often the need to indicate the 'record dimension'. A record dimension is the unbounded...
Read more >
Windowed writes in python, e.g. to NetCDF - Stack Overflow
The xarray input/output docs note that xarray does not support incremental writes, only incremental reads except by streaming through ...
Read more >
replace netCDF4 with scipy.io.netcdf for the Amber ... - GitHub
I've updated the plot above with the revised writing benchmarks. I can confirm that MDAnalysis is writing .ncdf files with double type for ......
Read more >
Reading and writing files - Xarray
Xarray supports direct serialization and IO to several file formats, ... Reading and writing netCDF files with xarray requires scipy or the ...
Read more >
Serialization and IO — xray 0.5.0 documentation
Reading and writing netCDF files with xray requires the netCDF4-Python library or scipy to be installed. We can save a Dataset to disk...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found