question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

InferenceData.to_netcdf and Arviz.from_netcdf be able to take file objects or buffers

See original GitHub issue

Most libraries that allow writing or reading files also allow reading and writing to a python file object or an IO file-like object (eg. BytesIO). Currently Arviz requires writing these files to disk which creates an awkward dance if one simply wants to upload to s3, for instance. This may not seem like a big deal, but the files can be large, you have to remember to delete them off disk after you’re done with them, etc.

Currently, to upload to S3, I have to do the following:

with tempfile.NamedTemporaryFile() as fp:
    inference_data.to_netcdf(fp.name)
    fp.seek(0)
    s3.Bucket("bucket-name").upload_fileobj(
        fp, "my-s3-file-key"
    )

Would be much easier to do this (without writing to disk)

with BytesIO() as buffer:
    inference_data.to_netcdf(buffer)
    s3.Bucket("bucket-name").upload_filobj(buffer, "my-s3-file-key")

Even better, I’d suggest using s3fs which is what Pandas does and support s3:// “protocol” so:

inference_data.to_netcdf("s3://bucket-name/my-s3-file-key")

However this last suggestion is not specifically related to the more important ability to write/read from a buffer or file object.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:4
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
OriolAbrilcommented, Jun 13, 2020

I think this could be added upstream to xarray (not sure to what extent it is possible and docs are missing and to what extent it is not possible), https://github.com/pydata/xarray/issues/4122 issue seems related. After all, we are basically calling xarray.Dataseto_netcdf to do all the magic.

This other issue https://github.com/pydata/xarray/issues/2995 also looks related and it could make sense to look into zarr too.

0reactions
OriolAbrilcommented, Aug 7, 2020

https://twitter.com/dopplershift/status/1286415993347047425 this could be helpful, in addition to the issues linked above

Read more comments on GitHub >

github_iconTop Results From Across the Web

arviz.InferenceData.to_netcdf — ArviZ dev documentation
Write InferenceData to file using netcdf4. Parameters ... Note this saves disk space, but may make saving and loading somewhat slower (default: True)....
Read more >
Introduction to xarray, InferenceData, and netCDF for ArviZ
InferenceData object contains both a posterior predictive distribution and the observed data, among other datasets. Each group in InferenceData is both an ...
Read more >
Arviz's 'to_netcdf' having issues saving my trace object - v4
Hello, I'm trying to save a trace after a six hour model run. I see in the Arviz documentation to use `az.to_netcdf' but...
Read more >
Ravin Kumar. ArviZ, InferenceData, and NetCDF - YouTube
https://mc-stan.orgThe Stan Conference 2020. August 13, 2020. #stancon2020------------------------------------------------------------Ravin ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found