question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Representing schema in xarray

See original GitHub issue

In order to get to feature parity with pymc3 plotting, we need to have a way to access sampler statistics (specifically, to access divergences), and ppc samples. @aseyboldt outlined a good way to think about the rest of the schema here.

At the same time, xarray supports groups, but it doesn’t look like it does so natively (yet? see the discussion at https://github.com/pydata/xarray/issues/1092).

I am proposing something like

import xarray as xr
import netCDF4 as nc

class Trace(object):
    def __init__(self, filename):
        self.filename = filename
        self.data = nc.Dataset(filename)
        self.groups = self.data.groups
        
    def __getattr__(self, name):
        if name in self.groups:
            return xr.open_dataset(self.filename, group=name)
        raise AttributeError("informative message")
    
    def __dir__(self):
        """Allows for tab completion on netCDF group names"""
        return super(Trace, self).__dir__() + list(self.groups.keys())

This is a pretty light wrapper around netCDF and xarray. Usage is something like

t = Trace('mytrace.nc')
t.posterior  # this is an xarray.Dataset
t.posterior.mu.mean()  # calculate the mean of a variable

I think this will have to change a little bit so that nested groups work fine. In particular, something like

t = Trace('mytrace.nc')
t.sampler_stats.divergences  # should return an xarray.Dataset
t.sampler_stats  #  I think this would tend to return an empty xarray.Dataset

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
shoyercommented, Aug 13, 2018

This seems more or less reasonable to me, but note that opening a netCDF file isn’t always cheap (this is also true to a lesser extent with creating an xarray.Dataset). I expect you will be happier with caching or eagerly creating Dataset objects rather than recreating them in __getattr__.

0reactions
ColCarrollcommented, Aug 28, 2018

Closed by #173 and #176

Read more comments on GitHub >

github_iconTop Results From Across the Web

Representing & checking Dataset schemas #1900 - GitHub
Somewhat related to this issue, I have implemented in xarray-simlab some logic to validate xarray.Variable objects (dimensions, dtype, etc.).
Read more >
Data Structures - Xarray
It is designed as an in-memory representation of the data model from the netCDF file format. In addition to the dict-like ... _images/dataset-diagram.png....
Read more >
xarray.Dataset
A multi-dimensional, in memory, array database. A dataset resembles an in-memory representation of a NetCDF file, and consists of variables, coordinates and ...
Read more >
Data Structures — xray 0.3.1 documentation - Xarray
It is designed as an in-memory representation of the data model from the netCDF file format. ... _images/dataset-diagram.png.
Read more >
Xarray's Data structures
Xarray has two representation types: "html" (which is only available in notebooks) and "text" . To choose between them, use the display_style option....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found