Dataset groups
See original GitHub issueEDIT: see https://github.com/pydata/xarray/issues/4118 for ongoing discussion
Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access Dataset
data variables, coordinates and attributes via groups.
Currently xarray allows loading a specific netCDF4 group into a Dataset
. Different groups can be loaded as separate Dataset
objects, which may be then combined into a single, flat Dataset
. Yet, in some cases it makes sense to represent data as a single object while it would be convenient to keep some nested structure. For example, a Dataset
representing data on a staggered grid might have scalar_vars
and flux_vars
groups. Here are some potential uses for groups. When there are a lot of data variables and/or attributes, it would also help to have a more concise repr.
I think about an implementation of Dataset.groups
that would be specific to xarray, i.e., independent of any backend, and which would easily co-exist with the flat Dataset
. It shouldn’t be required for a backend to support groups (some existing backends simply don’t). It is up to each backend to eventually transpose the Dataset.groups
logic to its own group logic.
Dataset.groups
might return a DatasetGroups
object, which quite similarly to xarray.core.coordinates.DatasetCoordinates
would (1) have a reference to the Dataset object, (2) basically consist of a Mapping of group names to data variable/coordinate/attribute names and (3) dynamically create another Dataset
object (sub-dataset) on __getitem__
. Keys of Dataset.groups
should be accessible as attributes , e.g., ds.groups['scalar_vars'] == ds.scalar_vars
.
Questions:
- How to handle hierarchies of > 1 levels (i.e., groups of groups…)?
- How to ensure that a variable / attribute in one group is not also present in another group?
- Case of methods called from groups with
inplace=True
?
Issue Analytics
- State:
- Created 7 years ago
- Comments:20 (11 by maintainers)
Top GitHub Comments
I am reluctant to add the additional complexity of groups directly into the
xarray.Dataset
data model. For example, how do groups get updated when you slice, aggregate or concatenate datasets? The rules for coordinates are already pretty complex.I would rather see this living in another data structure built on top of
xarray.Dataset
, either in xarray or in a separate library.There’s a parallel discussion hierarchical storage going on over in https://github.com/pydata/xarray/issues/4118. I’m going to close this issue in favor of the other one just to keep the ongoing discussion in one place.