Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataset groups

See original GitHub issue

EDIT: see https://github.com/pydata/xarray/issues/4118 for ongoing discussion

Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access Dataset data variables, coordinates and attributes via groups.

Currently xarray allows loading a specific netCDF4 group into a Dataset. Different groups can be loaded as separate Dataset objects, which may be then combined into a single, flat Dataset. Yet, in some cases it makes sense to represent data as a single object while it would be convenient to keep some nested structure. For example, a Dataset representing data on a staggered grid might have scalar_vars and flux_vars groups. Here are some potential uses for groups. When there are a lot of data variables and/or attributes, it would also help to have a more concise repr.

I think about an implementation of Dataset.groups that would be specific to xarray, i.e., independent of any backend, and which would easily co-exist with the flat Dataset. It shouldn’t be required for a backend to support groups (some existing backends simply don’t). It is up to each backend to eventually transpose the Dataset.groups logic to its own group logic.

Dataset.groups might return a DatasetGroups object, which quite similarly to xarray.core.coordinates.DatasetCoordinates would (1) have a reference to the Dataset object, (2) basically consist of a Mapping of group names to data variable/coordinate/attribute names and (3) dynamically create another Dataset object (sub-dataset) on __getitem__. Keys of Dataset.groups should be accessible as attributes , e.g., ds.groups['scalar_vars'] == ds.scalar_vars.

Questions:

How to handle hierarchies of > 1 levels (i.e., groups of groups…)?
How to ensure that a variable / attribute in one group is not also present in another group?
Case of methods called from groups with inplace=True?

Issue Analytics

State:
Created 7 years ago
Comments:20 (11 by maintainers)

Top GitHub Comments

2reactions

shoyercommented, Nov 8, 2016

I am reluctant to add the additional complexity of groups directly into the xarray.Dataset data model. For example, how do groups get updated when you slice, aggregate or concatenate datasets? The rules for coordinates are already pretty complex.

I would rather see this living in another data structure built on top of xarray.Dataset, either in xarray or in a separate library.

0reactions

shoyercommented, Jul 2, 2021

There’s a parallel discussion hierarchical storage going on over in https://github.com/pydata/xarray/issues/4118. I’m going to close this issue in favor of the other one just to keep the ongoing discussion in one place.

Top Results From Across the Web

Using dataset groups - IBM

The Dataset Group function provides a powerful facility for you to quickly examine and monitor the space and performance attributes of a collection...

Custom dataset groups - Amazon ... - AWS Documentation

Learn about creating a Custom dataset group with custom resources for ... With Custom dataset groups, you build custom resources for configurable use...

The Images of Groups Dataset

To study these ideas, we built a collection of people images from Flickr images. The following three searches were conducted: “wedding+bride+groom+portrait” “ ...

There are 10 group datasets available on data.world.

Find data about group contributed by thousands of users and organizations across the world.

Custom dataset groups - Amazon Personalize

With Custom dataset groups, you build custom resources for configurable use cases. You train, and deploy configurable solutions and solution versions (a ...