question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EDIT: see https://github.com/pydata/xarray/issues/4118 for ongoing discussion


Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access Dataset data variables, coordinates and attributes via groups.

Currently xarray allows loading a specific netCDF4 group into a Dataset. Different groups can be loaded as separate Dataset objects, which may be then combined into a single, flat Dataset. Yet, in some cases it makes sense to represent data as a single object while it would be convenient to keep some nested structure. For example, a Dataset representing data on a staggered grid might have scalar_vars and flux_vars groups. Here are some potential uses for groups. When there are a lot of data variables and/or attributes, it would also help to have a more concise repr.

I think about an implementation of Dataset.groups that would be specific to xarray, i.e., independent of any backend, and which would easily co-exist with the flat Dataset. It shouldn’t be required for a backend to support groups (some existing backends simply don’t). It is up to each backend to eventually transpose the Dataset.groups logic to its own group logic.

Dataset.groups might return a DatasetGroups object, which quite similarly to xarray.core.coordinates.DatasetCoordinates would (1) have a reference to the Dataset object, (2) basically consist of a Mapping of group names to data variable/coordinate/attribute names and (3) dynamically create another Dataset object (sub-dataset) on __getitem__. Keys of Dataset.groups should be accessible as attributes , e.g., ds.groups['scalar_vars'] == ds.scalar_vars.

Questions:

  • How to handle hierarchies of > 1 levels (i.e., groups of groups…)?
  • How to ensure that a variable / attribute in one group is not also present in another group?
  • Case of methods called from groups with inplace=True?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:20 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
shoyercommented, Nov 8, 2016

I am reluctant to add the additional complexity of groups directly into the xarray.Dataset data model. For example, how do groups get updated when you slice, aggregate or concatenate datasets? The rules for coordinates are already pretty complex.

I would rather see this living in another data structure built on top of xarray.Dataset, either in xarray or in a separate library.

0reactions
shoyercommented, Jul 2, 2021

There’s a parallel discussion hierarchical storage going on over in https://github.com/pydata/xarray/issues/4118. I’m going to close this issue in favor of the other one just to keep the ongoing discussion in one place.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using dataset groups - IBM
The Dataset Group function provides a powerful facility for you to quickly examine and monitor the space and performance attributes of a collection...
Read more >
Custom dataset groups - Amazon ... - AWS Documentation
Learn about creating a Custom dataset group with custom resources for ... With Custom dataset groups, you build custom resources for configurable use...
Read more >
The Images of Groups Dataset
To study these ideas, we built a collection of people images from Flickr images. The following three searches were conducted: “wedding+bride+groom+portrait” “ ...
Read more >
There are 10 group datasets available on data.world.
Find data about group contributed by thousands of users and organizations across the world.
Read more >
Custom dataset groups - Amazon Personalize
With Custom dataset groups, you build custom resources for configurable use cases. You train, and deploy configurable solutions and solution versions (a ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found