Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Iterating over a Dataset iterates only over its data_vars

See original GitHub issue

This has been a small-but-persistent issue for me for a while. I suspect that my perspective might be dependent on my current outlook, but socializing it here to test if it’s secular…

Currently Dataset.keys() returns both variables and coordinates (but not its attrs keys):

In [5]: ds=xr.Dataset({'a': (('x', 'y'), np.random.rand(10,2))})
In [12]: list(ds.keys())
Out[12]: ['a', 'x', 'y']

Is this conceptually correct? I would posit that a Dataset is a mapping of keys to variables, and the coordinates contain values that label that data.

So should Dataset.keys() instead return just the keys of the Variables?

We’re often passing around a dataset as a Mapping of keys to values - but then when we run a function across each of the keys, we get something run on both the Variables’ keys, and the Coordinate / label’s keys.

In Pandas, DataFrame.keys() returns just the columns, so that conforms to what we need. While I think the xarray design is in general much better in these areas, this is one area that pandas seems to get correct - and because of the inconsistency between pandas & xarray, we’re having to coerce our objects to pandas DataFrames before passing them off to functions that pull out their keys (this is also why we can’t just look at ds.data_vars.keys() - because it breaks that duck-typing).

Does that make sense?

Issue Analytics

State:
Created 7 years ago
Comments:11 (9 by maintainers)

Top GitHub Comments

1reaction

shoyercommented, May 25, 2018

Do we need to change the behavior of dict(dataset) so that dict(dataset).keys() and dataset.keys() become consistent?

No, I think these are guaranteed to be consistent because we inherit from collections.Mapping to implement dict methods like keys(), values() and items() (via __iter__ and __getitem__).

1reaction

fujiisoupcommented, May 25, 2018

Do we need to change the behavior of dict(dataset) so that dict(dataset).keys() and dataset.keys() become consistent?