Iterating over a Dataset iterates only over its data_vars
See original GitHub issueThis has been a small-but-persistent issue for me for a while. I suspect that my perspective might be dependent on my current outlook, but socializing it here to test if it’s secular…
Currently Dataset.keys()
returns both variables and coordinates (but not its attrs
keys):
In [5]: ds=xr.Dataset({'a': (('x', 'y'), np.random.rand(10,2))})
In [12]: list(ds.keys())
Out[12]: ['a', 'x', 'y']
Is this conceptually correct? I would posit that a Dataset is a mapping of keys to variables, and the coordinates contain values that label that data.
So should Dataset.keys()
instead return just the keys of the Variables?
We’re often passing around a dataset as a Mapping
of keys to values - but then when we run a function across each of the keys, we get something run on both the Variables’ keys, and the Coordinate / label’s keys.
In Pandas, DataFrame.keys()
returns just the columns, so that conforms to what we need. While I think the xarray design is in general much better in these areas, this is one area that pandas seems to get correct - and because of the inconsistency between pandas & xarray, we’re having to coerce our objects to pandas DataFrame
s before passing them off to functions that pull out their keys (this is also why we can’t just look at ds.data_vars.keys()
- because it breaks that duck-typing).
Does that make sense?
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (9 by maintainers)
Top GitHub Comments
No, I think these are guaranteed to be consistent because we inherit from
collections.Mapping
to implement dict methods likekeys()
,values()
anditems()
(via__iter__
and__getitem__
).Do we need to change the behavior of
dict(dataset)
so thatdict(dataset).keys()
anddataset.keys()
become consistent?