Inconsistency between the types of Dataset.dims and DataArray.dims
See original GitHub issueDataArray.dims
is currently a tuple, whereas Dataset.dims
is a dict. This results in ugly code like this, taken from xarray/core/group.py
:
try: # Dataset
expected_size = obj.dims[group_dim]
except TypeError: # DataArray
expected_size = obj.shape[obj.get_axis_num(group_dim)]
One way to resolve this inconsistency would be switch DataArray.dims
to a (frozen) OrderedDict
. The downside is that code like x.dims[0]
wouldn’t work anymore (unless we add some terrible fallback logic). You’d have to write x.dims.keys()[0]
, which is pretty ugly. On the plus side, x.dims['time']
would always return the size of the time dimension, regardless of whether x
is a DataArray or Dataset.
Another option would be to add an attribute dim_shape
(or some other name) that serves as an alias to dims
on Dataset and an alias to OrderedDict(zip(dims, shape))
on DataArray. This would be fully backwards compatible, but further pollute the namespace on Dataset and DataArray.
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (6 by maintainers)
Top GitHub Comments
With optional indexes (#1017) meaning that
ds.coords[...].size
may not always succeed (OK, depending on some implementation choices), the need for a consistent way to look up dimension sizes becomes even more severe.Rather than resolving the type inconsistency of
dims
, we could simply add a new attributesizes
to bothDataset
andDataArray
that is exactly equivalent toDataset.dims
. TheDataset
API ends up a little bit redundant, but we have a consistent way to access this information.We have
.sizes
now.