Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions
See original GitHub issueConcatenating DataArrays with repeated dimensions does not work.
import xarray as xr #xarray 0.8.2
from numpy import eye
A = xr.DataArray(eye(3), dims=['dim0', 'dim0'])
xr.concat([A, A], 'newdim')
fails with
[...]
ValueError: axes don't match array
Issue Analytics
- State:
- Created 6 years ago
- Reactions:2
- Comments:14 (11 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I cannot see a use case in which repeated dims actually make sense.
In my case this situation originates from h5 files which indeed contains repeated dimensions (
variables(dimensions): uint16 B0(phony_dim_0,phony_dim_0), ..., uint8 VAA(phony_dim_1,phony_dim_1)), thus xarray is not to blame here. These are “dummy” dimensions, not associated with physical values. What we do to circumvent this problem is “re-dimension” all variables. Maybe a safe approach would be for open_dataset to raise a warning by default when encountering such variables, with possibly an option to perform automatic or custom dimension naming to avoid repeated dims. I also agree with @shoyer that failing loudly when operating on such DataArrays instead of providing confusing results would be an improvement.I’m not too fond of having multiple dimensions with the same name because, whenever you need to operate on one but not the other, you have little to no choice but revert to positional indexing.
Consider also how many methods expect either **kwargs or a dict-like parameter with the dimension or variable names as the keys. I would not be surprised to find that many API design choices fall apart in the face of this use case.
Also, having two non positional (as it should always be in xarray!) dimensions with the same name only makes sense when modelling symmetric N:N relationships. Two good examples are covariance matrices and the weights for a Dijkstra algorithm.
The problems start when the object represents an asymmetric relationship, e.g:
EUR->USDis not identical to1/(USD->EUR)because of arbitrage and illiquidityI could easily come up with many other cases. In case of asymmetric N:N relationships, it is highly desirable to share the same index across multiple dimensions with different names (that would typically convey the direction of the relationship, e.g. “from” and “to”).
What if, instead of allowing for duplicate dimensions, we allowed sharing an index across different dimensions?
Something like
or, for DataArrays:
Note how this syntax doesn’t exist as of today:
From an implementation point of view, I think it could be easily implemented by keeping track of a map of aliases and with some
__geitem__magic. More effort would be needed to convince DataArrays to accept (and not accidentally drop) a coordinate whose dims don’t match any of the data variable’s.This design would not resolve the issue of compatibility with NetCDF though. I’d be surprised if the NetCDF designers never came across this - maybe it’s a good idea to have a chat with them?