Hooks for custom attribute handling in xarray operations
See original GitHub issueOver in #964, I am working on a rewrite/unification of the guts of xarray’s logic for computation with labelled data. The goal is to get all of xarray’s internal logic for working with labelled data going through a minimal set of flexible functions which we can also expose as part of the API.
Because we will finally have all (or at least nearly all) xarray operations using the same code path, I think it will also finally become feasible to open up hooks allowing extensions how xarray handles metadata.
Two obvious use cases here are units (#525) and automatic maintenance of metadata (e.g., cell_methods
or history
fields). Both of these are out of scope for xarray itself, mostly because the specific logic tends to be domain specific. This could also subsume options like the existing keep_attrs
on many operations.
I like the idea of supporting something like NumPy’s __array_wrap__
to allow third-party code to finalize xarray objects in some way before they are returned. However, it’s not obvious to me what the right design is.
- Should we lookup a custom attribute on subclasses like
__array_wrap__
(or__numpy_ufunc__
) in NumPy, or should we have a system (e.g., unilaterally or with a context manager andxarray.set_options
) for registering hooks that are then checked on all xarray objects? I am inclined toward the later, even though it’s a little slower, just because it will be simpler and easier to get right - Should these methods be able to control the full result objects, or only set
attrs
and/orname
? - To be useful, do we need to allow extensions to take control of the full operation, to support things like automatic unit conversion? This would suggest something closing to
__numpy_ufunc__
, which is a little more ambitious than what I had in mind here.
Feedback would be greatly appreciated.
Issue Analytics
- State:
- Created 7 years ago
- Comments:24 (19 by maintainers)
Top GitHub Comments
I definitely see the logic with regards to encouraging users to use a context manager, and from the perspective of someone building a third-party library on top of xarray it would be fine. However, I think that from the perspective of an end-user (for example, a scientist) crunching numbers and analyzing data with xarray simply as a convenience library, this produces much too obfuscated code - a standard library import (
contextlib
, which isn’t something many scientific coders would regularly use or necessarily know about) and a lot of boiler-plate “enabling” the extra features they want in their calculation.I think your earlier proposal of an
xarray.set_options
is a cleaner and simpler way forward, even if it does have thorns. Do you have any estimate of the performance penalty checking hooks on all xarray objects would incur?@shoyer I know elsewhere you said you weren’t sure about this idea any more, but personally I’d like to push forward on this idea. Do you have problems with this approach we need to resolve? Any chance you have some preliminary code?
I think this is the right way to solve the unit issue in XArray, since at it’s core unit handling is mostly a metadata operation.