question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hooks for custom attribute handling in xarray operations

See original GitHub issue

Over in #964, I am working on a rewrite/unification of the guts of xarray’s logic for computation with labelled data. The goal is to get all of xarray’s internal logic for working with labelled data going through a minimal set of flexible functions which we can also expose as part of the API.

Because we will finally have all (or at least nearly all) xarray operations using the same code path, I think it will also finally become feasible to open up hooks allowing extensions how xarray handles metadata.

Two obvious use cases here are units (#525) and automatic maintenance of metadata (e.g., cell_methods or history fields). Both of these are out of scope for xarray itself, mostly because the specific logic tends to be domain specific. This could also subsume options like the existing keep_attrs on many operations.

I like the idea of supporting something like NumPy’s __array_wrap__ to allow third-party code to finalize xarray objects in some way before they are returned. However, it’s not obvious to me what the right design is.

  • Should we lookup a custom attribute on subclasses like __array_wrap__ (or __numpy_ufunc__) in NumPy, or should we have a system (e.g., unilaterally or with a context manager and xarray.set_options) for registering hooks that are then checked on all xarray objects? I am inclined toward the later, even though it’s a little slower, just because it will be simpler and easier to get right
  • Should these methods be able to control the full result objects, or only set attrs and/or name?
  • To be useful, do we need to allow extensions to take control of the full operation, to support things like automatic unit conversion? This would suggest something closing to __numpy_ufunc__, which is a little more ambitious than what I had in mind here.

Feedback would be greatly appreciated.

CC @darothen @rabernat @jhamman @pwolfram

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:24 (19 by maintainers)

github_iconTop GitHub Comments

3reactions
darothencommented, Aug 29, 2016

I definitely see the logic with regards to encouraging users to use a context manager, and from the perspective of someone building a third-party library on top of xarray it would be fine. However, I think that from the perspective of an end-user (for example, a scientist) crunching numbers and analyzing data with xarray simply as a convenience library, this produces much too obfuscated code - a standard library import (contextlib, which isn’t something many scientific coders would regularly use or necessarily know about) and a lot of boiler-plate “enabling” the extra features they want in their calculation.

I think your earlier proposal of an xarray.set_options is a cleaner and simpler way forward, even if it does have thorns. Do you have any estimate of the performance penalty checking hooks on all xarray objects would incur?

2reactions
dopplershiftcommented, Aug 15, 2018

@shoyer I know elsewhere you said you weren’t sure about this idea any more, but personally I’d like to push forward on this idea. Do you have problems with this approach we need to resolve? Any chance you have some preliminary code?

I think this is the right way to solve the unit issue in XArray, since at it’s core unit handling is mostly a metadata operation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What's New - Xarray
This release brings improvements to plotting of categorical data, the ability to specify how attributes are combined in xarray operations, a new high-level ......
Read more >
Modeling Framework — xarray-simlab 0.5.0 documentation
This section explains the design of the xarray-simlab modeling framework. It is useful mostly for users who want to create new models from...
Read more >
Pangeo Data… and Models!. Introducing Xarray-simlab
monitor model runs with progress bars or custom runtime hook functions. All those features are detailed in Xarray-simlab's documentation. More ...
Read more >
Releases — HoloViews v1.15.3
Handle tuple unit on xarray attribute (#4881). Support selection masks and expressions on ... Fix handling of custom matplotlib and bokeh colormaps (#4693)....
Read more >
Descriptor HowTo Guide — Python 3.11.1 documentation
Functions and methods. Kinds of methods. Static methods. Class methods. Member objects and __slots__. Descriptors let objects customize attribute lookup, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found