question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Stateful user-defined accessors

See original GitHub issue

If anybody decorates a stateful class with @register_dataarray_accessor or @register_dataset_accessor, the instance will lose its state on any method that invokes _to_temp_dataset, as well as on a shallow copy.


In [1]: @xarray.register_dataarray_accessor('foo') 
   ...: class Foo: 
   ...:     def __init__(self, obj): 
   ...:         self.obj = obj 
   ...:         self.x = 1 
   ...:          
   ...:                                                                                                                                                                                                                                                        

In [2]: a = xarray.DataArray()                                                                                                                                                                                                                                 

In [3]: a.foo.x                                                                                                                                                                                                                                                
Out[3]: 1

In [4]: a.foo.x = 2                                                                                                                                                                                                                                            

In [5]: a.foo.x                                                                                                                                                                                                                                               
Out[5]: 2

In [6]: a.roll().foo.x                                                                                                                                                                                                                                        
Out[6]: 1

In [7]: a.copy(deep=False).foo.x                                                                                                                                                                                                                              
Out[7]: 1

While in the case of _to_temp_dataset it could be possible to spend (substantial) effort to retain the state, on the case of copy() it’s impossible without modifying the accessor duck API, as one would need to tamper with the accessor instance in place and modify the pointer back to the DataArray/Dataset.

This issue is so glaring that it makes me strongly suspect that nobody saves any state in accessor classes. This kind of use would also be problematic in practical terms, as the accessor object would have a hard time realising when its own state is no longer coherent with the referenced DataArray/Dataset.

This design also carries the problem that it introduces a circular reference in the DataArray/Dataset. This means that, after someone invokes an accessor method on his DataArray/Dataset, then the whole object - including the numpy buffers! - won’t be instantly collected when it’s dereferenced by the user, and it will have to instead wait for the next gc pass. This could cause huge increases in RAM usage overnight in a user application, which would be very hard to logically link to a change that just added a custom method.

Finally, with https://github.com/pydata/xarray/pull/3250/, this statefulness forces us to increase the RAM usage of all datasets and dataarrays by an extra slot, for all users, even if this feature is quite niche.

Proposed solution

Get rid of accessor caching altogether, and just recreate the accessor object from scratch every time it is invoked. In the documentation, clarify that the __init__ method should not perform anything computationally intensive.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:4
  • Comments:15 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
fmaussioncommented, Aug 27, 2019

Interesting, thanks!

As an accessor maintainer, I can ensure that at least one accessor implementation is storing state 😉. But this state is based on the xarray object itself: for example, we derive georeferencing information and store the names of the coordinate variabless we know are going to be useful to us later. That is, every new call to __init__ based on a modified object will trigger a new parsing, and we don’t come into the situation you describe above.

Getting rid of the caching logic would mean some performance loss to us, yes, but I don’t know if it’s “worse” than the circular reference issue you describe or not.

0reactions
gmazecommented, Oct 8, 2019

Alright, I think I get it, thanks for the clarification @crusaderky

Read more comments on GitHub >

github_iconTop Results From Across the Web

Stateful Objects - Artima
This chapter explains what stateful objects are, and what Scala ... a third for a library of user-defined circuits, and the last layer...
Read more >
Correct Execution of Reconfiguration for Stateful Components
In component-based software engineering, reconfiguration describes structural changes to the architecture of a component system. For stateful components ...
Read more >
Glossary A-E @ L.V. Expertise X3
An accessor is a function that provides access to a property of a class. ... Typically, data modifications or creations are managed by...
Read more >
Macro for accessors methods : r/cpp - Reddit
Classes model behavior, sometimes stateful behavior. ... r/cpp - ScyllaDB's take on WebAssembly for user-defined functions, with helper.
Read more >
5 Using JAXB Data Binding - Oracle Help Center
User-defined data types are those that you create from XML Schema or Java building ... interface or the port accessor method in the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found