Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Should we cache some small properties?

See original GitHub issue

I was doing some profiling on isel, and see there are some properties that (I think) never change, but are called frequently. Should we cache these on their object?

Pandas uses cache_readonly for these cases.

Here’s a case: we call LazilyOuterIndexedArray.shape frequently when doing a simple indexing operation. Each call takes ~150µs. An attribute lookup on a python object takes ~50ns (i.e. 3000x faster). IIUC the result on that property should never change.

I don’t think this is the solution to performance issues, and there’s some additional complexity. Could they be easy & small wins, though?

Issue Analytics

State:
Created 4 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

crusaderkycommented, Nov 12, 2019

By reading the implementation of cachedproperty, it needs a __dict__. It should be straightforward to write a variant that uses slots though.

0reactions

crusaderkycommented, Nov 14, 2019

%%prun -s cumulative
for _ in range(10000):
    ds.isel(x=[0])

Output (uncached):

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.148    0.000    3.317    0.000 dataset.py:1854(isel)
    [...]
    60000    0.092    0.000    0.122    0.000 indexing.py:535(shape)

Top Results From Across the Web

Caching data with class properties - why is it a bad idea?

Strictly speaking it is not a bad idea per se, in that it will do you what you expect it to do and...

Resolving Inefficient Cache Usage - IBM

Following are the different ways in which inefficient use of cache can be resolved, depending on the identified cause: Improperly Tuned Cache Performance...

Best practices for caching in Spark SQL - Towards Data Science

In Spark SQL caching is a common technique for reusing some computation. ... We will use the following dataset and cluster properties:

What is Caching and How it Works - AWS

A cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for...

Information about using disk drive caches with SQL Server ...

To fully secure your data, you should ensure that all data caching is properly handled. In many situations, this means you must disable...