question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Should we cache some small properties?

See original GitHub issue

I was doing some profiling on isel, and see there are some properties that (I think) never change, but are called frequently. Should we cache these on their object?

Pandas uses cache_readonly for these cases.

Here’s a case: we call LazilyOuterIndexedArray.shape frequently when doing a simple indexing operation. Each call takes ~150µs. An attribute lookup on a python object takes ~50ns (i.e. 3000x faster). IIUC the result on that property should never change.

I don’t think this is the solution to performance issues, and there’s some additional complexity. Could they be easy & small wins, though?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
crusaderkycommented, Nov 12, 2019

By reading the implementation of cachedproperty, it needs a __dict__. It should be straightforward to write a variant that uses slots though.

0reactions
crusaderkycommented, Nov 14, 2019
%%prun -s cumulative
for _ in range(10000):
    ds.isel(x=[0])

Output (uncached):

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.148    0.000    3.317    0.000 dataset.py:1854(isel)
    [...]
    60000    0.092    0.000    0.122    0.000 indexing.py:535(shape)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Caching data with class properties - why is it a bad idea?
Strictly speaking it is not a bad idea per se, in that it will do you what you expect it to do and...
Read more >
Resolving Inefficient Cache Usage - IBM
Following are the different ways in which inefficient use of cache can be resolved, depending on the identified cause: Improperly Tuned Cache Performance...
Read more >
Best practices for caching in Spark SQL - Towards Data Science
In Spark SQL caching is a common technique for reusing some computation. ... We will use the following dataset and cluster properties:
Read more >
What is Caching and How it Works - AWS
A cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for...
Read more >
Information about using disk drive caches with SQL Server ...
To fully secure your data, you should ensure that all data caching is properly handled. In many situations, this means you must disable...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found