Unbounded Cache causing some Memory Issues
See original GitHub issueCurrently we memoize calls to select functions, but there is no limit to how much we store. Our caches uses immutable
which is a bit of a memory hog. Eventually, the size of the cache can grow to be so large that it causes an out of memory error in V8.
doing some light memory profiling, the amount of memory that is created by normalizing is pretty heavy. At any rate, if we let the cache grow and grow, eventually it will hit a size that the browser can’t collect and it will just crash.
Couple questions:
- Can we be more selective on the functions we memoize?
- Can we be more selective of when we memoize those functions? Should normalizing cause a memoize?
- Can we prune the cache over time?
Issue Analytics
- State:
- Created 6 years ago
- Reactions:5
- Comments:21 (14 by maintainers)
Top Results From Across the Web
Caching: What could go wrong? - Buck Hodges
We had some issues with IMS performance and we decided to cache the most commonly identities in memory to avoid round-trips to expensive ......
Read more >Java Memory Problems | Dynatrace news
Unbound or inefficiently configured caches are another source of constant high ... In some cases memory problems also affect availabilty.
Read more >Unbounded Transactional Memory - supertech
Our studies show that the common case is small transactions that commit, even when contention is high, but that some applications contain very...
Read more >Unbounded cache model for online language ... - NIPS papers
These models only capture the local context, of up to a few thousands ... An unbounded cache adds a non-parametric and unconstrained memory...
Read more >UNBOUNDED TRANSACTIONAL MEMORY
a transactional memory system unbounded if ... like cache size or memory size—that can vary ... appear to be atomic with respect to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hey folks—I did some more research re: memoiziation effectiveness to look for some quick wins. I instrumented the
memoize
util to track 1) the number of cache insertions and cache hits for each memoized property in each class and 2) the computational cost of computing the property (by running the underlying function 20x and timing it) and then just used the editor for a while.(Source: https://github.com/bengotow/slate/blob/memoization-metrics/packages/slate/src/utils/memoize.js#L68)
I think these stats would vary based on the Slate plugins you’re using. The editor I’m testing has a dozen or so instances of
slate-auto-replace
, usesslate-soft-break
andslate-edit-list
, and has a custom spellcheck decorator and some other things. Hopefully I’m not doing anything particularly odd and it’s a good representation of the kitchen sink. 😄Column Meanings:
Reads / (Reads + Saves)
- overall “usefulness” of this value being cached.Based on this data, I believe the following properties are very easy to calculate and are not worth caching (caching them produces little time savings). These are at the bottom of the graph above which is sorted by
Time Savings
:The following cached properties have a very low hit rate (<20%), meaning Slate is caching them a lot and using the cached value infrequently, so it’s a poor use of space in RAM:
(
Node.getFragmentAtRange
would have been in these sets, but it has a very high computation time so it might be an exception to the rule.)On the other hand, it’s nice to see things like
Node.getParent
are producing massive speed improvements by being cached—95% hit rate and283s
of computation time saved. I only typed for ~3min to build these results, sounds like memoizing that almost doubled performance.@bengotow this is extremely awesome! Thank you for digging into that. These results are super helpful for thinking about this, and should absolutely help us make some quick savings on memoizing lots of these pieces. Before I get into dissecting, a few questions…
Is there any way to also factor into the results tables the overall size in memory that is taken up by the different caches? (Or even not inside the table, but just to see which cached methods are the ones taking up the most size over time.)
Is the “Time to Calculate (ms)” column there the average? (I’m assuming yes, which is great!)
@bengotow does your use case make heavy use of schemas to normalize the document? I know
slate-edit-list
has schema information, but is there any other schema use in your example?(Also, I’m going to specifically call out PR’s that I’d accept so far from this, just because I think it might make it more clear, since I’ve got a decent amount of context, and sometimes these kinds of research are hard to know when to trust data vs. familiarity. Hope that makes sense!)
One of the first things I notice, is that we have pair methods where one version returns a mutable value and the other returns the equivalent mutable value. For example
getKeysAsArray
andgetKeys
. We created these I believe because it seemed that often working on immutable values was slower, so internally we use the mutable arrays whenever possible.But, this definitely adds duplication in the memoization. For example,
getKeysAsArray
is used a lot and has a high cache hit rate. ButgetKeys
isn’t used (almost at all) in core, so we’re memoizing those potentially large sets of keys for no reason.And even if it is read by external code, converting an array of keys to a set probably isn’t that hard, and it will just be cached at the layer directly below of
getKeysAsArray
, saving all of the iteration of the tree which is the big cost anyways. So for the first change…👌 Change 1: I’d accept a PR that removed memoization on
node.getKeys
, and also changed these two lines in core to usegetKeysAsArray
instead ofgetKeys
.Okay, so continuing on that train of thought, here are the comparisons on time savings written up for the “arrays vs. immutables” caching:
All but one of them has the array variant saving the most time. Again this is probably because most of the core usage of these functions works on the array variant, and not the immutable one for speed reasons.
The only exception is
getActiveMarksAtRange
, where it seems like the immutable variant is actually saving us more time. And I think this is due to the immutable version being used in theValue
model directly, so it’s seeing more opportunity to have cache hits.What this tells us—which is already known but good to remember—is that this data is impacted by how much the code paths are used. Anything that is used in core for its main calculations is going to see high use, and thus high chance for cache hits to play a factor. But if there are any paths that are highly depended on in user land but not in core, we might not see them show up. (This is tempered by @bengotow’s use case, which is hopefully a kitchen-sink type use case.)
Okay, so with that, I think there are two more changes I’d accept:
👌 Change 2: I’d accept a PR that removed the memoization on the immutable variant of any method that also has an
AsArray
variant directly backing it. I believe that list would be:(I’ve included
getActiveMarksAtRange
in this list, because I think even though we’re seeing some decent cache hits from it, those hits will simply fall to the*AsArray
layer below, and we should still maintain most of the gains I think, while reducing some of the memory size.)👌 Change 3: I’d accept a PR that looked in the
Node
,Value
andText
models for any previous methods that were using the non-AsArray
variant that could be changed to use theAsArray
variant. This change would ensure that even after we’ve removed the memoization on the immutable variants that we’re still using the most-cached path. For example, this line is a contender becausegetCharactersAtRange
returns aImmutable.List
and we’re just iterating over it once, so it could easily be swapped for a regular array. (The only tricky part here is if the immutable value is anImmutable.Set
, in which case the regular array won’t be de-duped, so we can’t rely on it.)So far, after those changes, using the two lists that @bengotow compiled of candidates for removal, we’re left with:
For
node.getLastText
, I’m curious about that one. It seems likegetFirstText
is seeing massive savings, and I wonder ifgetLastText
is not purely because it’s used less, or whether the code paths aren’t hitting it here. I’m reticent to remove the memoization on one but not both, for consistency.For
node.validate
andtext.validate
I’m also curious about these. I’d have assumed that they’d see lots of cache hits if there are recursive schema validations happening. I think we need more information before we can remove these.Given that, I’ll wait for responses before digging more!
Thank you @bengotow!