StoreCache, a storage class to add arbitrary caches to existing stores
See original GitHub issueI’ve implemented a simple StoreCache which is similar to LRUStoreCache but accepts a cache object to use instead of implementing the cache internally. This was created to meet a requirement to cache on disk instead of in memory. It can be composed with LRUStoreCache to enable layered caching.
StoreCache enables a variety of off the shelf caches to be used as a chunk cache. The example script below tests DiskCache, the builtin dbm
, and with some slight modification cachey. For each cache, two requests are made to a Zarr dataset on S3:
import dbm
from timeit import timeit
import tempfile
import cachey
import diskcache
import s3fs
import zarr
from zarr.storage import StoreCache
s3 = s3fs.S3FileSystem(anon=True, client_kwargs=dict(region_name='eu-west-2'))
uncached_store = s3fs.S3Map(root='zarr-demo/store', s3=s3, check=False)
disk_cache_store = StoreCache(uncached_store, diskcache.Cache())
dict_cache_store = StoreCache(uncached_store, dict())
dbm_cache_store = StoreCache(uncached_store, dbm.open(tempfile.TemporaryDirectory().name, flag='c'))
# modify a cache which doesn't support item assignment
class CacheyStoreCache(cachey.Cache):
def __setitem__(self, k, v):
self.put(k, v, cost=len(v))
cachey_store = StoreCache(uncached_store, CacheyStoreCache(2**20))
def benchmark(cache_type, store):
root = zarr.group(store=store)
f = lambda: root["foo/bar/baz"][:]
t1 = timeit(f, number=1)
t2 = timeit(f, number=1)
print(f'{cache_type}\nt1: {t1}\nt2: {t2}')
benchmark('disk', disk_cache_store)
benchmark('dict', dict_cache_store)
benchmark('dbm', dbm_cache_store)
benchmark('cachey', cachey_store)
benchmark('uncached', uncached_store)
Results from my laptop:
disk
t1: 1.8804557690000365
t2: 0.0005672639999829698
dict
t1: 1.9392649410001468
t2: 0.0005628700000670506
dbm
t1: 2.0158668909998596
t2: 0.0004138780000175757
cachey
t1: 1.766250748999937
t2: 0.0004628369997590198
uncached
t1: 2.013281605999964
t2: 1.9759216810002727
A sufficient amount of the interface was implemented to allow it to pass the test suite:
class StoreCache(MutableMapping):
def __init__(self, store, cache):
self._cache = cache
self._store = store
def __getitem__(self, key):
value = self._cache.get(key)
if not value:
value = self._store.__getitem__(key)
self._cache[key] = value
return value
def __iter__(self):
return self._store.__iter__()
def __len__(self):
return self._store.__len__()
def __contains__(self, key):
return key in self._cache or self._store.__contains__(key)
def keys(self):
return self._store.keys()
def __delitem__(self, key):
self._store.__delitem__(key)
del self._cache[key]
def __setitem__(self, key, value):
self._store.__setitem__(key, value)
self._cache[key] = value
def items(self):
return self._store.items()
def values(self):
return self._store.values()
If this would be useful then I can submit a pull request. Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Designing the Caching system
In computing, a cache is a high-speed data storage layer which stores a ... Caching allows you to efficiently reuse previously retrieved or...
Read more >Interacting with cached data - Apollo GraphQL Docs
In addition to reading arbitrary data from the Apollo Client cache, you can write arbitrary data to the cache with the writeQuery and...
Read more >Caching with Rails: An overview
Some cache store implementations, like MemoryStore, are able to cache arbitrary Ruby objects, but don't count on every cache store to be able...
Read more >Caching in Python Using the LRU Cache Strategy
Caching is an optimization technique that you can use in your applications to keep recent or often-used data in memory locations that are...
Read more >29. Cache Abstraction
Similar to the transaction support, the caching abstraction allows consistent use of various caching solutions with minimal impact on the code.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
That’s a great post, leveraging CDNs for lower level things like Zarr chunks could be interesting. Our needs are totally aligned on
I should never have to pull the data across the cloud boundary more than once in a single session.
Are you looking for a general solution beyond Zarr chunks?I haven’t written anything up about this project yet, but Pangeo Showcase sounds like a good forum in which to share. I’ll check in on my end if it would be OK.
The other thing that might be worth playing with is the LRUStoreCache, which was created for this purpose (data crossing cloud boundaries only once). There’s an example in the docs.