memmap reads from directory store
See original GitHub issueIve only recently started using zarr but im impressed. well done.
I want to share an experience and a possible enhancement. In one of my use cases i use vindex heavily across the whole array. I know this is likely a worst use case scenario as zarr is reading many many chunks for a small amount of data in each one. I was previously using numpy memmap arrays for a similar use and it was much faster so i wondered if i used an uncompressed DirectoryStore if it would read chunks as a memmap. no luck, still reading full chunks. So i had a go at subclassing DirectoryStore to do this.
class MemMapReadStore(zarr.DirectoryStore):
"""Directory store using MemMap for reading chunks
"""
def __getitem__(self, key):
filepath = os.path.join(self.path, key)
if os.path.isfile(filepath):
#are there only 2 types of files? .zarray and the chunks?
if key == '.zarray':
with open(filepath, 'rb') as f:
return f.read()
else:
return np.memmap(filepath,mode='r')
else:
raise KeyError(key)
Its working well for me but I dont really know the inner workings of zarr so who knows what i might have broken and other features it wont play well with. I thought the idea might be a basis for an enhancement though. Worth sharing at least.
Speed up depends on access pattern, compression etc but for the example im testing im seeing 22 times speed up v a compressed zarr array of the same dimensions and chunking.
Its only working for reads as that was all i needed and i see the way you write replaces the whole chunk so memmap writes are not doable.
Issue Analytics
- State:
- Created 5 years ago
- Comments:15 (13 by maintainers)
Put together PR ( https://github.com/zarr-developers/zarr/pull/377 ), which adds the
memmap
option so we can further the discussion by looking at an implementation.Since PR ( https://github.com/zarr-developers/zarr-python/pull/377 ) was opened, we added PR ( https://github.com/zarr-developers/zarr-python/pull/503 ), which allows users to customize how reading occurs by overriding the
staticmethod
_fromfile
ofDirectoryStore
. For example:This store can then be used with
Group
s andArray
s.Given a user can do this on their own easily, have turned this into a doc issue ( https://github.com/zarr-developers/zarr-python/issues/1245 ). Closing this out.