zarr slower than npy, hdf5 etc?
See original GitHub issueI got interested in the performance of zarr and did a comparison with npy, pickle, hdf5 etc. See https://stackoverflow.com/a/58942584/353337. To my surprise, I found zarr reads large arrays slower than npy. This is for random float data as well as more structured mesh data. I had expected zarr to take the cake using multiple cores. Perhaps this isn’t a good test for zarr to show its strength either.
Code to reproduce the plot:
import perfplot
import pickle
import numpy
import h5py
import tables
import zarr
def setup(n):
data = numpy.random.rand(n)
# import meshzoo
# n = int(numpy.cbrt(n))
# points, cells = meshzoo.cube(
# xmin=0.0, xmax=1.0, ymin=0.0, ymax=1.0, zmin=0.0, zmax=1.0, nx=n, ny=n, nz=n
# )
# data = cells
# write all files
#
numpy.save("out.npy", data)
#
f = h5py.File("out.h5", "w")
f.create_dataset("data", data=data)
f.close()
#
with open("test.pkl", "wb") as f:
pickle.dump(data, f)
#
f = tables.open_file("pytables.h5", mode="w")
gcolumns = f.create_group(f.root, "columns", "data")
f.create_array(gcolumns, "data", data, "data")
f.close()
#
zarr.save("out.zip", data)
zarr.save("out.zarr", data)
def npy_read(data):
return numpy.load("out.npy")
def hdf5_read(data):
f = h5py.File("out.h5", "r")
out = f["data"][()]
f.close()
return out
def pickle_read(data):
with open("test.pkl", "rb") as f:
out = pickle.load(f)
return out
def pytables_read(data):
f = tables.open_file("pytables.h5", mode="r")
out = f.root.columns.data[()]
f.close()
return out
def zarr_zarr_read(data):
return zarr.load("out.zarr")
def zarr_zip_read(data):
return zarr.load("out.zip")
b = perfplot.bench(
setup=setup,
kernels=[
npy_read,
hdf5_read,
pickle_read,
pytables_read,
zarr_zarr_read,
zarr_zip_read,
],
n_range=[2 ** k for k in range(27)],
xlabel="len(data)",
title=f"zarr {zarr.__version__}",
)
b.save("out.png")
b.show()
Issue Analytics
- State:
- Created 4 years ago
- Comments:24 (10 by maintainers)
Top Results From Across the Web
Developers - zarr slower than npy, hdf5 etc? - - Bountysource
To my surprise, I found zarr reads large arrays slower than npy. ... I had expected zarr to take the cake using multiple...
Read more >Loading NumPy arrays from disk: mmap() vs ... - Python⇒Speed
Learn how to load larger-than-memory NumPy arrays from disk using either mmap() (using numpy.memmap), or the very similar Zarr and HDF5 file ...
Read more >Is there an analysis speed or memory usage advantage to ...
A memmap will have a fast best-case, but a very, very slow worst-case. h5py is better suited to datasets like yours than pytables...
Read more >Comparison of Array Management Library Performance - SC19
Array management libraries, such as HDF5, Zarr, etc., depend on ... mapping arrays to files, several self-describing data and file for-.
Read more >Moving away from HDF5 - Cyrille Rossant
In a simple benchmark with 3D arrays, Zarr was still slower to read data than h5py. h5py was head-to-head to numpy.save() (NPY format)....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Afaik, zarr uses blosc compression by default. h5py does not compress by default. Also, h5py does not chunk the data if you don’t specify
chunks=True
(or enable compression). Numpy and pickle neither compress nor chunk, I don’t know about pytables. So the comparison is not very fair.FWIW when I benchmarked z5, which implements the zarr spec bin C++, I found the performance on par with hdf5 in single-threaded performance and better mult-threaded. Unfortunately I don’t have the results right now, the code is here.
I’ll update the description of this issue. Here the problem is that someone tried to wrap a dask in a zarr, but you should put a zarr in your dask. 😄