Genetics data IO performance stats/doc
See original GitHub issueThis is a dump of some of the performance experiments. It’s part of a larger issue of performance setup and best practices for dask/sgkit and genetic data. The goal is to share the findings and continue the discussion.
Where not otherwise stated, the test machine is a GCE VM, 16 cores and 64GB of memory, 400 SPD. Dask cluster is a single node process based. If the data is read from GCS, the bucket is in the same region as the VM:
Specs/libs
➜ ~ uname -a
Linux rav-dev 4.19.0-13-cloud-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux
➜ ~ conda list
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
aiohttp 3.7.2 py38h1e0a361_0 conda-forge
appdirs 1.4.4 pypi_0 pypi
argon2-cffi 20.1.0 py38h1e0a361_2 conda-forge
asciitree 0.3.3 pypi_0 pypi
async-timeout 3.0.1 py_1000 conda-forge
async_generator 1.10 py_0 conda-forge
atk 2.36.0 ha770c72_4 conda-forge
atk-1.0 2.36.0 h0d5b62e_4 conda-forge
attrs 20.2.0 pyh9f0ad1d_0 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.1 py_0 conda-forge
bleach 3.2.1 pyh9f0ad1d_0 conda-forge
blinker 1.4 py_1 conda-forge
bokeh 2.2.3 py38h578d9bd_0 conda-forge
brotlipy 0.7.0 py38h8df0ef7_1001 conda-forge
ca-certificates 2020.11.8 ha878542_0 conda-forge
cachetools 4.1.1 py_0 conda-forge
cairo 1.16.0 h488836b_1006 conda-forge
cbgen 0.1.6 pypi_0 pypi
certifi 2020.11.8 py38h578d9bd_0 conda-forge
cffi 1.14.3 py38h1bdcb99_1 conda-forge
chardet 3.0.4 py38h924ce5b_1008 conda-forge
click 7.1.2 pyh9f0ad1d_0 conda-forge
cloudpickle 1.6.0 py_0 conda-forge
cryptography 3.2.1 py38h7699a38_0 conda-forge
cython 0.29.21 pypi_0 pypi
cytoolz 0.11.0 py38h25fe258_1 conda-forge
dask 2.30.0 py_0 conda-forge
dask-core 2.30.0 py_0 conda-forge
dask-glm 0.2.0 pypi_0 pypi
dask-ml 1.7.0 pypi_0 pypi
decorator 4.4.2 py_0 conda-forge
defusedxml 0.6.0 py_0 conda-forge
distributed 2.30.0 pypi_0 pypi
entrypoints 0.3 py38h32f6830_1002 conda-forge
expat 2.2.9 he1b5a44_2 conda-forge
fasteners 0.15 pypi_0 pypi
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 2.001 hab24e00_0 conda-forge
font-ttf-source-code-pro 2.030 hab24e00_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.13.1 h7e3eb15_1002 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
freetype 2.10.4 h7ca028e_0 conda-forge
fribidi 1.0.10 h36c2ea0_0 conda-forge
fsspec 0.8.4 py_0 conda-forge
gcsfs 0.7.1 py_0 conda-forge
gdk-pixbuf 2.42.0 h0536704_0 conda-forge
gettext 0.19.8.1 hf34092f_1004 conda-forge
giflib 5.2.1 h36c2ea0_2 conda-forge
gil-load 0.4.0 pypi_0 pypi
glib 2.66.3 h58526e2_0 conda-forge
gobject-introspection 1.66.1 py38h4eacb9c_3 conda-forge
google-auth 1.23.0 pyhd8ed1ab_0 conda-forge
google-auth-oauthlib 0.4.2 pyhd8ed1ab_0 conda-forge
graphite2 1.3.13 h58526e2_1001 conda-forge
graphviz 2.42.3 h6939c30_2 conda-forge
gtk2 2.24.32 h194ddfc_3 conda-forge
gts 0.7.6 h17b2bb4_1 conda-forge
harfbuzz 2.7.2 ha5b49bf_1 conda-forge
heapdict 1.0.1 py_0 conda-forge
icu 67.1 he1b5a44_0 conda-forge
idna 2.10 pyh9f0ad1d_0 conda-forge
importlib-metadata 2.0.0 py_1 conda-forge
importlib_metadata 2.0.0 1 conda-forge
iniconfig 1.1.1 pypi_0 pypi
ipykernel 5.3.4 py38h1cdfbd6_1 conda-forge
ipython 7.19.0 py38h81c977d_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 7.5.1 pypi_0 pypi
jedi 0.17.2 py38h32f6830_1 conda-forge
jinja2 2.11.2 pyh9f0ad1d_0 conda-forge
joblib 0.17.0 pypi_0 pypi
jpeg 9d h36c2ea0_0 conda-forge
json5 0.9.5 pyh9f0ad1d_0 conda-forge
jsonschema 3.2.0 py_2 conda-forge
jupyter-server-proxy 1.5.0 py_0 conda-forge
jupyter_client 6.1.7 py_0 conda-forge
jupyter_core 4.6.3 py38h32f6830_2 conda-forge
jupyterlab 2.2.9 py_0 conda-forge
jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge
jupyterlab_server 1.2.0 py_0 conda-forge
lcms2 2.11 hcbb858e_1 conda-forge
ld_impl_linux-64 2.35 h769bd43_9 conda-forge
libblas 3.9.0 3_openblas conda-forge
libcblas 3.9.0 3_openblas conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.3.0 h5dbcf3e_17 conda-forge
libgfortran-ng 9.3.0 he4bcb1c_17 conda-forge
libgfortran5 9.3.0 he4bcb1c_17 conda-forge
libglib 2.66.3 hbe7bbb4_0 conda-forge
libgomp 9.3.0 h5dbcf3e_17 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 3_openblas conda-forge
libopenblas 0.3.12 pthreads_h4812303_1 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libsodium 1.0.18 h516909a_1 conda-forge
libstdcxx-ng 9.3.0 h2ae2ef3_17 conda-forge
libtiff 4.1.0 h4f3a223_6 conda-forge
libtool 2.4.6 h58526e2_1007 conda-forge
libuuid 2.32.1 h14c3975_1000 conda-forge
libwebp 1.1.0 h76fa15c_4 conda-forge
libwebp-base 1.1.0 h36c2ea0_3 conda-forge
libxcb 1.13 h14c3975_1002 conda-forge
libxml2 2.9.10 h68273f3_2 conda-forge
llvmlite 0.34.0 pypi_0 pypi
locket 0.2.0 py_2 conda-forge
lz4-c 1.9.2 he1b5a44_3 conda-forge
markupsafe 1.1.1 py38h8df0ef7_2 conda-forge
mistune 0.8.4 py38h1e0a361_1002 conda-forge
monotonic 1.5 pypi_0 pypi
msgpack-python 1.0.0 py38h82cb98a_2 conda-forge
multidict 4.7.5 py38h1e0a361_2 conda-forge
multipledispatch 0.6.0 pypi_0 pypi
nbclient 0.5.1 py_0 conda-forge
nbconvert 6.0.7 py38h32f6830_2 conda-forge
nbformat 5.0.8 py_0 conda-forge
ncurses 6.2 he1b5a44_2 conda-forge
nest-asyncio 1.4.1 py_0 conda-forge
notebook 6.1.4 py38h32f6830_1 conda-forge
numba 0.51.2 pypi_0 pypi
numcodecs 0.7.2 pypi_0 pypi
numpy 1.19.3 pypi_0 pypi
oauthlib 3.0.1 py_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openssl 1.1.1h h516909a_0 conda-forge
packaging 20.4 pyh9f0ad1d_0 conda-forge
pandas 1.1.4 pypi_0 pypi
pandoc 2.11.0.4 hd18ef5c_0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
pango 1.42.4 h69149e4_5 conda-forge
parso 0.7.1 pyh9f0ad1d_0 conda-forge
partd 1.1.0 py_0 conda-forge
pcre 8.44 he1b5a44_0 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 8.0.1 py38h70fbd49_0 conda-forge
pip 20.2.4 py_0 conda-forge
pixman 0.38.0 h516909a_1003 conda-forge
pluggy 0.13.1 pypi_0 pypi
pooch 1.2.0 pypi_0 pypi
prometheus_client 0.8.0 pyh9f0ad1d_0 conda-forge
prompt-toolkit 3.0.8 py_0 conda-forge
psutil 5.7.3 py38h8df0ef7_0 conda-forge
pthread-stubs 0.4 h14c3975_1001 conda-forge
ptyprocess 0.6.0 py_1001 conda-forge
py 1.9.0 pypi_0 pypi
py-spy 0.3.3 pypi_0 pypi
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.7 py_0 conda-forge
pycparser 2.20 pyh9f0ad1d_2 conda-forge
pygments 2.7.2 py_0 conda-forge
pyjwt 1.7.1 py_0 conda-forge
pyopenssl 19.1.0 py_1 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyrsistent 0.17.3 py38h1e0a361_1 conda-forge
pysocks 1.7.1 py38h924ce5b_2 conda-forge
pytest 6.1.2 pypi_0 pypi
python 3.8.6 h852b56e_0_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python-graphviz 0.15 pypi_0 pypi
python_abi 3.8 1_cp38 conda-forge
pytz 2020.1 pypi_0 pypi
pyyaml 5.3.1 pypi_0 pypi
pyzmq 19.0.2 py38ha71036d_2 conda-forge
readline 8.0 he28a2e2_2 conda-forge
rechunker 0.3.1 pypi_0 pypi
requests 2.24.0 pyh9f0ad1d_0 conda-forge
requests-oauthlib 1.3.0 pyh9f0ad1d_0 conda-forge
rsa 4.6 pyh9f0ad1d_0 conda-forge
scikit-learn 0.23.2 pypi_0 pypi
scipy 1.5.3 pypi_0 pypi
send2trash 1.5.0 py_0 conda-forge
setuptools 49.6.0 py38h924ce5b_2 conda-forge
sgkit 0.1.dev290+gb81de07 pypi_0 pypi
simpervisor 0.3 py_1 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sortedcontainers 2.2.2 pypi_0 pypi
sqlite 3.33.0 h4cf870e_1 conda-forge
tblib 1.7.0 pypi_0 pypi
terminado 0.9.1 py38h32f6830_1 conda-forge
testpath 0.4.4 py_0 conda-forge
threadpoolctl 2.1.0 pypi_0 pypi
tk 8.6.10 hed695b0_1 conda-forge
toml 0.10.2 pypi_0 pypi
toolz 0.11.1 py_0 conda-forge
tornado 6.1 py38h25fe258_0 conda-forge
traitlets 5.0.5 py_0 conda-forge
typing-extensions 3.7.4.3 0 conda-forge
typing_extensions 3.7.4.3 py_0 conda-forge
urllib3 1.25.11 py_0 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.35.1 pyh9f0ad1d_0 conda-forge
widgetsnbextension 3.5.1 pypi_0 pypi
xarray 0.16.1 pypi_0 pypi
xorg-kbproto 1.0.7 h14c3975_1002 conda-forge
xorg-libice 1.0.10 h516909a_0 conda-forge
xorg-libsm 1.2.3 h84519dc_1000 conda-forge
xorg-libx11 1.6.12 h516909a_0 conda-forge
xorg-libxau 1.0.9 h14c3975_0 conda-forge
xorg-libxdmcp 1.1.3 h516909a_0 conda-forge
xorg-libxext 1.3.4 h516909a_0 conda-forge
xorg-libxpm 3.5.13 h516909a_0 conda-forge
xorg-libxrender 0.9.10 h516909a_1002 conda-forge
xorg-libxt 1.1.5 h516909a_1003 conda-forge
xorg-renderproto 0.11.1 h14c3975_1002 conda-forge
xorg-xextproto 7.3.0 h14c3975_1002 conda-forge
xorg-xproto 7.0.31 h14c3975_1007 conda-forge
xz 5.2.5 h516909a_1 conda-forge
yaml 0.2.5 h516909a_0 conda-forge
yarl 1.6.2 py38h1e0a361_0 conda-forge
zarr 2.5.0 pypi_0 pypi
zeromq 4.3.3 he1b5a44_2 conda-forge
zict 2.0.0 pypi_0 pypi
zipp 3.4.0 py_0 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
The issue with suboptimal saturation was originally reported for this code:
import fsspec
import xarray as xr
from sgkit.io.bgen.bgen_reader import unpack_variables
from dask.diagnostics import ProgressBar, ResourceProfiler, Profiler
path = "gs://foobar/data.zarr"
store = fsspec.mapping.get_mapper(path, check=False, create=False)
ds = xr.open_zarr(store, concat_characters=False, consolidated=False)
ds = unpack_variables(ds, dtype='float16')
ds["variant_dosage_std"] = ds["call_dosage"].astype("float32").std(dim="samples")
with ProgressBar(), Profiler() as prof, ResourceProfiler() as rprof:
ds['variant_dosage_std'] = ds['variant_dosage_std'].compute()
With local input, performance graph:
It’s pretty clear the cores are well saturated. I also measure GIL, GIL was held for 13% of time and waited on for 2.1%, with each worker thread (16 threads) holding it for 0.7% and waiting for 0.1% of time.
For GCS input (via fsspec):
GIL summary: GIL was held for 18% of time and waited on for 3.8%, with each worker thread (16 threads) holding it for 0.6% and waiting for 0.2% of time, with one thread holding GIL for 6.5% and waiting 1.6% time.
held: 0.186 (0.191, 0.187, 0.186)
wait: 0.038 (0.046, 0.041, 0.039)
<140287451305792>
held: 0.015 (0.029, 0.017, 0.015)
wait: 0.002 (0.002, 0.002, 0.002)
<140284185433856>
held: 0.065 (0.061, 0.064, 0.065)
wait: 0.016 (0.015, 0.017, 0.016)
<140284540389120>
held: 0.0 (0.0, 0.0, 0.0)
wait: 0.0 (0.0, 0.0, 0.0)
<140284590728960>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.002, 0.002, 0.002)
<140284599121664>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.002, 0.002, 0.001)
<140284759570176>
held: 0.006 (0.008, 0.007, 0.007)
wait: 0.002 (0.001, 0.001, 0.002)
<140284751177472>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.001, 0.001, 0.002)
<140283956950784>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.001 (0.001, 0.001, 0.001)
<140283948558080>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.001 (0.001, 0.001, 0.001)
<140283940165376>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.002, 0.002, 0.002)
<140283931772672>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.001, 0.002, 0.002)
<140283923379968>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.001, 0.002, 0.002)
<140283914987264>
held: 0.006 (0.007, 0.007, 0.007)
wait: 0.002 (0.001, 0.002, 0.002)
<140283295561472>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.002, 0.002, 0.002)
<140283287168768>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.002, 0.002, 0.002)
<140283278776064>
held: 0.006 (0.006, 0.007, 0.006)
wait: 0.002 (0.002, 0.002, 0.002)
<140283270383360>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.001 (0.001, 0.001, 0.001)
<140283261990656>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.001 (0.002, 0.002, 0.001)
<140283253597952>
held: 0.006 (0.006, 0.006, 0.006)
wait: 0.002 (0.001, 0.001, 0.001)
<140283245205248>
held: 0.0 (0.0, 0.0, 0.0)
wait: 0.0 (0.0, 0.0, 0.0)
<140282691581696>
held: 0.001 (0.0, 0.001, 0.001)
wait: 0.001 (0.001, 0.001, 0.001)
<140282683188992>
held: 0.002 (0.002, 0.002, 0.002)
wait: 0.001 (0.001, 0.001, 0.001)
<140282674796288>
held: 0.001 (0.001, 0.001, 0.001)
wait: 0.003 (0.012, 0.004, 0.003)
It’s clear that the CPU usage is lower, and not fully saturated, GIL wait time is a bit up (with a concerning spike in one thread). With remote/fsspec input, we have the overhead of data decryption and potential network IO overhead (tho it doesn’t seem like we hit network limits).
Issue Analytics
- State:
- Created 3 years ago
- Comments:9
Top GitHub Comments
A lot to digest here, thanks for the great work @ravwojdyla!
Based on the performance tests done above, here are some high level guidelines for dask performance experiments (this is a starting point, we might find a better home for this later, and potentially have someone from Dask review them):
TODO: