Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large memory increase and processing slowness during graph creation

See original GitHub issue

What happened:

When creating a graph consisting of delayed dataframes created from 20k image blocks, the memory climbs into 10+ gigabytes and it takes several minutes to create. No execution of graph is done. I experimented with fewer blocks, and the performance is quite non-linear. The processing time and memory taken up (measured via GKE jupyterhub pod memory usage) by number of blocks:

1000  =>   3s
2000  =>   9s
4000  =>  28s, 0.6GB
8000  =>  93s, 2.4GB
16000 => 324s, 9.1GB

What you expected to happen:

I expected 20k blocks this to fit under a minute and under a gigabyte. I also expected linear performance with number of blocks, as the blocks are independent (as can be verified by .visualize(..) on a single dataframe)

Minimal Complete Verifiable Example:

Scaled down example with 8k blocks, which is enough to demonstrate the memory growth. My actual dataset has 20k blocks, roughly 3TB divided into 150MB blocks – but the problem seems to be purely a function of block count, not datasize.

import dask
import dask.array as da
import dask.dataframe as dd
import numpy as np

image = da.zeros(8000, dtype=np.uint16, chunks=1)
block_iter = zip(np.ndindex(*image.numblocks), image.to_delayed().flatten())

ddf_all = np.empty(image.numblocks, dtype=object)
for idx_chunk, chunk in block_iter:
    ddf_delayed = dask.delayed(lambda x: None)(chunk)
    ddf_all[idx_chunk] = dd.from_delayed(ddf_delayed, meta=[("z", np.float32)])

Environment:

Dask version: 2021.06.2
Python version: 3.8.10
Operating System: Ubuntu 18.04
Install method (conda, pip, source): conda

Issue Analytics

State:
Created 2 years ago
Comments:13 (11 by maintainers)

Top GitHub Comments

1reaction

gjoseph92commented, Sep 27, 2021

Ah that makes sense, every from_delayed probably had to materialize and merge the full un-culled graph of the array. Nice! I had no idea I was fixing this.

1reaction

chrisroatcommented, Sep 27, 2021

The fix by #8174 by @gjoseph92 fixed this issue. In my testing, the 8000 block reproducer finished in <8s with minimal memory usage.