Use IO-Subgraph + Blockwise throughout codebase

~Blocked by #6715~ ~Blocked by #7281~

After #7281 goes in, most (maybe all) IO operations in the dataframe and array modules can be converted into Blockwise high-level graph (HLG) layers. This change should improve HLG optimizations (like blockwise fusion).

This issue is intended as a “living” checklist of action items related to Blockwise IO…

Dask-Dataframe:

read_parquet [#7415]
read_csv (via text_blocks_to_pandas) [#7415]
read_orc [#7415]
make_timeseries [#7615]
daily_stock [#7615]
read_hdf [#7625]
from_array
from_pandas
from_bcolz
from_dask_array
from_delayed
read_json

Dask-Array:

from_array
from_zarr (may be covered by from_array)
from_delayed
from_func
empty_like
ones_like [#7281]
zeros_like [#7281]
full_like [#7281]
linspace
arange
eye
diag
diagonal
triu
tril
fromfunction [#7704]
repeat

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

jrbourbeaucommented, Jun 3, 2021

Is it worth creating a doc page explaining how people go about implementing an HLG operation themselves?

Cross referencing https://github.com/dask/dask/issues/7709 and https://github.com/dask/dask/issues/7755 for updating documentation around HLGs

1reaction

jakirkhamcommented, May 26, 2021

Is it worth creating a doc page explaining how people go about implementing an HLG operation themselves? Maybe that helps people feel more comfortable filling out gaps in the codebase. Not to mention writing their own in their code