Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use Blockwise/`map_partitions` in various DataFrame join methods

See original GitHub issue

I noticed that some join methods have things like

        dsk = {
            (name, i): (apply, merge_chunk, [left_key, right_key], kwargs)
            for i, right_key in enumerate(right.__dask_keys__())
        }

where we’re generating a low-level graph that could just be done with map_partitions. Using map_partitions in these scenarios would both speed up graph transmission and allow for blockwise fusion across the operations. Refactoring this simple sorts of graphs should be straightforward.

single_partition_join
hash_join’s merge_chunk
stack_partitions should use HighLevelGraph.from_collections instead of merging all of the input graphs

cc @rjzamora @ncclementi @jrbourbeau

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

gjoseph92commented, Nov 3, 2021

I just realized this is slightly more important than I’d originally thought, because now that low-level optimization is turned off for DataFrames, the only way we get task fusion is through HighLevelGraphs. So even simple linear chains won’t be fused, exposing us to root task overproduction (https://github.com/dask/distributed/issues/5223).

For example, this means that a single_partition_join followed by a map_partitions operation may have worse memory performance than doing the join yourself within a map_partitions, since lots of extra single_partition_join outputs can accumulate in memory.

cc @jrbourbeau @ncclementi

0reactions

jrbourbeaucommented, Nov 15, 2021

Re-opening to continue to track here. I’ve also updated the original post to be a checklist instead of a bulleted list (hope that’s okay)