Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Optimize tensordot with rechunk

See original GitHub issue

Dask.array tensordot operations can be made significantly faster by doing an initial rechunk pass, making all axes over which we intend to contract single-chunked.

For example, if we have the following block-chunked arrays:

import dask.array as da
x = da.random.random((500, 500, 500), chunks=(50, 50, 50))
y = da.random.random((500, 100), chunks=(50, 50))

And we want to contract over the 1st and 0th dimensions respectively

da.tensordot(x, y, axes=[1, 0]).visualize()

Then we may want to preemptively rechunk so that those axes have only one chunk as follows:

x2 = x.rechunk((..., 500, ...)).persist()
y2 = y.rechunk((500, ...)).persist()

We may want to contract other axes while we expand these ones to make sure that we don’t produce chunks that are too large (and to ensure that the resulting tensordot chunks are not too large). In this case though the outputs are small enough even with a 10x increase in size, so we leave the other dimensions as-is.

x2 = x.rechunk((50, 500, 50)).persist()
y2 = y.rechunk((500, 50)).persist()

This does incur some communication costs up front, but it will generally save us more communication down the line.

So I think the question here is the following:

Given the chunks of both arrays and the axis= argument, how should we rechunk these arrays prior to the normal tensordot call. This should both increase the chunksize in the contracted axes to the full extent and possibly reduce the chunksize of the other dimension based on the expected nbytes of out output of the tensordot call.

Snagged this trick from this talk: https://youtu.be/dcT6c-PrloE?t=1584

cc @jcrist and @shoyer who might find this interesting. cc @pitrou who did the rechunk logic and might be able to recommend something.

Issue Analytics

State:
Created 6 years ago
Comments:28 (25 by maintainers)

Top GitHub Comments

1reaction

lsorbercommented, Sep 21, 2017

You can write/implement any einsum as a batch of independent GEMMs, so it should be able to benefit from the same fast kernels that tensordot does. I also agree that tensordot is still worth having for those cases where you don’t need the full flexibility of an einsum. Under the covers it could simply call einsum though.

I think einsum is an operation where dask could really shine as a distributed scheduler, because it is an example of a powerful tool where the distribution and scheduling can make a big difference in performance.

0reactions

mrocklincommented, Aug 13, 2021

Thank you for sharing these diagrams @GenevieveBuckley .

@tomwhite , do these match your operational experience?

Top Results From Across the Web

Tensordot — Multidimensional Dot Product — Explained

Tensordot is a super useful tool in tensor operations applicable in optimization, engineering, machine learning … or anywhere else you need ...

dask.array.optimization.optimize Example - Program Talk

Learn how to use python api dask.array.optimization.optimize. ... COO) > 0).rechunk(-1).persist() A_inter = sparse.tril( darr.tensordot( ...

Dask - Matthew Rocklin

Native parallelism in Python; Scales Numpy, Pandas, and Scikit-Learn; Supports arbitrary user-defined task graphs; Powered by task scheduling.

API - Dask documentation

rechunk (x[, chunks, threshold, ...]) ... Compute tensor dot product along specified axes. ... Array.rechunk ([chunks, threshold, ...]).

Numpy Einsum/Tensordot with Shared Noncontracted Axis - ADocLib

.Network communication during shuffles is not well optimized.rechunk and tensordot; Highlevel query optimization: Highlevel. heuniform shape Reference: He et al ...