question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dask.array.jit or dask.array.vectorize

See original GitHub issue

When working with dask-glm I find myself interacting with functions like the following (where x is a dask.array):

def l2(x, t):
    return 1 / (1 + lamda * t) * x

def l1(x, t):
    return (absolute(x) > lamda * t) * (x - sign(x) * lamda * t)

These are costly in a few ways:

  1. They have decent overhead, because they repeatedly regenerate relatively large graphs
  2. On computation, even if we fuse, we create many intermediate copies of numpy arrays

So there are two part solutions that we could combine here:

  1. For any given dtype/shape/chunks signature, we could precompute a dask graph. When the same dtype/shape/chunks signature comes in we would stitch the new keys in at the right place, change around some tokenized values, and ship the result out without calling all of the dask.array code.
  2. We could numba.jit fused tasks

Using numba would actually be pretty valuable in some cases in dask-glm. This could be an optimization at the task graph level. I suspect that if we get good at recognizing recurring patterns and cache well that we could make this fast-ish. (add, _, (mul, _, _)) -> numba.jit(lambda x, y, z: x + y * z). We might also be able to back out patterns based on keys (not sure if this is safe)

cc @jcrist @eriknw @sklam @seibert @shoyer

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:27 (23 by maintainers)

github_iconTop GitHub Comments

1reaction
shoyercommented, Jan 30, 2017

For what it’s worth, I would expect dask.array.vectorize to be a dask friendly version of numpy.vectorize that doesn’t require numba. Numba support would also be handy but I would save that variant for another function name (e.g., numba_vectorize) or a keyword argument (numba=True).

0reactions
mrocklincommented, Oct 23, 2018

This will likely be handled by the current effort on high level expression graphs. Closing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

dask.array.gufunc.apply_gufunc
Function to call like func(*args, **kwargs) on input arrays ( *args ) that returns an array or tuple of arrays.
Read more >
Stencil Computations with Numba
Many array computing functions operate only on a local region of the array. ... But if we JIT compile this function with Numba,...
Read more >
Array
Dask Array implements a subset of the NumPy ndarray interface using blocked algorithms, cutting up the large array into many small arrays.
Read more >
Create Dask Arrays
You can load or store Dask arrays from a variety of common sources like HDF5, NetCDF, Zarr, or any format that supports NumPy-style...
Read more >
dask.array.Array
dask.array.Array¶ ... A parallel nd-array comprised of many numpy arrays arranged in a grid. This constructor is for advanced uses only. For normal...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found