question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement `gufunc(pyfunc, ...)` similar to `numpy.vectorize` but without the vectorize part

See original GitHub issue

This issue was suggested in dask/dask#3109

My current understanding is that numpy.vectorize provides a way to

  1. provide a Python function,
  2. assign it a signature with information about core dimensions,
  3. bind it input data, where two things happen
    • call __array__ufunc__, if present, and/or
    • broadcast loop dimensions (according to signature) of input arrays against each other.
  4. Iterative calls of the Python function over all loop dimension entries

I would like to suggest to implement an additional wrapper, just like numpy.vectorize, which does all the steps above, except step 4). I.e. a Python function could be wrapped as gufunc and given a signature, and when binding input the same Step 3) is applied.

The benefit is, that same data binding methodology and interface is used, if the user already provides a vectorized implementation of Python function. It becomes especially important for interoperability with other libraries, e.g. dask.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Jul 8, 2018

As a concrete example consider Scikit-Learn’s Estimator.predict method. This typically takes an n x m shaped array and produces an n shaped array, but it generally just broadcasts along the first dimension so it’s signature is probably something like (m)->()

It would be useful for Scikit-Learn to have some mechanism to be able to say “this function can be broadcast in the following way” and defer to objects that can do that sort of broadcasting (using the __array_ufunc__ protocol) when they are provided as inputs.

Concretely, it would be nice for downstream projects to be able to say the following:

class Estimator:
    @numpy.broadcastable('(m)->()')
    def predict(self, X):
        ...
        return y

Ideally then if a user provides something like a dask array

estimator = Estimator(...)
estimator.fit(...)

estimator.predict(my_dask_array)

Then ideally the decorated predict method would go through the normal checks for __array_ufunc__ and give control over to the dask array object.

0reactions
mattipcommented, Jul 10, 2018

If we design such a wrapper it should handle these better than the current gufunc mechanism:

  • memory overlap requirements and temporary output buffer allocation
  • specifying requirements for contiguous memory layout
Read more comments on GitHub >

github_iconTop Results From Across the Web

python - NumPy vectorization without the use of numpy.vectorize
My personal definition is that code is vectorized when the array is handed off to a C/FORTRAN function which handles all or almost...
Read more >
numpy.vectorize — NumPy v1.24 Manual
Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or...
Read more >
Python: Np.Vectorize To Return "Float" - ADocLib
An open-source book about numpy vectorization techniques, based on If not stated ... each script should import numpy, scipy and matplotlib as: We'll...
Read more >
Creating NumPy universal functions - Numba
Using the vectorize() decorator, Numba can compile a pure Python function into a ... is to choose different targets for different data sizes...
Read more >
Numpy Vectorization - AskPython
Numpy Vectorization with the numpy.vectorize() function. Numpy vectorize function takes in a python function (pyfunc) and returns a vectorized version of the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found