Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incremental.predict failing for scipy.sparse input

See original GitHub issue

From https://examples.dask.org/machine-learning/text-vectorization.html

In [14]: predictions = pipe.predict(df['text'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-2acfe3e16558> in <module>
----> 1 predictions = pipe.predict(df['text'])

/usr/local/Caskroom/miniconda/base/envs/dask-dev/lib/python3.8/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
    111
    112             # lambda, but not partial, allows help() to work with update_wrapper
--> 113             out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)  # noqa
    114         else:
    115

/usr/local/Caskroom/miniconda/base/envs/dask-dev/lib/python3.8/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)
    468         for _, name, transform in self._iter(with_final=False):
    469             Xt = transform.transform(Xt)
--> 470         return self.steps[-1][1].predict(Xt, **predict_params)
    471
    472     @available_if(_final_estimator_has("fit_predict"))

~/gh/dask/dask-ml/dask_ml/wrappers.py in predict(self, X)
    315         if isinstance(X, da.Array):
    316             if meta is None:
--> 317                 meta = _get_output_dask_ar_meta_for_estimator(
    318                     _predict, self._postfit_estimator, X
    319                 )

~/gh/dask/dask-ml/dask_ml/wrappers.py in _get_output_dask_ar_meta_for_estimator(model_fn, estimator, input_dask_ar)
    663     # sklearn fails if input array has size size
    664     # It requires at least 1 sample to run successfully
--> 665     ar = np.zeros(
    666         shape=(1, input_dask_ar.shape[1]),
    667         dtype=input_dask_ar.dtype,

TypeError: The `like` argument must be an array-like that implements the `__array_function__` protocol.

Here are some values.

ipdb> pp input_dask_ar
dask.array<_transformer, shape=(nan, 1048576), dtype=float64, chunksize=(nan, 1048576), chunktype=scipy.csr_matrix>
ipdb> pp input_dask_ar._meta
<0x0 sparse matrix of type '<class 'numpy.float64'>'
	with 0 stored elements in Compressed Sparse Row format>
ipdb> pp type(input_dask_ar._meta)
<class 'scipy.sparse.csr.csr_matrix'>

cc @VibhuJawa if you have a chance to look. Maybe we check if the array implements __array_function__. If it doesn’t then… I’m not sure what to do.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

TomAugspurgercommented, Nov 28, 2021

That might be an acceptable option for now. I wonder if we can catch the TypeError and reraise with a more informative message, indicating that the user can set predict_meta, etc. to handle that?

1reaction

soumyadgcommented, Nov 28, 2021

@VibhuJawa - Awaiting the solution for the same problem.