question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incremental.predict failing for scipy.sparse input

See original GitHub issue

From https://examples.dask.org/machine-learning/text-vectorization.html

In [14]: predictions = pipe.predict(df['text'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-2acfe3e16558> in <module>
----> 1 predictions = pipe.predict(df['text'])

/usr/local/Caskroom/miniconda/base/envs/dask-dev/lib/python3.8/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
    111
    112             # lambda, but not partial, allows help() to work with update_wrapper
--> 113             out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)  # noqa
    114         else:
    115

/usr/local/Caskroom/miniconda/base/envs/dask-dev/lib/python3.8/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)
    468         for _, name, transform in self._iter(with_final=False):
    469             Xt = transform.transform(Xt)
--> 470         return self.steps[-1][1].predict(Xt, **predict_params)
    471
    472     @available_if(_final_estimator_has("fit_predict"))

~/gh/dask/dask-ml/dask_ml/wrappers.py in predict(self, X)
    315         if isinstance(X, da.Array):
    316             if meta is None:
--> 317                 meta = _get_output_dask_ar_meta_for_estimator(
    318                     _predict, self._postfit_estimator, X
    319                 )

~/gh/dask/dask-ml/dask_ml/wrappers.py in _get_output_dask_ar_meta_for_estimator(model_fn, estimator, input_dask_ar)
    663     # sklearn fails if input array has size size
    664     # It requires at least 1 sample to run successfully
--> 665     ar = np.zeros(
    666         shape=(1, input_dask_ar.shape[1]),
    667         dtype=input_dask_ar.dtype,

TypeError: The `like` argument must be an array-like that implements the `__array_function__` protocol.

Here are some values.

ipdb> pp input_dask_ar
dask.array<_transformer, shape=(nan, 1048576), dtype=float64, chunksize=(nan, 1048576), chunktype=scipy.csr_matrix>
ipdb> pp input_dask_ar._meta
<0x0 sparse matrix of type '<class 'numpy.float64'>'
	with 0 stored elements in Compressed Sparse Row format>
ipdb> pp type(input_dask_ar._meta)
<class 'scipy.sparse.csr.csr_matrix'>

cc @VibhuJawa if you have a chance to look. Maybe we check if the array implements __array_function__. If it doesn’t then… I’m not sure what to do.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, Nov 28, 2021

That might be an acceptable option for now. I wonder if we can catch the TypeError and reraise with a more informative message, indicating that the user can set predict_meta, etc. to handle that?

1reaction
soumyadgcommented, Nov 28, 2021

@VibhuJawa - Awaiting the solution for the same problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Out-of-core processing of sparse CSR arrays - Stack Overflow
In the latest release of scipy has a sparse.savenz . For csr format it uses np.savez to save dict(data=matrix.data, indices=matrix.
Read more >
Incremental Prediction Model of Disk Failures Based on the ...
This paper presents an incremental learning disk failure prediction model using the density metric of edge samples.
Read more >
(PDF) Incremental Prediction Model of Disk Failures Based on the ...
This paper presents an incremental learning disk failure prediction model using the density ... the edge samples of the training set are sparse,...
Read more >
cuML API Reference — cuml 22.12.00 documentation
If True, the imputer mask will be a sparse matrix. If False, the imputer mask will be a numpy array. error_on_newboolean, default=None. If...
Read more >
Tracking cell lineages in 3D by incremental deep learning | eLife
These sparse annotations were sufficient to generate 3D optical flow predictions for the entire dataset (Figure 4—video 1), which significantly ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found