Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicate check_finite when calling scipy.linalg functions

See original GitHub issue

Most functions in scipy.linalg functions (e.g. svd, qr, eig, eigh, pinv, pinv2 …) have a default kwarg check_finite=True that we typically leave to the default value in scikit-learn.

As we already validate the input data for most estimators in scikit-learn, this check is redundant and can cause significant overhead, especially at predict / transform time. We should probably always call those method with an explicit check_finite=False in scikit-learn.

This issue shall probably be addressed in many PRs, probably one per module that imports scipy.linalg.

We should still make sure that the estimators raise a ValueError with the expected error message when fed with numpy arrays with infinite some values (-np.inf, np.inf or np.nan). This can be done manually by calling sklearn.utils.estimator_checks.check_estimators_nan_inf on the estimator, which should be automatically be called by sklearn.tests.test_common but we need to check that it’s actually the case when reviewing such PRs.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:34 (17 by maintainers)

Top GitHub Comments

7reactions

ogriselcommented, Nov 16, 2020

Please do not discuss who is working on what in the comments of this issue. Instead try to read the source code of the functions I mentioned in the description, consider first working on a module that has a few occurrences of the scipy.linalg function and open a first pull request that mentions this issue number in the description (#18837 in this case) to automatically link your PR to this issue and the module name in the title, e.g. “Use check_finite=False in sklearn.module_name”. Then automatically we will see who is working on what in this issue and can help individual contributors in the review of their own PR.

If this is your first time contribution, don’t forget to follow the guide (including the video tutorials):

https://scikit-learn.org/dev/developers/contributing.html#contributing-code

3reactions

thomasjpfancommented, Dec 4, 2020

In general, I think we should only set check_finite=False only if we are completely sure that the input is checked before.

Top Results From Across the Web

Linear Algebra (scipy.linalg) — SciPy v1.9.3 Manual

This function takes a rank-1 (vectors) or a rank-2 (matrices) array and an optional order argument (default is 2). Based on these inputs,...

Why do NumPy and SciPy have a lot of the same functions ...

Most new > features belong in SciPy rather than NumPy. So yes, the duplicates are for backwards compatibility. In general, they give the...

NumPy for MATLAB users

MATLAB NumPy Notes ndims(a) np.ndim(a) or a.ndim number of dimensions of array a numel(a) np.size(a) or a.size number of elements of array a size(a) np.shape(a) or...

BLAS and LAPACK - Scientific Computing with Python

We've seen a bit of dense linear algebra using numpy and scipy. ... of mathematical functions (including BLAS and LAPACK) which is optimized...