question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicate check_finite when calling scipy.linalg functions

See original GitHub issue

Most functions in scipy.linalg functions (e.g. svd, qr, eig, eigh, pinv, pinv2 …) have a default kwarg check_finite=True that we typically leave to the default value in scikit-learn.

As we already validate the input data for most estimators in scikit-learn, this check is redundant and can cause significant overhead, especially at predict / transform time. We should probably always call those method with an explicit check_finite=False in scikit-learn.

This issue shall probably be addressed in many PRs, probably one per module that imports scipy.linalg.

We should still make sure that the estimators raise a ValueError with the expected error message when fed with numpy arrays with infinite some values (-np.inf, np.inf or np.nan). This can be done manually by calling sklearn.utils.estimator_checks.check_estimators_nan_inf on the estimator, which should be automatically be called by sklearn.tests.test_common but we need to check that it’s actually the case when reviewing such PRs.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:34 (17 by maintainers)

github_iconTop GitHub Comments

7reactions
ogriselcommented, Nov 16, 2020

Please do not discuss who is working on what in the comments of this issue. Instead try to read the source code of the functions I mentioned in the description, consider first working on a module that has a few occurrences of the scipy.linalg function and open a first pull request that mentions this issue number in the description (#18837 in this case) to automatically link your PR to this issue and the module name in the title, e.g. “Use check_finite=False in sklearn.module_name”. Then automatically we will see who is working on what in this issue and can help individual contributors in the review of their own PR.

If this is your first time contribution, don’t forget to follow the guide (including the video tutorials):

https://scikit-learn.org/dev/developers/contributing.html#contributing-code

3reactions
thomasjpfancommented, Dec 4, 2020

In general, I think we should only set check_finite=False only if we are completely sure that the input is checked before.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Linear Algebra (scipy.linalg) — SciPy v1.9.3 Manual
This function takes a rank-1 (vectors) or a rank-2 (matrices) array and an optional order argument (default is 2). Based on these inputs,...
Read more >
Why do NumPy and SciPy have a lot of the same functions ...
Most new > features belong in SciPy rather than NumPy. So yes, the duplicates are for backwards compatibility. In general, they give the...
Read more >
NumPy for MATLAB users
MATLAB NumPy Notes ndims(a) np.ndim(a) or a.ndim number of dimensions of array a numel(a) np.size(a) or a.size number of elements of array a size(a) np.shape(a) or...
Read more >
BLAS and LAPACK - Scientific Computing with Python
We've seen a bit of dense linear algebra using numpy and scipy. ... of mathematical functions (including BLAS and LAPACK) which is optimized...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found