Duplicate check_finite when calling scipy.linalg functions
See original GitHub issueMost functions in scipy.linalg
functions (e.g. svd
, qr
, eig
, eigh
, pinv
, pinv2
…) have a default kwarg check_finite=True
that we typically leave to the default value in scikit-learn.
As we already validate the input data for most estimators in scikit-learn, this check is redundant and can cause significant overhead, especially at predict / transform time. We should probably always call those method with an explicit check_finite=False
in scikit-learn.
This issue shall probably be addressed in many PRs, probably one per module that imports scipy.linalg
.
We should still make sure that the estimators raise a ValueError
with the expected error message when fed with numpy arrays with infinite some values (-np.inf
, np.inf
or np.nan
). This can be done manually by calling sklearn.utils.estimator_checks.check_estimators_nan_inf
on the estimator, which should be automatically be called by sklearn.tests.test_common
but we need to check that it’s actually the case when reviewing such PRs.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:34 (17 by maintainers)
Please do not discuss who is working on what in the comments of this issue. Instead try to read the source code of the functions I mentioned in the description, consider first working on a module that has a few occurrences of the
scipy.linalg
function and open a first pull request that mentions this issue number in the description (#18837 in this case) to automatically link your PR to this issue and the module name in the title, e.g. “Use check_finite=False in sklearn.module_name”. Then automatically we will see who is working on what in this issue and can help individual contributors in the review of their own PR.If this is your first time contribution, don’t forget to follow the guide (including the video tutorials):
https://scikit-learn.org/dev/developers/contributing.html#contributing-code
In general, I think we should only set
check_finite=False
only if we are completely sure that the input is checked before.