More generic estimator checks
See original GitHub issueDescription
While trying to roll my own estimator as part of kernelmethods, I ran into a failed estimator check check_non_transformer_estimators_n_iter
.
This function is not supposed to run the check (as my estimator is derived from SVR) according to code below:
# These models are dependent on external solvers like
# libsvm and accessing the iter parameter is non-trivial.
not_run_check_n_iter = ['Ridge', 'SVR', 'NuSVR', 'NuSVC',
'RidgeClassifier', 'SVC', 'RandomizedLasso',
'LogisticRegressionCV', 'LinearSVC',
'LogisticRegression']
# Tested in test_transformer_n_iter
not_run_check_n_iter += CROSS_DECOMPOSITION
if name in not_run_check_n_iter:
return
However, given my estimator name (OptimalKernelSVR
) is not actually in the list, it goes ahead and runs the check which is irrelevant to my estimator. To prevent this, I suggest making these checks less specific to sklearn’s native estimators and more amenable to external libraries. Some possible solutions are:
- change
if name in not_run_check_n_iter:
to
for no_test_name in not_run_check_n_iter:
if name.lower() in no_test_name.lower():
return
- Allow passing some flags to
check_estimator
skip a specific test in here
Steps/Code to Reproduce
-
Clone and install kernelmethods in dev mode
-
Run this code
from kernelmethods.algorithms import OptimalKernelSVR
from sklearn.utils.estimator_checks import check_estimator
OKSVR = OptimalKernelSVR(k_bucket='light')
check_estimator(OKSVR)
Versions
System:
python: 3.7.2 (default, Dec 29 2018, 00:00:04) [Clang 4.0.1 (tags/RELEASE_401/final)]
executable: /Users/Reddy/anaconda3/envs/py36/bin/python
machine: Darwin-18.6.0-x86_64-i386-64bit
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /Users/Reddy/anaconda3/envs/py36/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 19.1.1
setuptools: 41.0.1
sklearn: 0.21.2
numpy: 1.15.4
scipy: 1.1.0
Cython: None
pandas: 0.24.2
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
sklearn.utils.estimator_checks.check_estimator
Check if estimator adheres to scikit-learn conventions. This function will run an extensive test-suite for input validation, shapes, etc, making sure that the ......
Read more >Estimators — sagemaker 2.124.0 documentation
For more information about health check see: ... Bases: sagemaker.estimator.EstimatorBase. A generic Estimator to train using any supplied algorithm.
Read more >Salary Paycheck Calculator – Calculate Net Income
1. Determine taxable income by deducting any pre-tax contributions to benefits 2. Withhold all applicable taxes (federal, state and local) 3. Deduct any...
Read more >M-estimator
48 samples of robust M-estimators can be found in a recent review study. More generally, an M-estimator may be defined to be a...
Read more >Azure Machine Learning Python - Estimator Class
Represents a generic estimator to train data using any supplied framework. DEPRECATED. ... If greater than 1, an MPI distributed job will be...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Another peripheral suggestion: As I ran into multiple fails in check_estimator, I noticed there is too much reliance on
ValueError
to check if the estimator raises an error when it is supposed to. I also noticed some checks look for a specific string (e.g.Inf
orNaN
) in the exception message. While this is okay, I think it would be more valuable to raiseSKLearnException
with a specific tag/flag to identify the errors more precisely.This also helps us distinguish errors of incorrect usage (e.g. predicting before fitting) vs. incorrect input etc.
Thanks, would have loved to help. Given there may be multiple facets to this (disassembling the whole check_estimator part of codebase, different versions of check lists for different estimator types etc, and refining them by targeting the tests directly), it is a non-trivial job and I am unfortunately quite busy with my own software, trying to get kernelmethods published in JMLR etc. But I’d be happy to review any PRs in the meantime, as I believe this is important, and am also interested in extending the design of the estimator to allow for more advanced uses (such as passing in covariates etc).