question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

More generic estimator checks

See original GitHub issue

Description

While trying to roll my own estimator as part of kernelmethods, I ran into a failed estimator check check_non_transformer_estimators_n_iter.

This function is not supposed to run the check (as my estimator is derived from SVR) according to code below:

    # These models are dependent on external solvers like
    # libsvm and accessing the iter parameter is non-trivial.
    not_run_check_n_iter = ['Ridge', 'SVR', 'NuSVR', 'NuSVC',
                            'RidgeClassifier', 'SVC', 'RandomizedLasso',
                            'LogisticRegressionCV', 'LinearSVC',
                            'LogisticRegression']

    # Tested in test_transformer_n_iter
    not_run_check_n_iter += CROSS_DECOMPOSITION
    if name in not_run_check_n_iter:
        return

However, given my estimator name (OptimalKernelSVR) is not actually in the list, it goes ahead and runs the check which is irrelevant to my estimator. To prevent this, I suggest making these checks less specific to sklearn’s native estimators and more amenable to external libraries. Some possible solutions are:

  1. change if name in not_run_check_n_iter: to
for no_test_name in not_run_check_n_iter:
    if name.lower() in no_test_name.lower():
        return 
  1. Allow passing some flags to check_estimator skip a specific test in here

Steps/Code to Reproduce

  1. Clone and install kernelmethods in dev mode

  2. Run this code

from kernelmethods.algorithms import OptimalKernelSVR
from sklearn.utils.estimator_checks import check_estimator
OKSVR = OptimalKernelSVR(k_bucket='light')
check_estimator(OKSVR)

Versions

System:
    python: 3.7.2 (default, Dec 29 2018, 00:00:04)  [Clang 4.0.1 (tags/RELEASE_401/final)]
executable: /Users/Reddy/anaconda3/envs/py36/bin/python
   machine: Darwin-18.6.0-x86_64-i386-64bit
BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /Users/Reddy/anaconda3/envs/py36/lib
cblas_libs: mkl_rt, pthread
Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.21.2
     numpy: 1.15.4
     scipy: 1.1.0
    Cython: None
    pandas: 0.24.2

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
raamanacommented, Aug 21, 2019

Another peripheral suggestion: As I ran into multiple fails in check_estimator, I noticed there is too much reliance on ValueError to check if the estimator raises an error when it is supposed to. I also noticed some checks look for a specific string (e.g. Inf or NaN) in the exception message. While this is okay, I think it would be more valuable to raise SKLearnException with a specific tag/flag to identify the errors more precisely.

This also helps us distinguish errors of incorrect usage (e.g. predicting before fitting) vs. incorrect input etc.

0reactions
raamanacommented, Aug 23, 2019

Thanks, would have loved to help. Given there may be multiple facets to this (disassembling the whole check_estimator part of codebase, different versions of check lists for different estimator types etc, and refining them by targeting the tests directly), it is a non-trivial job and I am unfortunately quite busy with my own software, trying to get kernelmethods published in JMLR etc. But I’d be happy to review any PRs in the meantime, as I believe this is important, and am also interested in extending the design of the estimator to allow for more advanced uses (such as passing in covariates etc).

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.utils.estimator_checks.check_estimator
Check if estimator adheres to scikit-learn conventions. This function will run an extensive test-suite for input validation, shapes, etc, making sure that the ......
Read more >
Estimators — sagemaker 2.124.0 documentation
For more information about health check see: ... Bases: sagemaker.estimator.EstimatorBase. A generic Estimator to train using any supplied algorithm.
Read more >
Salary Paycheck Calculator – Calculate Net Income
1. Determine taxable income by deducting any pre-tax contributions to benefits 2. Withhold all applicable taxes (federal, state and local) 3. Deduct any...
Read more >
M-estimator
48 samples of robust M-estimators can be found in a recent review study. More generally, an M-estimator may be defined to be a...
Read more >
Azure Machine Learning Python - Estimator Class
Represents a generic estimator to train data using any supplied framework. DEPRECATED. ... If greater than 1, an MPI distributed job will be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found