question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make all estimators use `_validate_params`

See original GitHub issue

PR #22722 introduced a common method for the validation of the parameters of an estimator. We now need to use it in all estimators.

Please open one PR per estimator or family of estimators (if one inherits from another). The title of the PR must mention which estimator it’s dealing with. We recommend using the following pattern for titles:

MAINT Parameters validation for <Estimator>

where <Estimator> is a placeholder to be replaced with the Estimator you chose.

The description of the PR must begin with Towards #23462 so that this issue and the PR are mutually crossed-linked.

Steps

  • The estimator must define a class attribute _parameter_constraints that defines the valid types and values for the parameters of the estimator. Do not rely only on the docstring of the estimator to define it: although it can help, it’s important to primarily rely on the implementation to find the valid values because the docstring might not be completely accurate. See how it’s done in KMeans for instance https://github.com/scikit-learn/scikit-learn/blob/02ebf9e68fe1fc7687d9e1047b9e465ae0fd945e/sklearn/cluster/_kmeans.py#L835-L847
  • If the estimator class inherits from a base class that already defines _parameter_constraints, we just need to extend it.
  • Then, the first thing that fit and partial_fit should do is call self._validate_params.
  • All existing simple param validation can now be removed. (simple means that does not depend on the input data or that does not depend on the value of another parameter for instance). Missing removal of such validation should be easy to spot with codecov since they become unreachable code.
  • Tests that checks error messages from simple param validation can also be removed (carefully: we need to keep the tests checking for more complex param validation !).
  • Finally, remove the estimator from the list of skipped estimators for the common param validation test https://github.com/scikit-learn/scikit-learn/blob/ec5d2ed9e5bfb6b0baff57ff1f994310e9a31ad9/sklearn/tests/test_common.py#L448 and make sure the test passes: pytest -vl sklearn/tests/test_common.py -k check_param_validation

Estimators to update

  • ARDRegression
  • AdaBoostClassifier
  • AdaBoostRegressor
  • AdditiveChi2Sampler
  • AffinityPropagation
  • AgglomerativeClustering
  • BaggingClassifier
  • BaggingRegressor
  • BayesianGaussianMixture
  • BayesianRidge
  • BernoulliNB
  • BernoulliRBM
  • Binarizer
  • Birch
  • CCA
  • CalibratedClassifierCV
  • CategoricalNB
  • ClassifierChain
  • ComplementNB
  • CountVectorizer
  • DBSCAN
  • DecisionTreeClassifier
  • DecisionTreeRegressor
  • DictVectorizer
  • DictionaryLearning
  • DummyClassifier
  • DummyRegressor
  • ElasticNet
  • ElasticNetCV
  • EllipticEnvelope
  • EmpiricalCovariance
  • ExtraTreeClassifier
  • ExtraTreeRegressor
  • ExtraTreesClassifier
  • ExtraTreesRegressor
  • FactorAnalysis
  • FastICA
  • FeatureAgglomeration
  • FeatureHasher
  • FunctionTransformer
  • GammaRegressor
  • GaussianMixture
  • GaussianNB
  • GaussianProcessClassifier
  • GaussianProcessRegressor
  • GaussianRandomProjection
  • GenericUnivariateSelect
  • GradientBoostingClassifier
  • GradientBoostingRegressor
  • GraphicalLasso
  • GraphicalLassoCV
  • HashingVectorizer
  • HistGradientBoostingClassifier
  • HistGradientBoostingRegressor
  • HuberRegressor
  • IncrementalPCA
  • IsolationForest
  • Isomap
  • IsotonicRegression
  • IterativeImputer
  • KBinsDiscretizer
  • KNNImputer
  • KNeighborsClassifier
  • KNeighborsRegressor
  • KNeighborsTransformer
  • KernelDensity
  • KernelPCA
  • KernelRidge
  • LabelBinarizer
  • LabelPropagation
  • LabelSpreading
  • Lars
  • LarsCV
  • Lasso
  • LassoCV
  • LassoLars
  • LassoLarsCV
  • LassoLarsIC
  • LatentDirichletAllocation
  • LedoitWolf
  • LinearDiscriminantAnalysis
  • LinearRegression
  • LinearSVC
  • LinearSVR
  • LocalOutlierFactor
  • LocallyLinearEmbedding
  • LogisticRegression
  • LogisticRegressionCV
  • MDS
  • MLPClassifier
  • MLPRegressor
  • MaxAbsScaler
  • MeanShift
  • MinCovDet
  • MinMaxScaler
  • MiniBatchDictionaryLearning
  • MiniBatchNMF
  • MiniBatchSparsePCA
  • MissingIndicator
  • MultiLabelBinarizer
  • MultiOutputClassifier
  • MultiOutputRegressor
  • MultiTaskElasticNet
  • MultiTaskElasticNetCV
  • MultiTaskLasso
  • MultiTaskLassoCV
  • MultinomialNB
  • NMF
  • NearestCentroid
  • NearestNeighbors
  • NeighborhoodComponentsAnalysis
  • Normalizer
  • NuSVC
  • NuSVR
  • Nystroem
  • OAS
  • OPTICS
  • OneClassSVM
  • OneHotEncoder
  • OneVsOneClassifier
  • OneVsRestClassifier
  • OrdinalEncoder
  • OrthogonalMatchingPursuit
  • OrthogonalMatchingPursuitCV
  • OutputCodeClassifier
  • PCA
  • PLSCanonical
  • PLSRegression
  • PLSSVD
  • PassiveAggressiveClassifier
  • PassiveAggressiveRegressor
  • PatchExtractor
  • Perceptron
  • PoissonRegressor
  • PolynomialCountSketch
  • PolynomialFeatures
  • PowerTransformer
  • QuadraticDiscriminantAnalysis
  • QuantileRegressor
  • QuantileTransformer
  • RANSACRegressor
  • RBFSampler
  • RFE
  • RFECV
  • RadiusNeighborsClassifier
  • RadiusNeighborsRegressor
  • RadiusNeighborsTransformer
  • RandomForestClassifier
  • RandomForestRegressor
  • RandomTreesEmbedding
  • RegressorChain
  • Ridge
  • RidgeCV
  • RidgeClassifier
  • RidgeClassifierCV
  • RobustScaler
  • SGDClassifier
  • SGDOneClassSVM
  • SGDRegressor
  • SVC
  • SVR
  • SelectFdr
  • SelectFpr
  • SelectFromModel
  • SelectFwe
  • SelectKBest
  • SelectPercentile
  • SelfTrainingClassifier
  • SequentialFeatureSelector
  • ShrunkCovariance
  • SimpleImputer
  • SkewedChi2Sampler
  • SparsePCA
  • SparseRandomProjection
  • SpectralBiclustering
  • SpectralClustering
  • SpectralCoclustering
  • SpectralEmbedding
  • SplineTransformer
  • StackingClassifier
  • StackingRegressor
  • StandardScaler
  • TSNE
  • TfidfTransformer
  • TfidfVectorizer
  • TheilSenRegressor
  • TransformedTargetRegressor
  • TruncatedSVD
  • TweedieRegressor
  • VarianceThreshold
  • VotingClassifier
  • VotingRegressor

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:118 (107 by maintainers)

github_iconTop GitHub Comments

12reactions
jeremiedbbcommented, Sep 2, 2022

Each estimator is done. We can now close this big meta-issue. Thanks to every contributor and reviewer who worked on this issue !

2reactions
devkaranjoshicommented, Jun 24, 2022

Thank you very much for solving my query. @Jitensid

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.model_selection.cross_validate
Training the estimator and computing the score are parallelized over the cross-validation splits. None means 1 unless in a joblib.parallel_backend context. -1 ...
Read more >
Estimators | TensorFlow Core
All Estimators —pre-made or custom ones—are classes based on the tf.estimator.Estimator class. For a quick example, try Estimator tutorials.
Read more >
The Estimate Review and Validation Process
Verify that all allowances and factors are appropriate for the type of estimate being prepared, and are consistent with comparable projects and estimates....
Read more >
Parametric Estimating in Project Management | Wrike
Parametric estimating is a method of calculating the time, cost, and resources needed for a project. Learn more about parametric estimating ...
Read more >
The Ultimate Guide to Construction Estimating - InEight
Construction estimating is the process of evaluating all costs ... that general contractors use when bidding for construction projects.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found