RFC / API add option to fit/predict without input validation
See original GitHub issueGoal
Ideally, this issue helps to sort out an API.
Describe the workflow you want to enable
I’d like to be able to switch on/off:
- Parameter validation in
fit
- Input array validation in
fit
andpredict
- All other validation steps in
fit
andpredict
(e.g.check_is_fitted
)
Something like
model = RandomForestRegressor(validate_params=False)
model.fit(validate_input=False)
model.predict(validated_input=False)
Note that some estimators like Lasso
already support check_input
in fit
.
The main reason to do so is improved performance.
Additional context
Related issues and PRs are #16653, #20657, #21578 (in particular https://github.com/scikit-learn/scikit-learn/pull/21578#discussion_r745491121) Related discussions: https://github.com/scikit-learn/scikit-learn/discussions/21810
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:8 (7 by maintainers)
Top Results From Across the Web
RFC without importing parameters | SAP Community
It appears to be impossible to call RFCs without importing parameters. Adding a dummy importing parameter eventuall sovles the problem.
Read more >sklearn.ensemble.RandomForestClassifier
To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always...
Read more >PHP RFC: Add validation functions to filter module
This RFC encourage users to proper secure coding by adding more suitable functions and filter for input validations.
Read more >Available CRAN Packages By Name
acled.api, Automated Retrieval of ACLED Conflict Event Data ... airt, Evaluation of Algorithm Collections Using Item Response Theory.
Read more >fit() vs predict() vs fit_predict() in Python scikit-learn
What's the difference between fit(), predict() and fit_predict() in ... on the estimator and then perform parameter and data validation.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Mmh🤔 @glemaitre You have a good point, indeed. The context manager helps. My intuition still is to have it as configurable option in estimators. I would be very interested in the opinion and experience of others.
Also there is an impact with feature names:
If b) is adopted and a meta-estimator is fit on a dataframe, then feature names are extracted by the meta-estimator and then a numpy array without feature names is passed to the base estimators. But sometimes the feature names could be useful for the base estimator (e.g. to specify features to be treated as categorical variables for a HistGBRT model for instance).