Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC / API add option to fit/predict without input validation

See original GitHub issue

Goal

Ideally, this issue helps to sort out an API.

Describe the workflow you want to enable

I’d like to be able to switch on/off:

Parameter validation in fit
Input array validation in fit and predict
All other validation steps in fit and predict (e.g. check_is_fitted)

Something like

model = RandomForestRegressor(validate_params=False)
model.fit(validate_input=False)
model.predict(validated_input=False)

Note that some estimators like Lasso already support check_input in fit.

The main reason to do so is improved performance.

Additional context

Related issues and PRs are #16653, #20657, #21578 (in particular https://github.com/scikit-learn/scikit-learn/pull/21578#discussion_r745491121) Related discussions: https://github.com/scikit-learn/scikit-learn/discussions/21810

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:8 (7 by maintainers)

Top GitHub Comments

1reaction

lorentzenchrcommented, Nov 29, 2021

Mmh🤔 @glemaitre You have a good point, indeed. The context manager helps. My intuition still is to have it as configurable option in estimators. I would be very interested in the opinion and experience of others.

0reactions

ogriselcommented, Nov 30, 2021

Also there is an impact with feature names:

If b) is adopted and a meta-estimator is fit on a dataframe, then feature names are extracted by the meta-estimator and then a numpy array without feature names is passed to the base estimators. But sometimes the feature names could be useful for the base estimator (e.g. to specify features to be treated as categorical variables for a HistGBRT model for instance).