question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC / API add option to fit/predict without input validation

See original GitHub issue

Goal

Ideally, this issue helps to sort out an API.

Describe the workflow you want to enable

I’d like to be able to switch on/off:

  1. Parameter validation in fit
  2. Input array validation in fit and predict
  3. All other validation steps in fit and predict (e.g. check_is_fitted)

Something like

model = RandomForestRegressor(validate_params=False)
model.fit(validate_input=False)
model.predict(validated_input=False)

Note that some estimators like Lasso already support check_input in fit.

The main reason to do so is improved performance.

Additional context

Related issues and PRs are #16653, #20657, #21578 (in particular https://github.com/scikit-learn/scikit-learn/pull/21578#discussion_r745491121) Related discussions: https://github.com/scikit-learn/scikit-learn/discussions/21810

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
lorentzenchrcommented, Nov 29, 2021

Mmh🤔 @glemaitre You have a good point, indeed. The context manager helps. My intuition still is to have it as configurable option in estimators. I would be very interested in the opinion and experience of others.

0reactions
ogriselcommented, Nov 30, 2021

Also there is an impact with feature names:

If b) is adopted and a meta-estimator is fit on a dataframe, then feature names are extracted by the meta-estimator and then a numpy array without feature names is passed to the base estimators. But sometimes the feature names could be useful for the base estimator (e.g. to specify features to be treated as categorical variables for a HistGBRT model for instance).

Read more comments on GitHub >

github_iconTop Results From Across the Web

RFC without importing parameters | SAP Community
It appears to be impossible to call RFCs without importing parameters. Adding a dummy importing parameter eventuall sovles the problem.
Read more >
sklearn.ensemble.RandomForestClassifier
To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always...
Read more >
PHP RFC: Add validation functions to filter module
This RFC encourage users to proper secure coding by adding more suitable functions and filter for input validations.
Read more >
Available CRAN Packages By Name
acled.api, Automated Retrieval of ACLED Conflict Event Data ... airt, Evaluation of Algorithm Collections Using Item Response Theory.
Read more >
fit() vs predict() vs fit_predict() in Python scikit-learn
What's the difference between fit(), predict() and fit_predict() in ... on the estimator and then perform parameter and data validation.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found