question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFE/RFECV doesn't work with sample weights

See original GitHub issue

As far as I can tell, sklearn.feature_selection.RFE has no way to pass sample weights to the estimator alongside the data.

I have fixed this in my code with:

index bbe0cda..f5072b2 100644
--- a/sklearn/feature_selection/rfe.py
+++ b/sklearn/feature_selection/rfe.py
@@ -120,7 +120,7 @@ class RFE(BaseEstimator, MetaEstimatorMixin, SelectorMixin):
     def _estimator_type(self):
         return self.estimator._estimator_type

-    def fit(self, X, y):
+    def fit(self, X, y, **fit_params):
         """Fit the RFE model and then the underlying estimator on the selected
            features.

@@ -132,9 +132,9 @@ class RFE(BaseEstimator, MetaEstimatorMixin, SelectorMixin):
         y : array-like, shape = [n_samples]
             The target values.
         """
-        return self._fit(X, y)
+        return self._fit(X, y, **fit_params)

-    def _fit(self, X, y, step_score=None):
+    def _fit(self, X, y, step_score=None, **fit_params):
         X, y = check_X_y(X, y, "csc")
         # Initialization
         n_features = X.shape[1]
@@ -166,7 +166,7 @@ class RFE(BaseEstimator, MetaEstimatorMixin, SelectorMixin):
             if self.verbose > 0:
                 print("Fitting estimator with %d features." % np.sum(support_))

-            estimator.fit(X[:, features], y)
+            estimator.fit(X[:, features], y, **fit_params)

             # Get coefs
             if hasattr(estimator, 'coef_'):

Would this be a worthwhile contribution to scikit-learn?

Versions


In [1]: import platform; print(platform.platform())
Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty

In [2]: import sys; print("Python", sys.version)
('Python', '2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]')

In [3]: import numpy; print("NumPy", numpy.__version__)
('NumPy', '1.11.0')

In [4]: import scipy; print("SciPy", scipy.__version__)
('SciPy', '0.17.1')

In [5]: import sklearn; print("Scikit-Learn", sklearn.__version__)
('Scikit-Learn', '0.18.dev0')

TODO:

  • Add support for sample_weight in RFE
  • Add support for sample_weight in RFECV

Issue Analytics

  • State:open
  • Created 7 years ago
  • Reactions:1
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
nathanwalker-spcommented, Jul 23, 2021

@fbidu thank you so much for this! for what it’s worth I think it would be fairly simple to add this also to RFECV (which just calls RFE), but I understand if that’s outside of the scope of what you’re working on (since you’re already building/trying to merge)

1reaction
glemaitrecommented, Jul 28, 2021

I am reopening this issue and will rename the title. Feel free to open a new PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RFECV Does not Indicate Top 5 Features as Expected
I'm following these scikit-learn docs. My code is at the bottom for reference. The docs example states The following example shows how to ......
Read more >
the error occurred while selecting feature using recursive ...
I tried to rank the feature using recursive feature elimination in sklearn. However, I got this error when using RFE. here are the...
Read more >
1.13. Feature selection — scikit-learn 1.2.0 documentation
Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (...
Read more >
Recursive Feature Elimination (RFE) for Feature Selection in ...
Running the example fits the RFE pipeline on the entire dataset and is then used to make a prediction on a new row...
Read more >
Feature Selection with BorutaPy, RFE and Univariate Feature ...
Boruta library also provides a handy, scikit-learn compatible api for Boruta feature selection algorithm. We will be mainly focusing on techniques mentioned ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found