question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug: VotingClassifier does not support estimators using class_weight

See original GitHub issue

When using VotingClassifier, the estimators used inside cannot use the class_weight parameter. The reason for that, I believe, is that VotingClassifier is encoding the labels before passing them to the single estimators, but not encoding the labels in class_weight. This throws a ValueError: Class label <> not present.

As a matter of fact, I wonder why this encoding would even happen in the first place, as it is handled by each model individually.

Here is a reproducible example:

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[1,2], [1,3], [2,1], [2,3], [2,3]])
y = np.array(["a", "b", "a", "b", "a"])

LR = LogisticRegression(class_weight={"a":1, "b": 2})
VC = VotingClassifier([("LR", LR)])

VC.fit(X, y)

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-3ae1f62ef011> in <module>
      9 VC = VotingClassifier([("LR", LR)])
     10 
---> 11 VC.fit(X, y)

/opt/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in fit(self, X, y, sample_weight)
    220         transformed_y = self.le_.transform(y)
    221 
--> 222         return super().fit(X, transformed_y, sample_weight)
    223 
    224     def predict(self, X):

/opt/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in fit(self, X, y, sample_weight)
     66                 delayed(_parallel_fit_estimator)(clone(clf), X, y,
     67                                                  sample_weight=sample_weight)
---> 68                 for clf in clfs if clf not in (None, 'drop')
     69             )
     70 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1002             # remaining jobs.
   1003             self._iterating = False
-> 1004             if self.dispatch_one_batch(iterator):
   1005                 self._iterating = self._original_iterator is not None
   1006 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    833                 return False
    834             else:
--> 835                 self._dispatch(tasks)
    836                 return True
    837 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
    752         with self._lock:
    753             job_idx = len(self._jobs)
--> 754             job = self._backend.apply_async(batch, callback=cb)
    755             # A job can complete so quickly than its callback is
    756             # called before we get here, causing self._jobs to

/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
    207     def apply_async(self, func, callback=None):
    208         """Schedule a func to be run"""
--> 209         result = ImmediateResult(func)
    210         if callback:
    211             callback(result)

/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
    588         # Don't delay the application, to avoid keeping the input
    589         # arguments in memory
--> 590         self.results = batch()
    591 
    592     def get(self):

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/opt/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_base.py in _parallel_fit_estimator(estimator, X, y, sample_weight)
     34             raise
     35     else:
---> 36         estimator.fit(X, y)
     37     return estimator
     38 

/opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py in fit(self, X, y, sample_weight)
   1599                       penalty=penalty, max_squared_sum=max_squared_sum,
   1600                       sample_weight=sample_weight)
-> 1601             for class_, warm_start_coef_ in zip(classes_, warm_start_coef))
   1602 
   1603         fold_coefs_, _, n_iter_ = zip(*fold_coefs_)

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1002             # remaining jobs.
   1003             self._iterating = False
-> 1004             if self.dispatch_one_batch(iterator):
   1005                 self._iterating = self._original_iterator is not None
   1006 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    833                 return False
    834             else:
--> 835                 self._dispatch(tasks)
    836                 return True
    837 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
    752         with self._lock:
    753             job_idx = len(self._jobs)
--> 754             job = self._backend.apply_async(batch, callback=cb)
    755             # A job can complete so quickly than its callback is
    756             # called before we get here, causing self._jobs to

/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
    207     def apply_async(self, func, callback=None):
    208         """Schedule a func to be run"""
--> 209         result = ImmediateResult(func)
    210         if callback:
    211             callback(result)

/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
    588         # Don't delay the application, to avoid keeping the input
    589         # arguments in memory
--> 590         self.results = batch()
    591 
    592     def get(self):

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py in _logistic_regression_path(X, y, pos_class, Cs, fit_intercept, max_iter, tol, verbose, solver, coef, class_weight, dual, penalty, intercept_scaling, multi_class, random_state, check_input, max_squared_sum, sample_weight, l1_ratio)
    841     le = LabelEncoder()
    842     if isinstance(class_weight, dict) or multi_class == 'multinomial':
--> 843         class_weight_ = compute_class_weight(class_weight, classes, y)
    844         sample_weight *= class_weight_[le.fit_transform(y)]
    845 

/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/class_weight.py in compute_class_weight(class_weight, classes, y)
     63             i = np.searchsorted(classes, c)
     64             if i >= len(classes) or classes[i] != c:
---> 65                 raise ValueError("Class label {} not present.".format(c))
     66             else:
     67                 weight[i] = class_weight[c]

ValueError: Class label a not present.

This issue was noticed here.

System: python: 3.7.6 machine: macOS

sklearn: 0.22.1 numpy: 1.18.1 scipy: 1.4.1

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
thomasjpfancommented, Oct 30, 2020

For my solution to work generally, VotingClassifier would have to look into its estimators to see if there are class_weights and then encode it properly for the estimators. This feels like a fairly heavy handed approach.

I this is another use case where its inconvenient to have string labels for classes.

0reactions
MaxwellLZHcommented, Mar 31, 2021

For my solution to work generally, VotingClassifier would have to look into its estimators to see if there are class_weights and then encode it properly for the estimators. This feels like a fairly heavy handed approach.

I this is another use case where its inconvenient to have string labels for classes.

Hi Thomas, I took a try fixing with the method you proposed 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to use voting classifier when the prediction results ...
When classifiers for outlier detection are used in the Voting Classifier(voting type-hard), the following error occurs.
Read more >
using class weights with sklearn votingClassifier
My target variable is binary and has two category. I implemented Random Forest and Logistic Regression by assigning class_weights as parameter.
Read more >
EnsembleVoteClassifier: A majority voting classifier - mlxtend
Sequence of weights ( float or int ) to weight the occurances of predicted class labels ( hard voting) or class probabilities before...
Read more >
sklearn.ensemble.VotingClassifier
VotingClassifier (estimators, *, voting='hard', weights=None, n_jobs=None, ... Using None was deprecated in 0.22 and support was removed in 0.24.
Read more >
MATLAB fitensemble - Classification - MathWorks
Y is the response variable that is not in Tbl . example. Mdl = fitensemble( X , Y , Method , NLearn ,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found