Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug: VotingClassifier does not support estimators using class_weight

See original GitHub issue

When using VotingClassifier, the estimators used inside cannot use the class_weight parameter. The reason for that, I believe, is that VotingClassifier is encoding the labels before passing them to the single estimators, but not encoding the labels in class_weight. This throws a ValueError: Class label <> not present.

As a matter of fact, I wonder why this encoding would even happen in the first place, as it is handled by each model individually.

Here is a reproducible example:

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[1,2], [1,3], [2,1], [2,3], [2,3]])
y = np.array(["a", "b", "a", "b", "a"])

LR = LogisticRegression(class_weight={"a":1, "b": 2})
VC = VotingClassifier([("LR", LR)])

VC.fit(X, y)

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-3ae1f62ef011> in <module>
      9 VC = VotingClassifier([("LR", LR)])
     10 
---> 11 VC.fit(X, y)

/opt/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in fit(self, X, y, sample_weight)
    220         transformed_y = self.le_.transform(y)
    221 
--> 222         return super().fit(X, transformed_y, sample_weight)
    223 
    224     def predict(self, X):

/opt/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in fit(self, X, y, sample_weight)
     66                 delayed(_parallel_fit_estimator)(clone(clf), X, y,
     67                                                  sample_weight=sample_weight)
---> 68                 for clf in clfs if clf not in (None, 'drop')
     69             )
     70 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1002             # remaining jobs.
   1003             self._iterating = False
-> 1004             if self.dispatch_one_batch(iterator):
   1005                 self._iterating = self._original_iterator is not None
   1006 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    833                 return False
    834             else:
--> 835                 self._dispatch(tasks)
    836                 return True
    837 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
    752         with self._lock:
    753             job_idx = len(self._jobs)
--> 754             job = self._backend.apply_async(batch, callback=cb)
    755             # A job can complete so quickly than its callback is
    756             # called before we get here, causing self._jobs to

/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
    207     def apply_async(self, func, callback=None):
    208         """Schedule a func to be run"""
--> 209         result = ImmediateResult(func)
    210         if callback:
    211             callback(result)

/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
    588         # Don't delay the application, to avoid keeping the input
    589         # arguments in memory
--> 590         self.results = batch()
    591 
    592     def get(self):

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/opt/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_base.py in _parallel_fit_estimator(estimator, X, y, sample_weight)
     34             raise
     35     else:
---> 36         estimator.fit(X, y)
     37     return estimator
     38 

/opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py in fit(self, X, y, sample_weight)
   1599                       penalty=penalty, max_squared_sum=max_squared_sum,
   1600                       sample_weight=sample_weight)
-> 1601             for class_, warm_start_coef_ in zip(classes_, warm_start_coef))
   1602 
   1603         fold_coefs_, _, n_iter_ = zip(*fold_coefs_)

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1002             # remaining jobs.
   1003             self._iterating = False
-> 1004             if self.dispatch_one_batch(iterator):
   1005                 self._iterating = self._original_iterator is not None
   1006 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    833                 return False
    834             else:
--> 835                 self._dispatch(tasks)
    836                 return True
    837 

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
    752         with self._lock:
    753             job_idx = len(self._jobs)
--> 754             job = self._backend.apply_async(batch, callback=cb)
    755             # A job can complete so quickly than its callback is
    756             # called before we get here, causing self._jobs to

/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
    207     def apply_async(self, func, callback=None):
    208         """Schedule a func to be run"""
--> 209         result = ImmediateResult(func)
    210         if callback:
    211             callback(result)

/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
    588         # Don't delay the application, to avoid keeping the input
    589         # arguments in memory
--> 590         self.results = batch()
    591 
    592     def get(self):

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py in _logistic_regression_path(X, y, pos_class, Cs, fit_intercept, max_iter, tol, verbose, solver, coef, class_weight, dual, penalty, intercept_scaling, multi_class, random_state, check_input, max_squared_sum, sample_weight, l1_ratio)
    841     le = LabelEncoder()
    842     if isinstance(class_weight, dict) or multi_class == 'multinomial':
--> 843         class_weight_ = compute_class_weight(class_weight, classes, y)
    844         sample_weight *= class_weight_[le.fit_transform(y)]
    845 

/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/class_weight.py in compute_class_weight(class_weight, classes, y)
     63             i = np.searchsorted(classes, c)
     64             if i >= len(classes) or classes[i] != c:
---> 65                 raise ValueError("Class label {} not present.".format(c))
     66             else:
     67                 weight[i] = class_weight[c]

ValueError: Class label a not present.

This issue was noticed here.

System: python: 3.7.6 machine: macOS

sklearn: 0.22.1 numpy: 1.18.1 scipy: 1.4.1

Issue Analytics

State:
Created 3 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

thomasjpfancommented, Oct 30, 2020

For my solution to work generally, VotingClassifier would have to look into its estimators to see if there are class_weights and then encode it properly for the estimators. This feels like a fairly heavy handed approach.

I this is another use case where its inconvenient to have string labels for classes.

0reactions

MaxwellLZHcommented, Mar 31, 2021

For my solution to work generally, VotingClassifier would have to look into its estimators to see if there are class_weights and then encode it properly for the estimators. This feels like a fairly heavy handed approach.

I this is another use case where its inconvenient to have string labels for classes.

Hi Thomas, I took a try fixing with the method you proposed 😃