question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting “nan” with cross_val_score and StackingClassifier or Voting Classifier

See original GitHub issue

I want to use StackingClassifier & VotingClassifier with StratifiedKFold & cross_val_score. I am getting nan values in cross_val_score if I use StackingClassifier or VotingClassifier. If I use any other algorithm instead of StackingClassifier or VotingClassifier, cross_val_score works fine. I am using python 3.8.5 & sklearn 0.23.2.

Jupyter notebook attached StackingVotingClassifierIssue.ipynb. Dataset attached Parkinsons.csv. I had moved the status column (which is the target feature) to the right most end in parkinsons.csv.
Dataset can also be found at this Kaggle Link Below is the full code and the output.

StackingVotingClassifierIssue.zip

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn import preprocessing
from sklearn import metrics
from sklearn import model_selection
from sklearn import feature_selection

from imblearn.over_sampling import SMOTE

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier

import warnings
warnings.filterwarnings('ignore')

dataset = pd.read_csv('parkinsons.csv')


FS_X=dataset.iloc[:,:-1]
FS_y=dataset.iloc[:,-1:]

FS_X.drop(['name'],axis=1,inplace=True)

select_k_best = feature_selection.SelectKBest(score_func=feature_selection.f_classif,k=15)
X_k_best = select_k_best.fit_transform(FS_X,FS_y)

supportList = select_k_best.get_support().tolist()
p_valuesList = select_k_best.pvalues_.tolist()

toDrop=[]

for i in np.arange(len(FS_X.columns)):
    bool = supportList[i]
    if(bool == False):
        toDrop.append(FS_X.columns[i])     

FS_X.drop(toDrop,axis=1,inplace=True)        

smote = SMOTE(random_state=7)
Balanced_X,Balanced_y = smote.fit_sample(FS_X,FS_y)
before = pd.merge(FS_X,FS_y,right_index=True, left_index=True)
after = pd.merge(Balanced_X,Balanced_y,right_index=True, left_index=True)
b=before['status'].value_counts()
a=after['status'].value_counts()
print('Before')
print(b)
print('After')
print(a)

SkFold = model_selection.StratifiedKFold(n_splits=10, random_state=7, shuffle=False)

estimators_list = list()

KNN = KNeighborsClassifier()
RF = RandomForestClassifier(criterion='entropy',random_state = 1)
DT = DecisionTreeClassifier(criterion='entropy',random_state = 1)
GNB = GaussianNB()
LR = LogisticRegression(random_state = 1)

estimators_list.append(LR)
estimators_list.append(RF)
estimators_list.append(DT)
estimators_list.append(GNB)

SCLF = StackingClassifier(estimators = estimators_list,final_estimator = KNN,stack_method = 'predict_proba',cv=SkFold,n_jobs = -1)
VCLF = VotingClassifier(estimators = estimators_list,voting = 'soft',n_jobs = -1)

scores1 = model_selection.cross_val_score(estimator = SCLF,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('StackingClassifier Scores',scores1)

scores2 = model_selection.cross_val_score(estimator = VCLF,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('VotingClassifier Scores',scores2)

scores3 = model_selection.cross_val_score(estimator = DT,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('DecisionTreeClassifier Scores',scores3)

Output

Before
1    147
0     48
Name: status, dtype: int64
After
1    147
0    147
Name: status, dtype: int64
StackingClassifier Scores [nan nan nan nan nan nan nan nan nan nan]
VotingClassifier Scores [nan nan nan nan nan nan nan nan nan nan]
DecisionTreeClassifier Scores [0.86666667 0.9        0.93333333 0.86666667 0.96551724 0.82758621
 0.75862069 0.86206897 0.86206897 0.93103448]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

4reactions
glemaitrecommented, Dec 2, 2020

This is because your cross_val_score is raising an internal error. By default we are permissive and replace the score by a nan. To get the traceback, you need to pass error_score="raise" and in your case I am getting:

ValueError: Setting a random_state has no effect since shuffle is False. You should leave random_state to its default (None), or set shuffle=True.
0reactions
tusharkhochecommented, Dec 2, 2020

Hi @glemaitre , Thank you very much for the response. Got the issue. It worked. I feel like a fool. 😃 And regarding SMOTE thing, this was just a minimal working example so that anyone who looks this issue would just need to download and run the code. But thank you for this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting "nan" with cross_val_score and StackingClassifier or ...
I want to use StackingClassifier & VotingClassifier with StratifiedKFold & cross_val_score. I am getting nan values in cross_val_score if I ...
Read more >
StackingCVRegressor: stacking with cross-validation for ...
An ensemble-learning meta-regressor for stacking regression ... If True, allow multi-output targets, but forbid nan or inf values.
Read more >
1.11. Ensemble methods — scikit-learn 1.2.0 documentation
from sklearn.ensemble import HistGradientBoostingClassifier >>> import numpy as np >>> X = np.array([0, 1, 2, np.nan]).reshape(-1, 1) >>> y = [0, 0, 1, ......
Read more >
Ensemble Learning Techniques Tutorial - Kaggle
150, NaN, NaN, 0. 160, 138647.380952, 146000.0, 63. 180, 102300.000000, 88500.0, 10. 190, 129613.333333, 128250.0, 30. 20, 185224.811567, 159250.0, 536.
Read more >
ValueError: Input contains NaN, infinity or a value too large for ...
With np.isnan(X) you get a boolean mask back with True for positions containing NaN s. With np.where(np.isnan(X)) you get back a tuple with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found