Getting “nan” with cross_val_score and StackingClassifier or Voting Classifier
See original GitHub issueI want to use StackingClassifier & VotingClassifier with StratifiedKFold & cross_val_score. I am getting nan values in cross_val_score if I use StackingClassifier or VotingClassifier. If I use any other algorithm instead of StackingClassifier or VotingClassifier, cross_val_score works fine. I am using python 3.8.5 & sklearn 0.23.2.
Jupyter notebook attached StackingVotingClassifierIssue.ipynb.
Dataset attached Parkinsons.csv. I had moved the status column (which is the target feature) to the right most end in parkinsons.csv.
Dataset can also be found at this Kaggle Link
Below is the full code and the output.
StackingVotingClassifierIssue.zip
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn import preprocessing
from sklearn import metrics
from sklearn import model_selection
from sklearn import feature_selection
from imblearn.over_sampling import SMOTE
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier
import warnings
warnings.filterwarnings('ignore')
dataset = pd.read_csv('parkinsons.csv')
FS_X=dataset.iloc[:,:-1]
FS_y=dataset.iloc[:,-1:]
FS_X.drop(['name'],axis=1,inplace=True)
select_k_best = feature_selection.SelectKBest(score_func=feature_selection.f_classif,k=15)
X_k_best = select_k_best.fit_transform(FS_X,FS_y)
supportList = select_k_best.get_support().tolist()
p_valuesList = select_k_best.pvalues_.tolist()
toDrop=[]
for i in np.arange(len(FS_X.columns)):
bool = supportList[i]
if(bool == False):
toDrop.append(FS_X.columns[i])
FS_X.drop(toDrop,axis=1,inplace=True)
smote = SMOTE(random_state=7)
Balanced_X,Balanced_y = smote.fit_sample(FS_X,FS_y)
before = pd.merge(FS_X,FS_y,right_index=True, left_index=True)
after = pd.merge(Balanced_X,Balanced_y,right_index=True, left_index=True)
b=before['status'].value_counts()
a=after['status'].value_counts()
print('Before')
print(b)
print('After')
print(a)
SkFold = model_selection.StratifiedKFold(n_splits=10, random_state=7, shuffle=False)
estimators_list = list()
KNN = KNeighborsClassifier()
RF = RandomForestClassifier(criterion='entropy',random_state = 1)
DT = DecisionTreeClassifier(criterion='entropy',random_state = 1)
GNB = GaussianNB()
LR = LogisticRegression(random_state = 1)
estimators_list.append(LR)
estimators_list.append(RF)
estimators_list.append(DT)
estimators_list.append(GNB)
SCLF = StackingClassifier(estimators = estimators_list,final_estimator = KNN,stack_method = 'predict_proba',cv=SkFold,n_jobs = -1)
VCLF = VotingClassifier(estimators = estimators_list,voting = 'soft',n_jobs = -1)
scores1 = model_selection.cross_val_score(estimator = SCLF,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('StackingClassifier Scores',scores1)
scores2 = model_selection.cross_val_score(estimator = VCLF,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('VotingClassifier Scores',scores2)
scores3 = model_selection.cross_val_score(estimator = DT,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('DecisionTreeClassifier Scores',scores3)
Output
Before
1 147
0 48
Name: status, dtype: int64
After
1 147
0 147
Name: status, dtype: int64
StackingClassifier Scores [nan nan nan nan nan nan nan nan nan nan]
VotingClassifier Scores [nan nan nan nan nan nan nan nan nan nan]
DecisionTreeClassifier Scores [0.86666667 0.9 0.93333333 0.86666667 0.96551724 0.82758621
0.75862069 0.86206897 0.86206897 0.93103448]
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Getting "nan" with cross_val_score and StackingClassifier or ...
I want to use StackingClassifier & VotingClassifier with StratifiedKFold & cross_val_score. I am getting nan values in cross_val_score if I ...
Read more >StackingCVRegressor: stacking with cross-validation for ...
An ensemble-learning meta-regressor for stacking regression ... If True, allow multi-output targets, but forbid nan or inf values.
Read more >1.11. Ensemble methods — scikit-learn 1.2.0 documentation
from sklearn.ensemble import HistGradientBoostingClassifier >>> import numpy as np >>> X = np.array([0, 1, 2, np.nan]).reshape(-1, 1) >>> y = [0, 0, 1, ......
Read more >Ensemble Learning Techniques Tutorial - Kaggle
150, NaN, NaN, 0. 160, 138647.380952, 146000.0, 63. 180, 102300.000000, 88500.0, 10. 190, 129613.333333, 128250.0, 30. 20, 185224.811567, 159250.0, 536.
Read more >ValueError: Input contains NaN, infinity or a value too large for ...
With np.isnan(X) you get a boolean mask back with True for positions containing NaN s. With np.where(np.isnan(X)) you get back a tuple with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is because your
cross_val_score
is raising an internal error. By default we are permissive and replace the score by anan
. To get the traceback, you need to passerror_score="raise"
and in your case I am getting:Hi @glemaitre , Thank you very much for the response. Got the issue. It worked. I feel like a fool. 😃 And regarding SMOTE thing, this was just a minimal working example so that anyone who looks this issue would just need to download and run the code. But thank you for this.