Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scalar fit_params no longer handled. Was: Singleton array (insert value here) cannot be considered a valid collection.

See original GitHub issue

Description

TypeError: Singleton array array(True) cannot be considered a valid collection.

Steps/Code to Reproduce

Found when running RandomizedSearchCV with LightGBM. Previously worked fine. Latest update requires that all the **fit_params be checked for ‘slicability’. Difficult when some fit params are things like early_stopping_rounds = 5.

#Import the modules
import lightgbm as lgb
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV

#Create parameters grid
#Create fixed parameters
mod_fixed_params = {
    'boosting_type':'gbdt'
    ,'random_state':0
    ,'silent':False
    ,'objective':'multiclass'
    ,'num_class':np.unique(y_train)
    ,'min_samples_split':200 #Should be between 0.5-1% of samples
    ,'min_samples_leaf':50
    ,'subsample':0.8
}
search_params = {
    'fixed':{
        'cv':3
        ,'n_iter':80
        ,'verbose':True
        ,'random_state':0
    }
    ,'variable':{
        'learning_rate':[0.1,0.01,0.005]
        ,'num_leaves':np.linspace(10,1010,100,dtype=int)
        ,'max_depth':np.linspace(2,22,10,dtype=int)
    }
}
fit_params = {
    'verbose':True
    ,'eval_set':[(X_valid,y_valid)]
    ,'eval_metric':lgbm_custom_loss
    ,'early_stopping_rounds':5
}

#Setup the model
lgb_mod = lgb.LGBMClassifier(**mod_fixed_params)
#Add the search grid
seed = np.random.seed(0)
gbm = RandomizedSearchCV(lgb_mod,search_params['variable'],**search_params['fixed'])
#Fit the model
gbm.fit(X_train,y_train,**fit_params)
print('Best parameters found by grid search are: {}'.format(gbm.best_params_))

I’ve traced the error through and it starts in model_selection/_search.py ln652

–>

Expected Results

Expected to run the LightGBM wthrough RandomSearchGrid

Actual Results

TypeError: Singleton array array(True) cannot be considered a valid collection.

Versions

Issue Analytics

State:
Created 4 years ago
Comments:17 (13 by maintainers)

Top GitHub Comments

3reactions

jnothmancommented, Dec 11, 2019

I’m really not sure if this is a regression on our side though.

It’s been raised 2-3 times in the couple of weeks since 0.22 was released, and not before.

In 0.21.X:

https://github.com/scikit-learn/scikit-learn/blob/ee328faa3601b40944ad43e28bce71860d39f2de/sklearn/model_selection/_search.py#L630-L632

in 0.22.X

https://github.com/scikit-learn/scikit-learn/blob/bf24c7e3d6d768dddbfad3c26bb3f23bc82c0a18/sklearn/model_selection/_search.py#L650-L654

early_stopping_rounds does very much sound like an init parameter in our conventions

It does. But we have tacitly supported this behaviour for many, many releases and have changed the behaviour without warning. The support is more than tacit in the sense that _fit_and_score explicitly makes use of a helper that bypasses fit params that are not samplewise:

https://github.com/scikit-learn/scikit-learn/blob/bf24c7e3d6d768dddbfad3c26bb3f23bc82c0a18/sklearn/model_selection/_validation.py#L940-L944

Thus the previous behaviour could be understood as supported and intended behaviour, even though it was untested (with respect to search at least).

Yes, we can change behaviour around things that do not conform to our conventions, but the change was introduced by @amueller in #14702 and was incidental to that PR. If we are going to change our handling of popular if non-conforming estimators, it should be done intentionally, and incidental changes should indeed be reverted in patch releases, IMO.

2reactions

NicolasHugcommented, Dec 12, 2019

Let’s not deprecate non-aligned fit_params just yet. We need to carefully think about it first. Non-aligned fit_params is one proposition to implement the new warm start API https://github.com/scikit-learn/scikit-learn/pull/15105

We might also want to add feature-aligned params in the future, who knows