Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Same observation being generatad

See original GitHub issue

Hi,

I tried to run the code below to optimize a XGBoost classifier, but get stuck with same observation being tested all time. I expected some new observation being generated… or am I wrong?

Console output (after initial points generated). Notice that all iterations generates the same observation:

('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration:   1 | Last sampled value:   -0.680226 | with parameters:  {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
               | Current maximum:      -0.245901 | with parameters:  {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
               | Time taken: 0 minutes and 10.953415 seconds

('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration:   2 | Last sampled value:   -0.680226 | with parameters:  {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
               | Current maximum:      -0.245901 | with parameters:  {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
               | Time taken: 0 minutes and 10.790525 seconds

('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration:   3 | Last sampled value:   -0.680226 | with parameters:  {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
               | Current maximum:      -0.245901 | with parameters:  {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
               | Time taken: 0 minutes and 10.6884 seconds

Full code for the program (uses the xgboost library)

import xgboost as xgb
from bayes_opt import BayesianOptimization
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=2500, n_features=45, n_informative=12, n_redundant=7, n_classes=2, random_state=42)


def xgbcv(max_depth, eta, colsample_bytree, subsample, num_round):
    print("XGB", locals())

    dtrain = xgb.DMatrix(X, label=y)

    params = {
        'booster': 'gbtree',
        'objective': 'multi:softprob',
        'silent': 1,
        'max_depth': int(round(max_depth)),
        'eta': eta,
        'colsample_bytree': colsample_bytree,
        'subsample': subsample,
        'num_class': 2,
        'eval_metric': 'mlogloss',
        'seed': 42
    }

    r = xgb.cv(params, dtrain, int(round(num_round)), nfold=4, metrics={'mlogloss'}, seed=45, show_stdv=False)

    return -r['test-mlogloss-mean'].mean()


xgbBO = BayesianOptimization(xgbcv, {
    'max_depth': (2, 6),
    'eta': (0.01, 0.8),
    'colsample_bytree': (0.25, 1.0),
    'subsample': (0.25, 1.0),
    'num_round': (20, 30),
}, verbose=True)

xgbBO.maximize(init_points=32, n_iter=6)

Thanks in advice!

Issue Analytics

State:
Created 8 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

5reactions

Erotemiccommented, Dec 7, 2017

I’m also seeing this “edge obsession” with on e of my params (alpha, seed_thresh, and mask_thresh).

The random / given initialization points give a good sample of the space, but once I get to the maximization portion of the code, the algorithm always chooses alpha=0 or alpha=1 and seed_thersh/mask_thresh =.4 or .9.

I’m using UCB with kappas of 10, 5, and 1. My scores are only positive and are fairly well behaved.

Initialization
--------------------------------------------------------------------------------------------------------
 Step |   Time |      Value |     alpha |   mask_thresh |   min_seed_size |   min_size |   seed_thresh | 
    1 | 00m22s |    0.81479 |    0.8800 |        0.9000 |        100.0000 |   100.0000 |        0.4000 |                               
    2 | 00m22s |    0.82484 |    0.8800 |        0.8367 |         97.0000 |    33.0000 |        0.4549 |                               
    3 | 00m22s |    0.82484 |    0.8800 |        0.8367 |         97.0000 |    33.0000 |        0.4549 |                               
    4 | 00m23s |    0.80596 |    0.8800 |        0.7664 |         48.5327 |    61.8757 |        0.4090 |                               
    5 | 00m23s |    0.82962 |    0.8800 |        0.6666 |         81.5941 |    13.2919 |        0.4241 |                               
    6 | 00m22s |    0.70743 |    0.7219 |        0.7437 |         17.5233 |    28.8181 |        0.6414 |                               
    7 | 00m21s |    0.43979 |    0.2976 |        0.5215 |         94.8511 |    64.3517 |        0.8054 |                               
    8 | 00m23s |    0.84768 |    0.2408 |        0.6120 |         20.9162 |    32.0568 |        0.5938 |                               
    9 | 00m22s |    0.81603 |    0.6403 |        0.7360 |         24.6371 |    89.1438 |        0.5964 |                               
   10 | 00m24s |    0.82895 |    0.4123 |        0.4659 |         63.0934 |    10.2661 |        0.5906 |                               
   11 | 00m22s |    0.77536 |    0.1803 |        0.7268 |         12.2180 |    69.7986 |        0.7694 |                               
   12 | 00m23s |    0.71786 |    0.9697 |        0.7017 |          1.7283 |    87.1418 |        0.4590 |                               
   13 | 00m19s |    0.14442 |    0.4860 |        0.5708 |         80.0456 |    42.1833 |        0.8415 |                               
   14 | 00m22s |    0.80979 |    0.1810 |        0.8648 |          1.5454 |    53.7144 |        0.6080 |                               
   15 | 00m20s |    0.21012 |    0.9539 |        0.5251 |         94.5773 |     1.5600 |        0.7119 |                               
   16 | 00m19s |    0.15580 |    0.9824 |        0.6439 |         24.2936 |    56.2465 |        0.7527 |                               
   17 | 00m21s |    0.84999 |    0.6045 |        0.8915 |         95.3123 |    24.6991 |        0.4303 |                               
   18 | 00m18s |    0.07305 |    0.7312 |        0.8213 |         56.5674 |    86.4971 |        0.8207 |                               
   19 | 00m23s |    0.85359 |    0.1550 |        0.7519 |         28.8857 |    32.4800 |        0.5863 |                               
   20 | 00m24s |    0.82244 |    0.2414 |        0.4381 |         82.6430 |    14.5005 |        0.6036 |                               
   21 | 00m22s |    0.81988 |    0.5954 |        0.8685 |          3.5614 |    54.1788 |        0.4786 |                               
   22 | 00m20s |    0.18643 |    0.7339 |        0.4441 |         73.3577 |    27.7940 |        0.7647 |                               
   23 | 00m22s |    0.83862 |    0.6037 |        0.7404 |         53.4283 |    99.3464 |        0.5586 |                               
   24 | 00m17s |    0.01051 |    0.8708 |        0.7362 |         95.7069 |    58.4163 |        0.8590 |                               
   25 | 00m21s |    0.61327 |    0.3797 |        0.7900 |          9.6831 |    96.0789 |        0.7906 |                               
seeded {'max_params': {'alpha': 0.1550, 'mask_thresh': 0.7519, 'min_seed_size': 28.8857, 'min_size': 32.4800, 'seed_thresh': 0.5863}, 'max_val': 0.8536}
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
 Step |   Time |      Value |     alpha |   mask_thresh |   min_seed_size |   min_size |   seed_thresh | 
   26 | 00m41s |    0.68458 |    0.0000 |        0.4000 |          0.0000 |     0.0000 |        0.4000 |                               
   27 | 00m33s |    0.84261 |    0.0000 |        0.4000 |         33.1328 |     0.0000 |        0.4000 |                               
   28 | 00m31s |    0.46382 |    0.0000 |        0.4000 |         77.6450 |   100.0000 |        0.9000 |                               
   29 | 00m32s |    0.85606 |    0.0000 |        0.9000 |         72.6044 |    67.5066 |        0.4000 |                               
   30 | 00m36s |    0.85364 |    0.0000 |        0.4000 |         52.9370 |    40.6417 |        0.4000 |                               
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
 Step |   Time |      Value |     alpha |   mask_thresh |   min_seed_size |   min_size |   seed_thresh | 
   31 | 00m41s |    0.86287 |    0.0000 |        0.9000 |        100.0000 |    83.0868 |        0.4000 |                               
   32 | 00m32s |    0.57726 |    0.0000 |        0.4000 |         35.0292 |   100.0000 |        0.9000 |                               
   33 | 00m25s |    0.00070 |    1.0000 |        0.9000 |         51.8428 |     0.0000 |        0.9000 |                               
   34 | 00m24s |    0.00067 |    1.0000 |        0.9000 |         16.7209 |     0.0000 |        0.9000 |                               
   35 | 00m34s |    0.73777 |    0.0000 |        0.4000 |          0.0000 |    29.9011 |        0.4000 |                               
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
 Step |   Time |      Value |     alpha |   mask_thresh |   min_seed_size |   min_size |   seed_thresh | 
   36 | 00m43s |    0.84708 |    0.0000 |        0.4000 |         41.5305 |    18.6506 |        0.4000 |                               
   37 | 00m36s |    0.85582 |    0.0000 |        0.4000 |         59.2420 |    55.6881 |        0.4000 |                               
   38 | 00m35s |    0.86263 |    0.0000 |        0.4000 |         86.5147 |    77.5565 |        0.4000 |                               
   39 | 00m36s |    0.56978 |    1.0000 |        0.4000 |          0.0000 |    70.0372 |        0.4000 |                               
   40 | 00m33s |    0.85290 |    0.0000 |        0.9000 |         14.2876 |    82.4162 |        0.4000 |

Code looks like this:

    def bo_best(self):
        return {'max_val': self.Y.max(),
                'max_params': dict(zip(self.keys, self.X[self.Y.argmax()]))}

    preload, seeded_objective = _make_scorable_objective(arch_to_paths, arches,
                                                         train_data_path)
    preload()  # read datas into memory

    seeded_bounds = {
        'mask_thresh': (.4, .9),
        'seed_thresh': (.4, .9),
        'min_seed_size': (0, 100),
        'min_size': (0, 100),
        'alpha': (0.0, 1.0),
    }

    seeded_bo = BayesianOptimization(seeded_objective, seeded_bounds)
    cand_params = [
        {'mask_thresh': 0.9000, 'min_seed_size': 100.0000, 'min_size': 100.0000, 'seed_thresh': 0.4000},
        {'mask_thresh': 0.8367, 'seed_thresh': 0.4549, 'min_seed_size': 97, 'min_size': 33},  # 'max_val': 0.8708
        {'mask_thresh': 0.8367, 'min_seed_size': 97.0000, 'min_size': 33.0000, 'seed_thresh': 0.4549},  # max_val': 0.8991
        {'mask_thresh': 0.7664, 'min_seed_size': 48.5327, 'min_size': 61.8757, 'seed_thresh': 0.4090},  # 'max_val': 0.9091}
        {'mask_thresh': 0.6666, 'min_seed_size': 81.5941, 'min_size': 13.2919, 'seed_thresh': 0.4241},  # full dataset 'max_val': 0.9142}
        # {'mask_thresh': 0.8, 'seed_thresh': 0.5, 'min_seed_size': 20, 'min_size': 0},
        # {'mask_thresh': 0.5, 'seed_thresh': 0.8, 'min_seed_size': 20, 'min_size': 0},
        # {'mask_thresh': 0.8338, 'min_seed_size': 25.7651, 'min_size': 38.6179, 'seed_thresh': 0.6573},
        # {'mask_thresh': 0.6225, 'min_seed_size': 93.2705, 'min_size': 5, 'seed_thresh': 0.4401},
        # {'mask_thresh': 0.7870, 'min_seed_size': 85.1641, 'min_size': 64.0634, 'seed_thresh': 0.4320},
    ]
    for p in cand_params:
        p['alpha'] = .88
    n_init = 2 if DEBUG else 40

    seeded_bo.explore(pd.DataFrame(cand_params).to_dict(orient='list'))

    # Basically just using this package for random search.
    # The BO doesnt seem to help much
    seeded_bo.plog.print_header(initialization=True)
    seeded_bo.init(n_init)
    print('seeded ' + ub.repr2(bo_best(seeded_bo), nl=0, precision=4))

    gp_params = {"alpha": 1e-5, "n_restarts_optimizer": 2}

    n_iter = 2 if DEBUG else 10
    for kappa in [10, 5, 1]:
        seeded_bo.maximize(n_iter=n_iter, acq='ucb', kappa=kappa, **gp_params)

    best_res = bo_best(seeded_bo)
    print('seeded ' + ub.repr2(best_res, nl=0, precision=4))

3reactions

pietromarchesicommented, Dec 7, 2017

I am running into a similar issue: while during the Initialization phase the parameter space is sampled nicely, in the Optimization phase, for most parameters, only extreme values are tried. I’m optimizing an R2 score which can be negative. I tried optimizing 10 + R2 instead, because I read that that may a be a problem. While alleviated, the obsession for edges of the parameters space is still present. Why does the presence of negative values matter for one, and any suggestion on how to fix this?

Top Results From Across the Web

Hidden Markov Models

The forward algorithm computes the observation probability by summing over the prob- abilities of all possible hidden state paths that could generate the ......

Hidden Markov Model. Elaborated with examples

Hidden Markov models are probabilistic frameworks where the observed data are modeled as a series of outputs generated by one of several (hidden) ......

Introduction to Hidden Markov Models

To define Markov model, the following probabilities have to be specified: transition probabilities ... What HMM is more likely to generate this observation....

Exploring Hidden Markov Models - Nipun Batra

In an HMM, an observation is generated from a hidden component, which is modeled as a Markov chain. The observation at time t...

Hidden Markov Models - an overview | ScienceDirect Topics

In HMM, each observation is generated by some states and observations are independent of each other.