question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Same observation being generatad

See original GitHub issue

Hi,

I tried to run the code below to optimize a XGBoost classifier, but get stuck with same observation being tested all time. I expected some new observation being generated… or am I wrong?

Console output (after initial points generated). Notice that all iterations generates the same observation:

('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration:   1 | Last sampled value:   -0.680226 | with parameters:  {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
               | Current maximum:      -0.245901 | with parameters:  {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
               | Time taken: 0 minutes and 10.953415 seconds

('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration:   2 | Last sampled value:   -0.680226 | with parameters:  {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
               | Current maximum:      -0.245901 | with parameters:  {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
               | Time taken: 0 minutes and 10.790525 seconds

('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration:   3 | Last sampled value:   -0.680226 | with parameters:  {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
               | Current maximum:      -0.245901 | with parameters:  {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
               | Time taken: 0 minutes and 10.6884 seconds

Full code for the program (uses the xgboost library)

import xgboost as xgb
from bayes_opt import BayesianOptimization
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=2500, n_features=45, n_informative=12, n_redundant=7, n_classes=2, random_state=42)


def xgbcv(max_depth, eta, colsample_bytree, subsample, num_round):
    print("XGB", locals())

    dtrain = xgb.DMatrix(X, label=y)

    params = {
        'booster': 'gbtree',
        'objective': 'multi:softprob',
        'silent': 1,
        'max_depth': int(round(max_depth)),
        'eta': eta,
        'colsample_bytree': colsample_bytree,
        'subsample': subsample,
        'num_class': 2,
        'eval_metric': 'mlogloss',
        'seed': 42
    }

    r = xgb.cv(params, dtrain, int(round(num_round)), nfold=4, metrics={'mlogloss'}, seed=45, show_stdv=False)

    return -r['test-mlogloss-mean'].mean()


xgbBO = BayesianOptimization(xgbcv, {
    'max_depth': (2, 6),
    'eta': (0.01, 0.8),
    'colsample_bytree': (0.25, 1.0),
    'subsample': (0.25, 1.0),
    'num_round': (20, 30),
}, verbose=True)

xgbBO.maximize(init_points=32, n_iter=6)

Thanks in advice!

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

5reactions
Erotemiccommented, Dec 7, 2017

I’m also seeing this “edge obsession” with on e of my params (alpha, seed_thresh, and mask_thresh).

The random / given initialization points give a good sample of the space, but once I get to the maximization portion of the code, the algorithm always chooses alpha=0 or alpha=1 and seed_thersh/mask_thresh =.4 or .9.

I’m using UCB with kappas of 10, 5, and 1. My scores are only positive and are fairly well behaved.

Initialization
--------------------------------------------------------------------------------------------------------
 Step |   Time |      Value |     alpha |   mask_thresh |   min_seed_size |   min_size |   seed_thresh | 
    1 | 00m22s |    0.81479 |    0.8800 |        0.9000 |        100.0000 |   100.0000 |        0.4000 |                               
    2 | 00m22s |    0.82484 |    0.8800 |        0.8367 |         97.0000 |    33.0000 |        0.4549 |                               
    3 | 00m22s |    0.82484 |    0.8800 |        0.8367 |         97.0000 |    33.0000 |        0.4549 |                               
    4 | 00m23s |    0.80596 |    0.8800 |        0.7664 |         48.5327 |    61.8757 |        0.4090 |                               
    5 | 00m23s |    0.82962 |    0.8800 |        0.6666 |         81.5941 |    13.2919 |        0.4241 |                               
    6 | 00m22s |    0.70743 |    0.7219 |        0.7437 |         17.5233 |    28.8181 |        0.6414 |                               
    7 | 00m21s |    0.43979 |    0.2976 |        0.5215 |         94.8511 |    64.3517 |        0.8054 |                               
    8 | 00m23s |    0.84768 |    0.2408 |        0.6120 |         20.9162 |    32.0568 |        0.5938 |                               
    9 | 00m22s |    0.81603 |    0.6403 |        0.7360 |         24.6371 |    89.1438 |        0.5964 |                               
   10 | 00m24s |    0.82895 |    0.4123 |        0.4659 |         63.0934 |    10.2661 |        0.5906 |                               
   11 | 00m22s |    0.77536 |    0.1803 |        0.7268 |         12.2180 |    69.7986 |        0.7694 |                               
   12 | 00m23s |    0.71786 |    0.9697 |        0.7017 |          1.7283 |    87.1418 |        0.4590 |                               
   13 | 00m19s |    0.14442 |    0.4860 |        0.5708 |         80.0456 |    42.1833 |        0.8415 |                               
   14 | 00m22s |    0.80979 |    0.1810 |        0.8648 |          1.5454 |    53.7144 |        0.6080 |                               
   15 | 00m20s |    0.21012 |    0.9539 |        0.5251 |         94.5773 |     1.5600 |        0.7119 |                               
   16 | 00m19s |    0.15580 |    0.9824 |        0.6439 |         24.2936 |    56.2465 |        0.7527 |                               
   17 | 00m21s |    0.84999 |    0.6045 |        0.8915 |         95.3123 |    24.6991 |        0.4303 |                               
   18 | 00m18s |    0.07305 |    0.7312 |        0.8213 |         56.5674 |    86.4971 |        0.8207 |                               
   19 | 00m23s |    0.85359 |    0.1550 |        0.7519 |         28.8857 |    32.4800 |        0.5863 |                               
   20 | 00m24s |    0.82244 |    0.2414 |        0.4381 |         82.6430 |    14.5005 |        0.6036 |                               
   21 | 00m22s |    0.81988 |    0.5954 |        0.8685 |          3.5614 |    54.1788 |        0.4786 |                               
   22 | 00m20s |    0.18643 |    0.7339 |        0.4441 |         73.3577 |    27.7940 |        0.7647 |                               
   23 | 00m22s |    0.83862 |    0.6037 |        0.7404 |         53.4283 |    99.3464 |        0.5586 |                               
   24 | 00m17s |    0.01051 |    0.8708 |        0.7362 |         95.7069 |    58.4163 |        0.8590 |                               
   25 | 00m21s |    0.61327 |    0.3797 |        0.7900 |          9.6831 |    96.0789 |        0.7906 |                               
seeded {'max_params': {'alpha': 0.1550, 'mask_thresh': 0.7519, 'min_seed_size': 28.8857, 'min_size': 32.4800, 'seed_thresh': 0.5863}, 'max_val': 0.8536}
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
 Step |   Time |      Value |     alpha |   mask_thresh |   min_seed_size |   min_size |   seed_thresh | 
   26 | 00m41s |    0.68458 |    0.0000 |        0.4000 |          0.0000 |     0.0000 |        0.4000 |                               
   27 | 00m33s |    0.84261 |    0.0000 |        0.4000 |         33.1328 |     0.0000 |        0.4000 |                               
   28 | 00m31s |    0.46382 |    0.0000 |        0.4000 |         77.6450 |   100.0000 |        0.9000 |                               
   29 | 00m32s |    0.85606 |    0.0000 |        0.9000 |         72.6044 |    67.5066 |        0.4000 |                               
   30 | 00m36s |    0.85364 |    0.0000 |        0.4000 |         52.9370 |    40.6417 |        0.4000 |                               
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
 Step |   Time |      Value |     alpha |   mask_thresh |   min_seed_size |   min_size |   seed_thresh | 
   31 | 00m41s |    0.86287 |    0.0000 |        0.9000 |        100.0000 |    83.0868 |        0.4000 |                               
   32 | 00m32s |    0.57726 |    0.0000 |        0.4000 |         35.0292 |   100.0000 |        0.9000 |                               
   33 | 00m25s |    0.00070 |    1.0000 |        0.9000 |         51.8428 |     0.0000 |        0.9000 |                               
   34 | 00m24s |    0.00067 |    1.0000 |        0.9000 |         16.7209 |     0.0000 |        0.9000 |                               
   35 | 00m34s |    0.73777 |    0.0000 |        0.4000 |          0.0000 |    29.9011 |        0.4000 |                               
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
 Step |   Time |      Value |     alpha |   mask_thresh |   min_seed_size |   min_size |   seed_thresh | 
   36 | 00m43s |    0.84708 |    0.0000 |        0.4000 |         41.5305 |    18.6506 |        0.4000 |                               
   37 | 00m36s |    0.85582 |    0.0000 |        0.4000 |         59.2420 |    55.6881 |        0.4000 |                               
   38 | 00m35s |    0.86263 |    0.0000 |        0.4000 |         86.5147 |    77.5565 |        0.4000 |                               
   39 | 00m36s |    0.56978 |    1.0000 |        0.4000 |          0.0000 |    70.0372 |        0.4000 |                               
   40 | 00m33s |    0.85290 |    0.0000 |        0.9000 |         14.2876 |    82.4162 |        0.4000 |       

Code looks like this:

    def bo_best(self):
        return {'max_val': self.Y.max(),
                'max_params': dict(zip(self.keys, self.X[self.Y.argmax()]))}

    preload, seeded_objective = _make_scorable_objective(arch_to_paths, arches,
                                                         train_data_path)
    preload()  # read datas into memory

    seeded_bounds = {
        'mask_thresh': (.4, .9),
        'seed_thresh': (.4, .9),
        'min_seed_size': (0, 100),
        'min_size': (0, 100),
        'alpha': (0.0, 1.0),
    }

    seeded_bo = BayesianOptimization(seeded_objective, seeded_bounds)
    cand_params = [
        {'mask_thresh': 0.9000, 'min_seed_size': 100.0000, 'min_size': 100.0000, 'seed_thresh': 0.4000},
        {'mask_thresh': 0.8367, 'seed_thresh': 0.4549, 'min_seed_size': 97, 'min_size': 33},  # 'max_val': 0.8708
        {'mask_thresh': 0.8367, 'min_seed_size': 97.0000, 'min_size': 33.0000, 'seed_thresh': 0.4549},  # max_val': 0.8991
        {'mask_thresh': 0.7664, 'min_seed_size': 48.5327, 'min_size': 61.8757, 'seed_thresh': 0.4090},  # 'max_val': 0.9091}
        {'mask_thresh': 0.6666, 'min_seed_size': 81.5941, 'min_size': 13.2919, 'seed_thresh': 0.4241},  # full dataset 'max_val': 0.9142}
        # {'mask_thresh': 0.8, 'seed_thresh': 0.5, 'min_seed_size': 20, 'min_size': 0},
        # {'mask_thresh': 0.5, 'seed_thresh': 0.8, 'min_seed_size': 20, 'min_size': 0},
        # {'mask_thresh': 0.8338, 'min_seed_size': 25.7651, 'min_size': 38.6179, 'seed_thresh': 0.6573},
        # {'mask_thresh': 0.6225, 'min_seed_size': 93.2705, 'min_size': 5, 'seed_thresh': 0.4401},
        # {'mask_thresh': 0.7870, 'min_seed_size': 85.1641, 'min_size': 64.0634, 'seed_thresh': 0.4320},
    ]
    for p in cand_params:
        p['alpha'] = .88
    n_init = 2 if DEBUG else 40

    seeded_bo.explore(pd.DataFrame(cand_params).to_dict(orient='list'))

    # Basically just using this package for random search.
    # The BO doesnt seem to help much
    seeded_bo.plog.print_header(initialization=True)
    seeded_bo.init(n_init)
    print('seeded ' + ub.repr2(bo_best(seeded_bo), nl=0, precision=4))

    gp_params = {"alpha": 1e-5, "n_restarts_optimizer": 2}

    n_iter = 2 if DEBUG else 10
    for kappa in [10, 5, 1]:
        seeded_bo.maximize(n_iter=n_iter, acq='ucb', kappa=kappa, **gp_params)

    best_res = bo_best(seeded_bo)
    print('seeded ' + ub.repr2(best_res, nl=0, precision=4))
3reactions
pietromarchesicommented, Dec 7, 2017

I am running into a similar issue: while during the Initialization phase the parameter space is sampled nicely, in the Optimization phase, for most parameters, only extreme values are tried. I’m optimizing an R2 score which can be negative. I tried optimizing 10 + R2 instead, because I read that that may a be a problem. While alleviated, the obsession for edges of the parameters space is still present. Why does the presence of negative values matter for one, and any suggestion on how to fix this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Hidden Markov Models
The forward algorithm computes the observation probability by summing over the prob- abilities of all possible hidden state paths that could generate the ......
Read more >
Hidden Markov Model. Elaborated with examples
Hidden Markov models are probabilistic frameworks where the observed data are modeled as a series of outputs generated by one of several (hidden) ......
Read more >
Introduction to Hidden Markov Models
To define Markov model, the following probabilities have to be specified: transition probabilities ... What HMM is more likely to generate this observation....
Read more >
Exploring Hidden Markov Models - Nipun Batra
In an HMM, an observation is generated from a hidden component, which is modeled as a Markov chain. The observation at time t...
Read more >
Hidden Markov Models - an overview | ScienceDirect Topics
In HMM, each observation is generated by some states and observations are independent of each other.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found