Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hyperband doesn't use best checkpoints from the previous round

See original GitHub issue

As far as I could understand from the keras-tuner sources, this realization of HyperBand uses the set of best hyperparameters from the previous rounds, but not that models’ weights/checkpoints. Models for new rounds initializes with random weights. Is this behaviour consistent with the original work? Are there some plans to change this behaviour?

Any way, how could one add corresponding checkpoints’ loading at the beginning of rounds to the existing workflow of KerasTuner? Any minimal working example would be appreciated! Thanks!

UPD. Ok, I managed to create my own solution by subclassing Hyperband tuner and overwriting _on_train_begin, that fires just before model.fit in run_trial method:

from kerastuner.engine import hypermodel as hm_module

class MyTuner(Hyperband):
    def _on_train_begin(self, model, hp, *fit_args, **fit_kwargs):
        prev_trial_id = hp.values['tuner/trial_id'] if 'tuner/trial_id' in hp else None
        if prev_trial_id:
            prev_trial = self.oracle.trials[prev_trial_id]
            best_epoch = prev_trial.best_step
            # the code below is from load_model method of Tuner class
            with hm_module.maybe_distribute(self.distribution_strategy):
                model.load_weights(self._get_checkpoint_fname(
                    prev_trial.trial_id, best_epoch))

Check it out. It seens to work fine, as can be seen at tensorflow acc plot: Any comments are appreciated

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:7 (2 by maintainers)

Top GitHub Comments

2reactions

TimMenschcommented, Jul 5, 2022

For posterity:

The above solution doesn’t work because the Tuner class changed.
Hyperband was at some point updated to perform this reloading, but its code wasn’t functional either because the Tuner class changed again.
For the current version of Hyperband, this seems to work:

class HyperbandPlus(Hyperband):
    def _build_hypermodel(self, hp):
        model = super(Hyperband, self)._build_hypermodel(hp)
        if "tuner/trial_id" in hp.values:
            trial_id = hp.values["tuner/trial_id"]
            print("Reloading data from", trial_id)
            fname = self._get_checkpoint_fname(trial_id)
            # Load best checkpoint from this trial.
            model.load_weights(fname)
        return model

This is the code from the current Hyperband implementation, updated to override _build_hypermodel instead of _build_model, which no longer exists. Aside: This is why static types are important. If this code had consistent type definitions and had a type linting build step, as soon as _build_model was renamed the Hyperband code would have thrown an error and wouldn’t have just silently broken.

When I get a chance I’ll submit the above fix as a PR, but until then I wanted to post the fix in case anyone else was wondering why their Hyperband trials weren’t reloading weights.

2reactions

TsaiTung-Chencommented, Mar 22, 2021

I’m facing the same issue too. Here’s my metric history on the training set (10 colors for 10 hyper-configs): SimpleCNN_bracket 2 (training)

As you can see, every hyper-config model seems to restart training when entering a new round. (max_epochs ® = 3^3, factor (eta) = 3, with early stopping) And the dataset and model are from here.