question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hyperband doesn't use best checkpoints from the previous round

See original GitHub issue

As far as I could understand from the keras-tuner sources, this realization of HyperBand uses the set of best hyperparameters from the previous rounds, but not that models’ weights/checkpoints. Models for new rounds initializes with random weights. Is this behaviour consistent with the original work? Are there some plans to change this behaviour?

Any way, how could one add corresponding checkpoints’ loading at the beginning of rounds to the existing workflow of KerasTuner? Any minimal working example would be appreciated! Thanks!

UPD. Ok, I managed to create my own solution by subclassing Hyperband tuner and overwriting _on_train_begin, that fires just before model.fit in run_trial method:

from kerastuner.engine import hypermodel as hm_module

class MyTuner(Hyperband):
    def _on_train_begin(self, model, hp, *fit_args, **fit_kwargs):
        prev_trial_id = hp.values['tuner/trial_id'] if 'tuner/trial_id' in hp else None
        if prev_trial_id:
            prev_trial = self.oracle.trials[prev_trial_id]
            best_epoch = prev_trial.best_step
            # the code below is from load_model method of Tuner class
            with hm_module.maybe_distribute(self.distribution_strategy):
                model.load_weights(self._get_checkpoint_fname(
                    prev_trial.trial_id, best_epoch))

Check it out. It seens to work fine, as can be seen at tensorflow acc plot: image Any comments are appreciated

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
TimMenschcommented, Jul 5, 2022

For posterity:

  1. The above solution doesn’t work because the Tuner class changed.
  2. Hyperband was at some point updated to perform this reloading, but its code wasn’t functional either because the Tuner class changed again.
  3. For the current version of Hyperband, this seems to work:
class HyperbandPlus(Hyperband):
    def _build_hypermodel(self, hp):
        model = super(Hyperband, self)._build_hypermodel(hp)
        if "tuner/trial_id" in hp.values:
            trial_id = hp.values["tuner/trial_id"]
            print("Reloading data from", trial_id)
            fname = self._get_checkpoint_fname(trial_id)
            # Load best checkpoint from this trial.
            model.load_weights(fname)
        return model

This is the code from the current Hyperband implementation, updated to override _build_hypermodel instead of _build_model, which no longer exists. Aside: This is why static types are important. If this code had consistent type definitions and had a type linting build step, as soon as _build_model was renamed the Hyperband code would have thrown an error and wouldn’t have just silently broken.

When I get a chance I’ll submit the above fix as a PR, but until then I wanted to post the fix in case anyone else was wondering why their Hyperband trials weren’t reloading weights.

2reactions
TsaiTung-Chencommented, Mar 22, 2021

I’m facing the same issue too. Here’s my metric history on the training set (10 colors for 10 hyper-configs): SimpleCNN_bracket 2 (training)

As you can see, every hyper-config model seems to restart training when entering a new round. (max_epochs ® = 3^3, factor (eta) = 3, with early stopping) And the dataset and model are from here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Server 2016 - Unable to apply checkpoints - TechNet - Microsoft
We have a Server 2016 Hyper-V cluster (fully patched) with which we are experiencing issues applying checkpoints.
Read more >
How to merge Hyper-V Checkpoints and why - A complete guide
How to Merge Hyper-V Checkpoints using Hyper-V Manager and Powershell ... the best practice procedure in managing Hyper-V checkpoints.
Read more >
Standard and Production Checkpoints in Hyper-V 2016 - Altaro
Hyper -V 2016 added production checkpoints that add a greater range of valid use cases to the checkpoint mechanism. Learn about them here....
Read more >
12 things you should know about Hyper-V snapshots - Veeam
Checkpoints may slow down the entire host as long as you are keeping them; Checkpoints are lost when the VM's virtual disk is...
Read more >
Expanding VHD or VDHX Files That Have Checkpoints Is ...
So you've expanded the virtual disk (VHD/VHDX) of a virtual machine that has checkpoints (or snapshots as they used to be called) on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found