question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hyperband bracket assignment using hash in parallelization

See original GitHub issue

I think there could be a problem when using Hyperband pruner in parallelization. In Hyperband, each trial is assigned to the bracket by computing hash(“{}{}”.format(study.studyname, trial.number). However, the hash gives different value for the same input when executed on different terminal. A trial n, for example, could be assigned to the bracket 1 at process 1 and the bracket 2 at process 2 , which ruins the benefits of using Hyperband pruner instead of the SuccesiveHalving pruner.

Expected behavior

A trial needs to be assigned to the identical bracket among each process.

Environment

  • Optuna version: 2.10.0
  • Python version: 3.6.9
  • OS: Linux

Error messages, stack traces, or logs

I attached an custom-made log that I’d added in the SuccessiveHalving code, which is a part of the Hyperband. The trials 1, 2, 3, 4 were included in bracket 1 at both process 1 and 2 However, the trials 5, 7, 9 were included in the bracket only at process 2.

Trials for bracket 1 at process 1 image

Trials for the same bracket 1 at process 2 image

Steps to reproduce

  1. Add the following codes at the beginning of the def _get_competing_values( ) in the _successive_halving.py
for t in trials :
    if rung_key in t.system_attrs:
        compet = [t.system_attrs[rung_key]]
        print('   Prune SH Get_competing_values: '+ rung_key, ' trial ',t.number,'competing ',compet)
  1. Check if the trials consisting of a bracket differs among each process

Additional context (optional)

As you can see here, it was due to the update that hash randomization is turned on after python version 3.3. Of course, the randomization could be turned off by setting the PYTHONHASHSEED=0 when executing the python script. Since I did not see any warning or suggestion about this problem, I would like to ask if my assertion was legit.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:22 (15 by maintainers)

github_iconTop GitHub Comments

2reactions
toshihikoyanasecommented, Nov 9, 2021

@hvy I took a brief benchmark using timeit.

The result shows that hashlib.sha* functions were 7 to 8 times slower than hash(). This is mainly because hashlib.sha* returns a byte array and we need to convert it to an integer to calculate the remainder divided by self._total_trial_allocation_budget. @c-bata suggested an alternative function binascii.crc32 , and it seems a bit slower than hash, but it is much faster than hashlib.sha*. I think we don’t need cryptographic hash functions for this purpose, and binascii.crc32 may be a reasonable choice among them.

Alternatively, we may just take trial.number % self._total_trial_allocation_budget as suggested in https://github.com/optuna/optuna/pull/809#discussion_r361076742. I’m not sure the randomness of hash functions affects the search performance.

name time (sec.)
hash 0.001790
hashlib.md5 0.007207
hashlib.sha1 0.006742
hashlib.sha256 0.007733
hashlib.sha512 0.009293
binascii.crc32 0.002447
timeit benchmarking
>>> import timeit
>>> import hashlib
>>> study_name = "tmp"
>>> trial_number = 1
>>> timeit.timeit(lambda: hash(f"{study_name}_{trial_number}"), number=10000)
>>> def _hash(s, f):
...     return int.from_bytes(f(s.encode()).digest(), byteorder="big")
>>> timeit.timeit(lambda: _hash(f"{study_name}_{trial_number}", hashlib.md5), number=10000)
>>> timeit.timeit(lambda: _hash(f"{study_name}_{trial_number}", hashlib.sha1), number=10000)
>>> timeit.timeit(lambda: _hash(f"{study_name}_{trial_number}", hashlib.sha256), number=10000)
>>> timeit.timeit(lambda: _hash(f"{study_name}_{trial_number}", hashlib.sha512), number=10000)
>>> timeit.timeit(lambda: binascii.crc32(f"{study_name}_{trial_number}".encode()), number=10000)
1reaction
hvycommented, Nov 9, 2021

You’re right, and a simple mod might also suffice. The logical change is probably a matter of a few lines of code (if not one), but we probably want to take a benchmark during the course of this PR, whichever approach we opt for. If a contributor is willing to work on this issue, a core maintainer could perhaps step in and help with setting up the benchmark, or alternatively, run the benchmark for him/her. (It’d be interesting to compare different bracket assignment algorithms for sure)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tuner Using the Hyperband Algorithm - mlr3hyperband
This hyperband implementation evaluates hyperparameter configurations of equal budget across brackets in one batch. For example, all configurations in stage 1 ...
Read more >
Supervising the Multi-Fidelity Race of Hyperparameter ...
Hyperband with a sampling based on Bayesian optimization or ... the budget for a set of candidates (Hyperband bracket) according to a predefined...
Read more >
Parallelizing Hyperband for Large-Scale Tuning - MLSys
Note that parallelizing Hyperband is trivia after paralleliz- ing SHA—simply take the best performing configuration across multiple brackets of SHA with ...
Read more >
Ray Documentation - Read the Docs
One implementation detail is when using multiple brackets, trial allocation to bracket is done randomly with over a softmax probability.
Read more >
Release 0.8.5 The Ray Team - the Ray documentation
the commit hash, Ray version, Operating System, and Python version: ... Use double parentheses to evaluate math in Bash: $((200 * 1024 *....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found