Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hyperband bracket assignment using hash in parallelization

See original GitHub issue

I think there could be a problem when using Hyperband pruner in parallelization. In Hyperband, each trial is assigned to the bracket by computing hash(“{}{}”.format(study.studyname, trial.number). However, the hash gives different value for the same input when executed on different terminal. A trial n, for example, could be assigned to the bracket 1 at process 1 and the bracket 2 at process 2 , which ruins the benefits of using Hyperband pruner instead of the SuccesiveHalving pruner.

Expected behavior

A trial needs to be assigned to the identical bracket among each process.

Environment

Optuna version: 2.10.0
Python version: 3.6.9
OS: Linux

Error messages, stack traces, or logs

I attached an custom-made log that I’d added in the SuccessiveHalving code, which is a part of the Hyperband. The trials 1, 2, 3, 4 were included in bracket 1 at both process 1 and 2 However, the trials 5, 7, 9 were included in the bracket only at process 2.

Trials for bracket 1 at process 1

Trials for the same bracket 1 at process 2

Steps to reproduce

Add the following codes at the beginning of the def _get_competing_values( ) in the _successive_halving.py

for t in trials :
    if rung_key in t.system_attrs:
        compet = [t.system_attrs[rung_key]]
        print('   Prune SH Get_competing_values: '+ rung_key, ' trial ',t.number,'competing ',compet)

Check if the trials consisting of a bracket differs among each process

Additional context (optional)

As you can see here, it was due to the update that hash randomization is turned on after python version 3.3. Of course, the randomization could be turned off by setting the PYTHONHASHSEED=0 when executing the python script. Since I did not see any warning or suggestion about this problem, I would like to ask if my assertion was legit.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:22 (15 by maintainers)

Top GitHub Comments

2reactions

toshihikoyanasecommented, Nov 9, 2021

@hvy I took a brief benchmark using timeit.

The result shows that hashlib.sha* functions were 7 to 8 times slower than hash(). This is mainly because hashlib.sha* returns a byte array and we need to convert it to an integer to calculate the remainder divided by self._total_trial_allocation_budget. @c-bata suggested an alternative function binascii.crc32 , and it seems a bit slower than hash, but it is much faster than hashlib.sha*. I think we don’t need cryptographic hash functions for this purpose, and binascii.crc32 may be a reasonable choice among them.

Alternatively, we may just take trial.number % self._total_trial_allocation_budget as suggested in https://github.com/optuna/optuna/pull/809#discussion_r361076742. I’m not sure the randomness of hash functions affects the search performance.

name	time (sec.)
`hash`	0.001790
`hashlib.md5`	0.007207
`hashlib.sha1`	0.006742
`hashlib.sha256`	0.007733
`hashlib.sha512`	0.009293
`binascii.crc32`	0.002447

timeit benchmarking

>>> import timeit
>>> import hashlib
>>> study_name = "tmp"
>>> trial_number = 1
>>> timeit.timeit(lambda: hash(f"{study_name}_{trial_number}"), number=10000)
>>> def _hash(s, f):
...     return int.from_bytes(f(s.encode()).digest(), byteorder="big")
>>> timeit.timeit(lambda: _hash(f"{study_name}_{trial_number}", hashlib.md5), number=10000)
>>> timeit.timeit(lambda: _hash(f"{study_name}_{trial_number}", hashlib.sha1), number=10000)
>>> timeit.timeit(lambda: _hash(f"{study_name}_{trial_number}", hashlib.sha256), number=10000)
>>> timeit.timeit(lambda: _hash(f"{study_name}_{trial_number}", hashlib.sha512), number=10000)
>>> timeit.timeit(lambda: binascii.crc32(f"{study_name}_{trial_number}".encode()), number=10000)

1reaction

hvycommented, Nov 9, 2021

You’re right, and a simple mod might also suffice. The logical change is probably a matter of a few lines of code (if not one), but we probably want to take a benchmark during the course of this PR, whichever approach we opt for. If a contributor is willing to work on this issue, a core maintainer could perhaps step in and help with setting up the benchmark, or alternatively, run the benchmark for him/her. (It’d be interesting to compare different bracket assignment algorithms for sure)

Top Results From Across the Web

Tuner Using the Hyperband Algorithm - mlr3hyperband

This hyperband implementation evaluates hyperparameter configurations of equal budget across brackets in one batch. For example, all configurations in stage 1 ...

Supervising the Multi-Fidelity Race of Hyperparameter ...

Hyperband with a sampling based on Bayesian optimization or ... the budget for a set of candidates (Hyperband bracket) according to a predefined...

Parallelizing Hyperband for Large-Scale Tuning - MLSys

Note that parallelizing Hyperband is trivia after paralleliz- ing SHA—simply take the best performing configuration across multiple brackets of SHA with ...

Ray Documentation - Read the Docs

One implementation detail is when using multiple brackets, trial allocation to bracket is done randomly with over a softmax probability.

Release 0.8.5 The Ray Team - the Ray documentation

the commit hash, Ray version, Operating System, and Python version: ... Use double parentheses to evaluate math in Bash: $((200 * 1024 *....