question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proper parallel optimisation

See original GitHub issue

I have a single threaded function which I want to optimise. I’m trying to write a wrapper that would handle multiple runs at the same time, but I’m noticing considerable degradation in results as I increase the number of parallel evaluations.

Here is the rough logic of what I’m doing:

from __future__ import annotations

from pprint import pprint
from typing import Callable

import numpy as np
from bayes_opt.bayesian_optimization import BayesianOptimization
from bayes_opt.util import UtilityFunction
from tqdm import tqdm, trange


def multivariable_func(r: float, x: float, y: float, diff: float) -> float:
    r = int(r)
    diff = diff > 0.5

    loss = (r - 5) ** 2
    loss += (x**2 + y**2 - r) ** 2
    loss += abs(x - y) * (-1) ** int(diff)
    loss += 0.5 * x
    loss += -0.25 * int(diff)
    return -loss


def optimize(func: Callable[..., float], num_iter: int, bounds: dict[str, tuple[float, float]], num_workers=0):
    init_samples = int(np.sqrt(num_iter))

    optimizer = BayesianOptimization(f=None, pbounds=bounds, verbose=0)
    init_kappa = 10
    kappa_decay = (0.1 / init_kappa) ** (1 / (num_iter - init_samples))
    utility = UtilityFunction(
        kind="ucb", kappa=init_kappa, xi=0.0, kappa_decay=kappa_decay, kappa_decay_delay=init_samples
    )

    init_queue = [optimizer.suggest(utility) for _ in range(init_samples)]
    result_queue = []
    tbar = tqdm(total=num_iter, leave=False)
    while len(optimizer.res) < num_iter:
        sample = init_queue.pop(0) if init_queue else optimizer.suggest(utility)
        loss = func(**sample)
        result_queue.append((sample, loss))
        if len(result_queue) >= num_workers:
            try:
                optimizer.register(*result_queue.pop(0))
                utility.update_params()
                tbar.update()
            except KeyError:
                pass
    return optimizer.max


bounds = {"r": [-10, 10], "x": [-10, 10], "y": [-10, 10], "diff": [0, 1]}

all_results = {}
for num_workers in tqdm([1, 2, 4, 8], desc="Checking num_workers"):
    results = []
    for idx in trange(2, desc=f"Sampling with {num_workers=}"):
        best = optimize(multivariable_func, 400, bounds, num_workers)
        results.append(best["target"])
    all_results[num_workers] = np.mean(results)
    tqdm.write(f"Result for optimizing with {num_workers=}: {all_results[num_workers]}")
print("\n")
pprint(all_results)

The result_queue variable is simulating evaluation across multiple processes.

Here are the results:

{1: 4.320798413579277,
 2: 4.320676522735756,
 4: 3.5379530743926133,
 8: 2.175667857740832}

As can be seen, the more processes I use, the worse the final result is. I don’t understand why that would happen, even if a couple fewer suggestions are evaluated properly, the results should not differ so much.

What am I missing?

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:15

github_iconTop GitHub Comments

1reaction
Rizhiycommented, Aug 22, 2022

OK, thank you. I’ve done some tests and here are the results for my example:

num_iter\num_worker 1 2 4 8 16
400 4.220 4.233 3.893 2.705 -1.569
500 4.320 4.316 4.106 3.357 2.101
600 4.315 4.320 4.257 4.131 3.511
700 4.321 4.320 4.318 4.304 3.880
800 4.321 4.321 4.320 4.309 4.145
900 4.320 4.320 4.320 4.310 4.182

These are averaged over 10 runs with different seeds to remove the effect of lucky guesses.

In this toy example, I would say anything above 4.3 is satisfactory. 8 workers require 700 iterations instead of 500, which equates to 470% speed-up. Currently, this is fine for me, so I will stop working on analysing this for now.

0reactions
bwheelz36commented, Oct 13, 2022

looks really promising! Sorry I think it is going to take me a few days to look at this properly though 😦

Read more comments on GitHub >

github_iconTop Results From Across the Web

Improving Optimization Performance with Parallel Computing
This article describes two ways to use parallel computing to accelerate the solution of computationally expensive optimization problems.
Read more >
Parallel optimization algorithms for High Performance ...
3 Load balancing methods for parallel optimization algorithms ... A crucial aspect for the proper construction of a response surface by means of...
Read more >
Parallel optimization algorithms for a problem with very ...
You can always calculate the gradient in parallel (for quasi-Newton methods using finite differences) and get a speedup proportional to the ...
Read more >
(PDF) Parallel optimization: Theory, algorithms, and applications
PDF | This book offers a unique pathway to methods of parallel optimization by introducing parallel computing ideas and techniques into both ...
Read more >
Speeding Up Optimization Problems Using Parallel Computing
Speeding Up Optimization Problems Using Parallel Computing. 4.8K views 7 years ago. MATLAB SOFTWARE. MATLAB SOFTWARE. 11.7K subscribers.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found