question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[CLI]: Connection issues with local (Docker) setup and Optuna multi-threading

See original GitHub issue

Describe the bug

Trying to combine Optuna and a Docker wandb instance, I get connection issues when using multi-threading. Removing the n_jobs=2 parameter from optuna.create_study() makes the issue go away. (Remoing the wandb-related code will also get rid of the errors, even with n_jobs enabled). I hope there is a way to use wandb and still run a parallelized study with Optuna.

Note that both wandb server and the client run on the same machine, just in different Docker containers. Thus, real network issues should not be the issue.

import wandb
import optuna
import numpy as np


def objective(trial: optuna.Trial) -> float:
    config = dict(trial.params)
    config["trial.number"] = trial.number

    wandb.init(
        project="my_project",
        group="wandb_mwe",
        config=config,
        reinit=True,
        # settings=wandb.Settings(start_method="fork"),  # does not make a difference
        # settings=wandb.Settings(start_method="thread"),  # does not make a difference
    )

    score = np.random.random(1)

    wandb.run.summary["final score"] = score
    wandb.run.summary["state"] = "completed"
    wandb.finish(quiet=True)

    return score


wandb.login(
    key="",
    host="http://my.local.net:8000",
    # host="http://wandb-docker-container:8000",  # not a valid url
    relogin=True,
)

study = optuna.create_study(study_name="my_study", direction="maximize")
study.optimize(objective, n_trials=5, n_jobs=2)

$ /opt/conda/bin/python /workspaces/my_project/src/wandb_errors_mwe.py
wandb: Appending key for my.local.net to your netrc file: /home/vscode/.netrc
[I 2022-08-18 11:27:58,400] A new study created in memory with name: my_study
/opt/conda/lib/python3.9/site-packages/optuna/study/study.py:393: FutureWarning: `n_jobs` argument has been deprecated in v2.7.0. This feature will be removed in v4.0.0. See https://github.com/optuna/optuna/releases/tag/v2.7.0.
  warnings.warn(
wandb: Currently logged in as: username. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.13.1
wandb: Run data is saved locally in /workspaces/my_project/src/wandb/run-20220818_112758-r6nbia7o
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run spring-cloud-23
wandb: ⭐️ View project at http://my.local.net:8080/username/my_project
wandb: 🚀 View run at http://my.local.net:8080/username/my_project/runs/r6nbia7o
wandb: Waiting for W&B process to finish... (success).
wandb:                                                                                
wandb: Synced spring-cloud-23: http://my.local.net:8080/username/my_project/runs/r6nbia7o
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
[I 2022-08-18 11:28:10,043] Trial 0 finished with value: 0.07086374553871444 and parameters: {}. Best is trial 0 with value: 0.07086374553871444.
wandb: Tracking run with wandb version 0.13.1
wandb: Run data is saved locally in /workspaces/my_project/src/wandb/run-20220818_112810-132mjr0f
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run kind-microwave-24
wandb: ⭐️ View project at http://my.local.net:8080/username/my_project
wandb: 🚀 View run at http://my.local.net:8080/username/my_project/runs/132mjr0f
wandb: Waiting for W&B process to finish... (success).
wandb:                                                                                
wandb: Synced kind-microwave-24: http://my.local.net:8080/username/my_project/runs/132mjr0f
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
[I 2022-08-18 11:28:23,610] Trial 2 finished with value: 0.42732459058893624 and parameters: {}. Best is trial 2 with value: 0.42732459058893624.
Problem at: /workspaces/my_project/src/wandb_errors_mwe.py 10 objective
wandb: ERROR Error communicating with wandb process
wandb: ERROR For more info see: https://docs.wandb.ai/library/init#init-start-error
[W 2022-08-18 11:28:28,455] Trial 1 failed because of the following error: UsageError('Error communicating with wandb process\nFor more info see: https://docs.wandb.ai/library/init#init-start-error')
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/workspaces/my_project/src/wandb_errors_mwe.py", line 10, in objective
    wandb.init(
  File "/opt/conda/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1043, in init
    run = wi.init()
  File "/opt/conda/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 691, in init
    raise UsageError(error_message)
wandb.errors.UsageError: Error communicating with wandb process
For more info see: https://docs.wandb.ai/library/init#init-start-error
wandb: Tracking run with wandb version 0.13.1
wandb: Run data is saved locally in /workspaces/my_project/src/wandb/run-20220818_112823-2h3zkj2i
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run serene-frost-25
wandb: ⭐️ View project at http://my.local.net:8080/username/my_project
wandb: 🚀 View run at http://my.local.net:8080/username/my_project/runs/2h3zkj2i
wandb: Waiting for W&B process to finish... (success).
wandb:                                                                                
wandb: Synced serene-frost-25: http://my.local.net:8080/username/my_project/runs/2h3zkj2i
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
[I 2022-08-18 11:28:36,211] Trial 3 finished with value: 0.5581329015664727 and parameters: {}. Best is trial 3 with value: 0.5581329015664727.
Traceback (most recent call last):
  File "/workspaces/my_project/src/wandb_errors_mwe.py", line 38, in <module>
    study.optimize(objective, n_trials=5, n_jobs=2)
  File "/opt/conda/lib/python3.9/site-packages/optuna/study/study.py", line 400, in optimize
    _optimize(
  File "/opt/conda/lib/python3.9/site-packages/optuna/study/_optimize.py", line 106, in _optimize
    f.result()
  File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.9/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "/opt/conda/lib/python3.9/site-packages/optuna/study/_optimize.py", line 264, in _run_trial
    raise func_err
  File "/opt/conda/lib/python3.9/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/workspaces/my_project/src/wandb_errors_mwe.py", line 10, in objective
    wandb.init(
  File "/opt/conda/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1043, in init
    run = wi.init()
  File "/opt/conda/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 691, in init
    raise UsageError(error_message)
wandb.errors.UsageError: Error communicating with wandb process
For more info see: https://docs.wandb.ai/library/init#init-start-error
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb:                                                                                
wandb: Synced 2 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20220818_112758-2b67cjb2/logs

Additional Files

wandb.zip

Environment

WandB version: 0.13.1

OS: Linux (running in VSCode dev container)

Python version: 3.9.6

Versions of relevant libraries: optuna: 2.10.1

Additional Context

I have based my MWE on this post: https://medium.com/optuna/optuna-meets-weights-and-biases-58fc6bab893

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
thanos-wandbcommented, Aug 25, 2022

Hi @mimxrt thank you for attaching the wandb folder, we will try to reproduce this error and get back to you soon with more information.

0reactions
MBakirWBcommented, Sep 29, 2022

Thank you for the update @mimxrt.

Read more comments on GitHub >

github_iconTop Results From Across the Web

docker network connect
Connects a container to a network. You can connect a container by name or by ID. Once connected, the container can communicate with...
Read more >
Multi-platform images | Docker Documentation
Using the standard Docker tooling and processes, you can start to build, push, pull, and run images seamlessly on different compute architectures. In...
Read more >
Vulnerability scanning for Docker local images
Users trigger vulnerability scans through the CLI, and use the CLI to view the scan results. The scan results contain a list of...
Read more >
Runtime options with Memory, CPUs, and GPUs
See also the Docker Engine troubleshooting guide for more information. ... You can set various constraints to limit a given container's access to...
Read more >
Docker Context - Docker Documentation
This guide shows how contexts make it easy for a single Docker CLI to manage ... Default orchestrator = Swarm; Issue commands to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found