Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wandb sweep hanging on Google Colab

See original GitHub issue

wandb --version && python --version && uname

Weights and Biases version: 0.10.1
Python version: Python 3.6.9
Operating System: Google Colab

Description

Trying to run a sweep on Google Colab.

What I Did

So there is no error the sweep just hangs indefinitely after the first run.

wandb: Agent Finished Run

I then have to kill it manually

wandb: Ctrl-c pressed. Waiting for runs to end. Press ctrl-c again to terminate them.
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/wandb/wandb_agent.py", line 69, in _start
    run._stop_jupyter_agent()
AttributeError: 'Run' object has no attribute '_stop_jupyter_agent'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 261, in _bootstrap
    util._exit_function()
  File "/usr/lib/python3.6/multiprocessing/util.py", line 319, in _exit_function
    p.join()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 124, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

This is true even on relatively simple things: https://colab.research.google.com/drive/1u2xHJI_z-XoLTeFYGRzYS5PjdZGRakyG#scrollTo=HMRG0EnyyJYD

I also tested just using a generic train function with no model and it still fails. https://colab.research.google.com/drive/1Ek-BpLf6BTXaZORGhgAi3NZo07mMw9nO#scrollTo=SAqGr4Vjp6ml

Issue Analytics

State:
Created 3 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

2reactions

issue-label-bot[bot]commented, Sep 17, 2020

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.95. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

1reaction

vanpeltcommented, Sep 17, 2020

Hey @isaacmg we’re aware of the issue. With 0.10.1 you can use an experimental agent by replacing wandb.agent with wandb_secretagent which should fix the stalling issue.

Top Results From Across the Web

Wandb sweep hanging on Google Colab · Issue #1243

Trying to run a sweep on Google Colab. What I Did. So there is no error the sweep just hangs indefinitely after the...

Organizing_Hyperparameter_Sw...

Introduction to Hyperparameter Sweeps using W&B. Searching through high dimensional hyperparameter spaces to find the most performant model can get unwieldy ...

CS6910 Assignment 3 – Weights & Biases

Instructions. The goal of this assignment is fourfold: (i) learn how to model sequence to sequence learning problems using Recurrent Neural Networks (ii) ......

Log distributed training experiments - Weights & Biases - WandB

Hanging at the beginning of training - A wandb process can hang if the wandb multiprocessing interferes with the multiprocessing from distributed training....

Define sweep configuration - Documentation - Weights & Biases

1. Ensure that you log ( wandb.log ) the exact metric name that you defined the sweep ...