question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wandb sweep hanging on Google Colab

See original GitHub issue

wandb --version && python --version && uname

  • Weights and Biases version: 0.10.1
  • Python version: Python 3.6.9
  • Operating System: Google Colab

Description

Trying to run a sweep on Google Colab.

What I Did

So there is no error the sweep just hangs indefinitely after the first run.

wandb: Agent Finished Run

I then have to kill it manually

wandb: Ctrl-c pressed. Waiting for runs to end. Press ctrl-c again to terminate them.
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/wandb/wandb_agent.py", line 69, in _start
    run._stop_jupyter_agent()
AttributeError: 'Run' object has no attribute '_stop_jupyter_agent'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 261, in _bootstrap
    util._exit_function()
  File "/usr/lib/python3.6/multiprocessing/util.py", line 319, in _exit_function
    p.join()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 124, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

This is true even on relatively simple things: https://colab.research.google.com/drive/1u2xHJI_z-XoLTeFYGRzYS5PjdZGRakyG#scrollTo=HMRG0EnyyJYD

I also tested just using a generic train function with no model and it still fails. https://colab.research.google.com/drive/1Ek-BpLf6BTXaZORGhgAi3NZo07mMw9nO#scrollTo=SAqGr4Vjp6ml

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
issue-label-bot[bot]commented, Sep 17, 2020

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.95. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

1reaction
vanpeltcommented, Sep 17, 2020

Hey @isaacmg we’re aware of the issue. With 0.10.1 you can use an experimental agent by replacing wandb.agent with wandb_secretagent which should fix the stalling issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wandb sweep hanging on Google Colab · Issue #1243
Trying to run a sweep on Google Colab. What I Did. So there is no error the sweep just hangs indefinitely after the...
Read more >
Organizing_Hyperparameter_Sw...
Introduction to Hyperparameter Sweeps using W&B. Searching through high dimensional hyperparameter spaces to find the most performant model can get unwieldy ...
Read more >
CS6910 Assignment 3 – Weights & Biases
Instructions. The goal of this assignment is fourfold: (i) learn how to model sequence to sequence learning problems using Recurrent Neural Networks (ii) ......
Read more >
Log distributed training experiments - Weights & Biases - WandB
Hanging at the beginning of training - A wandb process can hang if the wandb multiprocessing interferes with the multiprocessing from distributed training....
Read more >
Define sweep configuration - Documentation - Weights & Biases
1. Ensure that you log ( wandb.log ) the exact metric name that you defined the sweep ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found