question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug with Sweep when upgrading from 0.9.3 to 0.10.1

See original GitHub issue
  • wandb, version 0.10.1
  • Python 3.7.3
  • Linux

Description

When trying to run a sweep, a code that used to work well on v0.9.3 now breaks (v0.10.1) with the following (cryptic) error.

Traceback (most recent call last):
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 481, in init
    run = wi.init()
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 353, in init
    use_redirect=use_redirect,
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/site-packages/wandb/backend/backend.py", line 50, in ensure_launched
    self.record_q = self._wl._multiprocessing.Queue()
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/context.py", line 102, in Queue
    return Queue(maxsize, ctx=self.get_context())
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/queues.py", line 42, in __init__
    self._rlock = ctx.Lock()
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/context.py", line 67, in Lock
    return Lock(ctx=self.get_context())
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/synchronize.py", line 80, in __init__
    register(self._semlock.name)
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/semaphore_tracker.py", line 83, in register
    self._send('REGISTER', name)
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/semaphore_tracker.py", line 90, in _send
    self.ensure_running()
  File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/semaphore_tracker.py", line 46, in ensure_running
    pid, status = os.waitpid(self._pid, os.WNOHANG)
ChildProcessError: [Errno 10] No child processes
wandb: ERROR Abnormal program exit

I have created a minimal example that reproduces the bug:

import wandb
from argparse import ArgumentParser
import random

class sweepClass():

    def __init__(self, args):
        self.args = args

    def main_sweep(self):
        wandb.init(
            config=self.args,
            dir=self.args.wandbLogs_dir
        )

        args_2 = wandb.config

        # This where a hypothetical machine learning model would go
        # model using as hyperparameters: model_param_1 and model_param_2
        # model(args_2).fit(X,y)
        # y_pred = model.predict(Xnew)

        perf_metric = random.random() #(for the example)

        wandb.log({'myMetric': perf_metric})


    def sweep(self):

        sweep_config = {
            "name": "mySweep_1",
            "method": "bayes",
            'metric': {
                'name':"myMetric",
                'goal':'maximize'
            },
            "parameters": {
                'model_param_1':{
                    'values':[1,4,5]
                },
                'model_param_2': {
                    'distribution': 'uniform',
                    'min':2,
                    'max':10
                },
            }
        }

        sweep_id = wandb.sweep(sweep_config, project=self.args.wandb_project)

        wandb.agent(sweep_id, function=self.main_sweep)

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--wandb_project', type=str, default='myProject')
    parser.add_argument('--wandbLogs_dir', type=str, default='./myCustomDir_wandb')

    args = parser.parse_args()

    sweepClass(args).sweep()

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
raubitsjcommented, Sep 20, 2020

A quick workaround:

if __name__ == '__main__':
    import multiprocessing
    multiprocessing.set_start_method('spawn')

We will look into why mutiprocessing fork? or forkserver? method is not working with the python based sweep agent.

0reactions
ariG23498commented, Dec 18, 2020

Hey folks We are closing this ticket due to the inactivity. The PR that is said to fix the solution has been merged to the repo. We hope that solved the problem. Please feel free to reopen the issue if the issue persists. 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bug with Sweep when upgrading from 0.9.3 to 0.10.1 #1250
When trying to run a sweep, a code that used to work well on v0.9.3 now breaks (v0.10.1) with the following (cryptic) error....
Read more >
Bug listing with status RESOLVED with resolution FIXED as at ...
Bug :2 - "How do I attach an ebuild. ... Bug:3230 - "frozen-bubble-0.9.3 (Update)" status:RESOLVED resolution:FIXED severity:enhancement · Bug:3231 - "Wrong ...
Read more >
Bug #1640978 “[SRU] Backport letsencrypt from bionic”
For Xenial, we are backporting the version of Certbot in Bionic. Note that this update includes two important functional changes: 1) Automatic ...
Read more >
NEWS
Bug fixes for addGps when using a dataframe as the source, ... Updating addHydrophoneDepth to be able to take numeric input for constant...
Read more >
Release Notes - Circonus Docs
Update irondb-eventer.conf with default settings for new find jobqs. ... Fix reconstitute bug where surrogate databases would get ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found