Bug with Sweep when upgrading from 0.9.3 to 0.10.1
See original GitHub issue- wandb, version 0.10.1
- Python 3.7.3
- Linux
Description
When trying to run a sweep, a code that used to work well on v0.9.3 now breaks (v0.10.1) with the following (cryptic) error.
Traceback (most recent call last):
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 481, in init
run = wi.init()
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 353, in init
use_redirect=use_redirect,
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/site-packages/wandb/backend/backend.py", line 50, in ensure_launched
self.record_q = self._wl._multiprocessing.Queue()
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/context.py", line 102, in Queue
return Queue(maxsize, ctx=self.get_context())
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/queues.py", line 42, in __init__
self._rlock = ctx.Lock()
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/context.py", line 67, in Lock
return Lock(ctx=self.get_context())
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/synchronize.py", line 162, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/synchronize.py", line 80, in __init__
register(self._semlock.name)
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/semaphore_tracker.py", line 83, in register
self._send('REGISTER', name)
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/semaphore_tracker.py", line 90, in _send
self.ensure_running()
File "/home/ll582/.conda/envs/PPI-env/lib/python3.7/multiprocessing/semaphore_tracker.py", line 46, in ensure_running
pid, status = os.waitpid(self._pid, os.WNOHANG)
ChildProcessError: [Errno 10] No child processes
wandb: ERROR Abnormal program exit
I have created a minimal example that reproduces the bug:
import wandb
from argparse import ArgumentParser
import random
class sweepClass():
def __init__(self, args):
self.args = args
def main_sweep(self):
wandb.init(
config=self.args,
dir=self.args.wandbLogs_dir
)
args_2 = wandb.config
# This where a hypothetical machine learning model would go
# model using as hyperparameters: model_param_1 and model_param_2
# model(args_2).fit(X,y)
# y_pred = model.predict(Xnew)
perf_metric = random.random() #(for the example)
wandb.log({'myMetric': perf_metric})
def sweep(self):
sweep_config = {
"name": "mySweep_1",
"method": "bayes",
'metric': {
'name':"myMetric",
'goal':'maximize'
},
"parameters": {
'model_param_1':{
'values':[1,4,5]
},
'model_param_2': {
'distribution': 'uniform',
'min':2,
'max':10
},
}
}
sweep_id = wandb.sweep(sweep_config, project=self.args.wandb_project)
wandb.agent(sweep_id, function=self.main_sweep)
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--wandb_project', type=str, default='myProject')
parser.add_argument('--wandbLogs_dir', type=str, default='./myCustomDir_wandb')
args = parser.parse_args()
sweepClass(args).sweep()
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Bug with Sweep when upgrading from 0.9.3 to 0.10.1 #1250
When trying to run a sweep, a code that used to work well on v0.9.3 now breaks (v0.10.1) with the following (cryptic) error....
Read more >Bug listing with status RESOLVED with resolution FIXED as at ...
Bug :2 - "How do I attach an ebuild. ... Bug:3230 - "frozen-bubble-0.9.3 (Update)" status:RESOLVED resolution:FIXED severity:enhancement · Bug:3231 - "Wrong ...
Read more >Bug #1640978 “[SRU] Backport letsencrypt from bionic”
For Xenial, we are backporting the version of Certbot in Bionic. Note that this update includes two important functional changes: 1) Automatic ...
Read more >NEWS
Bug fixes for addGps when using a dataframe as the source, ... Updating addHydrophoneDepth to be able to take numeric input for constant...
Read more >Release Notes - Circonus Docs
Update irondb-eventer.conf with default settings for new find jobqs. ... Fix reconstitute bug where surrogate databases would get ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
A quick workaround:
We will look into why mutiprocessing fork? or forkserver? method is not working with the python based sweep agent.
Hey folks We are closing this ticket due to the inactivity. The PR that is said to fix the solution has been merged to the repo. We hope that solved the problem. Please feel free to reopen the issue if the issue persists. 😄