question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fastai callback - > ValueError: signal only works in main thread

See original GitHub issue

wandb --version && python --version && uname

  • Weights and Biases version: 8.24
  • Python version: 3.7.4
  • Operating System: Debian Stretch

Description

Using the wandb.fastai.wandCallback as described in the fast.ai tutorial e.g.

wandb.init()

learn = my_learner(data,
                    model,
                    callback_fns=WandbCallback)

learn.fit_one_cycle(5, 3e-3)

produces the error below in my Jupyter lab:

ValueError: signal only works in main thread

Full Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-d81c6bd29d71> in <module>
----> 1 learn.lr_find()

/opt/anaconda3/lib/python3.7/site-packages/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, wd)
     30     cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
     31     epochs = int(np.ceil(num_it/len(learn.data.train_dl)))
---> 32     learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
     33 
     34 def to_fp16(learn:Learner, loss_scale:float=None, max_noskip:int=1000, dynamic:bool=True, clip:float=None,

/opt/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    200         callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
    201         self.cb_fns_registered = True
--> 202         fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
    203 
    204     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/opt/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
     99             for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
    100                 xb, yb = cb_handler.on_batch_begin(xb, yb)
--> 101                 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
    102                 if cb_handler.on_batch_end(loss): break
    103 

/opt/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     32     if opt is not None:
     33         loss,skip_bwd = cb_handler.on_backward_begin(loss)
---> 34         if not skip_bwd:                     loss.backward()
     35         if not cb_handler.on_backward_end(): opt.step()
     36         if not cb_handler.on_step_end():     opt.zero_grad()

/opt/anaconda3/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    116                 products. Defaults to ``False``.
    117         """
--> 118         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    119 
    120     def register_hook(self, hook):

/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

/opt/anaconda3/lib/python3.7/site-packages/wandb/wandb_torch.py in backward_hook(module, input, output)
    357                     graph.loaded = True
    358                     if wandb.run:
--> 359                         wandb.run.summary["graph_%i" % graph_idx] = graph
    360                     else:
    361                         wandb.termwarn(

/opt/anaconda3/lib/python3.7/site-packages/wandb/summary.py in __setitem__(self, k, v)
    136         self._locked_keys.add(k)
    137 
--> 138         self._root._write()
    139 
    140         return v

/opt/anaconda3/lib/python3.7/site-packages/wandb/summary.py in _write(self, commit)
    370             self._h5 = None
    371         if wandb.run and wandb.run._jupyter_agent:
--> 372             wandb.run._jupyter_agent.start()
    373 
    374 

/opt/anaconda3/lib/python3.7/site-packages/wandb/jupyter.py in start(self)
    120     def start(self):
    121         if self.paused:
--> 122             self.rm = RunManager(wandb.run, output=False, cloud=wandb.run.mode != "dryrun")
    123             wandb.run.api._file_stream_api = None
    124             self.rm.mirror_stdout_stderr()

/opt/anaconda3/lib/python3.7/site-packages/wandb/run_manager.py in __init__(self, run, project, tags, cloud, output, port)
    506         # Calling .start() on _meta and _system_stats will spin a thread that reports system stats every 30 seconds
    507         self._system_stats = stats.SystemStats(run, self._api)
--> 508         self._meta = meta.Meta(self._api, self._run.dir)
    509         self._meta.data["jobType"] = self._run.job_type
    510         self._meta.data["mode"] = self._run.mode

/opt/anaconda3/lib/python3.7/site-packages/wandb/meta.py in __init__(self, api, out_dir)
     38             self.data = {}
     39         self.lock = threading.Lock()
---> 40         self.setup()
     41         self._thread = threading.Thread(target=self._thread_body)
     42         self._thread.daemon = True

/opt/anaconda3/lib/python3.7/site-packages/wandb/meta.py in setup(self)
    101                 try:
    102                     try:
--> 103                         old_alarm = signal.signal(signal.SIGALRM, alarm_handler)
    104                         signal.alarm(25)
    105                         self._setup_code_git()

/opt/anaconda3/lib/python3.7/signal.py in signal(signalnum, handler)
     45 @_wraps(_signal.signal)
     46 def signal(signalnum, handler):
---> 47     handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
     48     return _int_to_enum(handler, Handlers)
     49 

ValueError: signal only works in main thread

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
raubitsjcommented, Feb 4, 2020

Same root cause as: https://github.com/wandb/client/issues/832

Workaround is to do a summary write before model forward pass is executed when user has run wandb.watch() with jupyter notebooks

On Tue, Feb 4, 2020 at 2:35 PM Lukas Biewald notifications@github.com wrote:

@borisdayma https://github.com/borisdayma could you take a look at this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wandb/client/issues/834?email_source=notifications&email_token=AAN7MP2QHNIUNY3EGZT6AXDRBHURXA5CNFSM4KP6SLIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKZOI2A#issuecomment-582149224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN7MP3FD4WUGMBWNJEYYK3RBHURXANCNFSM4KP6SLIA .

1reaction
lukascommented, Feb 4, 2020

@borisdayma could you take a look at this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

ValueError: signal only works in main thread - fastai callback
Using the wandb.fastai.wandCallback as described in the fast.ai tutorial e.g.. wandb.init() learn = my_learner(data, model, callback_fns= ...
Read more >
ValueError: signal only works in main thread - Stack Overflow
This means that signals can't be used as a means of inter-thread communication. You can use the synchronization primitives from the threading ...
Read more >
Callbacks - fastai
Basic callbacks for Learner. ... This only works for getting attributes, not for setting them. mk_class('TstLearner', 'a') class TstCallback(Callback): def ...
Read more >
ValueError('signal only works in main thread') - W&B Help
I'm running a hyper parameter sweep using PL and Weights and Biases's framework. Running on a GPU on Google Colab which causes all...
Read more >
彻底解决运行jupyer notebook错误:ImportError: cannot import ...
... cannot import name 'constants'、ValueError: signal only works in main thread、ERROR:tornado.application:Exception in callback.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found