question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[CLI]: sweep agent crashes during wandb.init()

See original GitHub issue

Describe the bug

The sweep agent crashes during wandb.init(), which is very probably caused by a server-side modification. This afternoon (13:00 - 15:00 CET) I was running a sweep as usual but when I wanted to start a very similar sweep later (after 18:30 CET) it didn’t work anymore. Even running an exact copy of the pervious sweep with the same sweep.yaml, as well as code and no package updates, resulted in a crash. I therefore strongly suspect that it was caused by a server-side modification.

From the traceback below, I investigated sender.py and found that history is returns a list of dictionaries which does not need to be parsed again:

history = json.loads(resume_status["historyTail"])
if history:
    history = json.loads(history[-1])

I don’t know why this started to occur today and what the correct behavior would be, but removing the json.loads resolved the issue for me as a workaround. I don’t have code to reproduce the issue, but I would assume that it occurs with any project using a history. Just to be sure, my minimal sweep.yaml looks like this but guess it won’t change much:

method: grid
metric:
  goal: maximize
  name: probe/student.max
parameters:
  enc:
    value: vgg11
program: dino.py

Thanks in advance and best wishes, Felix

Thread SenderThread: wandb.init()...
Traceback (most recent call last):
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 49, in run
    self._run()
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 100, in _run
    self._process(record)
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/internal/internal.py", line 309, in _process
    self._sm.send(record)
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 305, in send
    send_handler(record)
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 770, in send_run
    error = self._maybe_setup_resume(run)
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 629, in _maybe_setup_resume
    history = json.loads(history[-1]) # ORIGINAL
  File "/usr/lib/python3.8/json/__init__.py", line 341, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not dict
2022-12-12 21:00:37,743 - wandb.wandb_agent - INFO - Running runs: ['w2ptq40k']
wandb: ERROR Internal wandb error: file data was not synced
Problem at: /local/home/safelix/venv/dino/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py 357 experiment
Traceback (most recent call last):
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1078, in init
    run = wi.init()
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 697, in init
    result = handle.wait(
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 259, in wait
    raise MailboxError("transport failed")
wandb.errors.MailboxError: transport failed
wandb: ERROR Abnormal program exit
Traceback (most recent call last):
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1078, in init
    run = wi.init()
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 697, in init
    result = handle.wait(
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 259, in wait
    raise MailboxError("transport failed")
wandb.errors.MailboxError: transport failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "dino.py", line 254, in <module>
    main(config)
  File "dino.py", line 38, in main
    wandb_logger = WandbLogger(
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py", line 311, in __init__
    _ = self.experiment
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 41, in experiment
    return get_experiment() or DummyExperiment()
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/pytorch_lightning/utilities/rank_zero.py", line 32, in wrapped_fn
    return fn(*args, **kwargs)
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 39, in get_experiment
    return fn(self)
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py", line 357, in experiment
    self._experiment = wandb.init(**self._wandb_init)
  File "/local/home/safelix/venv/dino/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1115, in init
    raise Exception("problem") from error_seen
Exception: problem

Additional Files

No response

Environment

WandB version: 0.13.6

OS: Ubuntu 20.04.2 LTS

Python version: Python 3.8.5

Versions of relevant libraries: json.__version__ = '2.0.9'

Additional Context

No response

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
MBakirWBcommented, Dec 13, 2022

Hi @safelix , this stemmed from a recent server side change. The issue has now been resolved. Please do let us know if you encounter any other problems.

1reaction
chris-clemcommented, Dec 12, 2022

Thanks for creating the issue. I have the same problem!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Sweeps - Documentation - Weights & Biases
wandb.init(id="some-string"). You can not set a Run ID for W&B Sweeps because Weights & Biases automatically generates random, unique IDs for Runs created ......
Read more >
wandb.wandb_agent - ERROR - Detected 5 failed runs in a ...
I believe the issue is in your wandb.agent() function call. It's supposed to get a function (name only) as an argument, so the...
Read more >
Weights & Biases sweep with multi gpu accelerate launch
Hi, I am trying to use Accelerate with multi-gpu on a single machine with ... Run xw1qzfpw errored: Error('You must call wandb.init() before ......
Read more >
Common Questions · GitBook
wandb: ERROR Error while calling W&B API: anaconda 400 error: {"code":400 ... You cant set a project with wandb.init() when running a sweep....
Read more >
attributeerror: module 'wandb' has no attribute 'init' - You.com
AI-powered apps built-in to help you code faster ... def sweep(): wandb.init() # Get hyp dict from sweep agent hypdict = vars(wandb.config).get("items")
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found