question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sweep stopped when running "openai wandb sync" inside sweep function

See original GitHub issue

Describe the bug

I am trying to find the best hyperparameters for openai fimetuned models using wandb sweep. But when I try to run the “openai wandb sync” inside the function of sweep it is running successfully for the first run. But throwed error and stopped the rest of the runs.

sweep_id = wandb.sweep(sweep=sweep_configuration, project=project)
wandb.agent(sweep_id=sweep_id, function=finetune,count=run_count)

def finetune():
    run = wandb.init()
    with run:
        response = openai.FineTune.create(**args)
        #some code here to wait for the finetune to finish and get the finetune_id
        os.system("openai wandb sync --project {} --id {}".format(str(run.project_name()),finetune_id))
🎉 wandb sync completed successfully
wandb: Waiting for W&B process to finish... (success).
Exception in thread SockSrvRdThr:
Traceback (most recent call last):
  File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\server_sock.py", line 112, in run
    shandler(sreq)
  File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\server_sock.py", line 173, in server_record_publish
    iface = self._mux.get_stream(stream_id).interface
  File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\streams.py", line 199, in get_stream
    stream = self._streams[stream_id]
KeyError: 'pxjbs8ta'

Additional Files

No response

Environment

WandB version: 0.13.4

OS: Windows 10 Pro

Python version: 3.10.6

Versions of relevant libraries: openai==0.23.1

Additional Context

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:20

github_iconTop GitHub Comments

3reactions
bigheiniucommented, Nov 8, 2022

I had the same issue. I run two modeling training scripts on a single machine. Each script utilizes the same wandb project name. Every time, it throws an error at the end of the training.

1reaction
CCRcmcpecommented, Nov 5, 2022

If so, wandb need to improve resiliency to not-perfect network condictions. Random exceptions through a long training process is disasterous.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Quickstart - Documentation - Weights & Biases - Wandb
From the terminal, hit Ctrl+c to stop the run that the Sweep agent is currently running. To kill the agent, hit Ctrl+c again...
Read more >
wandb.sweep - Documentation - Weights & Biases
You can always call again openai wandb sync and we will re-sync any run that was not synced successfully. If needed, you can...
Read more >
wandb sweep - Documentation - Weights & Biases
Finish a sweep to stop running new runs and let currently running runs finish. ... You can always call again openai wandb sync...
Read more >
wandb.agent - Documentation - Weights & Biases
agent(. sweep_id, function=None, entity=None, project=None, count=None. ) Will run a function or program with configuration parameters specified by server.
Read more >
Troubleshooting - Documentation - Weights & Biases - WandB
This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found