Sweep stopped when running "openai wandb sync" inside sweep function
See original GitHub issueDescribe the bug
I am trying to find the best hyperparameters for openai fimetuned models using wandb sweep. But when I try to run the “openai wandb sync” inside the function of sweep it is running successfully for the first run. But throwed error and stopped the rest of the runs.
sweep_id = wandb.sweep(sweep=sweep_configuration, project=project)
wandb.agent(sweep_id=sweep_id, function=finetune,count=run_count)
def finetune():
run = wandb.init()
with run:
response = openai.FineTune.create(**args)
#some code here to wait for the finetune to finish and get the finetune_id
os.system("openai wandb sync --project {} --id {}".format(str(run.project_name()),finetune_id))
🎉 wandb sync completed successfully
wandb: Waiting for W&B process to finish... (success).
Exception in thread SockSrvRdThr:
Traceback (most recent call last):
File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\server_sock.py", line 112, in run
shandler(sreq)
File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\server_sock.py", line 173, in server_record_publish
iface = self._mux.get_stream(stream_id).interface
File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\streams.py", line 199, in get_stream
stream = self._streams[stream_id]
KeyError: 'pxjbs8ta'
Additional Files
No response
Environment
WandB version: 0.13.4
OS: Windows 10 Pro
Python version: 3.10.6
Versions of relevant libraries: openai==0.23.1
Additional Context
No response
Issue Analytics
- State:
- Created a year ago
- Comments:20
Top Results From Across the Web
Quickstart - Documentation - Weights & Biases - Wandb
From the terminal, hit Ctrl+c to stop the run that the Sweep agent is currently running. To kill the agent, hit Ctrl+c again...
Read more >wandb.sweep - Documentation - Weights & Biases
You can always call again openai wandb sync and we will re-sync any run that was not synced successfully. If needed, you can...
Read more >wandb sweep - Documentation - Weights & Biases
Finish a sweep to stop running new runs and let currently running runs finish. ... You can always call again openai wandb sync...
Read more >wandb.agent - Documentation - Weights & Biases
agent(. sweep_id, function=None, entity=None, project=None, count=None. ) Will run a function or program with configuration parameters specified by server.
Read more >Troubleshooting - Documentation - Weights & Biases - WandB
This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I had the same issue. I run two modeling training scripts on a single machine. Each script utilizes the same wandb project name. Every time, it throws an error at the end of the training.
If so, wandb need to improve resiliency to not-perfect network condictions. Random exceptions through a long training process is disasterous.