Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sweep stopped when running "openai wandb sync" inside sweep function

See original GitHub issue

Describe the bug

I am trying to find the best hyperparameters for openai fimetuned models using wandb sweep. But when I try to run the “openai wandb sync” inside the function of sweep it is running successfully for the first run. But throwed error and stopped the rest of the runs.

sweep_id = wandb.sweep(sweep=sweep_configuration, project=project)
wandb.agent(sweep_id=sweep_id, function=finetune,count=run_count)

def finetune():
    run = wandb.init()
    with run:
        response = openai.FineTune.create(**args)
        #some code here to wait for the finetune to finish and get the finetune_id
        os.system("openai wandb sync --project {} --id {}".format(str(run.project_name()),finetune_id))

🎉 wandb sync completed successfully
wandb: Waiting for W&B process to finish... (success).
Exception in thread SockSrvRdThr:
Traceback (most recent call last):
  File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\server_sock.py", line 112, in run
    shandler(sreq)
  File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\server_sock.py", line 173, in server_record_publish
    iface = self._mux.get_stream(stream_id).interface
  File "C:\Users\KarthikeyanV\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\service\streams.py", line 199, in get_stream
    stream = self._streams[stream_id]
KeyError: 'pxjbs8ta'

Additional Files

No response

Environment

WandB version: 0.13.4

OS: Windows 10 Pro

Python version: 3.10.6

Versions of relevant libraries: openai==0.23.1

Additional Context

No response

Issue Analytics

State:
Created a year ago
Comments:20

Top GitHub Comments

3reactions

bigheiniucommented, Nov 8, 2022

I had the same issue. I run two modeling training scripts on a single machine. Each script utilizes the same wandb project name. Every time, it throws an error at the end of the training.

1reaction

CCRcmcpecommented, Nov 5, 2022

If so, wandb need to improve resiliency to not-perfect network condictions. Random exceptions through a long training process is disasterous.

Top Results From Across the Web

Quickstart - Documentation - Weights & Biases - Wandb

From the terminal, hit Ctrl+c to stop the run that the Sweep agent is currently running. To kill the agent, hit Ctrl+c again...

wandb.sweep - Documentation - Weights & Biases

You can always call again openai wandb sync and we will re-sync any run that was not synced successfully. If needed, you can...

wandb sweep - Documentation - Weights & Biases

Finish a sweep to stop running new runs and let currently running runs finish. ... You can always call again openai wandb sync...

wandb.agent - Documentation - Weights & Biases

agent(. sweep_id, function=None, entity=None, project=None, count=None. ) Will run a function or program with configuration parameters specified by server.

Troubleshooting - Documentation - Weights & Biases - WandB

This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run...