Sweeps: how to run agents using qsub job scheduling
See original GitHub issuewandb --version && python --version && uname
- Weights and Biases version: 0.9.1
- Python version: 3.7.7
- Operating System: Centos 6.5
Description
Trying to run a sweep with wandb agent
, but through qsub
on a server. wandb
command line interface works from login node but is not found by the executed job. Using the workaround https://github.com/lukas/ml-class/issues/24 (using /path/to/my/python -m wandb.cli
in place of wandb agent
) fixes the first wandb
issue appears to execute the final training script with system python rather than the python at /path/to/my/python
.
What I Did/Longer description
Background detail: I am working on a cluster which uses qsub
for job scheduling. In essence, this means you write a little submission_script.sh
which generally runs a few lines that set up an environment (e.g. cd project_dir
etc) and then runs your main script. To run a Python script using a particular Python environment you run it explicitly like /users/me/miniconda3/envs/my_env/bin/python -u train.py
.
This works nicely with wandb
when wandb
is only used within Python (i.e. not command line interface like wandb agent
)
However, I’m now trying to run sweeps.
My hope was I could simply put the line wandb agent walter/myproject/sweepid02938
into the .sh
submission script. However, submitting this results in a failure with a wandb: command not found
error. This is the first mystery/bug, though I deciding that it seemed plausible that wandb
as a command line option was only installed on my login node so followed a workaround I found that suggested using python -m wandb.cli agent
in place of wandb agent
if the latter didn’t work.
Thus I instead used the line: /users/me/miniconda3/envs/my_env/bin/python -m wandb.cli agent walter/myproject/sweepid02938
in my .sh
submission script. However, I now get Python ModuleNotFound errors (e.g. import git) which are not consistent with my environment, where this module exists. Indeed, running the above line from the login node rather than as a submission script works.
Is this a known problem? Has anyone had success running sweeps with wandb
on a qsub
server?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Hey @bibbygoodwin you can override the entrypoint in a couple ways.
wandb agent
you can specify the full path to your env:/users/me/miniconda3/envs/my_env/bin/wandb agent
/usr/bin/env python
to launch the sub process by default. You can override this in your sweep config if that doesn’t work: https://docs.wandb.com/sweeps/configuration#commandHi Walter, thanks for writing in. This is a good question, and it sounds like a complex environment issue. I’m not experienced with
qsub
myself, but I’m cc’ing my colleague @vanpelt who might be able to share some guidance.