Can't get results from HTCondorCluster
See original GitHub issueDear developers,
I’m experiencing some problems submitting jobs with HTCondorCluster
.
This is the simple example I’m trying to run:
from dask.distributed import Client, progress
from dask_jobqueue.htcondor import HTCondorCluster
def dummy_function(num):
return num
def main():
cluster = HTCondorCluster(cores=1, memory='1GB', disk='1GB')
cluster.scale(jobs=3)
client = Client(cluster)
futures = client.map(dummy_function, range(3))
progress(futures)
res = client.gather(futures)
print('\n', res)
if __name__ == '__main__':
main()
I would expect to see the progress bar completed and then a list [0, 1, 2]
printed.
What happens is that the jobs seem to run: if I run condor_q
I initially see 3 jobs in idle, then they start running until all of them are done. But where I should see the bar at 100% and the list printed I only see the bar at 0% with the time running.
If I do the same interactively with IPython I see that status of the futures is pending
even after all the jobs finished running.
Am I doing something wrong?
Environment:
- Dask version: 2.20.0
- Python version: 3.7.8
- Operating System: Centos7
- Install method (conda, pip, source): conda
As a side note, the same example run on a different cluster with SLURMCluster
instead of HTCondorCluster
works as expected.
Thank you,
Massimiliano
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
Thanks for the explanation, I hope Dask will answer your need and others at CERN!
Hi @guillaumeeb, sorry for the late reply. I’m working on rewriting an analysis framework for Higgs to two gamma analysis. Since it will have to run on (at least) two clusters using HTCondor and SLURM, Dask (+ joblib) looks like the perfect solution to have a common API for both of them. But right now I’m still at an early stage, so I don’t have much more information 😃
On a wider and more general scenario, they mentioned the idea of providing workload distribution with Dask “as a service” (you can see it in the ticket I attached above), so I guess it will be used more in the future.