question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't get results from HTCondorCluster

See original GitHub issue

Dear developers,

I’m experiencing some problems submitting jobs with HTCondorCluster.

This is the simple example I’m trying to run:

from dask.distributed import Client, progress
from dask_jobqueue.htcondor import HTCondorCluster

def dummy_function(num):
    return num

def main():
    cluster = HTCondorCluster(cores=1, memory='1GB', disk='1GB')
    cluster.scale(jobs=3)
    client = Client(cluster)
    futures = client.map(dummy_function, range(3))
    progress(futures)
    res = client.gather(futures)
    print('\n', res)

if __name__ == '__main__':
    main()

I would expect to see the progress bar completed and then a list [0, 1, 2] printed. What happens is that the jobs seem to run: if I run condor_q I initially see 3 jobs in idle, then they start running until all of them are done. But where I should see the bar at 100% and the list printed I only see the bar at 0% with the time running. If I do the same interactively with IPython I see that status of the futures is pending even after all the jobs finished running.

Am I doing something wrong?

Environment:

  • Dask version: 2.20.0
  • Python version: 3.7.8
  • Operating System: Centos7
  • Install method (conda, pip, source): conda

As a side note, the same example run on a different cluster with SLURMCluster instead of HTCondorCluster works as expected.

Thank you,

Massimiliano

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
guillaumeebcommented, Feb 8, 2021

Thanks for the explanation, I hope Dask will answer your need and others at CERN!

0reactions
maxgallicommented, Feb 8, 2021

Hi @guillaumeeb, sorry for the late reply. I’m working on rewriting an analysis framework for Higgs to two gamma analysis. Since it will have to run on (at least) two clusters using HTCondor and SLURM, Dask (+ joblib) looks like the perfect solution to have a common API for both of them. But right now I’m still at an early stage, so I don’t have much more information 😃

On a wider and more general scenario, they mentioned the idea of providing workload distribution with Dask “as a service” (you can see it in the ticket I attached above), so I guess it will be used more in the future.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting - HTCondor-CE Documentation
HTCondor -CE Troubleshooting Guide¶. In this document, you will find a collection of files and commands to help troubleshoot HTCondor-CE along with a...
Read more >
HTCondor User Tutorial - CERN Indico
HTCondor manages and runs work on your behalf. • Schedule tasks on a single computer to not overwhelm the computer.
Read more >
Re: [HTCondor-users] HTCondor can't execute the job with error
Dear Jaime Frey, One more thing, in the log I see only four "Got RELEASE_CLAIM from" messages instead of five. Dmitry.
Read more >
condor_q — HTCondor Manual 10.1.1 documentation
(output option) Display results as jobs are fetched from the job queue rather than storing results in memory until all jobs have been...
Read more >
Troubleshooting Condor Batch System - uscms
Troubleshooting the Condor Batch System at the CMS LPC CAF ... you can click on "Cluster" to find out all the jobs with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found