question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support SGE scheduler

See original GitHub issue

It’d be great to support SGE as scheduler. I’m trying to use PBS as they are somewhat similar, but polling doesn’t seem to really work: at https://github.com/eth-cscs/reframe/blob/45fbacb23210c757724882a13c4a53f33af04800/reframe/core/schedulers/pbs.py#L186 the command qstat -f Your is executed for me, I guess there is something wrong going on.

I might be able to work on this implementation in the near future, it shouldn’t be too different from PBS, but I’ll likely need some assistance along the way 🙂

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
vkarakcommented, Apr 19, 2021

Actually, there is no need to look outside the scheduler backends. To implement a scheduler backend you only need to implement the interface of the JobScheduler abstract class. As soon as you do this, and you decorate your scheduler class with @register_scheduler(name) it should be integrated with the framework and ready to be used. Job schedulers manage Job instances, which are the job descriptors and contain the information about the job has been submitted. The rest of the framework does not know about the backends at all. The framework will call Job.create() to create a new job with all the information retrieved by the test spec. This in turn will call the make_job() method of the scheduler backend. Job is not abstract, but backends may choose to extend it just to add additional fields relevant for them, see for example the _PbsJob. The rest of the JobScheduler API takes either a single job or a list of jobs to process. If you look into the reframe.core.schedulers module you will see the documentation for each API function. Practically, for a scheduler backend, the most important methods are the emit_preamble(job), the submit(job), the cancel(job) and the poll(*jobs). The poll method may take multiple jobs at once, because it is more efficient to issue a single poll command and retrieve the state of multiple jobs instead of polling each one individually. How is this going to be implemented is entirely up to the backend.

And a small correction to the documentation of the API. The following is not correct:

https://github.com/eth-cscs/reframe/blob/45fbacb23210c757724882a13c4a53f33af04800/reframe/core/schedulers/__init__.py#L86-L95

The finished() method does not poll (this is a stale comment 😬 ). The poll() method polls and finished() simply retrieves the job state (whether a job has finished or not) and raises any job-related error that has happened during polling.

The filternodes() and allnodes() methods are not essential (see PBS backend) unless you want your backend to support flexible jobs (see here).

0reactions
giordanocommented, Apr 21, 2021

That was helpful, thanks! When submitting a job with SGE I get

Your job <job id> ("<filename>") has been submitted

I’m now going through the other changes of the syntax.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Administering the Scheduler (Sun N1 Grid Engine 6.1 ...
This section describes how the grid engine system schedules jobs for execution. The section describes different types of scheduling strategies and explains ...
Read more >
SGE Job Scheduler | Arts & Sciences Computing
Sun Grid Engine (SGE) is a tool for resource management and load balancing in cluster environments. Running a batch job with SGE begins...
Read more >
Sun Grid Engine (SGE) QuickStart — StarCluster 0.93.3 ...
Scheduling - allows you to schedule a virtually unlimited amount of work to be performed when resources become available. · Load Balancing -...
Read more >
SGE Manual Pages - Open Grid Scheduler
NAME Sun Grid Engine - a facility for executing UNIX jobs on remote machines ... User level checkpointing programs are supported and a...
Read more >
Sun Grid Engine -- A Batch System - Talby
Scheduler, Queues and Slots: SGE includes both a scheduler for allocating resources (CPUs!) to computational jobs and a queueing mechanism.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found