question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Infinite job submissions when submitted jobs' status is error/cancelled/unknown status

See original GitHub issue

Information

  • Qiskit Terra version: lastest
  • Python version:
  • Operating system:

What is the current behavior?

The problem is in the run_qobj() function in qiskit/utils/run_circuits.py. When a submitted job to a non-simulator backend is in a JobStatus.ERROR or JobStatus.CANCELLED status, the job is submitted again. This happens infinitely many times if the job is continually canceled or in an error state.

if job_status == JobStatus.CANCELLED:
    logger.warning("FAILURE: Job id: %s is cancelled. Re-submit the Qobj.", job_id)
elif job_status == JobStatus.ERROR:
    logger.warning(
        "FAILURE: Job id: %s encounters the error. "
        "Error is : %s. Re-submit the Qobj.",
        job_id,
        job.error_message(),
    )
else:
    logging.warning(
        "FAILURE: Job id: %s. Unknown status: %s. " "Re-submit the Qobj.",
        job_id,
        job_status,
    )
job, job_id = _safe_submit_qobj(
    qobj, backend, backend_options, noise_config, skip_qobj_validation
)

This is especially dangerous when a the job was submitted successfully, but reading the status was wrong (due to a bug in the job class or at the backend side), possibly causing infinite job submissions each costing the user money.

Steps to reproduce the problem

Execute a job that always returns the status JobStatus.CANCELLED.

What is the expected behavior?

The job submission should stop after a finite amount of retries.

Suggested solutions

Do not use while True loops in the code. Instead use a finite, preferably small, amount of retries.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
TheGupta2012commented, Aug 17, 2021

I was thinking that too but Mathew Treinish and Steve Wood said that it would be okay to halt and notify that the job set execution has failed. The basis was that execution through QuantumInstance is intended for execution of quantum algorithms and failing of even one job in the job set would mean a failed execution of the algorithm altogether.

Also, if the reviewers want I could still add some functionality in other commits but the current PR resolves the issue by raising a QiskitError if any check for execution or status retrieval fails.

0reactions
TheGupta2012commented, Aug 19, 2021

Hey @shakal, is there any other advice for the PR I opened? I’m not sure if I should ask the reviewers specifically.

Read more comments on GitHub >

github_iconTop Results From Across the Web

submit a Job but the result status always "esriJobSubmitted"
Hi there,. I submit a task using this API: https://analysis7.arcgis.com/arcgis/rest/services/tasks/GPServer/PlanRoutes/submitJob.
Read more >
PM17650: TWSZ SUBMITTED JOB STUCK SUBMIT IN ... - IBM
A TWSz-submitted job may remain in SUBMIT IN PROGRESS status in the CP even though the job has completed execution in JES.
Read more >
job submit | Microsoft Learn
Use this parameter to submit a job that already exists and contains tasks. You can only submit jobs that have a state of...
Read more >
Job Submission within the ARCC
Submit scripts are submitted to SLURM, which assigns it a job id and puts the job in a ... You can see the...
Read more >
Quickstart Using the Ray Jobs CLI
To do this, we can pass the --no-wait flag to ray job submit and use the other CLI commands to check on the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found