question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to clear Failed task with retries

See original GitHub issue

Apache Airflow version: 2.0.1

Kubernetes version (if you are using kubernetes) (use kubectl version): NA

Environment: Windows WSL2 (Ubuntu) Local

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): Ubuntu 18.04
  • Kernel (e.g. uname -a): Linux d255bce4dcd5 5.4.72-microsoft-standard-WSL2
  • Install tools: Docker -compose
  • Others:

What happened: I have a dag with tasks: Task1 - Get Date Task 2 - Get data from Api call (Have set retires to 3) Task 3 - Load Data

Task 2 had failed after three attempts. I am unable to clear the task Instance and get the below error in UI.

Dag Code

Python version: 3.8.7
Airflow version: 2.0.1rc2
Node: d255bce4dcd5
-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/auth.py", line 34, in decorated
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/decorators.py", line 60, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/views.py", line 1547, in clear
    return self._clear_dag_tis(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/views.py", line 1475, in _clear_dag_tis
    count = dag.clear(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 65, in wrapper
    return func(*args, session=session, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dag.py", line 1324, in clear
    clear_task_instances(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 160, in clear_task_instances
    ti.max_tries = ti.try_number + task_retries - 1
TypeError: unsupported operand type(s) for +: 'int' and 'str'

What you expected to happen:

I expected to clear the Task Instance so that the task could be scheduled again.

How to reproduce it:

  1. Clone the repo link shared above
  2. Follow instructions to setup cluster.
  3. Change code to enforce error in Task 2
  4. Execute and try to clear task instance after three attempts.

Error pops up when clicked on Clear

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
hsnprsdcommented, Jun 4, 2021

@uranusjr And if the retries is an string i think we can try to parse it as integer if we can.

0reactions
vshabarishcommented, Apr 14, 2021

Isn’t the retries value supposed to be an int? The repro above has

 @dag.task(default_args={'retries': '2', 'retry_delay': timedelta(minutes=30)})

Which is the cause to the exception, if I’m not mistaken.

That said, Airflow should probably be more resilient against user issue like this. Probably set the value to default with a warning?

Excellent, it is working for us by changing the str to int at retries. Thanks much for the help !!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow Task failure/retry workflow - Stack Overflow
An easy way to confirm the sequence that it is executed in is to set your email_on_retry and email_on_failure to True and see...
Read more >
How to Retry Failed Tasks in the ThreadPoolExecutor in Python
We can retry a failed task by manually resubmitting it to the thread pool. This requires two things: That we know that the...
Read more >
Retry tasks on different hosts - IBM
Retrying a task on another host is useful especially when the status of a host causes a retry task to fail repeatedly on...
Read more >
influx task retry-failed | InfluxDB Cloud Documentation
Retry failed task runs for a specific task ID · Retry failed task runs that occurred before a specific time · Retry failed...
Read more >
Retrying event-driven functions - Google Cloud
If the failure is due to a bug or any other sort of permanent error, your function can get stuck in a retry...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found