question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Airflow CLI tasks clear command error (due to "latest" symlink?)

See original GitHub issue

Apache Airflow version

2.3.2 (latest released)

What happened

When running airflow tasks clear command, we get the following error.

[2022-06-07 15:59:58,353] {{dagbag.py:507}} INFO - Filling up the DagBag from /usr/local/airflow
Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/airflow/__main__.py", line 38, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/airflow/cli/cli_parser.py", line 51, in command
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/airflow/utils/cli.py", line 99, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/airflow/cli/commands/task_command.py", line 591, in task_clear
    dags = get_dags(args.subdir, args.dag_id, use_regex=args.dag_regex)
  File "/usr/local/lib/python3.8/dist-packages/airflow/utils/cli.py", line 214, in get_dags
    return [get_dag(subdir, dag_id)]
  File "/usr/local/lib/python3.8/dist-packages/airflow/utils/cli.py", line 201, in get_dag
    dagbag = DagBag(process_subdir(subdir))
  File "/usr/local/lib/python3.8/dist-packages/airflow/models/dagbag.py", line 130, in __init__
    self.collect_dags(
  File "/usr/local/lib/python3.8/dist-packages/airflow/models/dagbag.py", line 514, in collect_dags
    for filepath in list_py_file_paths(
  File "/usr/local/lib/python3.8/dist-packages/airflow/utils/file.py", line 305, in list_py_file_paths
    file_paths.extend(find_dag_file_paths(directory, safe_mode))
  File "/usr/local/lib/python3.8/dist-packages/airflow/utils/file.py", line 323, in find_dag_file_paths
    for file_path in find_path_from_directory(str(directory), ".airflowignore"):
  File "/usr/local/lib/python3.8/dist-packages/airflow/utils/file.py", line 242, in _find_path_from_directory
    raise RuntimeError(
RuntimeError: Detected recursive loop when walking DAG directory /usr/local/airflow: /usr/local/airflow/logs/splunk/scheduler/2022-06-07 has appeared more than once.

Looking at this directory, I see 2022-06-07 and latest, which is a symlink to 2022-06-07.

The error is being raised from here https://github.com/apache/airflow/blob/0bf5f495d4131109fba449697adee68a62516851/airflow/utils/file.py#L242

child_process_log_directory = /usr/local/airflow/logs/splunk/scheduler in our airflow.cfg

What you think should happen instead

Clear command should run successfully.

How to reproduce

My understanding is that if you have 2022-06-07 and latest within your scheduler logging directory, and you try to clear a task, the CLI command would fail. We are overriding child_process_log_directory = /usr/local/airflow/logs/splunk/scheduler in the airflow.cfg.

Operating System

Linux

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

As a workaround, adding logs/splunk/scheduler/latest to the .airflowignore resolved the issue for us.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
potiukcommented, Jun 14, 2022

No worries - this is not an urgent one - assigned you 😃

0reactions
github-actions[bot]commented, Oct 17, 2022

This issue has been closed because it has not received response from the issue author.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Command Line Interface and Environment Variables Reference
Sub-commands¶. clear¶. Clear a set of task instance, as if they never ran. airflow tasks ...
Read more >
Airflow: dag_id could not be found - Stack Overflow
In my case, the metadata database instance was too slow, and loading dags failed because of a timeout. I've fixed it by: Upgrading...
Read more >
Airflow Documentation - Read the Docs
The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities ...
Read more >
What is a "failed to create a symbolic link: file exists" error?
This is a classical error... it's the other way around: ln -s Existing-file New-name. so in your case
Read more >
Installing and Configuring Apache Airflow - Home - Clairvoyant
Delete the “Default” queue; Restart Airflow Scheduler service. Install MySQL Dependencies. If you intend to use MySQL as an DB repo you will ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found