question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Upgrade Airflow 2.1.4 -> 2.2.0 : Kubernetes executor worker pod can't resolve dag folder

See original GitHub issue

Apache Airflow version

Other Airflow 2 version

What happened

Using Bitnami Airflow image, I’m upgrading Airflow from 2.1.4 to 2.2.0, running on Kubernetes cluster. During upgrade no changes to environment is made: kubernets cluster is the same, database is the same, Helm chart used is the same. Docker images are custom build due to company’s restrictions, but the only change in build process is what Airflow version is used.

What happened: – DB upgrade initially generated errors due to different schema (task_instantses and dug_runs tables), but upon examining, they were harmless: records were apparantly from some test some months ago – scheduler and web containers started without errors, found all dags and parsed them correctly – attmept to start test dag (which uses Bash operator and prints hello world) failed with errors:

[2022-09-19 18:03:26,111] {dagbag.py:500} INFO - Filling up the DagBag from /opt/bitnami/***/dags/test_dag.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/__main__.py", line 48, in main
    args.func(args)
  File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/utils/cli.py", line 92, in wrapper
    return f(*args, **kwargs)
  File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 276, in task_run
    dag = get_dag(args.subdir, args.dag_id)
  File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/airflow/utils/cli.py", line 192, in get_dag
    raise AirflowException(
airflow.exceptions.AirflowException: Dag 'test_dag' could not be found; either it does not exist or it failed to parse.

So, apparently worker pod found a dag, as it is indicated by first line, but then failed to find it again.

From the scheduler log:

[2022-09-22 12:28:54,476] {kubernetes_executor.py:531} INFO - Add task TaskInstanceKey(dag_id='test_dag', task_id='print_current_date', run_id='scheduled__2022-09-21T15:35:00+00:00', try_number=1) with command ['airflow', 'tasks', 'run', 'test_dag', 'print_current_date', 'scheduled__2022-09-21T15:35:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/test_dag.py'] with executor_config {}

Checking currently running instance of 2.1.4 Airflow, I see that corresponding line there is

[2022-09-22 12:28:54,476] {kubernetes_executor.py:531} INFO - Add task TaskInstanceKey(dag_id='test_dag', task_id='print_current_date', run_id='scheduled__2022-09-21T15:35:00+00:00', try_number=1) with command ['airflow', 'tasks', 'run', 'test_dag', 'print_current_date', 'scheduled__2022-09-21T15:35:00+00:00', '--local', '--subdir', '/airflow-workflow/dags/test_dag.py'] with executor_config {}

So, subdirs argument is resolved correctly, and test_dag runs fine.

I think this is a reason for a failure - that command to run test_dag doesn’t resolve “subdirs” parameter correctly, but I don’t know how to fix it:

  1. “dags_folder” is set correctly in web and scheduler airflow.cfg
  2. AIRFLOW_DAGS_DIR is set correctly in web and scheduler
  3. It’s difficult to get information from crashed worker pods, but they are built using same scripts, so I assume they are have the same settings.

What you think should happen instead

Test dag should be run by worker pod and print “hellow world” as it does in setup with Airflow 2.1.4

How to reproduce

No response

Operating System

Docker image uses:

PRETTY_NAME=“Debian GNU/Linux 10 (buster)” NAME=“Debian GNU/Linux” VERSION_ID=“10” VERSION=“10 (buster)” VERSION_CODENAME=buster ID=debian HOME_URL=“https://www.debian.org/” SUPPORT_URL=“https://www.debian.org/support” BUG_REPORT_URL=“https://bugs.debian.org/

Versions of Apache Airflow Providers

No response

Deployment

Other 3rd-party Helm chart

Deployment details

  • Use Bitnami images: ugrade from 2.1.4-debian-10-r0 to 2.2.0-debian-10-r0
  • Use Bitnami Helm chart 11.4.2 in both cases
  • Bitnami images are custom-built due to company requirement to have dags mounted from a separate volume. Dag directory is custom, but the same for 2.1.4 and 2.2.0. Scripts building images with custom dag folder is the same for 2.1.4 and 2.2.0

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
potiukcommented, Oct 7, 2022

Indeed, it seems a low priority and since it has an easy workaround, it’s not blocking anyone. Hopefully someone will take in on, investigate and fix. There are probably hundreds of other things that’s it’s more important to spend time on.

0reactions
potiukcommented, Nov 25, 2022

converting to discussion in case more is needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Release Notes — Airflow Documentation
New to this release of Airflow is the concept of Datasets to Airflow, and with it a new way of scheduling dags: data-aware...
Read more >
Upgrading to Airflow 2.0+
In Airflow 1.10.x, users could modify task pods at runtime by passing a dictionary to the executor_config variable. Users will now have full...
Read more >
Kubernetes Executor — Airflow Documentation - Apache Airflow
When a DAG submits a task, the KubernetesExecutor requests a worker pod from the Kubernetes API. The worker pod then runs the task,...
Read more >
Changelog — Airflow Documentation - Apache Airflow
Fix airflow dags backfill --reset-dagruns errors when run twice (#21062) ... Disclaimer in KubernetesExecutor pod template docs (#19686). Add upgrade note ...
Read more >
Configuration Reference — Airflow Documentation
Deprecated since version 2.2.0: The option has been moved to api. ... Number of Kubernetes Worker Pod creation calls per scheduler loop.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found