Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

max_tis_per_query=0 leads to nothing being scheduled in 2.0.0

See original GitHub issue

After upgrading to airflow 2.0.0 it seems as if the scheduler isn’t working anymore. Tasks hang on scheduled state, but no tasks get executed. I’ve tested this with sequential and celery executor. When using the celery executor no messages seem to arrive in RabbiyMq

This is on local docker. Everything was working fine before upgrading. There don’t seem to be any error messages, so I’m not completely sure if this is a bug or a misconfiguration on my end.

Using python:3.7-slim-stretch Docker image. Regular setup that we’re using is CeleryExecutor. Mysql version is 5.7

Any help would be greatly appreciated.

Python packages alembic==1.4.3 altair==4.1.0 amazon-kclpy==1.5.0 amqp==2.6.1 apache-airflow==2.0.0 apache-airflow-providers-amazon==1.0.0 apache-airflow-providers-celery==1.0.0 apache-airflow-providers-ftp==1.0.0 apache-airflow-providers-http==1.0.0 apache-airflow-providers-imap==1.0.0 apache-airflow-providers-jdbc==1.0.0 apache-airflow-providers-mysql==1.0.0 apache-airflow-providers-sqlite==1.0.0 apache-airflow-upgrade-check==1.1.0 apispec==3.3.2 appdirs==1.4.4 argcomplete==1.12.2 argon2-cffi==20.1.0 asn1crypto==1.4.0 async-generator==1.10 attrs==20.3.0 azure-common==1.1.26 azure-core==1.9.0 azure-storage-blob==12.6.0 Babel==2.9.0 backcall==0.2.0 bcrypt==3.2.0 billiard==3.6.3.0 black==20.8b1 bleach==3.2.1 boa-str==1.1.0 boto==2.49.0 boto3==1.7.3 botocore==1.10.84 cached-property==1.5.2 cattrs==1.1.2 cbsodata==1.3.3 celery==4.4.2 certifi==2020.12.5 cffi==1.14.4 chardet==3.0.4 click==7.1.2 clickclick==20.10.2 cmdstanpy==0.9.5 colorama==0.4.4 colorlog==4.0.2 commonmark==0.9.1 connexion==2.7.0 convertdate==2.3.0 coverage==4.2 croniter==0.3.36 cryptography==3.3.1 cycler==0.10.0 Cython==0.29.21 decorator==4.4.2 defusedxml==0.6.0 dill==0.3.3 dnspython==2.0.0 docutils==0.14 email-validator==1.1.2 entrypoints==0.3 ephem==3.7.7.1 et-xmlfile==1.0.1 fbprophet==0.7.1 fire==0.3.1 Flask==1.1.2 Flask-AppBuilder==3.1.1 Flask-Babel==1.0.0 Flask-Bcrypt==0.7.1 Flask-Caching==1.9.0 Flask-JWT-Extended==3.25.0 Flask-Login==0.4.1 Flask-OpenID==1.2.5 Flask-SQLAlchemy==2.4.4 flask-swagger==0.2.13 Flask-WTF==0.14.3 flatten-json==0.1.7 flower==0.9.5 funcsigs==1.0.2 future==0.18.2 graphviz==0.15 great-expectations==0.13.2 gunicorn==19.10.0 holidays==0.10.4 humanize==3.2.0 idna==2.10 importlib-metadata==1.7.0 importlib-resources==1.5.0 inflection==0.5.1 ipykernel==5.4.2 ipython==7.19.0 ipython-genutils==0.2.0 ipywidgets==7.5.1 iso8601==0.1.13 isodate==0.6.0 itsdangerous==1.1.0 JayDeBeApi==1.2.3 jdcal==1.4.1 jedi==0.17.2 jellyfish==0.8.2 Jinja2==2.11.2 jmespath==0.10.0 joblib==1.0.0 JPype1==1.2.0 json-merge-patch==0.2 jsonpatch==1.28 jsonpointer==2.0 jsonschema==3.2.0 jupyter-client==6.1.7 jupyter-core==4.7.0 jupyterlab-pygments==0.1.2 kinesis-events==0.1.0 kiwisolver==1.3.1 kombu==4.6.11 korean-lunar-calendar==0.2.1 lazy-object-proxy==1.4.3 lockfile==0.12.2 LunarCalendar==0.0.9 Mako==1.1.3 Markdown==3.3.3 MarkupSafe==1.1.1 marshmallow==3.10.0 marshmallow-enum==1.5.1 marshmallow-oneofschema==2.0.1 marshmallow-sqlalchemy==0.23.1 matplotlib==3.3.3 mistune==0.8.4 mock==1.0.1 mockito==1.2.2 msrest==0.6.19 mypy-extensions==0.4.3 mysql-connector-python==8.0.18 mysqlclient==2.0.2 natsort==7.1.0 nbclient==0.5.1 nbconvert==6.0.7 nbformat==5.0.8 nest-asyncio==1.4.3 nose==1.3.7 notebook==6.1.5 numpy==1.19.4 oauthlib==3.1.0 openapi-spec-validator==0.2.9 openpyxl==3.0.5 oscrypto==1.2.1 packaging==20.8 pandas==1.1.5 pandocfilters==1.4.3 parso==0.7.1 pathspec==0.8.1 pendulum==2.1.2 pexpect==4.8.0 phonenumbers==8.12.15 pickleshare==0.7.5 Pillow==8.0.1 prison==0.1.3 prometheus-client==0.8.0 prompt-toolkit==3.0.8 protobuf==3.14.0 psutil==5.8.0 ptyprocess==0.6.0 pyarrow==2.0.0 pycodestyle==2.6.0 pycparser==2.20 pycryptodomex==3.9.9 pydevd-pycharm==193.5233.109 Pygments==2.7.3 PyJWT==1.7.1 PyMeeus==0.3.7 pyodbc==4.0.30 pyOpenSSL==19.1.0 pyparsing==2.4.7 pyrsistent==0.17.3 pystan==2.19.1.1 python-crontab==2.5.1 python-daemon==2.2.4 python-dateutil==2.8.1 python-editor==1.0.4 python-nvd3==0.15.0 python-slugify==4.0.1 python3-openid==3.2.0 pytz==2019.3 pytzdata==2020.1 PyYAML==5.3.1 pyzmq==20.0.0 recordlinkage==0.14 regex==2020.11.13 requests==2.23.0 requests-oauthlib==1.3.0 rich==9.2.0 ruamel.yaml==0.16.12 ruamel.yaml.clib==0.2.2 s3transfer==0.1.13 scikit-learn==0.23.2 scipy==1.5.4 scriptinep3==0.3.1 Send2Trash==1.5.0 setproctitle==1.2.1 setuptools-git==1.2 shelljob==0.5.6 six==1.15.0 sklearn==0.0 snowflake-connector-python==2.3.7 snowflake-sqlalchemy==1.2.4 SQLAlchemy==1.3.22 SQLAlchemy-JSONField==1.0.0 SQLAlchemy-Utils==0.36.8 swagger-ui-bundle==0.0.8 tabulate==0.8.7 TagValidator==0.0.8 tenacity==6.2.0 termcolor==1.1.0 terminado==0.9.1 testpath==0.4.4 text-unidecode==1.3 threadpoolctl==2.1.0 thrift==0.13.0 toml==0.10.2 toolz==0.11.1 tornado==6.1 tqdm==4.54.1 traitlets==5.0.5 typed-ast==1.4.1 typing-extensions==3.7.4.3 tzlocal==1.5.1 unicodecsv==0.14.1 urllib3==1.24.2 validate-email==1.3 vine==1.3.0 watchtower==0.7.3 wcwidth==0.2.5 webencodings==0.5.1 Werkzeug==1.0.1 widgetsnbextension==3.5.1 wrapt==1.12.1 WTForms==2.3.1 xlrd==2.0.1 XlsxWriter==1.3.7 zipp==3.4.0

Relevant config

# The folder where your airflow pipelines live, most likely a
# subfolder in a code repositories
# This path must be absolute
dags_folder = /usr/local/airflow/dags

# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor
executor = CeleryExecutor

# The SqlAlchemy connection string to the metadata database.
# SqlAlchemy supports many different database engine, more information
# their website
sql_alchemy_conn = db+mysql://airflow:airflow@postgres/airflow

# The SqlAlchemy pool size is the maximum number of database connections
# in the pool.
sql_alchemy_pool_size = 5

# The SqlAlchemy pool recycle is the number of seconds a connection
# can be idle in the pool before it is invalidated. This config does
# not apply to sqlite.
sql_alchemy_pool_recycle = 3600

# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32

# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 16

# Are DAGs paused by default at creation
dags_are_paused_at_creation = True

# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 128

# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 16

# How long before timing out a python file import while filling the DagBag
dagbag_import_timeout = 60

# The class to use for running task instances in a subprocess
task_runner = StandardTaskRunner

# Whether to enable pickling for xcom (note that this is insecure and allows for
# RCE exploits). This will be deprecated in Airflow 2.0 (be forced to False).
enable_xcom_pickling = True

# When a task is killed forcefully, this is the amount of time in seconds that
# it has to cleanup after it is sent a SIGTERM, before it is SIGKILLED
killed_task_cleanup_time = 60

#  This flag decides whether to serialise DAGs and persist them in DB. If set to True, Webserver reads from DB instead of parsing DAG files
store_dag_code = True

# You can also update the following default configurations based on your needs
min_serialized_dag_update_interval = 30
min_serialized_dag_fetch_interval = 10

[celery]
# This section only applies if you are using the CeleryExecutor in
# [core] section above

# The app name that will be used by celery
celery_app_name = airflow.executors.celery_executor

# The concurrency that will be used when starting workers with the
# "airflow worker" command. This defines the number of task instances that
# a worker will take, so size up your workers based on the resources on
# your worker box and the nature of your tasks
worker_concurrency = 16

# When you start an airflow worker, airflow starts a tiny web server
# subprocess to serve the workers local log files to the airflow main
# web server, who then builds pages and sends them to users. This defines
# the port on which the logs are served. It needs to be unused, and open
# visible from the main web server to connect into the workers.
worker_log_server_port = 8793

# The Celery broker URL. Celery supports RabbitMQ, Redis and experimentally
# a sqlalchemy database. Refer to the Celery documentation for more
# information.
broker_url = amqp://amqp:5672/1

# Another key Celery setting
result_backend = db+mysql://airflow:airflow@postgres/airflow

# Celery Flower is a sweet UI for Celery. Airflow has a shortcut to start
# it `airflow flower`. This defines the IP that Celery Flower runs on
flower_host = 0.0.0.0

# This defines the port that Celery Flower runs on
flower_port = 5555

# Default queue that tasks get assigned to and that worker listen on.
default_queue = airflow

# Import path for celery configuration options
celery_config_options = airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG

# No SSL
ssl_active = False

[scheduler]
# Task instances listen for external kill signal (when you clear tasks
# from the CLI or the UI), this defines the frequency at which they should
# listen (in seconds).
job_heartbeat_sec = 5

# The scheduler constantly tries to trigger new tasks (look at the
# scheduler section in the docs for more information). This defines
# how often the scheduler should run (in seconds).
scheduler_heartbeat_sec = 5

# after how much time should the scheduler terminate in seconds
# -1 indicates to run continuously (see also num_runs)
run_duration = -1

# after how much time a new DAGs should be picked up from the filesystem
min_file_process_interval = 60

use_row_level_locking=False

dag_dir_list_interval = 300

# How often should stats be printed to the logs
print_stats_interval = 30

child_process_log_directory = /usr/local/airflow/logs/scheduler

# Local task jobs periodically heartbeat to the DB. If the job has
# not heartbeat in this many seconds, the scheduler will mark the
# associated task instance as failed and will re-schedule the task.
scheduler_zombie_task_threshold = 300

# Turn off scheduler catchup by setting this to False.
# Default behavior is unchanged and
# Command Line Backfills still work, but the scheduler
# will not do scheduler catchup if this is False,
# however it can be set on a per DAG basis in the
# DAG definition (catchup)
catchup_by_default = True

# This changes the batch size of queries in the scheduling main loop.
# This depends on query length limits and how long you are willing to hold locks.
# 0 for no limit
max_tis_per_query = 0

# The scheduler can run multiple threads in parallel to schedule dags.
# This defines how many threads will run.
parsing_processes = 4

authenticate = False

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:21 (11 by maintainers)

Top GitHub Comments

1reaction

ashbcommented, Jan 6, 2021

@bouke-nederstigt Okay, I’ve found the problem. max_tis_per_query=0 in the config is broken. A quick work-around for you for now is to set that to a large value (say 512).

We’ll fix it so 0 works as documented in 2.0.1 – this was a bug in Airflow (it turns out we don’t have any tests that set it to 0)

1reaction

bouke-nederstigtcommented, Jan 5, 2021

The zip should contain everything to reproduce the issue.

airflow_scheduler_issue.zip

From directory rabbitmq run docker-compose up
From directory airflow run docker-compose up
Create a user account and turn on the dag knmi_weather.
You should see the described behaviour where tasks get scheduled, but never reach a running state.

Webserver should be running on localhost:8080.

Let me know if you run into any issues with the docker files. We tested running the containers on a multiple computers, but you never know with these things.