`airflow db upgrade` Failed to write serialized DAG
See original GitHub issueApache Airflow version
2.4.1
What happened
Running airflow db upgrade
on an Airflow installation with 100 DAGs fails with this error:
ERROR [airflow.models.dagbag.DagBag] Failed to write serialized DAG: /usr/local/airflow/dags/REDACTED.py
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/airflow/models/dagbag.py", line 615, in _serialize_dag_capturing_errors
dag_was_updated = SerializedDagModel.write_dag(
File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 72, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/airflow/models/serialized_dag.py", line 146, in write_dag
session.query(literal(True))
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2810, in first
return self.limit(1)._iter().first()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
result = self.session.execute(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1688, in execute
conn = self._connection_for_bind(bind)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1529, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 721, in _connection_for_bind
self._assert_active()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "serialized_dag_pkey"
DETAIL: Key (dag_id)=(REDACTED) already exists.
[SQL: INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, data_compressed, last_updated, dag_hash, ...
What you think should happen instead
airflow db upgrade
should successfully reserialize DAGs at the end of the upgrade just like the airflow dags reserialize
command.
How to reproduce
- Upgrade to
airflow 2.4.1
on an existing codebase - Run
airflow db upgrade
Operating System
Debian GNU/Linux 10 (buster)
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==5.1.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.3.0
apache-airflow-providers-common-sql==1.2.0
apache-airflow-providers-datadog==3.0.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-postgres==5.2.1
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-sftp==4.0.0
apache-airflow-providers-slack==5.1.0
apache-airflow-providers-sqlite==3.2.1
apache-airflow-providers-ssh==3.1.0
Deployment
Other Docker-based deployment
Deployment details
k8s deployment
Anything else
Fails consistently in these two scenarios:
-
Run db upgrade only:
airflow db upgrade
-
Run along with reserialize
airflow dags reserialize --clear-only airflow db upgrade
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:17 (13 by maintainers)
Top Results From Across the Web
Airflow 2.0 - Scheduler is unable to find serialized DAG in the ...
As a quick fix, you can run airflow DB init once again, I think it fixes the DB state and the scheduler starts...
Read more >DAG Serialization — Airflow Documentation
Without DAG Serialization & persistence in DB, the Webserver and the Scheduler both need access to the DAG files. Both the Scheduler and...
Read more >Upgrading Airflow to a newer version - Apache Airflow
Newer Airflow versions can contain database migrations so you must run airflow db upgrade to upgrade your database with the schema changes in...
Read more >airflow.models.serialized_dag — Airflow Documentation
A table for serialized DAGs. serialized_dag table is a snapshot of DAG files synchronized by scheduler. This feature is controlled by: ... It...
Read more >Upgrading from 1.10 to 2 - Apache Airflow
The reason to pause DAGs is to make sure that nothing is actively being written to the database during the database upgrade which...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Today we upgraded to airflow 2.4.2 - we did not notice this issue during the migration this time around.
We are using the official helm chart, so the migration occurred on deploy via the migration job. We run 900 + dags currently
When we upgraded to airflow 2.4.1 the migration took >20 minutes. After upgrading to airflow 2.4.2 the migration took < 2 minutes.
If there are other data points needed I am happy to help provide some.
(I think even -t 0 is not needed)