question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`airflow db upgrade` Failed to write serialized DAG

See original GitHub issue

Apache Airflow version

2.4.1

What happened

Running airflow db upgrade on an Airflow installation with 100 DAGs fails with this error:

ERROR [airflow.models.dagbag.DagBag] Failed to write serialized DAG: /usr/local/airflow/dags/REDACTED.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/dagbag.py", line 615, in _serialize_dag_capturing_errors
    dag_was_updated = SerializedDagModel.write_dag(
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/serialized_dag.py", line 146, in write_dag
    session.query(literal(True))
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2810, in first
    return self.limit(1)._iter().first()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1688, in execute
    conn = self._connection_for_bind(bind)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1529, in _connection_for_bind
    return self._transaction._connection_for_bind(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 721, in _connection_for_bind
    self._assert_active()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "serialized_dag_pkey"
DETAIL:  Key (dag_id)=(REDACTED) already exists.

[SQL: INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, data_compressed, last_updated, dag_hash, ...

What you think should happen instead

airflow db upgrade should successfully reserialize DAGs at the end of the upgrade just like the airflow dags reserialize command.

How to reproduce

  1. Upgrade to airflow 2.4.1 on an existing codebase
  2. Run airflow db upgrade

Operating System

Debian GNU/Linux 10 (buster)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==5.1.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.3.0
apache-airflow-providers-common-sql==1.2.0
apache-airflow-providers-datadog==3.0.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-postgres==5.2.1
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-sftp==4.0.0
apache-airflow-providers-slack==5.1.0
apache-airflow-providers-sqlite==3.2.1
apache-airflow-providers-ssh==3.1.0

Deployment

Other Docker-based deployment

Deployment details

k8s deployment

Anything else

Fails consistently in these two scenarios:

  1. Run db upgrade only:

     airflow db upgrade
    
  2. Run along with reserialize

     airflow dags reserialize --clear-only
     airflow db upgrade
    

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:17 (13 by maintainers)

github_iconTop GitHub Comments

2reactions
DMilmontcommented, Nov 7, 2022

Today we upgraded to airflow 2.4.2 - we did not notice this issue during the migration this time around.

We are using the official helm chart, so the migration occurred on deploy via the migration job. We run 900 + dags currently

When we upgraded to airflow 2.4.1 the migration took >20 minutes. After upgrading to airflow 2.4.2 the migration took < 2 minutes.

If there are other data points needed I am happy to help provide some.

1reaction
potiukcommented, Oct 6, 2022

(I think even -t 0 is not needed)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow 2.0 - Scheduler is unable to find serialized DAG in the ...
As a quick fix, you can run airflow DB init once again, I think it fixes the DB state and the scheduler starts...
Read more >
DAG Serialization — Airflow Documentation
Without DAG Serialization & persistence in DB, the Webserver and the Scheduler both need access to the DAG files. Both the Scheduler and...
Read more >
Upgrading Airflow to a newer version - Apache Airflow
Newer Airflow versions can contain database migrations so you must run airflow db upgrade to upgrade your database with the schema changes in...
Read more >
airflow.models.serialized_dag — Airflow Documentation
A table for serialized DAGs. serialized_dag table is a snapshot of DAG files synchronized by scheduler. This feature is controlled by: ... It...
Read more >
Upgrading from 1.10 to 2 - Apache Airflow
The reason to pause DAGs is to make sure that nothing is actively being written to the database during the database upgrade which...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found