TriggerDagOperator sometimes causes database to trigger UniqueViolation constraint
See original GitHub issueApache Airflow version: 1.10.10
Kubernetes version (if you are using kubernetes) (use kubectl version
):
Client Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.2”, GitCommit:“59603c6e503c87169aea6106f57b9f242f64df89”, GitTreeState:“clean”, BuildDate:“2020-01-23T14:21:54Z”, GoVersion:“go1.13.6”, Compiler:“gc”, Platform:“darwin/amd64”}
Server Version: version.Info{Major:“1”, Minor:“14+”, GitVersion:“v1.14.9-eks-502bfb”, GitCommit:“502bfb383169b124d87848f89e17a04b9fc1f6f0”, GitTreeState:“clean”, BuildDate:“2020-02-07T01:31:02Z”, GoVersion:“go1.12.12”, Compiler:“gc”, Platform:“linux/amd64”}
Environment: EKS - KubernetesExecutor
- Cloud provider or hardware configuration: AWS EKS
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools: Docker image
- Others:
What happened:
Sometimes the TriggerDagOperator will fail with the following error, when, in fact, it was successful in triggering the target DAG.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1228, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 857, in do_executemany
cursor.executemany(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "task_instance_pkey"
DETAIL: Key (task_id, dag_id, execution_date)=(<task_id>, <dag_id>, 2020-04-20 17:00:00+00) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute
replace_microseconds=False)
File "/usr/local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag
external_trigger=True,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/models/dag.py", line 1469, in create_dagrun
run.verify_integrity(session=session)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 70, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/models/dagrun.py", line 400, in verify_integrity
session.commit()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1036, in commit
self.transaction.commit()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 503, in commit
self._prepare_impl()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 482, in _prepare_impl
self.session.flush()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2496, in flush
self._flush(objects)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2637, in _flush
transaction.rollback(_capture_exception=True)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__
exc_value, with_traceback=exc_tb,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2597, in _flush
flush_context.execute()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1083, in _emit_insert_statements
c = cached_connections[connection].execute(statement, multiparams)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 984, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement
distilled_params,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1228, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 857, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "task_instance_pkey"
DETAIL: Key (task_id, dag_id, execution_date)=(<task_id>, <dag_id>, 2020-04-20 17:00:00+00) already exists.
What you expected to happen:
It seems like this could by some kind of race condition or perhaps an issue with rescheduling? I expect this to not happen or for there to be a check before attempting to triggering the DAG so as to not have a false positive.
How to reproduce it:
Really not sure how to reproduce, its a pretty strange bug.
Anything else we need to know:
It happens maybe 3-4 times a month.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:14 (10 by maintainers)
Top GitHub Comments
There has been no fix of this, so I expect it is the same in 2.x, although I don’t have a MWE to check.
trigger_dag()
calls, running a dag in the same microsecond causes an error.Also finding this, apparently due to triggering same dag in same second (https://issues.apache.org/jira/browse/AIRFLOW-699 - apparently resolved?)
reproduces for me on
1.10.10
, with postgres12airflow
database.Being able to trigger the same dag multiple times a seconds is a deal-breaker - happy to help out if pointed in the right direction for this one.
Edit
As an update, it looks like
start_date
andexecution_date
are defined identically in the tabledag_run
https://github.com/apache/airflow/blob/dd9f04e152997b7cff56920cb73c1e5b710a6f9d/airflow/models/dagrun.py#L42 and yet in this table I findstart_date
with microsecond precision, andexecution_date
only at second precision:Edit 2
Looks like this is the offending line for me: https://github.com/apache/airflow/blob/dd9f04e152997b7cff56920cb73c1e5b710a6f9d/airflow/api/common/experimental/trigger_dag.py#L110 I guess rather than calling via the client I can call this function directly to change it to
replace_microseconds=False
. Hopefully you can find a solution when usingTriggerDagOperator
, presumably underlying it is just a call to this function.