MySQL deadlock when using DAG serialization
See original GitHub issueApache Airflow version: 1.10.10 Kubernetes version: v1.16.8 MySQL version: 5.7
What happened:
Airflow tasks fail with Deadlock when running Dag with max_active_runs
> 1 and concurrency
> 1 and when dag_serialization
is enabled.
Logs
[2020-04-22 19:19:49,018] {taskinstance.py:1145} ERROR - (_mysql_exceptions.OperationalError) (1205, ‘Lock wait timeout exceeded; try restarting transaction’) [SQL: INSERT INTO rendered_task_instance_fields (dag_id, task_id, execution_date, rendered_fields) VALUES (%s, %s, %s, %s)] [parameters: (‘some_dag_v.0.0.1’, ‘some_task_id’, datetime.datetime(2019, 12, 2, 0, 0), ‘Some rendered fields (837 characters truncated)’)]
(Background on this error at: http://sqlalche.me/e/e3q8) Traceback (most recent call last): File “/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py”, line 1248, in _execute_context cursor, statement, parameters, context File “/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py”, line 590, in do_execute cursor.execute(statement, parameters) File “/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py”, line 255, in execute self.errorhandler(self, exc, value) File “/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py”, line 50, in defaulterrorhandler raise errorvalue File “/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py”, line 252, in execute res = self._query(query) File “/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py”, line 378, in _query db.query(q) File “/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py”, line 280, in query _mysql.connection.query(self, query) _mysql_exceptions.OperationalError: (1205, ‘Lock wait timeout exceeded; try restarting transaction’)
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File “/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py”, line 1002, in _run_raw_task self.refresh_from_db(lock_for_update=True) File “/usr/local/lib/python3.7/site-packages/airflow/utils/db.py”, line 74, in wrapper return func(*args, **kwargs) File “/usr/local/lib/python3.7/contextlib.py”, line 119, in exit next(self.gen) File “/usr/local/lib/python3.7/site-packages/airflow/utils/db.py”, line 45, in create_session session.commit() File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py”, line 1036, in commit self.transaction.commit() File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py”, line 503, in commit self._prepare_impl() File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py”, line 482, in _prepare_impl self.session.flush() File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py”, line 2496, in flush self._flush(objects) File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py”, line 2637, in _flush transaction.rollback(capture_exception=True) File “/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py”, line 69, in exit exc_value, with_traceback=exc_tb, File “/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py”, line 178, in raise raise exception File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py”, line 2597, in _flush flush_context.execute() File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py”, line 422, in execute rec.execute(self) File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py”, line 589, in execute uow, File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py”, line 245, in save_obj insert, File “/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py”, line 1083, in _emit_insert_statements c = cached_connections[connection].execute(statement, multiparams) File “/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py”, line 984, in execute return meth(self, multiparams, params) File “/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py”, line 293, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File “/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py”, line 1103, in _execute_clauseelement distilled_params, File “/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py”, line 1288, in execute_context e, statement, parameters, cursor, context File “/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py”, line 1482, in handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from=e File “/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py”, line 178, in raise raise exception File “/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py”, line 1248, in _execute_context cursor, statement, parameters, context File “/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py”, line 590, in do_execute cursor.execute(statement, parameters) File “/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py”, line 255, in execute self.errorhandler(self, exc, value) File “/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py”, line 50, in defaulterrorhandler raise errorvalue File “/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py”, line 252, in execute res = self._query(query) File “/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py”, line 378, in _query db.query(q) File “/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py”, line 280, in query _mysql.connection.query(self, query)
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (2 by maintainers)
Top GitHub Comments
Hi, I have the same issue.
I was looking in
models/renderedtifields.py
file and I noticed thatdef delete_old_records(
contains a line that loads the number or rendered fields to keep:
num_to_keep=conf.getint("core", "max_num_rendered_ti_fields_per_task", fallback=0)
and if this value is <= 0 the function will return doing nothing.
Since the dead lock is about the insert and the delete in that table, setting max_num_rendered_ti_fields_per_task = 0 inside the [core] config … perhaps can fix the issue.
Of course it does not work.
Using
SHOW ENGINE INNODB STATUS
I see queries like this:-----> Please note LIMIT 30
I found this code inside
models/taskinstance.py
and it’s the unique place where delete_old_records is called, so … it is weird, is it not? from which point of the universe comes that “30”?
I’ll investigate better tomorrow…
max_num_rendered_ti_fields_per_task = 0
seems that fixed my problems. Of course can only be a temporary fix. I moved the table cleaning to external task.