question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DAG serialization JSONDecodeError

See original GitHub issue

Apache Airflow version: 1.10.12

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others: MySQL (RDS) metadata backend (v5.6.43)

What happened:

We recently turned on DAG serialization and noticed that when we tried to click on large DAGs in the UI, we get an error:

Traceback
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.7/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 121, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/flask_appbuilder/security/decorators.py", line 109, in wraps
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 92, in view_func
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 56, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/views.py", line 1407, in tree
    dag = dagbag.get_dag(dag_id)
  File "/usr/local/lib/python3.7/dist-packages/airflow/models/dagbag.py", line 136, in get_dag
    self._add_dag_from_db(dag_id=dag_id)
  File "/usr/local/lib/python3.7/dist-packages/airflow/models/dagbag.py", line 191, in _add_dag_from_db
    row = SerializedDagModel.get(dag_id)
  File "/usr/local/lib/python3.7/dist-packages/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/airflow/models/serialized_dag.py", line 217, in get
    row = session.query(cls).filter(cls.dag_id == dag_id).one_or_none()
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/query.py", line 3459, in one_or_none
    ret = list(self)
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 100, in instances
    cursor.close()
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    with_traceback=exc_tb,
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 80, in instances
    rows = [proc(row) for row in fetch]
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 80, in <listcomp>
    rows = [proc(row) for row in fetch]
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 588, in _instance
    populators,
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 725, in _populate_full
    dict_[key] = getter(row)
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/type_api.py", line 1278, in process
    return process_value(impl_processor(value), dialect)
  File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/sqltypes.py", line 2454, in process
    return json_deserializer(value)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 16275 (char 16274)

We’ve determined the issue is that in the serialized_dag table with MySQL, the data column type is TEXT, which has a max of 64KB, but some of our DAG code is larger than that. We were able to get around this by running the following manually on the serialized_dag table then waiting for the table to get re-updated:

CREATE TABLE serialized_dag_backup AS SELECT * FROM serialized_dag;

ALTER TABLE serialized_dag MODIFY data MEDIUMTEXT;

SELECT * FROM serialized_dag
WHERE LENGTH(data) = 65535;

DELETE FROM serialized_dag
WHERE LENGTH(data) = 65535; 

What you expected to happen:

Should be able to click on the DAG in the UI without error

How to reproduce it: With a MySQL metadata backend, create a DAG with code that is larger than 64KB and enable DAG serialization. Then attempt to click on that DAG in the UI.

Anything else we need to know: N/A

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
kaxilcommented, Nov 20, 2020

Cool, yeah with Airflow 2.0 around the corner, I think it would be worth for you to at least upgrade to 5.7 and even better 8.0 if you want to run multiple Schedulers in 2.0 😉

0reactions
ahuynh3commented, Nov 20, 2020

Ah nice catch – seems like we can resolve this issue fairly easily by upgrading our MySQL version. Feel free to close this issue out if you don’t think it’s worth fixing for 5.6.x. Thanks again!

Read more comments on GitHub >

github_iconTop Results From Across the Web

DAG Serialization — Airflow Documentation
The Webserver now instead of having to parse the DAG files again, reads the serialized DAGs in JSON, de-serializes them and creates the...
Read more >
[GitHub] [airflow] boring-cyborg[bot] commented on issue #12515 ...
[GitHub] [airflow] boring-cyborg[bot] commented on issue #12515: DAG serialization JSONDecodeError · Previous message · View by thread · View by date · Next...
Read more >
elasticsearch exception SerializationError - python 2.7
The problem is resolved by using Bulk indexing method ,when we are indexing to local server it won't be a matter if we...
Read more >
DAG Serialization - Apache Airflow Documentation
As shown in the image above, when using the this feature, the Scheduler parses the DAG files, serializes them in JSON format and...
Read more >
great_expectations Documentation
We aim to integrate seamlessly with DAG execution tools like Spark, Airflow, dbt, ... Expectation Suites can be serialized as JSON objects, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found