Example tutorial_taskflow_api_etl crashes w/ AttributeError: Can't pickle local object
See original GitHub issueApache Airflow version: from the docker-compose file in the tutorial, image apache/airflow:master-python3.8 and apache/airflow:master-python3.7
Kubernetes version (if you are using kubernetes) (use kubectl version
): n/a
Environment:
- Cloud provider or hardware configuration: Bare metal server in lab, AMD threadripper, 64GB ram
- OS (e.g. from /etc/os-release): Ubuntu 18.04.5 LTS BIONIC VM running atop KVM
- Kernel (e.g.
uname -a
): Linux zxdev01 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux - Install tools: docker-compose as provided in airflow tutorial w/ mods as above
- Others: set up per the tutorial here
What happened:
Running a version of the included example tutorial dag taskflow_api_etl that is set to be on a schedule causes a crash
docker exec -ti airflow_airflow-webserver_1 airflow dags backfill test_tutorial_taskflow_api_etl --start-date <any date> --end-date <any date>
The following error is received:
AttributeError: Can't pickle local object 'test_tutorial_taskflow_api_etl.<locals>.transform'
Modifications to this dag, including removing the return values from each function and the parameters in each, yield identical results. Running a single day backfill does not cause this crash. Deleting the dag and letting it re-import does not help.
What you expected to happen: The dag should complete the backfill for all days. I removed the return statement in all functions, and removed the parameters in each of the function definitions thinking this had to do with the passing along of return results but this did not help.
I tested with multiple workers and a single worker as well as with python 3.7. The same result was achieved in all cases.
How to reproduce it:
- Install an ubuntu VM, versions as specified, 8GB ram, 4 cores
- Install Docker CE from the docker repositories per their installation directions
- Download the Airflow docker-compose image, perform the tutorial set up tasks, modify only the airflow image as noted in the versions above (this is a different bug maybe, default causes a crash).
docker-compose up -d
- Copy the code in tutorial_taskflow_api_etl, modify to set the schedule_interval to daily, save into ./dags/test.py and let airflow discover it.
- execute
docker exec -ti airflow_airflow-webserver_1 airflow dags backfill test_tutorial_taskflow_api_etl --start-date <any date> --end-date <any date>
Anything else we need to know:
At least two other dag examples appear to work just fine when executed this way. Neither are using the Taskflow API. Maybe that is a contributing factor.
This crash occurs consistently at every run as described.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
FWIW, I’m not able to reproduce this on
main
with Breeze, 2.0.0 in Breeze, nor 2.3.0 with Astro CLI. I used the tutorial_taskflow_api_etl example DAG and changedschedule_interval="@daily"
.Breeze (Airflow 2.0.0) Commands:
breeze exec
,airflow dags backfill test_tutorial_taskflow_api_etl --start-date 2022-05-01 --end-date 2022-05-05
Breeze (main) Commands:
breeze exec
,airflow dags backfill tutorial_taskflow_api_etl -s 2022-05-01 -e 2022-05-05
Astro CLI (Astro Runtime 5.0.0, Airflow 2.3.0) Command (directly on webserver container):
docker exec -it 2-3-0_d06ac9-webserver-1 airflow dags backfill tutorial_taskflow_api_etl -s 2022-05-01 -e 2022-05-05
Command (using built-in Astro CLI command):astrocloud dev run dags backfill tutorial_taskflow_api_etl -s 2022-05-06 -e 2022-05-10
@gschwim Does this pickling error still occur on Airflow 2.3.0 with your environment setup? Is the only modification made to the example DAG was the
schedule_interval
initially? Does setting the--donot-pickle
flag alleviate the issue? I guess it’s obvious to everyone else but what executor were you using?This issue has been closed because it has not received response from the issue author.