question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Example tutorial_taskflow_api_etl crashes w/ AttributeError: Can't pickle local object

See original GitHub issue

Apache Airflow version: from the docker-compose file in the tutorial, image apache/airflow:master-python3.8 and apache/airflow:master-python3.7

Kubernetes version (if you are using kubernetes) (use kubectl version): n/a

Environment:

  • Cloud provider or hardware configuration: Bare metal server in lab, AMD threadripper, 64GB ram
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.5 LTS BIONIC VM running atop KVM
  • Kernel (e.g. uname -a): Linux zxdev01 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: docker-compose as provided in airflow tutorial w/ mods as above
  • Others: set up per the tutorial here

What happened:

Running a version of the included example tutorial dag taskflow_api_etl that is set to be on a schedule causes a crash

docker exec -ti airflow_airflow-webserver_1 airflow dags backfill test_tutorial_taskflow_api_etl --start-date <any date> --end-date <any date>

The following error is received:

AttributeError: Can't pickle local object 'test_tutorial_taskflow_api_etl.<locals>.transform'

Modifications to this dag, including removing the return values from each function and the parameters in each, yield identical results. Running a single day backfill does not cause this crash. Deleting the dag and letting it re-import does not help.

What you expected to happen: The dag should complete the backfill for all days. I removed the return statement in all functions, and removed the parameters in each of the function definitions thinking this had to do with the passing along of return results but this did not help.

I tested with multiple workers and a single worker as well as with python 3.7. The same result was achieved in all cases.

How to reproduce it:

  1. Install an ubuntu VM, versions as specified, 8GB ram, 4 cores
  2. Install Docker CE from the docker repositories per their installation directions
  3. Download the Airflow docker-compose image, perform the tutorial set up tasks, modify only the airflow image as noted in the versions above (this is a different bug maybe, default causes a crash).
  4. docker-compose up -d
  5. Copy the code in tutorial_taskflow_api_etl, modify to set the schedule_interval to daily, save into ./dags/test.py and let airflow discover it.
  6. execute docker exec -ti airflow_airflow-webserver_1 airflow dags backfill test_tutorial_taskflow_api_etl --start-date <any date> --end-date <any date>

Anything else we need to know:

At least two other dag examples appear to work just fine when executed this way. Neither are using the Taskflow API. Maybe that is a contributing factor.

This crash occurs consistently at every run as described.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
josh-fellcommented, Jun 1, 2022

FWIW, I’m not able to reproduce this on main with Breeze, 2.0.0 in Breeze, nor 2.3.0 with Astro CLI. I used the tutorial_taskflow_api_etl example DAG and changed schedule_interval="@daily".

Breeze (Airflow 2.0.0) Commands: breeze exec, airflow dags backfill test_tutorial_taskflow_api_etl --start-date 2022-05-01 --end-date 2022-05-05 image

Breeze (main) Commands: breeze exec, airflow dags backfill tutorial_taskflow_api_etl -s 2022-05-01 -e 2022-05-05 image

Astro CLI (Astro Runtime 5.0.0, Airflow 2.3.0) Command (directly on webserver container): docker exec -it 2-3-0_d06ac9-webserver-1 airflow dags backfill tutorial_taskflow_api_etl -s 2022-05-01 -e 2022-05-05 image Command (using built-in Astro CLI command): astrocloud dev run dags backfill tutorial_taskflow_api_etl -s 2022-05-06 -e 2022-05-10 image

@gschwim Does this pickling error still occur on Airflow 2.3.0 with your environment setup? Is the only modification made to the example DAG was the schedule_interval initially? Does setting the --donot-pickle flag alleviate the issue? I guess it’s obvious to everyone else but what executor were you using?

0reactions
github-actions[bot]commented, Jul 10, 2022

This issue has been closed because it has not received response from the issue author.

Read more comments on GitHub >

github_iconTop Results From Across the Web

AttributeError: Can't pickle local object in Multiprocessing
Basically, the reason you are getting this error is because multiprocessing uses pickle, which can only serialize top-module level functions ...
Read more >
Tutorial on the TaskFlow API - Apache Airflow
We are creating a DAG which is the collection of our tasks with dependencies between the tasks. This is a very simple definition,...
Read more >
AttributeError: Can't pickle local object-docker - appsloveworld
Coding example for the question Python multiprocessing: AttributeError: Can't pickle local object-docker.
Read more >
[GitHub] [airflow] Dr-Denzy commented on issue #14162: Example ...
[GitHub] [airflow] Dr-Denzy commented on issue #14162: Example tutorial_taskflow_api_etl crashes w/ AttributeError: Can't pickle local object.
Read more >
TaskFlow API in Airflow 2.0 - YouTube
Airflow 2.0 brought with it many great new features, one of which is the TaskFlow API. The TaskFlow API makes DAGs easier to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found