question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

emr_pyspark_step_launcher fails with packaged airflow dags

See original GitHub issue

When the EmrPysparkStepLauncher(deploy_local_pipeline_package=True) is used in conjunction with dagster-airflow where the dag is a packaged dag (i.e. in a zip file), emr_pyspark_step_launcher post_artifacts will zip up the already zipped dag.

https://github.com/dagster-io/dagster/blob/439022652453b770e8f7ce92b6ab2d1c6766278b/python_modules/libraries/dagster-aws/dagster_aws/emr/pyspark_step_launcher.py#L179-L183

This results in an EMR artifact code.zip containing {packaged_dag}.zip which cannot be imported in pyspark EMR-land.

The step_run_ref that gets pickled and executed from EMR also references an import of {packaged_dag}.zip.

https://github.com/dagster-io/dagster/blob/439022652453b770e8f7ce92b6ab2d1c6766278b/python_modules/libraries/dagster-aws/dagster_aws/emr/pyspark_step_launcher.py#L188

The only resolution I see at the moment is to unzip the packaged dag prior to post_artifacts.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
dpeng817commented, Mar 3, 2021

Hey! Thanks @Benyuel for bringing this to our attention. We don’t have the bandwidth to investigate a fix right now, but as we head into 0.12.0 planning in a month or so, will be happy to touch base and revisit if you are still running into this.

0reactions
catherinewucommented, Mar 3, 2021

bump @dpeng817?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Airflow giving broken DAG error cannot import ...
The import sequence of tasks in the dag file are as follows: from datetime import timedelta # The DAG object; we'll need this...
Read more >
1.1.7 (core) / 0.17.7 (libraries) - Dagster Docs
[dagstermill] Failed notebooks can be saved for inspection and debugging using the new save_on_notebook_failure parameter. [dagster-airflow] Added a new option ...
Read more >
DAGs, Operators, Connections, and other issues in Apache ...
The topics on this page contain errors and resolutions to Apache Airflow v1.10.12 Python dependencies, custom plugins, DAGs, Operators, Connections, tasks, ...
Read more >
D3433.id16352.diff
+ tools that allowed developers to write Dagster pipelines and then compile them into Airflow DAGs. + for execution. We've now added ingestion...
Read more >
Source code for airflow.providers.amazon.aws.example_dags ...
This is an example dag for a AWS EMR Pipeline with auto steps. """ from datetime import timedelta from airflow import DAG from ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found