execute_pipeline migration (dagster 0.13.4)
See original GitHub issueSummary
The execute_pipeline method is not working on Dagster 0.13.2 when trying to run an op on a Databricks cluster.
Reproduction
Consider the following code sample, defining op “test_op”.
prod_adls2 = {
"pyspark_step_launcher": databricks_pyspark_step_launcher,
"pyspark": pyspark_resource,
"adls2": adls2_resource,
'io_manager': adls2_delta_io_manager,
"adls_csv_loader" : adls_csv_loader,
"database": adls_delta_resource,
}
@op(required_resource_keys={"pyspark_step_launcher", "pyspark"})
def test_op(context):
context.log.debug('Op started')
On previous Dagster versions (at least until 0.12.2), the way to execute this op (solid) would be to wrap it in a pipeline and use execute_pipeline+reconstructable.
execute_job(reconstructable(test_pipeline), mode="prod_adls2", run_config=run_config)
Nowadays (dagster 0.13.2), we can either define a job or a graph and then use the to_job method.
I tried the following approaches, which produced the corresponding errors.
@job(resource_defs=prod_adls2)
def test_job():
test_op()
test_job.execute_in_process(run_config=config)
Error:
dagster.check.ParameterCheckError: Param “recon_pipeline” is not a ReconstructablePipeline. Got <dagster.core.definitions.pipeline_base.InMemoryPipeline object at 0x7f2f43fe10a0> which is type <class ‘dagster.core.definitions.pipeline_base.InMemoryPipeline’>
reconstructable(test_job).execute_in_process(run_config=config)
or execute_pipeline(reconstructable(test_job), run_config=config)
Error:
dagster.core.errors.DagsterInvariantViolationError: Reconstructable target was not a function returning a job definition, or a job definition produced by a decorated function. If your job was constructed using
GraphDefinition.to_job
, you must wrap theto_job
call in a function at module scope, ie not within any other functions. To learn more, check out the docs onreconstructable
: https://docs.dagster.io/_apidocs/execution#dagster.reconstructable
reconstructable(make_test_job).execute_in_process(run_config=config)
Error:
AttributeError: ‘ReconstructablePipeline’ object has no attribute ‘execute_in_process’
execute_pipeline(reconstructable(make_test_job), run_config=config)
Error:
dagster.core.errors.DagsterUnmetExecutorRequirementsError: You have attempted to use an executor that uses multiple processes with an ephemeral DagsterInstance. A non-ephemeral instance is needed to coordinate execution between multiple processes. You can configure your default instance via $DAGSTER_HOME or ensure a valid one is passed when invoking the python APIs. You can learn more about setting up a persistent DagsterInstance from the DagsterInstance docs here: https://docs.dagster.io/deployment/dagster-instance#default-local-behavior
Am I missing the point or was execute_pipeline “mismigrated”?
Thanks!
Dagit UI/UX
Oddly enough, dagit does manage to run the job.
Environment
Python 3.9.5 dagster 0.13.2
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
@bernardocortez yes, this has gone out!
@bernardocortez thanks so much for posting the example. After further examination, this case is indeed a bug. Was able to put up this fix, which should go in by next release. Thanks again for surfacing!