Unable to template target / location when mapped tasks are involved and deploying on a dask cluster
See original GitHub issueDescription
I am using a static dask cluster deployment setup - (following this deployment recipe)
I am on prefect v0.11.4 and using a k8s agent and I am trying to template the location or target for a mapped task.
Please see two example code snippets below:
Using target templating
template = 'prefect-testing/{task_name}/{filename}_{map_index}.prefect'
s3_result = S3Result(
bucket=os.environ["AWS_BUCKET"],
)
@task()
def gen_list():
return [x for x in range(10)]
@task(
target=template
)
def add(x, y):
return x + y
@task(
target=template
)
def multiply(x, y):
return x * y
with Flow(
flow_name,
environment=RemoteDaskEnvironment(address="tcp://dask-scheduler:8786"),
storage=Docker(
registry_url=registry_url,
image_name=image_name,
image_tag=image_tag,
python_dependencies=[
'boto3==1.13.14',
]
),
result=s3_result
) as flow:
x = gen_list()
y = gen_list()
added = add.map(x, y)
multiply.map(added, added)
I get the following error
Unexpected error while reading from S3: KeyError('filename')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/s3_result.py", line 166, in exists
self.client.get_object(Bucket=self.bucket, Key=location.format(**kwargs))
KeyError: 'filename'
Using Result.location templating
template = 'prefect-testing/{task_name}/{filename}_{map_index}.prefect'
s3_result = S3Result(
bucket=os.environ["AWS_BUCKET"],
location=template
)
@task
def gen_list():
return [x for x in range(10)]
@task
def add(x, y):
return x + y
@task
def multiply(x, y):
return x * y
with Flow(
flow_name,
environment=RemoteDaskEnvironment(address="tcp://dask-scheduler:8786"),
storage=Docker(
registry_url=registry_url,
image_name=image_name,
image_tag=image_tag,
python_dependencies=[
'boto3==1.13.14',
]
),
result=s3_result
) as flow:
x = gen_list()
y = gen_list()
added = add.map(x, y)
multiply.map(added, added)
I get the following error after multiply
is mapped - i.e. flow runs fine until it reaches multiply[0]
3 June 2020,04:40:57 prefect.S3Result DEBUG Starting to download result from prefect-testing/{task_name}/{filename}_{map_index}.prefect...
3 June 2020,04:40:57 prefect.S3Result ERROR Unexpected error while reading from result handler: ClientError('An error occurred (404) when calling the HeadObject operation: Not Found')
it fails to format the location before reading it because it says Starting to download result from prefect-testing/{task_name}/{filename}_{map_index}.prefect
- for some reason the location formatting is not invoked for the second mapped task
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (13 by maintainers)
Top Results From Across the Web
Deploy Dask Clusters - Dask documentation
This page describes various ways to set up Dask clusters on different hardware, either locally on your own machine or on a distributed...
Read more >Release Notes — Airflow Documentation
In order to support Dynamic Task Mapping the default templates for per-task instance logging has changed. If your config contains the old default...
Read more >How do I resolve cluster creation errors in Amazon EKS?
You receive an error message stating that resource creation failed. Complete the steps in the Confirm that you have the correct IAM permissions ......
Read more >Troubleshooting App Deployment Errors - Dash Python
error: failed to push some refs to git@<dash-enterprise>:<app-name>. $ git push plotly main [...] To <your-dash-enterprise-server>:< ...
Read more >Release UNKNOWN Iguazio - MLRun
learning lifecycle, from data ingestion, through model development to full pipeline deployment. MLRun is running as.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @cicdw - just tested it out - Result.location templating is working fine now! Closing this issue as both templating approaches are working. Thanks!
Yes - I don’t think authentication is the issue here because all previous tasks to multiply - i.e.
gen_list
andadd
’s results are being saved to S3 just fine