question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to template target / location when mapped tasks are involved and deploying on a dask cluster

See original GitHub issue

Description

I am using a static dask cluster deployment setup - (following this deployment recipe)

I am on prefect v0.11.4 and using a k8s agent and I am trying to template the location or target for a mapped task.

Please see two example code snippets below:

Using target templating

template = 'prefect-testing/{task_name}/{filename}_{map_index}.prefect'

s3_result = S3Result(
    bucket=os.environ["AWS_BUCKET"],
)

@task()
def gen_list():
    return [x for x in range(10)]


@task(
    target=template
)
def add(x, y):
    return x + y


@task(
    target=template
)
def multiply(x, y):
    return x * y

    with Flow(
        flow_name,
        environment=RemoteDaskEnvironment(address="tcp://dask-scheduler:8786"),
        storage=Docker(
            registry_url=registry_url,
            image_name=image_name,
            image_tag=image_tag,
            python_dependencies=[
                'boto3==1.13.14',
            ]
        ),
        result=s3_result
    ) as flow:
        x = gen_list()
        y = gen_list()
        added = add.map(x, y)
        multiply.map(added, added)

I get the following error

Unexpected error while reading from S3: KeyError('filename')
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/s3_result.py", line 166, in exists
    self.client.get_object(Bucket=self.bucket, Key=location.format(**kwargs))
KeyError: 'filename'

Using Result.location templating

template = 'prefect-testing/{task_name}/{filename}_{map_index}.prefect'

s3_result = S3Result(
    bucket=os.environ["AWS_BUCKET"],
    location=template
)

@task
def gen_list():
    return [x for x in range(10)]


@task
def add(x, y):
    return x + y


@task
def multiply(x, y):
    return x * y

    with Flow(
        flow_name,
        environment=RemoteDaskEnvironment(address="tcp://dask-scheduler:8786"),
        storage=Docker(
            registry_url=registry_url,
            image_name=image_name,
            image_tag=image_tag,
            python_dependencies=[
                'boto3==1.13.14',
            ]
        ),
        result=s3_result
    ) as flow:
        x = gen_list()
        y = gen_list()
        added = add.map(x, y)
        multiply.map(added, added)

I get the following error after multiply is mapped - i.e. flow runs fine until it reaches multiply[0]

3 June 2020,04:40:57 	prefect.S3Result	DEBUG	Starting to download result from prefect-testing/{task_name}/{filename}_{map_index}.prefect...
3 June 2020,04:40:57 	prefect.S3Result	ERROR	Unexpected error while reading from result handler: ClientError('An error occurred (404) when calling the HeadObject operation: Not Found')

it fails to format the location before reading it because it says Starting to download result from prefect-testing/{task_name}/{filename}_{map_index}.prefect - for some reason the location formatting is not invoked for the second mapped task

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
marwan116commented, Jun 4, 2020

Hi @cicdw - just tested it out - Result.location templating is working fine now! Closing this issue as both templating approaches are working. Thanks!

1reaction
marwan116commented, Jun 3, 2020

For the second issue, can you confirm that your dask workers have the appropriate configuration so they can authenticate with S3?

Yes - I don’t think authentication is the issue here because all previous tasks to multiply - i.e. gen_list and add’s results are being saved to S3 just fine

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deploy Dask Clusters - Dask documentation
This page describes various ways to set up Dask clusters on different hardware, either locally on your own machine or on a distributed...
Read more >
Release Notes — Airflow Documentation
In order to support Dynamic Task Mapping the default templates for per-task instance logging has changed. If your config contains the old default...
Read more >
How do I resolve cluster creation errors in Amazon EKS?
You receive an error message stating that resource creation failed. Complete the steps in the Confirm that you have the correct IAM permissions ......
Read more >
Troubleshooting App Deployment Errors - Dash Python
error: failed to push some refs to git@<dash-enterprise>:<app-name>. $ git push plotly main [...] To <your-dash-enterprise-server>:< ...
Read more >
Release UNKNOWN Iguazio - MLRun
learning lifecycle, from data ingestion, through model development to full pipeline deployment. MLRun is running as.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found