question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent configuration of S3Result vs S3 Storage with custom S3 backend

See original GitHub issue

Description

The following S3 storage configuration works as expected when connecting to Minio:

storage = S3(
        bucket="flows",
        aws_access_key_id=os.getenv("MY_MINIO_ID"),
        aws_secret_access_key=os.getenv("MY_MINIO_KEY"),
        client_options=dict(endpoint_url=os.getenv("MINIO_ENDPOINT")),
    )

Although when trying to work with S3Result:

    result = S3Result(
        bucket="results",
        boto3_kwargs=dict(
            aws_access_key_id=os.getenv("MY_MINIO_ID"),
            aws_secret_access_key=os.getenv("MY_MINIO_KEY"),
            client_options=dict(endpoint_url=os.getenv("MINIO_ENDPOINT")),
        ),
    )

I receive the following error in UI:

prefect-agent_1      | [2020-06-09 07:59:36] DEBUG - prefect.S3Result | Starting to upload result to 2020/6/9/5a96e8ef-70d1-4d47-a495-15b0afa99169.prefect_result...
prefect-agent_1      | [2020-06-09 07:59:36] ERROR - prefect.CloudTaskRunner | Unexpected error: TypeError("client() got multiple values for keyword argument 'aws_access_key_id'")
prefect-agent_1      | Traceback (most recent call last):
prefect-agent_1      |   File "/usr/local/lib/python3.7/site-packages/prefect/engine/runner.py", line 48, in inner
prefect-agent_1      |     new_state = method(self, state, *args, **kwargs)
prefect-agent_1      |   File "/usr/local/lib/python3.7/site-packages/prefect/engine/task_runner.py", line 986, in get_task_run_state
prefect-agent_1      |     result = self.result.write(value, filename="output", **prefect.context)
prefect-agent_1      |   File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/s3_result.py", line 103, in write
prefect-agent_1      |     self.client.upload_fileobj(stream, Bucket=self.bucket, Key=new.location)
prefect-agent_1      |   File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/s3_result.py", line 60, in client
prefect-agent_1      |     self.initialize_client()
prefect-agent_1      |   File "/usr/local/lib/python3.7/site-packages/prefect/engine/results/s3_result.py", line 49, in initialize_client
prefect-agent_1      |     "s3", credentials=None, use_session=True, **self.boto3_kwargs
prefect-agent_1      |   File "/usr/local/lib/python3.7/site-packages/prefect/utilities/aws.py", line 49, in get_boto_client
prefect-agent_1      |     **kwargs
prefect-agent_1      | TypeError: client() got multiple values for keyword argument 'aws_access_key_id'

Expected Behavior

Expectation is to be able to configure S3Result by directly passing boto3 arguments (same as it works for S3 storage) and have consistency across these 2 structures.

Reproduction

Sample flow I’m using:

import os
from prefect import task, Flow
from prefect.engine.results.s3_result import S3Result
from prefect.environments.storage import Docker

@task
def add(x, y=1):
    """
    The only task we use so far here ;-)
    """
    return x + y

def create_flow():
    """
    Create the flow
    """
    result = S3Result(
        bucket="results",
        boto3_kwargs=dict(
            aws_access_key_id=os.getenv("MY_MINIO_ID"),
            aws_secret_access_key=os.getenv("MY_MINIO_KEY"),
            client_options=dict(endpoint_url=os.getenv("MINIO_ENDPOINT")),
        ),
    )

    with Flow("Sample Flow", result=result) as flow:
        first_result = add(1, y=2)
        second_result = add(x=first_result, y=100)
    
    storage = Docker()
    storage.add_flow(flow)
    flow.storage = storage

    return flow

Environment

Any additional information about your environment

  • OSX
  • Docker Compose
  • Docker Agent

Optionally run prefect diagnostics from the command line and paste the information here

root@9675cb8de5d4:/opt/packages# prefect diagnostics
{
  "config_overrides": {},
  "env_vars": [
    "PREFECT__LOGGING__LEVEL",
    "PREFECT__SERVER__HOST",
    "PREFECT__BACKEND"
  ],
  "system_information": {
    "platform": "Linux-4.19.76-linuxkit-x86_64-with-debian-10.4",
    "prefect_version": "0.11.5",
    "python_version": "3.7.7"
  }
}

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:18

github_iconTop GitHub Comments

1reaction
joshmeekcommented, Jun 9, 2020

@oleksandr Yeah that could be it. Removing that and also adding -U to the pip install should install it from the branch.

0reactions
joshmeekcommented, Jun 16, 2020

Thanks for the follow up! Will keep this issue in mind when doing #2714

Read more comments on GitHub >

github_iconTop Results From Across the Web

Best practices design patterns: optimizing Amazon S3 ...
You can increase your read or write performance by using parallelization. For example, if you create 10 prefixes in an Amazon S3 bucket...
Read more >
Errors Related to Visible S3 Inconsistency
The directory inconsistency is precisely the problem which S3Guard aims to correct. Inconsistent directory listings can surface as a `FileNotFoundException` ...
Read more >
Resource: aws_s3_bucket - hashicorp - Terraform Registry
S3 Bucket CORS can be configured in either the standalone resource aws_s3_bucket_cors_configuration or with the deprecated parameter cors_rule in the resource ...
Read more >
Configuring Object Storage as Primary Storage
Nextcloud allows to configure object storages like OpenStack Swift or Amazon Simple Storage Service (S3) or any compatible S3-implementation (e.g. Minio or ......
Read more >
AWS Terraform S3 and dynamoDB backend - Angelo Malatacca
In practice, it stores the terraform.tfstate file in an s3 bucket and uses a dynamoDB table for state locking and consistency checking. In...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found