question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

botocore.exceptions.CredentialRetrievalError raised when opening many files in parallel from S3

See original GitHub issue

Problem description

Be sure your description clearly answers the following questions:

  • What are you trying to achieve? I’m trying to read several files in parallel using multiprocessing from S3. I’m using a single c5.24xlarge or m5a.24xlarge EC2 instance which is running a single container. Note that each process is reading a different file.

  • What is the expected result? The opens should be successful. smart_open should be smarter at supporting parallelism.

  • What are you seeing instead? The following exception is raised:

botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response (429) from ECS metadata: You have reached maximum request limit.

Note that setting the environment variable ECS_TASK_METADATA_RPS_LIMIT="8000,9000" didn’t help at all.

What helped is a retry after catching the exception and sleeping time.sleep(random.random()), but there has got to be a cleaner way.

Steps/code to reproduce the problem

This line of code when run at about the same time from 168 simultaneous processes in a docker container raises the aforementioned exception:

with smart_open.open(input_s3_object_uri, “rb”) as input_file:

Versions

Please provide the output of:

import platform, sys, smart_open
print(platform.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)
>>> import platform, sys, smart_open
>>> print(platform.platform())
Linux-4.19.76-linuxkit-x86_64-with-glibc2.10
>>> print("Python", sys.version)
Python 3.8.3 (default, May 19 2020, 18:47:26) 
[GCC 7.3.0]
>>> print("smart_open", smart_open.__version__)
smart_open 2.0.0

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
impredicativecommented, Jun 16, 2020

I’m not sure if there’s anything smart_open should do about that.

Is it not supposed to be smart? Or is it just semismart?

For example, there’s exponential backoff.

A pure exponential backoff doesn’t help with a uniform load distribution. time.sleep(random.random() * exp_backoff_multiplier) does, although this doesn’t define an appropriate choice of the backoff parameters.

I think it’d be difficult for smart_open to pick an approach that works for everyone

The above suggested strategy should work for 90%. On AWS the number of cores are limited to 128. It is doubtful that anyone would run more than 512 worker processes on such a node. Most users that use a naive retry would be better off with the above strategy.

0reactions
mpenkovcommented, Jun 16, 2020

Is it not supposed to be smart? Or is it just semismart?

Having “smart” in the name doesn’t mean “it does everything for you”. For example, equipping yourself with a smartphone, smartwatch, etc. doesn’t instantly make you a genius (quite often the opposite). You still need to apply some of your own effort.

time.sleep(random.random() * exp_backoff_multiplier) does, although this doesn’t define an appropriate choice of the backoff parameters.

Yeah, the devil is always in the details. What parameters to pass? How to pass them? How to enable/disable this functionality?

I think the best way forward in your use case is to handle the above questions in your application logic.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CredentialRetrievalError: Failed to refresh credentials
I'm unable to end the process as it raises the following error CredentialRetrievalError: Error when retrieving credentials from iam-role: ...
Read more >
Resolve "Unable to locate credentials" error in ... - Amazon AWS
An "Unable to locate credentials" error indicates that Amazon S3 can't find the credentials to authenticate AWS API calls. To resolve this issue ......
Read more >
Process AWS S3 bucket files in parallel python batch jobs.
Here the first lambda function reads the S3 generated inventory file, which is a CSV file of bucket, and key for all the...
Read more >
Source code for boto3.s3.transfer - Amazon AWS
It also allows you to configure many aspects of the transfer process including: * Multipart threshold size * Max parallel downloads * Socket...
Read more >
Resolve "Unable to locate credentials" error in Amazon S3
When I try to access my Amazon Simple Storage Service (Amazon S3) bucket using the AWS Command Line Interface (AWS CLI), I get...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found