question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 list operation requested when file not found

See original GitHub issue

What happened: When trying to open a file that does not exist in an AWS S3 bucket, a list operation is requested. In addition, a FileNotFoundError exception is raised.

What you expected to happen: I would expect to see a FileNotFoundError exception being raised only. No list operations should be requested. It seems unnecessary and AWS S3 charges for it.

Minimal Complete Verifiable Example: When running:

import s3fs


storage_options = {
    'key': 'my-key',
    'secret': 'my-secret',
    'use_ssl': False,
    'client_kwargs': {
        'endpoint_url': 'http://my-url:my-port'
    },
}
file_system = s3fs.S3FileSystem(**storage_options)
file_system.open('s3://my-bucket/my_file.txt', 'r')

The output is:

FileNotFoundError: my-bucket/my_file.txt

And my HTTP traffic monitor shows the GET request for the list operation (note the list-type=2):

GET     /my-bucket?list-type=2&prefix=my_file.txt%2F&delimiter=%2F&max-keys=1&encoding-type=url

Anything else we need to know?: The list operation seems to be requested when this line is executed.

Environment:

  • Python version: 3.6.9
  • Operating System: Ubuntu 18.04.3 LTS
  • Install method (conda, pip, source): pip install s3fs
  • Output of pip list:
Package           Version
----------------- -------
aiobotocore       1.1.2
aiohttp           3.6.2
aioitertools      0.7.0
async-timeout     3.0.1
attrs             20.2.0
botocore          1.17.44
chardet           3.0.4
docutils          0.15.2
fsspec            0.8.3
idna              2.10
idna-ssl          1.1.0
jmespath          0.10.0
multidict         4.7.6
pip               20.2.3
pkg-resources     0.0.0
python-dateutil   2.8.1
s3fs              0.5.1
setuptools        50.3.0
six               1.15.0
typing-extensions 3.7.4.3
urllib3           1.25.10
wheel             0.35.1
wrapt             1.12.1
yarl              1.6.0

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Oct 30, 2020

The problem with this is, that you are ignoring the file listing cache, so that you do the lookup on every open, even if you have previously listed the prefix. You could have a middle-ground of checking self._ls_from_cache first, but you do end up repeating code.

0reactions
estebanagcommented, Oct 30, 2020

Would it be desirable to modify S3FileSystem._open() so that it does something like this before attempting to open a file?

...
bucket, key, _ = self.split_path(path)
try:
    self.s3.head_object(Bucket=bucket, Key=key)
except ClientError as e:
    raise translate_boto_error(e) from e
...

According to this, translate_boto_error(e) will return a FileNotFoundError when e.response['Error'].get('Code') is '404'.

In the meantime, I’m using file_system.s3.head_object() as a workaround.

Read more comments on GitHub >

github_iconTop Results From Across the Web

S3 list operation requested when file not found #382 - GitHub
I would expect to see a FileNotFoundError exception being raised only. No list operations should be requested. It seems unnecessary and AWS S3 ......
Read more >
Resolve errors uploading data to or downloading data from ...
To load data as a text file from Amazon Aurora into Amazon S3, run the SELECT ... Incorrect Command: missing file/prefix/manifest keyword ...
Read more >
Quick way to list all files in Amazon S3 bucket? - Stack Overflow
AWS CLI can let you see all files of an S3 bucket quickly and help in performing other operations too. To use AWS...
Read more >
S3 API operations - IBM
Provides a list of buckets for this object client node. In IBM Spectrum Protect, buckets are represented by file spaces. No parameters are...
Read more >
10 things you should know about using AWS S3 - Sumo Logic
Learn how to optimize Amazon S3 with top tips and best practices. Bucket limits, transfer speeds, storage costs, and more – get answers...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found