S3 list_objects_v2 paginator MaxItems only counts keys (Contents) not prefixes (CommonPrefixes)
See original GitHub issueDescribe the bug
When using boto3 to iterate an S3 bucket with a Delimiter, MaxItems only counts the keys, not the prefixes. So if you have a bucket with only prefixes, MaxItems will never stop searching and may take unbounded time.
Steps to reproduce
Set up a bucket with 20000 keys of the form result1/results.txt … result20000/results.txt
Run this code:
import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket='mybucket', Delimiter='/', PaginationConfig={'MaxItems': 2000}):
for prefix in result.get('CommonPrefixes', []):
print("prefix {}".format(prefix['Prefix']))
for key in result.get('Contents', []):
print("key {}".format(key['Key'])
Expected behavior The above program should return a maximum of 2000 keys. It actually returns all 20,000 keys, because MaxItems doesn’t count prefixes.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top Results From Across the Web
ListObjectsV2 - Amazon Simple Storage Service
Encoding type used by Amazon S3 to encode object keys in the response. ... A response can contain CommonPrefixes only if you specify...
Read more >Retrieving subfolders names in S3 bucket from boto3
Passing a limit in the form of PaginationConfig={'MaxItems': limit} limits only the number of keys, not the common prefixes.
Read more >S3 — Boto3 Docs 1.26.34 documentation - AWS
Key of the object for which the multipart upload was initiated. ... If it were not, it would not contain the content-length, and...
Read more >Listing even more keys in an S3 bucket with Python - alexwlchan
Part of that code is handling pagination in the S3 API – it makes a series of calls to the ListObjectsV2 API, fetching...
Read more >How to scan millions of files on AWS S3 - LinkedIn
aws s3api list-objects --bucket "mybucket.aws.s3com" --query 'Contents[].{Key: Key, Size: Size}' --prefix XY >XY.log.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
In this case, there will be nothing in Contents ever, there will only be CommonPrefixes.
And the problem is not the truncation: the problem is that even after getting 2000 CommonPrefixes, it keeps making calls forever.
I have worked around this locally by doing something like this:
However I don’t believe that this is or should be the expected behavior of the boto3 paginator. If this is is the expected behavior of the paginator, then the paginator docs need to be updated to warn of this behavior.
I believe that this issue is still valid. Neither of the docs at https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#customizing-page-iterators or https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2 document this behavior, and there is still no way to limit the pagination to both prefixes and keys.