question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 list_objects_v2 paginator MaxItems only counts keys (Contents) not prefixes (CommonPrefixes)

See original GitHub issue

Describe the bug

When using boto3 to iterate an S3 bucket with a Delimiter, MaxItems only counts the keys, not the prefixes. So if you have a bucket with only prefixes, MaxItems will never stop searching and may take unbounded time.

Steps to reproduce

Set up a bucket with 20000 keys of the form result1/results.txt … result20000/results.txt

Run this code:

import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket='mybucket', Delimiter='/', PaginationConfig={'MaxItems': 2000}):
    for prefix in result.get('CommonPrefixes', []):
        print("prefix {}".format(prefix['Prefix']))
    for key in result.get('Contents', []):
        print("key {}".format(key['Key'])

Expected behavior The above program should return a maximum of 2000 keys. It actually returns all 20,000 keys, because MaxItems doesn’t count prefixes.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
bsmedberg-xometrycommented, Apr 13, 2020

In this case, there will be nothing in Contents ever, there will only be CommonPrefixes.

And the problem is not the truncation: the problem is that even after getting 2000 CommonPrefixes, it keeps making calls forever.

I have worked around this locally by doing something like this:

import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
found_keys = 0
found_prefixes = 0
for result in paginator.paginate(Bucket='mybucket', Delimiter='/', PaginationConfig={'MaxItems': 2000}):
    found_prefixes += len(result.get('CommonPrefixes', []))
    found_keys += len(result.get('Contents', []))
    if found_prefixes + found_keys > 2000:
        break # stop iterating here to prevent eternal iteration

However I don’t believe that this is or should be the expected behavior of the boto3 paginator. If this is is the expected behavior of the paginator, then the paginator docs need to be updated to warn of this behavior.

0reactions
bsmedberg-xometrycommented, Jun 10, 2022

I believe that this issue is still valid. Neither of the docs at https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#customizing-page-iterators or https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2 document this behavior, and there is still no way to limit the pagination to both prefixes and keys.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ListObjectsV2 - Amazon Simple Storage Service
Encoding type used by Amazon S3 to encode object keys in the response. ... A response can contain CommonPrefixes only if you specify...
Read more >
Retrieving subfolders names in S3 bucket from boto3
Passing a limit in the form of PaginationConfig={'MaxItems': limit} limits only the number of keys, not the common prefixes.
Read more >
S3 — Boto3 Docs 1.26.34 documentation - AWS
Key of the object for which the multipart upload was initiated. ... If it were not, it would not contain the content-length, and...
Read more >
Listing even more keys in an S3 bucket with Python - alexwlchan
Part of that code is handling pagination in the S3 API – it makes a series of calls to the ListObjectsV2 API, fetching...
Read more >
How to scan millions of files on AWS S3 - LinkedIn
aws s3api list-objects --bucket "mybucket.aws.s3com" --query 'Contents[].{Key: Key, Size: Size}' --prefix XY >XY.log.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found