Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 list_objects_v2 paginator MaxItems only counts keys (Contents) not prefixes (CommonPrefixes)

See original GitHub issue

Describe the bug

When using boto3 to iterate an S3 bucket with a Delimiter, MaxItems only counts the keys, not the prefixes. So if you have a bucket with only prefixes, MaxItems will never stop searching and may take unbounded time.

Steps to reproduce

Set up a bucket with 20000 keys of the form result1/results.txt … result20000/results.txt

Run this code:

import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket='mybucket', Delimiter='/', PaginationConfig={'MaxItems': 2000}):
    for prefix in result.get('CommonPrefixes', []):
        print("prefix {}".format(prefix['Prefix']))
    for key in result.get('Contents', []):
        print("key {}".format(key['Key'])

Expected behavior The above program should return a maximum of 2000 keys. It actually returns all 20,000 keys, because MaxItems doesn’t count prefixes.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

bsmedberg-xometrycommented, Apr 13, 2020

In this case, there will be nothing in Contents ever, there will only be CommonPrefixes.

And the problem is not the truncation: the problem is that even after getting 2000 CommonPrefixes, it keeps making calls forever.

I have worked around this locally by doing something like this:

import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
found_keys = 0
found_prefixes = 0
for result in paginator.paginate(Bucket='mybucket', Delimiter='/', PaginationConfig={'MaxItems': 2000}):
    found_prefixes += len(result.get('CommonPrefixes', []))
    found_keys += len(result.get('Contents', []))
    if found_prefixes + found_keys > 2000:
        break # stop iterating here to prevent eternal iteration

However I don’t believe that this is or should be the expected behavior of the boto3 paginator. If this is is the expected behavior of the paginator, then the paginator docs need to be updated to warn of this behavior.

0reactions

bsmedberg-xometrycommented, Jun 10, 2022

I believe that this issue is still valid. Neither of the docs at https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#customizing-page-iterators or https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2 document this behavior, and there is still no way to limit the pagination to both prefixes and keys.

Top Results From Across the Web

ListObjectsV2 - Amazon Simple Storage Service

Encoding type used by Amazon S3 to encode object keys in the response. ... A response can contain CommonPrefixes only if you specify...

Retrieving subfolders names in S3 bucket from boto3

Passing a limit in the form of PaginationConfig={'MaxItems': limit} limits only the number of keys, not the common prefixes.

S3 — Boto3 Docs 1.26.34 documentation - AWS

Key of the object for which the multipart upload was initiated. ... If it were not, it would not contain the content-length, and...

Listing even more keys in an S3 bucket with Python - alexwlchan

Part of that code is handling pagination in the S3 API – it makes a series of calls to the ListObjectsV2 API, fetching...

How to scan millions of files on AWS S3 - LinkedIn

aws s3api list-objects --bucket "mybucket.aws.s3com" --query 'Contents[].{Key: Key, Size: Size}' --prefix XY >XY.log.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

S3 list_objects_v2 paginator MaxItems only counts keys (Contents) not prefixes (CommonPrefixes)

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Chime Client - Unable to create user

"can only concatenate str (not \"NoneType\") to str" error when using revoke_ingress method of ec2 client