question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improper paging when invoking EC2:DescribeSnapshots

See original GitHub issue

Describe the bug

When invoking EC2:DescribeSnapshots, clients are able to specify the MaxResults request parameter:

image

botocore supports a paginator for the EC2:DescribeSnapshots API:

https://botocore.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Paginator.DescribeSnapshots

In order to instruct botocore to actually paginate, however, one must specify the PaginationConfig parameter. If unspecified, meaning MaxResults is unspecified, the EC2:DescribeSnapshots API returns all results

In accounts / regions with a substantial number of snapshots, improperly configuring paging for the EC2:DescribeSnapshots API causes Cloud Custodian to fail with read timeouts because it tries to describe potentially hundreds of thousands of snapshots all at once:

2021-08-27 19:24:09,860: custodian.output:ERROR Error while executing policy
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 314, in send
    urllib_response = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 507, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: AWSHTTPSConnectionPool(host='ec2.eu-central-1.amazonaws.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/src/c7n/policy.py", line 285, in run
    resources = self.policy.resource_manager.resources()
  File "/src/c7n/resources/ebs.py", line 71, in resources
    return super(Snapshot, self).resources(query=query)
  File "/src/c7n/query.py", line 514, in resources
    resources = self.source.resources(query)
  File "/src/c7n/query.py", line 223, in resources
    return self.query.filter(self.manager, **query)
  File "/src/c7n/query.py", line 75, in filter
    return self._invoke_client_enum(
  File "/src/c7n/query.py", line 56, in _invoke_client_enum
    data = results.build_full_result()
  File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 449, in build_full_result
    for response in self:
  File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 255, in __iter__
    response = self._make_request(current_kwargs)
  File "/src/c7n/query.py", line 744, in _make_request
    return self.retry(self._method, **current_kwargs)
  File "/src/c7n/utils.py", line 444, in _retry
    return func(*args, **kw)
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 662, in _make_api_call
    http, parsed_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 682, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 136, in _send_request
    while self._needs_retry(attempts, operation_model, request_dict,
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 253, in _needs_retry
    responses = self._event_emitter.emit(
  File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 250, in __call__
    should_retry = self._should_retry(attempt_number, response,
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 277, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 316, in __call__
    checker_response = checker(attempt_number, response,
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 222, in __call__
    return self._check_caught_exception(
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _do_get_response
    http_response = self._send(request)
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send
    return self.http_session.send(request)
  File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 349, in send
    raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://ec2.eu-central-1.amazonaws.com/"
2021-08-27 19:24:09,864: custodian.commands:ERROR Error while executing policy garbage-collect-snapshots, continuing
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 314, in send
    urllib_response = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 507, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: AWSHTTPSConnectionPool(host='ec2.eu-central-1.amazonaws.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/src/c7n/commands.py", line 271, in run
    policy()
  File "/src/c7n/policy.py", line 1178, in __call__
    resources = PullMode(self).run()
  File "/src/c7n/policy.py", line 285, in run
    resources = self.policy.resource_manager.resources()
  File "/src/c7n/resources/ebs.py", line 71, in resources
    return super(Snapshot, self).resources(query=query)
  File "/src/c7n/query.py", line 514, in resources
    resources = self.source.resources(query)
  File "/src/c7n/query.py", line 223, in resources
    return self.query.filter(self.manager, **query)
  File "/src/c7n/query.py", line 75, in filter
    return self._invoke_client_enum(
  File "/src/c7n/query.py", line 56, in _invoke_client_enum
    data = results.build_full_result()
  File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 449, in build_full_result
    for response in self:
  File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 255, in __iter__
    response = self._make_request(current_kwargs)
  File "/src/c7n/query.py", line 744, in _make_request
    return self.retry(self._method, **current_kwargs)
  File "/src/c7n/utils.py", line 444, in _retry
    return func(*args, **kw)
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 662, in _make_api_call
    http, parsed_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 682, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 136, in _send_request
    while self._needs_retry(attempts, operation_model, request_dict,
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 253, in _needs_retry
    responses = self._event_emitter.emit(
  File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 250, in __call__
    should_retry = self._should_retry(attempt_number, response,
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 277, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 316, in __call__
    checker_response = checker(attempt_number, response,
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 222, in __call__
    return self._check_caught_exception(
  File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _do_get_response
    http_response = self._send(request)
  File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send
    return self.http_session.send(request)
  File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 349, in send
    raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://ec2.eu-central-1.amazonaws.com/"

This is likely the root of the problem behind the following issues:

Further proof of what Cloud Custodian is doing can be shown using CloudTrail. Here’s the requestParameters field in the event record for a EC2:DescribeSnapshots invocation from Cloud Custodian:

"requestParameters": {
    "snapshotSet": {},
    "ownersSet": {
        "items": [
            {
                "owner": "self"
            }
        ]
    },
    "sharedUsersSet": {},
    "filterSet": {},
    "includeRecoveryBin": false
}

Here it is for an invocation that properly configures paging (w/ MaxResults set to 100):

"requestParameters": {
    "maxResults": 100,
    "snapshotSet": {},
    "ownersSet": {
        "items": [
            {
                "owner": "self"
            }
        ]
    },
    "sharedUsersSet": {},
    "filterSet": {
        "items": [
            {
                "name": "tag:appian:system",
                "valueSet": {
                    "items": [
                        {
                            "value": "SiteWorkerGroupOperator"
                        }
                    ]
                }
            },
            {
                "name": "tag:appian:expires-on",
                "valueSet": {
                    "items": [
                        {
                            "value": "2021-08-30"
                        }
                    ]
                }
            }
        ]
    },
    "includeRecoveryBin": false
},

Note how the latter event record includes "maxResults": 100 whereas the former does not

To Reproduce

Run Cloud Custodian using a policy using the ebs-snapshot resource against an account / region containing hundreds of thousands of snapshots

Expected behavior

Cloud Custodian should describe the snapshots incrementally using pagination

Background (please complete the following information):

  • OS: Linux 47c05caf41de 5.10.47-linuxkit #1 SMP Sat Jul 3 21:51:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Python Version: Python 3.8.5
  • Custodian Version: 0.9.11
  • Cloud Provider: AWS
  • Policy:
policies:
  - name: garbage-collect-snapshots
    resource: ebs-snapshot
    max-resources-percent: 20
    filters:
      - type: value
        key: tag:appian:system
        value: SiteWorkerGroupOperator
      - type: value
        key: tag:appian:deletion-protected
        op: ne
        value: "true"
      - type: value
        key: tag:appian:expires-on
        value: "2021-08-30"
    actions:
      - delete
  • Traceback: See above
  • custodian version --debug output
Custodian:   0.9.11
Python:      3.8.5 (default, Jan 27 2021, 15:41:15)
             [GCC 9.3.0]
Platform:    posix.uname_result(sysname='Linux', nodename='f50e6da2a3a1', release='5.10.47-linuxkit', version='#1 SMP Sat Jul 3 21:51:47 UTC 2021', machine='x86_64')
Using venv:  True
Docker: True
Installed:

PyJWT==1.7.1
PyYAML==5.4.1
adal==1.2.6
appdirs==1.4.4
applicationinsights==0.11.9
apscheduler==3.7.0
argcomplete==1.12.2
attrs==20.3.0
azure-common==1.1.26
azure-core==1.12.0
azure-cosmos==3.2.0
azure-cosmosdb-nspkg==2.0.2
azure-cosmosdb-table==1.0.6
azure-functions==1.6.0
azure-graphrbac==0.61.1
azure-identity==1.5.0
azure-keyvault==4.1.0
azure-keyvault-certificates==4.2.1
azure-keyvault-keys==4.3.1
azure-keyvault-secrets==4.2.0
azure-mgmt-apimanagement==1.0.0
azure-mgmt-applicationinsights==1.0.0
azure-mgmt-authorization==1.0.0
azure-mgmt-batch==15.0.0
azure-mgmt-cdn==10.0.0
azure-mgmt-cognitiveservices==11.0.0
azure-mgmt-compute==19.0.0
azure-mgmt-containerinstance==7.0.0
azure-mgmt-containerregistry==8.0.0b1
azure-mgmt-containerservice==15.0.0
azure-mgmt-core==1.2.2
azure-mgmt-cosmosdb==6.1.0
azure-mgmt-costmanagement==1.0.0
azure-mgmt-databricks==1.0.0b1
azure-mgmt-datafactory==1.1.0
azure-mgmt-datalake-store==1.0.0
azure-mgmt-dns==8.0.0b1
azure-mgmt-eventgrid==8.0.0
azure-mgmt-eventhub==8.0.0
azure-mgmt-hdinsight==7.0.0
azure-mgmt-iothub==1.0.0
azure-mgmt-keyvault==8.0.0
azure-mgmt-logic==9.0.0
azure-mgmt-managementgroups==1.0.0b1
azure-mgmt-monitor==2.0.0
azure-mgmt-msi==1.0.0
azure-mgmt-network==17.1.0
azure-mgmt-policyinsights==1.0.0
azure-mgmt-rdbms==8.0.0
azure-mgmt-redis==12.0.0
azure-mgmt-resource==16.0.0
azure-mgmt-resourcegraph==7.0.0
azure-mgmt-search==8.0.0
azure-mgmt-sql==1.0.0
azure-mgmt-storage==17.0.0
azure-mgmt-subscription==1.0.0
azure-mgmt-web==2.0.0
azure-nspkg==3.0.2
azure-storage-blob==12.8.0
azure-storage-common==2.1.0
azure-storage-file==2.1.0
azure-storage-file-share==12.4.1
azure-storage-queue==12.1.5
boto3==1.17.33
botocore==1.20.33
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
cryptography==3.4.6
decorator==4.4.2
distlib==0.3.1
dogpile.cache==1.1.2
google-api-core==1.26.1
google-api-python-client==1.12.8
google-auth==1.28.0
google-auth-httplib2==0.1.0
google-cloud-core==1.6.0
google-cloud-logging==1.15.1
google-cloud-monitoring==0.34.0
google-cloud-storage==1.36.2
google-crc32c==1.1.2
google-resumable-media==1.2.0
googleapis-common-protos==1.53.0
httplib2==0.19.0
idna==2.10
importlib-metadata==3.7.3
iso8601==0.1.14
isodate==0.6.0
jmespath==0.10.0
jsonpatch==1.32
jsonpickle==1.3
jsonpointer==2.1
jsonschema==3.2.0
keystoneauth1==4.3.1
kubernetes==10.0.1
msal==1.10.0
msal-extensions==0.3.0
msrest==0.6.21
msrestazure==0.6.4
munch==2.5.0
netaddr==0.7.20
netifaces==0.10.9
oauthlib==3.1.0
openstacksdk==0.52.0
os-service-types==1.7.0
packaging==20.9
pbr==5.5.1
portalocker==1.7.1
protobuf==3.15.6
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
pyyaml==5.4.1
ratelimiter==1.2.0.post0
requests==2.25.1
requests-oauthlib==1.3.0
requestsexceptions==1.4.0
retrying==1.3.3
rsa==4.7.2
s3transfer==0.3.6
setuptools==44.0.0
six==1.15.0
stevedore==3.3.0
tabulate==0.8.9
tzlocal==2.1
uritemplate==3.0.1
urllib3==1.26.4
websocket-client==0.58.0
zipp==3.4.1

Additional context

I should also note that I expected the above policy to result in some amount of server-side filtering. In particular, the following two filters should be able to be performed server-side by AWS:

- type: value
  key: tag:appian:system
  value: SiteWorkerGroupOperator
- type: value
  key: tag:appian:expires-on
  value: "2021-08-30"

The event record’s filterSet field in CloudTrail, however, was empty. I created #6874 to track this

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:6
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
kapiltcommented, Sep 5, 2021

@robbie-demuth thank you for the detailed bug report btw

1reaction
castrojocommented, Sep 1, 2021

Hi Robbie! We’ve discussed your two issues during our last community meeting and figured I would link you to the video discussion to add some background: https://youtu.be/N97x0OI0wXg?t=1470

Read more comments on GitHub >

github_iconTop Results From Across the Web

describe-snapshots — AWS CLI 1.27.34 Command Reference
describe -snapshots is a paginated operation. Multiple API calls may be issued in order to retrieve the entire data set of results. You...
Read more >
describe-snapshots — AWS CLI 2.9.8 Command Reference
Describes the specified EBS snapshots available to you or all of the EBS snapshots available to ... The size of each page to...
Read more >
Delete older than month AWS EC2 snapshots - Stack Overflow
Is this below given command will work or not to delete older than month AWS EC2 Snapshot. aws describe- ...
Read more >
How to Create Snapshot of EC2 Instance for Data Protection
EBS snapshots are block-level backups of EC2 instance data that represent the EBS volume at the specific point in time when the snapshot...
Read more >
Gathers information about EC2 volume snapshots in AWS
If you are a Red Hat customer, refer to the Ansible Automation Platform Life Cycle page for subscription details. amazon.aws.ec2_snapshot_info module – Gathers ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found