Improper paging when invoking EC2:DescribeSnapshots
See original GitHub issueDescribe the bug
When invoking EC2:DescribeSnapshots, clients are able to specify the MaxResults
request parameter:
botocore
supports a paginator for the EC2:DescribeSnapshots API:
In order to instruct botocore
to actually paginate, however, one must specify the PaginationConfig
parameter. If unspecified, meaning MaxResults
is unspecified, the EC2:DescribeSnapshots API returns all results
In accounts / regions with a substantial number of snapshots, improperly configuring paging for the EC2:DescribeSnapshots API causes Cloud Custodian to fail with read timeouts because it tries to describe potentially hundreds of thousands of snapshots all at once:
2021-08-27 19:24:09,860: custodian.output:ERROR Error while executing policy
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/usr/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 314, in send
urllib_response = conn.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 507, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 447, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: AWSHTTPSConnectionPool(host='ec2.eu-central-1.amazonaws.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/c7n/policy.py", line 285, in run
resources = self.policy.resource_manager.resources()
File "/src/c7n/resources/ebs.py", line 71, in resources
return super(Snapshot, self).resources(query=query)
File "/src/c7n/query.py", line 514, in resources
resources = self.source.resources(query)
File "/src/c7n/query.py", line 223, in resources
return self.query.filter(self.manager, **query)
File "/src/c7n/query.py", line 75, in filter
return self._invoke_client_enum(
File "/src/c7n/query.py", line 56, in _invoke_client_enum
data = results.build_full_result()
File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 449, in build_full_result
for response in self:
File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 255, in __iter__
response = self._make_request(current_kwargs)
File "/src/c7n/query.py", line 744, in _make_request
return self.retry(self._method, **current_kwargs)
File "/src/c7n/utils.py", line 444, in _retry
return func(*args, **kw)
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 662, in _make_api_call
http, parsed_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 682, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 136, in _send_request
while self._needs_retry(attempts, operation_model, request_dict,
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 253, in _needs_retry
responses = self._event_emitter.emit(
File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 183, in __call__
if self._checker(attempts, response, caught_exception):
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 250, in __call__
should_retry = self._should_retry(attempt_number, response,
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 277, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 316, in __call__
checker_response = checker(attempt_number, response,
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 222, in __call__
return self._check_caught_exception(
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 349, in send
raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://ec2.eu-central-1.amazonaws.com/"
2021-08-27 19:24:09,864: custodian.commands:ERROR Error while executing policy garbage-collect-snapshots, continuing
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/usr/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 314, in send
urllib_response = conn.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 507, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 447, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: AWSHTTPSConnectionPool(host='ec2.eu-central-1.amazonaws.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/c7n/commands.py", line 271, in run
policy()
File "/src/c7n/policy.py", line 1178, in __call__
resources = PullMode(self).run()
File "/src/c7n/policy.py", line 285, in run
resources = self.policy.resource_manager.resources()
File "/src/c7n/resources/ebs.py", line 71, in resources
return super(Snapshot, self).resources(query=query)
File "/src/c7n/query.py", line 514, in resources
resources = self.source.resources(query)
File "/src/c7n/query.py", line 223, in resources
return self.query.filter(self.manager, **query)
File "/src/c7n/query.py", line 75, in filter
return self._invoke_client_enum(
File "/src/c7n/query.py", line 56, in _invoke_client_enum
data = results.build_full_result()
File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 449, in build_full_result
for response in self:
File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 255, in __iter__
response = self._make_request(current_kwargs)
File "/src/c7n/query.py", line 744, in _make_request
return self.retry(self._method, **current_kwargs)
File "/src/c7n/utils.py", line 444, in _retry
return func(*args, **kw)
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 662, in _make_api_call
http, parsed_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 682, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 136, in _send_request
while self._needs_retry(attempts, operation_model, request_dict,
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 253, in _needs_retry
responses = self._event_emitter.emit(
File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 183, in __call__
if self._checker(attempts, response, caught_exception):
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 250, in __call__
should_retry = self._should_retry(attempt_number, response,
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 277, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 316, in __call__
checker_response = checker(attempt_number, response,
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 222, in __call__
return self._check_caught_exception(
File "/usr/local/lib/python3.8/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 349, in send
raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://ec2.eu-central-1.amazonaws.com/"
This is likely the root of the problem behind the following issues:
Further proof of what Cloud Custodian is doing can be shown using CloudTrail. Here’s the requestParameters
field in the event record for a EC2:DescribeSnapshots invocation from Cloud Custodian:
"requestParameters": {
"snapshotSet": {},
"ownersSet": {
"items": [
{
"owner": "self"
}
]
},
"sharedUsersSet": {},
"filterSet": {},
"includeRecoveryBin": false
}
Here it is for an invocation that properly configures paging (w/ MaxResults
set to 100
):
"requestParameters": {
"maxResults": 100,
"snapshotSet": {},
"ownersSet": {
"items": [
{
"owner": "self"
}
]
},
"sharedUsersSet": {},
"filterSet": {
"items": [
{
"name": "tag:appian:system",
"valueSet": {
"items": [
{
"value": "SiteWorkerGroupOperator"
}
]
}
},
{
"name": "tag:appian:expires-on",
"valueSet": {
"items": [
{
"value": "2021-08-30"
}
]
}
}
]
},
"includeRecoveryBin": false
},
Note how the latter event record includes "maxResults": 100
whereas the former does not
To Reproduce
Run Cloud Custodian using a policy using the ebs-snapshot
resource against an account / region containing hundreds of thousands of snapshots
Expected behavior
Cloud Custodian should describe the snapshots incrementally using pagination
Background (please complete the following information):
- OS:
Linux 47c05caf41de 5.10.47-linuxkit #1 SMP Sat Jul 3 21:51:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Python Version:
Python 3.8.5
- Custodian Version:
0.9.11
- Cloud Provider: AWS
- Policy:
policies:
- name: garbage-collect-snapshots
resource: ebs-snapshot
max-resources-percent: 20
filters:
- type: value
key: tag:appian:system
value: SiteWorkerGroupOperator
- type: value
key: tag:appian:deletion-protected
op: ne
value: "true"
- type: value
key: tag:appian:expires-on
value: "2021-08-30"
actions:
- delete
- Traceback: See above
custodian version --debug
output
Custodian: 0.9.11
Python: 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0]
Platform: posix.uname_result(sysname='Linux', nodename='f50e6da2a3a1', release='5.10.47-linuxkit', version='#1 SMP Sat Jul 3 21:51:47 UTC 2021', machine='x86_64')
Using venv: True
Docker: True
Installed:
PyJWT==1.7.1
PyYAML==5.4.1
adal==1.2.6
appdirs==1.4.4
applicationinsights==0.11.9
apscheduler==3.7.0
argcomplete==1.12.2
attrs==20.3.0
azure-common==1.1.26
azure-core==1.12.0
azure-cosmos==3.2.0
azure-cosmosdb-nspkg==2.0.2
azure-cosmosdb-table==1.0.6
azure-functions==1.6.0
azure-graphrbac==0.61.1
azure-identity==1.5.0
azure-keyvault==4.1.0
azure-keyvault-certificates==4.2.1
azure-keyvault-keys==4.3.1
azure-keyvault-secrets==4.2.0
azure-mgmt-apimanagement==1.0.0
azure-mgmt-applicationinsights==1.0.0
azure-mgmt-authorization==1.0.0
azure-mgmt-batch==15.0.0
azure-mgmt-cdn==10.0.0
azure-mgmt-cognitiveservices==11.0.0
azure-mgmt-compute==19.0.0
azure-mgmt-containerinstance==7.0.0
azure-mgmt-containerregistry==8.0.0b1
azure-mgmt-containerservice==15.0.0
azure-mgmt-core==1.2.2
azure-mgmt-cosmosdb==6.1.0
azure-mgmt-costmanagement==1.0.0
azure-mgmt-databricks==1.0.0b1
azure-mgmt-datafactory==1.1.0
azure-mgmt-datalake-store==1.0.0
azure-mgmt-dns==8.0.0b1
azure-mgmt-eventgrid==8.0.0
azure-mgmt-eventhub==8.0.0
azure-mgmt-hdinsight==7.0.0
azure-mgmt-iothub==1.0.0
azure-mgmt-keyvault==8.0.0
azure-mgmt-logic==9.0.0
azure-mgmt-managementgroups==1.0.0b1
azure-mgmt-monitor==2.0.0
azure-mgmt-msi==1.0.0
azure-mgmt-network==17.1.0
azure-mgmt-policyinsights==1.0.0
azure-mgmt-rdbms==8.0.0
azure-mgmt-redis==12.0.0
azure-mgmt-resource==16.0.0
azure-mgmt-resourcegraph==7.0.0
azure-mgmt-search==8.0.0
azure-mgmt-sql==1.0.0
azure-mgmt-storage==17.0.0
azure-mgmt-subscription==1.0.0
azure-mgmt-web==2.0.0
azure-nspkg==3.0.2
azure-storage-blob==12.8.0
azure-storage-common==2.1.0
azure-storage-file==2.1.0
azure-storage-file-share==12.4.1
azure-storage-queue==12.1.5
boto3==1.17.33
botocore==1.20.33
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
cryptography==3.4.6
decorator==4.4.2
distlib==0.3.1
dogpile.cache==1.1.2
google-api-core==1.26.1
google-api-python-client==1.12.8
google-auth==1.28.0
google-auth-httplib2==0.1.0
google-cloud-core==1.6.0
google-cloud-logging==1.15.1
google-cloud-monitoring==0.34.0
google-cloud-storage==1.36.2
google-crc32c==1.1.2
google-resumable-media==1.2.0
googleapis-common-protos==1.53.0
httplib2==0.19.0
idna==2.10
importlib-metadata==3.7.3
iso8601==0.1.14
isodate==0.6.0
jmespath==0.10.0
jsonpatch==1.32
jsonpickle==1.3
jsonpointer==2.1
jsonschema==3.2.0
keystoneauth1==4.3.1
kubernetes==10.0.1
msal==1.10.0
msal-extensions==0.3.0
msrest==0.6.21
msrestazure==0.6.4
munch==2.5.0
netaddr==0.7.20
netifaces==0.10.9
oauthlib==3.1.0
openstacksdk==0.52.0
os-service-types==1.7.0
packaging==20.9
pbr==5.5.1
portalocker==1.7.1
protobuf==3.15.6
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
pyyaml==5.4.1
ratelimiter==1.2.0.post0
requests==2.25.1
requests-oauthlib==1.3.0
requestsexceptions==1.4.0
retrying==1.3.3
rsa==4.7.2
s3transfer==0.3.6
setuptools==44.0.0
six==1.15.0
stevedore==3.3.0
tabulate==0.8.9
tzlocal==2.1
uritemplate==3.0.1
urllib3==1.26.4
websocket-client==0.58.0
zipp==3.4.1
Additional context
I should also note that I expected the above policy to result in some amount of server-side filtering. In particular, the following two filters should be able to be performed server-side by AWS:
- type: value
key: tag:appian:system
value: SiteWorkerGroupOperator
- type: value
key: tag:appian:expires-on
value: "2021-08-30"
The event record’s filterSet
field in CloudTrail, however, was empty. I created #6874 to track this
Issue Analytics
- State:
- Created 2 years ago
- Reactions:6
- Comments:6 (2 by maintainers)
Top GitHub Comments
@robbie-demuth thank you for the detailed bug report btw
Hi Robbie! We’ve discussed your two issues during our last community meeting and figured I would link you to the video discussion to add some background: https://youtu.be/N97x0OI0wXg?t=1470