Random GetBucketLocation Failures Since Upgrading to v0.9.19
See original GitHub issueDescribe the bug
While testing v0.9.18 we encountered the AccessDenied error for the GetBucketLocation api while policies were trying to write to a central S3 logging bucket in one region. After realizing this was a known issue for which a fix was being delivered in v0.9.19, we held off on our upgrade from v0.9.14. We did testing in our PreProd environment when v0.9.19 was released and the issue seemed to have been resolved.
Two nights ago we moved forward with the upgrade and since then we have been receiving random failures. We have several dozen policies running against an Organization of 500+ accounts across 16 regions. Pull-type policies run hourly and daily, and this is where we are seeing the random errors. Event-based policies that run in lambda do not appear to be experiencing this, st least we haven’t identified any occurrences of it yet.
What did you expect to happen?
The policies would run successfully and not receive AccessDenied failures for the GetBucketLocation api.
Cloud Provider
Amazon Web Services (AWS)
Cloud Custodian version and dependency information
Custodian: 0.9.19
Python: 3.9.10 (main, Jan 15 2022, 11:48:04)
[Clang 13.0.0 (clang-1300.0.29.3)]
Platform: posix.uname_result(sysname='Darwin', nodename='PL1USCLT001MAC.local', release='21.6.0', version='Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10 PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64', machine='x86_64')
Using venv: True
Docker: False
Installed:
PyYAML==6.0
Pygments==2.13.0
argcomplete==2.0.0
attrs==22.1.0
aws-xray-sdk==2.10.0
bleach==5.0.1
boto3==1.24.87
botocore==1.27.87
c7n==0.9.19
cachetools==5.2.0
certifi==2022.9.24
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.5
coverage==6.5.0
docutils==0.17.1
execnet==1.9.0
flake8==3.9.2
freezegun==1.2.2
google-api-core==2.10.1
google-api-python-client==2.64.0
google-auth==2.12.0
google-auth-httplib2==0.1.0
google-cloud-appengine-logging==1.1.5
google-cloud-audit-log==0.2.4
google-cloud-core==2.3.2
google-cloud-logging==3.2.4
google-cloud-monitoring==2.11.2
google-cloud-storage==1.44.0
google-crc32c==1.5.0
google-resumable-media==2.4.0
googleapis-common-protos==1.56.4
grpc-google-iam-v1==0.12.4
grpcio==1.49.1
grpcio-status==1.49.1
httplib2==0.20.4
idna==3.4
importlib-metadata==4.13.0
importlib-resources==5.9.0
iniconfig==1.1.1
jaraco.classes==3.2.3
jmespath==1.0.1
jsonpatch==1.32
jsonpointer==2.3
jsonschema==4.16.0
keyring==23.9.3
mccabe==0.6.1
mock==4.0.3
more-itertools==8.14.0
multidict==6.0.2
packaging==21.3
pkginfo==1.8.3
pkgutil-resolve-name==1.3.10
placebo==0.9.0
pluggy==1.0.0
portalocker==2.5.1
proto-plus==1.22.1
protobuf==4.21.7
psutil==5.9.2
py==1.11.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.7.0
pyflakes==2.3.1
pygments==2.13.0
pyparsing==3.0.9
pyrsistent==0.18.1
pytest==7.1.3
pytest-cov==3.0.0
pytest-forked==1.4.0
pytest-recording==0.12.1
pytest-sugar==0.9.5
pytest-terraform==0.6.4
pytest-xdist==2.5.0
python-dateutil==2.8.2
pyyaml==6.0
ratelimiter==1.2.0.post0
readme-renderer==37.2
requests==2.28.1
requests-toolbelt==0.9.1
retrying==1.3.3
rfc3986==2.0.0
rsa==4.9
s3transfer==0.6.0
six==1.16.0
tabulate==0.8.10
termcolor==2.0.1
tomli==2.0.1
tqdm==4.64.1
twine==3.8.0
typing-extensions==4.3.0
uritemplate==4.1.1
urllib3==1.26.12
vcrpy==4.2.1
webencodings==0.5.1
wrapt==1.14.1
yarl==1.8.1
zipp==3.8.1
Policy
The failure is not tied to a specific policy.
Relevant log/traceback output
2022-10-12 13:20:03,574: c7n_org:ERROR Exception running policy:ec2-optin-start-instances-off-hours-periodic account:XXXXXX region:us-east-2 error:unable to determine a region for output bucket XXX-XXX-XXX: An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied
Extra information or context
No response
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:11 (9 by maintainers)
Top GitHub Comments
Unfortunately, I hit the following error while running with the updated module:
2022-10-25 15:56:31,337: custodian.aws:WARNING unable to determine output bucket region with HTTP HEAD request: HTTP Error 503: Slow Down
I received a few of those errors in the logs.
Sounds great. Unless you see any issues with this, I would like to take the updated aws.py module with your changes and place it into our runtime environment. Considering I too am unable to reproduce the issue manually, this is best way we can test the efficacy of the enhancement.