ElasticSearch Streaming Function - You have exceeded the number of permissible concurrent requests with unique IAM Identities
See original GitHub issueBefore opening, please confirm:
- I have installed the latest version of the Amplify CLI (see above), and confirmed that the issue still persists.
- I have searched for duplicate or closed issues.
- I have read the guide for submitting bug reports.
- I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.
- I have removed any sensitive information from my code snippets and submission.
How did you install the Amplify CLI?
npm
If applicable, what version of Node.js are you using?
v10.23.2
Amplify CLI Version
6.3.1
What operating system are you using?
Linux
Did you make any manual changes to the cloud resources managed by Amplify? Please describe the changes made.
No
Amplify Categories
Not applicable
Amplify Commands
Not applicable
Describe the bug
Firstly, let me say that I know we are on an old version of the Amplify CLI, and thus an old version of the python Lambda function that streams data from DynamoDB to ElasticSearch. We are of course also on ElasticSearch 6.2. We started to look at a migration to the V2 Transformer and getting onto the latest version, but ran into a problem that prevented moving forward (#9123) .
That aside, we have found a problem with the DdbToEsFn-* function as shown below:
[ERROR] 2021-12-21T05:55:54.3Z 990a0f9b-c075-419c-81e9-86b105006d24 Traceback (most recent call last):
File "/var/task/python_streaming_function.py", line 261, in lambda_handler
return _lambda_handler(event, context)
File "/var/task/python_streaming_function.py", line 255, in _lambda_handler
post_to_es(es_payload) # Post to ES with exponential backoff
File "/var/task/python_streaming_function.py", line 102, in post_to_es
payload, es_region, creds, es_endpoint, '/_bulk')
File "/var/task/python_streaming_function.py", line 76, in post_data_to_es
raise ES_Exception(res.status_code, res._content)
python_streaming_function.ES_Exception: ES_Exception: status_code=400, payload=b'
{ "Message": "You have exceeded the number of permissible concurrent requests with unique IAM Identities. Please retry." }
'
In order to get the streaming function to work in a timely fashion (#4695) I have set “BatchSize”:100 & “MaximumBatchingWindowInSeconds”:2 - wondering if these settings might have an impact or if there are other configuration settings I should set.
Secondly, if this is an issue that can only be fixed with an update to the python Lambda function itself, can you confirm the latest V2 OpenSearch lambda function has no issue like this - that it retries if there is a failure like this.
Expected behavior
Any update in DynamoDB streams to ElasticSearch in a fail-safe manner.
Reproduction steps
See description
GraphQL schema(s)
# Put schemas below this line
Log output
# Put your logs below this line
Additional information
No response
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (3 by maintainers)
Hi @cjihrig - thanks for pointing me in the right direction.
I logged an issue with AWS OpenSearch Support and got this response. I’m including it here in case others run into this problem.
"…The above error happens when an Amazon OpenSearch node receives signed requests for authentication. Every node which received signed requests will interact with our internal service to validate an IAM access key and secret key included in the signature. The request rate is limited and throttled at each node level. This is done so that the internal service is not overloaded.
Unique IAM Identities mean unique access keys and secret keys. With the current setup, you might be using lot of unique credentials to sign the request that would be causing the problem. IE you may have many different clients sending signed requests simultaneously to this cluster to index or search data and all of those clients are using different IAM roles or credentials which have different secret and access keys.
To resolve the issue, the workarounds will be:
[A] Reducing variety of credentials (the number of IAM entities sending signed requests to the cluster) should help. To deal with this, you need to use the same credentials for your request for some time so that they won’t get throttled. Only unique IAM credentials matter. Requests with the same credentials won’t be throttled.
[B] The threshold for throttling is per node based. That means adding nodes to the cluster can increase limit.
[C] Retry the failed requests after some interval.
[D] Gradually increase the requests. Please try hitting the cluster with requests that have same signed requests or not more unique credentials in the requests so that the limit is not reached rather than hitting the cluster in a bulk. (Load testing)…"
Hi @cjihrig - just putting this here in case someone else runs into the same issue. I followed the above advice from OpenSearch support, but it made no differences (doubling the number of nodes).
Eventually OpenSearch Support said "We will increase the cache if confirm that you want it to be increased, so we need a confirmation to proceed. As stated by the internal team : “Increasing cache size may or may not improve cache hit ratio and thus reduce throttling”. Since the support team increased the cache limit on the OpenSearch cluster this issue has not reoccurred.