Redshift IAM Credential Expiration
See original GitHub issueDescribe the feature
The option to “autorefresh” temporary Redshift credentials acquired through the Redshift GetClusterCredentials API call.
Describe alternatives you’ve considered
The only alternative to this I can think of is separating out models in to separate calls to dbt run
. While this would
technically work it would be nice to not have to do this to run all our dbt models at once. If there is an existing work around for this I’m also open to that as well.
Additional context
This Feature is Redshift specific and I believe would only require changes to the Redshift Plugin. My team uses to DBT with Redshift and uses the IAM temporary credentials method for both convenience and and security purposes. We have several long running DBT process and have run into issues several times with the temporary credentials expiring in the middle of the process. This is fine for individual models as the connection to Redshift remains alive after the credentials expire however as soon as DBT attempts to make another connection to Redshift with the now expired credentials all subsequent models in the run will fail.
There was some mention of this in original PR that introduced the IAM method as an option for authenticating to Redshift but it seems it was ultimately decided that the credentials expiring in the time the user a lots to them is fine. This makes sense but the one catch is Amazon doesn’t allow an iam_duration_seconds
value greater than 3600 seconds.
I would like to propose a Time to Live Cache be used to store the temporary username and password provided by Redshift. The cache can be set to expire slightly before the credentials would expire. Then, when the credentials are accessed by something in dbt the cache can be checked and if there is a miss the Redshift GetClusterCredentials API call can be made again to replace them. We have used this solution in our own code for long running processes that make multiple queries to Redshift and it has worked smoothly.
Who will this benefit?
I believe it will benefit anyone who uses DBT with Redshift that has multiple models that may run longer than an hour.
Are you interested in contributing this feature?
I have looked into the Redshift Plugin and would be willing to contribute this myself. I am however a first time to contributor to DBT so would most likely be looking for some guidance around whether the solution I have in mind would be acceptable.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
So, I’m here to eat my words. I was mistaken about how we were authenticating to Redshift. We were manually making the
GetClusterCredentialsCall
and passing the username and password returned to us into the profile. We’re going to double check that using the iam method correctly fixes the issue but after that I’m gonna close this. Thanks for the quick responses!right on @livinlefevreloca! Everything you said matches my understanding as well - really appreciate you diving deep into this one!
Please do feel free to close this ticket out once you’ve confirmed that the IAM auth method is working as intended, and let us know if there’s anything else we can help out with 😃