Zero-downtime deployments
See original GitHub issuetl;dr
The lambdas are unreachable for a while during deployment due to CloudFront referring to the old function version which no longer exists. Since function.latestVersion
is not allowed in Lamba@Edge, I think the next best thing would be to retain the previous version until deployment is complete, and then clean up old function versions.
More Info
Is your feature request related to a problem? Please describe.
At the moment, using the latest versions of Builder
from @sls-next/cdk-core
and NextJSLambdaEdge
from @sls-next/cdk-construct
(I haven’t tested the serverless component), all lambdas are unreachable for about 3 minutes while the CloudFront distribution is being updated. All responses that were not already in CloudFront’s edge cache return a 503 during this time, until the CloudFront distribution is done updating.
I believe this is because the previous versions of the lambdas are being deleted (as per the currentVersion.deletionPolicy
), and the CloudFront distribution uses the functionVersion: this.defaultNextLambda.currentVersion
. At the moment, CloudFormation first creates the new lambda version and deletes the previous version, and then tells CloudFront to start using the new version. This means that for a short while, the distribution still refers to the ARN of the now deleted previous version. Once the distribution is done deploying, it refers to the new ARN and all the responses work again.
Describe alternatives you’ve considered
I have checked what happens when I change the lamba versions’ deletionPolicy
to RETAIN
, and indeed I no longer experience the 503 errors in that case, even for pages that use getServerSideProps
. We could enable lambda retention for now in our projects, and clean up old versions manually. This wouldn’t be ideal in my opinion though, because all devs would need to remember, or otherwise build their own automated cleanup solution.
I’ve also tried using functionVersion: this.defaultLambda.latestVersion
as I’m sure you have as well, only to be told off by AWS that this is not supported for Lambda@Edge (but why though 😢). I’ve also looked into retrieving the lambda by its "live"
alias, but this would only work if the previous lambdas are retained, in which case the alias itself doesn’t add anything.
Describe the solution you’d like
What I think would be a decent solution (unless there’s an actual proper way to do this in AWS that I haven’t found; please let me know if that’s the case 😄), is to first create the new version of the lambda without deleting the current version (i.e. DeletionPolicy.RETAIN
), then update the CloudFront distribution, and then delete the previous version(s) of the lambda, all as part of the CDK/SLS deployment.
Is there a way of implementing this that doesn’t feel like a workaround? (Wouldn’t it be nice if AWS had a FunctionVersionDeletionPolicy.RETAIN_ONE
or something 😏) And I’m also curious: does anyone else experience this behavior, or did I implement something incorrectly?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:12
- Comments:8 (4 by maintainers)
Top GitHub Comments
@thijsdaniels We are also having this issue, except in our case we have had our lambdas be unreachable for upwards of 15 minutes during a build, but this generally only happens after we haven’t pushed code for a few days, like after a weekend. This only started happening about a month and a half ago for us.
Thanks for the
deletionPolicy
tip, we will try that for now so at least our visitors don’t experience the 503 errors.@gengoro thanks for this tip. In your experience could this also lead to a lambda returning a 503? I haven’t seen any spikes in 404 errors during our deployments, only 503s, but still may be related/