question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Zero-downtime deployments

See original GitHub issue

tl;dr

The lambdas are unreachable for a while during deployment due to CloudFront referring to the old function version which no longer exists. Since function.latestVersion is not allowed in Lamba@Edge, I think the next best thing would be to retain the previous version until deployment is complete, and then clean up old function versions.

More Info

Is your feature request related to a problem? Please describe.

At the moment, using the latest versions of Builder from @sls-next/cdk-core and NextJSLambdaEdge from @sls-next/cdk-construct (I haven’t tested the serverless component), all lambdas are unreachable for about 3 minutes while the CloudFront distribution is being updated. All responses that were not already in CloudFront’s edge cache return a 503 during this time, until the CloudFront distribution is done updating.

I believe this is because the previous versions of the lambdas are being deleted (as per the currentVersion.deletionPolicy), and the CloudFront distribution uses the functionVersion: this.defaultNextLambda.currentVersion. At the moment, CloudFormation first creates the new lambda version and deletes the previous version, and then tells CloudFront to start using the new version. This means that for a short while, the distribution still refers to the ARN of the now deleted previous version. Once the distribution is done deploying, it refers to the new ARN and all the responses work again.

Describe alternatives you’ve considered

I have checked what happens when I change the lamba versions’ deletionPolicy to RETAIN, and indeed I no longer experience the 503 errors in that case, even for pages that use getServerSideProps. We could enable lambda retention for now in our projects, and clean up old versions manually. This wouldn’t be ideal in my opinion though, because all devs would need to remember, or otherwise build their own automated cleanup solution.

I’ve also tried using functionVersion: this.defaultLambda.latestVersion as I’m sure you have as well, only to be told off by AWS that this is not supported for Lambda@Edge (but why though 😢). I’ve also looked into retrieving the lambda by its "live" alias, but this would only work if the previous lambdas are retained, in which case the alias itself doesn’t add anything.

Describe the solution you’d like

What I think would be a decent solution (unless there’s an actual proper way to do this in AWS that I haven’t found; please let me know if that’s the case 😄), is to first create the new version of the lambda without deleting the current version (i.e. DeletionPolicy.RETAIN), then update the CloudFront distribution, and then delete the previous version(s) of the lambda, all as part of the CDK/SLS deployment.

Is there a way of implementing this that doesn’t feel like a workaround? (Wouldn’t it be nice if AWS had a FunctionVersionDeletionPolicy.RETAIN_ONE or something 😏) And I’m also curious: does anyone else experience this behavior, or did I implement something incorrectly?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:12
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
jlegreidcommented, Mar 28, 2022

@thijsdaniels We are also having this issue, except in our case we have had our lambdas be unreachable for upwards of 15 minutes during a build, but this generally only happens after we haven’t pushed code for a few days, like after a weekend. This only started happening about a month and a half ago for us.

Thanks for the deletionPolicy tip, we will try that for now so at least our visitors don’t experience the 503 errors.

0reactions
jlegreidcommented, Jun 21, 2022

@gengoro thanks for this tip. In your experience could this also lead to a lambda returning a 503? I haven’t seen any spikes in 404 errors during our deployments, only 503s, but still may be related/

Read more comments on GitHub >

github_iconTop Results From Across the Web

What are Zero Downtime Deployments? - CraftQuest
Zero downtime deployment is a deployment method where your website or application is never down or in an unstable state during the deployment...
Read more >
Deployment strategies to achieve zero downtime in production
This article explains deployment strategies to achieve zero downtime and service disruption. Blue-Green Deployment. Here idea is to have two ...
Read more >
Zero Downtime Deployment Techniques - Blue-Green ...
A Blue-Green deployment is a relatively simple way to achieve zero downtime deployments by creating a new, separate environment for the new ...
Read more >
Zero Downtime Deployment with a Database - Spring
If you have a stateless application that doesn't store any data in the database then you can start doing zero downtime deployment right...
Read more >
What Are Zero Downtime Deployments? - Pure Storage
Zero downtime deployments utilize a variety of methods to prevent interruptions in your web-based services. They protect your data while keeping the original ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found