Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Very high cold start times

See original GitHub issue

Describe the bug I’m experiencing Invocation times of more than 7 seconds for cold lambdas. Subsequent invocations take ~250ms.

I’m using the @sls-next/lambda-at-edge@1.6.0-alpha.0 package with serverless trace mode and deploy the lambdas/public resources using CDK. My (zipped) bundle size is 6.9MB.

First invocation: Screenshot 2020-08-15 at 13 57 21

Second invocation (within a few minutes after the first invocation): Screenshot 2020-08-15 at 13 57 46

Expected behavior Invocation duration for cold lambdas should be similar to hot lambdas. It shouldn’t be 30 times more.

Question How to measure/debug the performance of the default-handler? The performance problem seems related to the handler code rather than resource initialization during cold starts.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:42 (26 by maintainers)

Top GitHub Comments

3reactions

danielcondemarincommented, Aug 17, 2020

I fear Lambda@Edge is simply not able to deliver the performance I expected. I’m considering to give Cloudflare workers a try, or just go back to CRA for now.

I wouldn’t put it on Lambda@Edge though, doesn’t matter how good the runtime is if the dependencies in play aren’t being efficient enough. In terms of cold starts I forgot to mention AWS announced https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/ which is a game changer. It’s still not supported by Lambda@Edge but may be at some point 🤞 I’m going to close this for now and good luck with everything. Feel free to open a new issue if you stick with next and this project and need further help 👍🏻

3reactions

dphangcommented, Aug 16, 2020

Alright, I’ve created a repro here.

Repro link: https://github.com/dphang/nextjs-repros.

Some notes:

I added all the dependencies I’m using in my own project, which include React, NextJS, MaterialUI, etc. But there is only 1 SSR index page which just has a simple button on it and I’m only using a few components from MaterialUI. So I guess the Lambda bundle size would be quite small.
I used k6 (https://github.com/loadimpact/k6) to test the request performance (CloudFront URL - includes CloudFront + Lambda + network time). Made sure to reset/redeploy the Lambda each time and ran the following command on the CloudFront URL:
If you are reproing, make sure to rename the app in serverless.yml since I currently use that bucket.

K6_BASE_URL=https://xxx.cloudfront.net k6 run perf/perf-test.js --vus 50 --duration 30s

(Note: this specific script is running 50 vus (virtual users) that will hit the endpoint every 2-5 seconds for 30 seconds total. Total of ~300-400 iterations within 30 seconds. Feel free to tune this but this at least would invoke a bunch of cold starts due to simultaneous requests). Also, I redacted the cloudfront URL since that’s my own AWS account, feel free to use your own)

With serverless target (Lambda zip ~1.4 MB, index.js page was ~2.9 MB uncompressed), got these results:

✓ check_failure_rate…: 0.00% ✓ 0 ✗ 448 checks…: 100.00% ✓ 898 ✗ 0
data_received…: 1.3 MB 44 kB/s data_sent…: 64 kB 2.1 kB/s http_req_blocked…: avg=6.13ms min=0s med=0s max=224.2ms p(90)=47.02ms p(95)=50.78ms http_req_connecting…: avg=1.6ms min=0s med=0s max=16.97ms p(90)=12.69ms p(95)=14.18ms ✓ http_req_duration…: avg=55ms min=27.18ms med=35.9ms max=701.61ms p(90)=141.08ms p(95)=162.49ms http_req_receiving…: avg=92.86µs min=31µs med=72µs max=2.76ms p(90)=108.19µs p(95)=127µs
http_req_sending…: avg=60.93µs min=20µs med=52µs max=1.15ms p(90)=86µs p(95)=100µs
http_req_tls_handshaking…: avg=4.42ms min=0s med=0s max=183.36ms p(90)=33.3ms p(95)=35.58ms http_req_waiting…: avg=54.85ms min=27.07ms med=35.76ms max=701.44ms p(90)=140.94ms p(95)=162.38ms http_reqs…: 449 14.966646/s iteration_duration…: avg=3.52s min=269.52ms med=3.46s max=5.17s p(90)=4.72s p(95)=4.92s
iterations…: 399 13.299982/s vus…: 50 min=50 max=50 vus_max…: 50 min=50 max=50

I looked at Lambda logs and the Lambda execution time was generally around 350-400 ms and 150 ms of that was initialization time (this was manual process, not sure of an easy way to extract percentile data from the CloudWatch Lambda logs. If someone knows a good way, please let me know 😃. So I guess the difference is network latency.

With serverless-trace target (interestingly, the Lambda zip is higher here at 3.7 MB compressed which is not what I would expect. node_modules is taking ~22 MB uncompressed and the index.js page is ~100 kB.), I got these results:

✓ check_failure_rate…: 0.00% ✓ 0 ✗ 426 checks…: 100.00% ✓ 854 ✗ 0
data_received…: 1.3 MB 42 kB/s data_sent…: 63 kB 2.1 kB/s http_req_blocked…: avg=6.87ms min=0s med=0s max=162.3ms p(90)=53.37ms p(95)=54.91ms http_req_connecting…: avg=1.95ms min=0s med=0s max=19.62ms p(90)=15ms p(95)=16.57ms ✓ http_req_duration…: avg=187.7ms min=27.32ms med=37.11ms max=1.5s p(90)=1.31s p(95)=1.38s
http_req_receiving…: avg=105.9µs min=40µs med=74µs max=2.05ms p(90)=103.4µs p(95)=128.69µs http_req_sending…: avg=57.1µs min=21µs med=48µs max=1.17ms p(90)=82.4µs p(95)=109.69µs http_req_tls_handshaking…: avg=4.86ms min=0s med=0s max=147.46ms p(90)=35.44ms p(95)=39.04ms http_req_waiting…: avg=187.54ms min=27.23ms med=36.99ms max=1.5s p(90)=1.31s p(95)=1.38s
http_reqs…: 427 14.233232/s iteration_duration…: avg=3.71s min=1.66s med=3.8s max=6.02s p(90)=4.86s p(95)=5.03s
iterations…: 377 12.566577/s vus…: 50 min=50 max=50 vus_max…: 50 min=50 max=50

Similarly in Lambda logs I was getting 1000 ms durations with init durations of ~150 ms.

Ran each of the k6 tests a few times just to be sure and it does seem that indeed serverless-trace target has higher times on cold start (1.3-1.5 sec vs. 700 ms request time, 1 sec vs. 350 ms Lambda execution time). But Lambda init was about the same at 150 ms, so I don’t think the Lambda zip size is affecting it much.

I’m not a JS expert (I normally work in Python/Java) but I suspect:

node_modules seems too large and can be reduced further, and also could it be bundled into 1 file?
As mentioned by @danielcondemarin, with this target now the index.js file is minimal (100 kB) and it needs to do a bunch of requires, which could add to JS execution time.