Zombie Process problem.
See original GitHub issueHello,
Recently we talked about this problem in the issues #1823 and #1791.
Environment:
- Puppeteer Version: 1.0.
- Chrome Version: 64.0.3282.71 (https://github.com/adieuadieu/serverless-chrome/releases/tag/v1.0.0-34)
- Platform / OS version: AWS Lambda
- Node.js version: 6.10 (https://aws.amazon.com/about-aws/whats-new/2017/03/aws-lambda-supports-node-js-6-10/)
Use Case:
We are using puppeteer on AWS Lambda. We take a screenshot of given HTML template and upload it to S3 and use this image for future requests It handles over 100 million requests each month. That’s why every process should be atomic and immutable. (AWS Lambda has a disk and process limit.)
Example Code:
const browser = await puppeteer.launch({
args: ['--disable-gpu', '--no-sandbox', '--single-process',
'--disable-web-security', '--disable-dev-profile']
});
const page = await browser.newPage();
await page.goto('https://s3bucket.com/markup/a.html');
const response = await page.screenshot({{ type: 'jpeg', quality: 95 }});
browser.close();
Problem
When we are using example code, we got disk error from AWS Lambda.
Example /tmp
folder:
2018-01-12T14:55:38.553Z a6ef3454-f7a8-11e7-be0f-17f405d5a180 start stdout: total 226084
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:55 .
drwxr-xr-x 21 root root 4096 Jan 12 10:53 ..
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:33 core.headless-chromi.129
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.131
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.135
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.137
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.138
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:51 core.headless-chromi.14
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.15
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:36 core.headless-chromi.169
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.174
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.178
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.180
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:14 .pki
When we investigated these files, we understood that it is a core dump
. We removed these files after the process completed.
When we monitored process list, we saw zombie processes
Zombie chrome processes have been growing increasingly. We can’t kill them. AWS Lambda has a maximum process limit. (max 1024 process) That’s why we reach the lambda limits.
483 1 3.3 1.6 1226196 65408 ? Ssl 22:07 0:05 /var/lang/bin/node --max-old-space-size=870 --max-semi-space-size=54 --max-executable-size=109 --expose-gc /var/runtime/node_modules/awslambda/index.js
483 22 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 73 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 119 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 166 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 214 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 262 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 307 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 353 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 1915 0.0 0.0 0 0 ? Z 22:09 0:00 [sh] <defunct>
We couldn’t use dump-init
on lambda. Because lambda already has an init
system.
How did we fix it? (very hacky method)
We used browser.disconnect()
instead of browser.close()
. We manualy managed chrome processes such as kill
.
Example Code:
browser.on('disconnected', () => {
console.log('sleeping 100ms'); // sleep to eliminate race condition
setTimeout(function(){
console.log(`Browser Disconnected... Process Id: ${process}`);
child_process.exec(`kill -9 ${process}`, (error, stdout, stderr) => {
if (error) {
console.log(`Process Kill Error: ${error}`)
}
console.log(`Process Kill Success. stdout: ${stdout} stderr:${stderr}`);
});
}, 100);
Firstly we didn’t use this method. We only killed the process after browser disconnect. We got the following error:
Error: read ECONNRESET at exports._errnoException (util.js:1018:11) at TCP.onread (net.js:568:26)
I think it looks like a puppeteer process management problem. When we used this method, we didn’t receive any puppeteer related errors. How can we fix it?
Thanks.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:92
- Comments:51 (3 by maintainers)
Top GitHub Comments
I’ve overcome these issues by adding the flags for chrome headless:
I think the child processes are orphaned when the parent is killed and that leads to the zombies. With this, I only get one process and it works pretty well
@bahattincinic - thanks, I’ve tried your method of disconnecting + killing the process, and while it does kill the “main” process returned by
puppeteer.launch()
, each run seems to leave another defunct zombie with a PID that is different than the killed one…What’s worse, when I run
ps aux
right afterpuppeteer.launch()
, aside from the “main” process, there is already one that’s defunct, right away, before running code or trying to kill anything.I’ve tried sending a kill -15, hoping that will allow the main process to clean up its children, but -15 or -9 doesn’t make any difference, so I’m still stuck with an ever-growing list of zombies and rising memory…
Do you have any advice on how you managed to keep it clean of those as well (if you had a similar experience)? I’m also running on Lambda, same
args
used, puppeteer 1.1.1. Thanks!