question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue running scrapy-playwright with --single-process flag in chromium and AWS Lambda

See original GitHub issue

Hello! Thank you for your hard work on this scrapy-plugin - it has been really valuable!

I am having some issues running scrapy-playwright in an AWS lambda docker container. Running chromium and playwright in lambda is a pretty well understood problem, but for some reason the combination of playwright + scrapy in a docker container does not seem to work well.

I am getting the following error:

 playwright._impl._api_types.Error: Target page, context or browser has been closed2021-10-14 22:24:57 
[scrapy.core.scraper] ERROR: Error downloading <GET https://example.com>

I have recreated a simple example here: https://github.com/maxneuvians/scrapy-playwright-lambda-debug

In working on this example I have found that chromium crashes if I pass the '--single-process' flag. If I comment it out, scrapy runs fine. If you want to test it you can do the following:

docker build -t sp-test -f Dockerfile .
docker run -p 9000:8080 sp-test
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'

The docker container will build it in a way that AWS lambda expects it to be invoked. If you post the curl you will see scrapy executing successfully. Uncomment the flag, re-build and run, and it throws the error.

Unfortunately, conventional wisdom on the internet seems to imply that '--single-process' is a mandatory flag for headless chromium on AWS Lambda (Looking at puppeteer and other implementations, ex: https://github.com/JupiterOne/playwright-aws-lambda/blob/main/src/chromium.ts#L63).

Let me know if you have any ideas that could help!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
maxneuvianscommented, Oct 16, 2021

Last update for future searchers:

I had to run playwright install chromium after I had install all dependencies playwright, scrapy, and scrapy-playwright. If I ran it after install playwright and not sceapy it would install chromium-907428. If I installed chromium after all the dependencies, it would install chromium-920619. I think it also did the permissions properly.

1reaction
maxneuvianscommented, Oct 15, 2021

It looks like installing playwright as part of the docker build process causes a failure with the flag. When I manually run playwright install chromium in a shell for the container, it installs and the flag works. So I don’t think this has anything to do with scrapy-playwright anymore directly but something with permissions on the executable. Thank you for the rubber ducking !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Obscure errors on AWS Lambda with container image using ...
I am not setting the --security-opt flag when running the container. Not sure if Lambda allows to set that. I'm setting the --disable-gpu...
Read more >
Scraping without JavaScript using Chromium on AWS Lambda
Scraping without JavaScript using Chromium on AWS Lambda: The Novel ... Ensure that Chromium is started with these flags: --single-process ...
Read more >
Running End-to-End Tests with Playwright on AWS Lambda
Luckily with Chromium you can use a launch flag called --single-process to disable the use of multiple processes. "In this model, both the ......
Read more >
Troubleshoot deployment issues in Lambda
This issue can occur when you specify an Amazon S3 object in a call to UpdateFunctionCode, or use the package and deploy commands...
Read more >
headless chrome & aws lambda - Google Groups
Has anyone tried to run headless chrome inside aws lambda function? ... The problem with this is that, with the --single-process flag, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found