Issue running scrapy-playwright with --single-process flag in chromium and AWS Lambda
See original GitHub issueHello! Thank you for your hard work on this scrapy-plugin - it has been really valuable!
I am having some issues running scrapy-playwright in an AWS lambda docker container. Running chromium and playwright in lambda is a pretty well understood problem, but for some reason the combination of playwright + scrapy in a docker container does not seem to work well.
I am getting the following error:
playwright._impl._api_types.Error: Target page, context or browser has been closed2021-10-14 22:24:57
[scrapy.core.scraper] ERROR: Error downloading <GET https://example.com>
I have recreated a simple example here: https://github.com/maxneuvians/scrapy-playwright-lambda-debug
In working on this example I have found that chromium crashes if I pass the '--single-process'
flag. If I comment it out, scrapy runs fine. If you want to test it you can do the following:
docker build -t sp-test -f Dockerfile .
docker run -p 9000:8080 sp-test
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
The docker container will build it in a way that AWS lambda expects it to be invoked. If you post the curl
you will see scrapy executing successfully. Uncomment the flag, re-build and run, and it throws the error.
Unfortunately, conventional wisdom on the internet seems to imply that '--single-process'
is a mandatory flag for headless chromium on AWS Lambda (Looking at puppeteer and other implementations, ex: https://github.com/JupiterOne/playwright-aws-lambda/blob/main/src/chromium.ts#L63).
Let me know if you have any ideas that could help!
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
Last update for future searchers:
I had to run
playwright install chromium
after I had install all dependenciesplaywright
,scrapy
, andscrapy-playwright
. If I ran it after installplaywright
and notsceapy
it would installchromium-907428
. If I installed chromium after all the dependencies, it would installchromium-920619
. I think it also did the permissions properly.It looks like installing playwright as part of the docker build process causes a failure with the flag. When I manually run
playwright install chromium
in a shell for the container, it installs and the flag works. So I don’t think this has anything to do withscrapy-playwright
anymore directly but something with permissions on the executable. Thank you for the rubber ducking !