Allocation failed - JavaScript heap out of memory
See original GitHub issueHi,
This issue related to #18
The error still occurred with scrapy-playwright 0.0.4
. The Scrapy script crawled about 2500 domains in 10k from majestic and crashed with the last error JavaScript heap out of memory
. So I think this is a bug.
My main code:
domain = self.get_domain(url=url)
context_name = domain.replace('.', '_')
yield scrapy.Request(
url=url,
meta={
"playwright": True,
"playwright_page_coroutines": {
"screenshot": PageCoroutine("screenshot", domain + ".png"),
},
# Create new content
"playwright_context": context_name,
},
)
My env:
Python 3.8.10
Scrapy 2.5.0
playwright 1.12.1
scrapy-playwright 0.0.04
The detail of error:
2021-07-17 14:47:48 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.costco.com/>: HTTP status code is not handled or not allowed
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: 0xa18150 node::Abort() [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
2: 0xa1855c node::OnFatalError(char const*, char const*) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
3: 0xb9715e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
4: 0xb974d9 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
5: 0xd54755 [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
6: 0xd650a8 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
7: 0xd2bd9d v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::RootIndex, int, v8::internal::Object, v8::internal::AllocationType) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
8: 0xd2be90 v8::internal::Handle<v8::internal::FixedArray> v8::internal::Factory::NewFixedArrayWithMap<v8::internal::FixedArray>(v8::internal::RootIndex, int, v8::internal::AllocationType) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
9: 0xf5abd0 v8::internal::OrderedHashTable<v8::internal::OrderedHashMap, 2>::Allocate(v8::internal::Isolate*, int, v8::internal::AllocationType) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
10: 0xf5ac81 v8::internal::OrderedHashTable<v8::internal::OrderedHashMap, 2>::Rehash(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashMap>, int) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
11: 0xf5b2cb v8::internal::OrderedHashTable<v8::internal::OrderedHashMap, 2>::EnsureGrowable(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashMap>) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
12: 0x1051b38 v8::internal::Runtime_MapGrow(int, unsigned long*, v8::internal::Isolate*) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
13: 0x140a8f9 [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
Aborted (core dumped)
2021-07-17 14:48:34 [scrapy.extensions.logstats] INFO: Crawled 2533 pages (at 15 pages/min), scraped 2362 items (at 12 items/min)
Temporary fix: I replaced line 166 with await page.context.close()
to close current context in handler.py because my script had one context per one domain. It will fix the error Allocation failed - JavaScript heap out of memory
and the Scrapy script crawled all 10k domains, but the successful rate was about 72% in comparison with no added code (about 85% successful rate). Also, when I added the new code, the new error was:
2021-07-17 15:04:59 [scrapy.core.scraper] ERROR: Error downloading <GET http://usatoday.com>
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/ubuntu/.local/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
return (yield download_func(request=request, spider=spider))
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 824, in adapt
extracted = result.result()
File "/home/ubuntu/python/scrapy-playwright/scrapy_playwright/handler.py", line 138, in _download_request
result = await self._download_request_with_page(request, page)
File "/home/ubuntu/python/scrapy-playwright/scrapy_playwright/handler.py", line 149, in _download_request_with_page
response = await page.goto(request.url)
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 6006, in goto
await self._async(
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_page.py", line 429, in goto
return await self._main_frame.goto(**locals_to_params(locals()))
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_frame.py", line 117, in goto
await self._channel.send("goto", locals_to_params(locals()))
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 36, in send
return await self.inner_send(method, params, False)
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 54, in inner_send
result = next(iter(done)).result()
playwright._impl._api_types.Error: Navigation failed because page was closed!
...
2021-07-17 19:31:15 [asyncio] ERROR: Task exception was never retrieved
future: <Task finished name='Task-38926' coro=<Route.continue_() done, defined at /home/ubuntu/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py:544> exception=Error('Target page, context or browser has been closed')>
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 582, in continue_
await self._async(
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_network.py", line 207, in continue_
await self._channel.send("continue", cast(Any, overrides))
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 36, in send
return await self.inner_send(method, params, False)
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 54, in inner_send
result = next(iter(done)).result()
playwright._impl._api_types.Error: Target page, context or browser has been closed
....
2021-07-18 03:51:34 [scrapy.core.scraper] ERROR: Error downloading <GET http://bbc.co.uk>
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/ubuntu/.local/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
return (yield download_func(request=request, spider=spider))
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 824, in adapt
extracted = result.result()
File "/home/ubuntu/python/scrapy-playwright/scrapy_playwright/handler.py", line 138, in _download_request
result = await self._download_request_with_page(request, page)
File "/home/ubuntu/python/scrapy-playwright/scrapy_playwright/handler.py", line 165, in _download_request_with_page
body = (await page.content()).encode("utf8")
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 5914, in content
await self._async("page.content", self._impl_obj.content())
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_page.py", line 412, in content
return await self._main_frame.content()
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_frame.py", line 325, in content
return await self._channel.send("content")
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 36, in send
return await self.inner_send(method, params, False)
File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 54, in inner_send
result = next(iter(done)).result()
playwright._impl._api_types.Error: Execution context was destroyed, most likely because of a navigation.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (2 by maintainers)
Top Results From Across the Web
How to Fix JavaScript Heap Out of Memory Error - MakeUseOf
A common problem while working on a JavaScript Node.js project is the “JavaScript heap out of memory” error. This error usually occurs when ......
Read more >JavaScript heap out of memory - Snyk Support
This generally occurs on larger projects where the default amount of memory allocated by Node (1.5gb) is insufficient to complete the command successfully....
Read more >Node.js heap out of memory - javascript - Stack Overflow
I have a 64-bit CPU and I've installed x86 node version, which caused the CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory ......
Read more >How to solve JavaScript heap out of memory error
To fix JavaScript heap out of memory error, you need to add the --max-old-space-size option when running your npm command. ... Alternatively, you ......
Read more >JavaScript Heap Out Of Memory Error - OpenReplay Blog
A quick solution that you can use to fix "Heap Out Of Memory Error" in JavaScript. We lay out the causes and how...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m not sure. When I run scrapy in celery as a separate process it doesn’t log to the file when it crashes. There is something still going on though because ocassionally it stops and keeps putting out the same page/item count indefinitely without stopping and I have another issue where it doesn’t kill the chrome process correctly but I’ll investigate more and start another issue for that if I find anything. (A week of use spawned a quarter of a million zombie processes…)
@xanrag Hi, did you get “Aborted (core dumped)” error anymore?
I added
export NODE_OPTIONS=--max-old-space-size=8192
in~/.profile
file and run Scrapy script. However, the errorAborted (core dumped)
still occurs when Scrapy Playwright crawled more than 10k urls, sometime about 100k urls.