question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allocation failed - JavaScript heap out of memory

See original GitHub issue

Hi,

This issue related to #18

The error still occurred with scrapy-playwright 0.0.4. The Scrapy script crawled about 2500 domains in 10k from majestic and crashed with the last error JavaScript heap out of memory. So I think this is a bug.

My main code:

domain = self.get_domain(url=url)

context_name = domain.replace('.', '_')
yield scrapy.Request(
    url=url,
    meta={
        "playwright": True,
        "playwright_page_coroutines": {
            "screenshot": PageCoroutine("screenshot", domain + ".png"),
        },
        # Create new content
        "playwright_context": context_name,
    },
)

My env:

Python 3.8.10
Scrapy 2.5.0
playwright 1.12.1
scrapy-playwright 0.0.04

The detail of error:

2021-07-17 14:47:48 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.costco.com/>: HTTP status code is not handled or not allowed
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: 0xa18150 node::Abort() [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 2: 0xa1855c node::OnFatalError(char const*, char const*) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 3: 0xb9715e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 4: 0xb974d9 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 5: 0xd54755  [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 6: 0xd650a8 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 7: 0xd2bd9d v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::RootIndex, int, v8::internal::Object, v8::internal::AllocationType) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 8: 0xd2be90 v8::internal::Handle<v8::internal::FixedArray> v8::internal::Factory::NewFixedArrayWithMap<v8::internal::FixedArray>(v8::internal::RootIndex, int, v8::internal::AllocationType) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 9: 0xf5abd0 v8::internal::OrderedHashTable<v8::internal::OrderedHashMap, 2>::Allocate(v8::internal::Isolate*, int, v8::internal::AllocationType) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
10: 0xf5ac81 v8::internal::OrderedHashTable<v8::internal::OrderedHashMap, 2>::Rehash(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashMap>, int) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
11: 0xf5b2cb v8::internal::OrderedHashTable<v8::internal::OrderedHashMap, 2>::EnsureGrowable(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashMap>) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
12: 0x1051b38 v8::internal::Runtime_MapGrow(int, unsigned long*, v8::internal::Isolate*) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
13: 0x140a8f9  [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
Aborted (core dumped)
2021-07-17 14:48:34 [scrapy.extensions.logstats] INFO: Crawled 2533 pages (at 15 pages/min), scraped 2362 items (at 12 items/min)

Temporary fix: I replaced line 166 with await page.context.close() to close current context in handler.py because my script had one context per one domain. It will fix the error Allocation failed - JavaScript heap out of memory and the Scrapy script crawled all 10k domains, but the successful rate was about 72% in comparison with no added code (about 85% successful rate). Also, when I added the new code, the new error was:

2021-07-17 15:04:59 [scrapy.core.scraper] ERROR: Error downloading <GET http://usatoday.com>
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 824, in adapt
    extracted = result.result()
  File "/home/ubuntu/python/scrapy-playwright/scrapy_playwright/handler.py", line 138, in _download_request
    result = await self._download_request_with_page(request, page)
  File "/home/ubuntu/python/scrapy-playwright/scrapy_playwright/handler.py", line 149, in _download_request_with_page
    response = await page.goto(request.url)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 6006, in goto
    await self._async(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_page.py", line 429, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_frame.py", line 117, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 36, in send
    return await self.inner_send(method, params, False)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 54, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Navigation failed because page was closed!

...

2021-07-17 19:31:15 [asyncio] ERROR: Task exception was never retrieved
future: <Task finished name='Task-38926' coro=<Route.continue_() done, defined at /home/ubuntu/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py:544> exception=Error('Target page, context or browser has been closed')>
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 582, in continue_
    await self._async(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_network.py", line 207, in continue_
    await self._channel.send("continue", cast(Any, overrides))
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 36, in send
    return await self.inner_send(method, params, False)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 54, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Target page, context or browser has been closed

....

2021-07-18 03:51:34 [scrapy.core.scraper] ERROR: Error downloading <GET http://bbc.co.uk>
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 824, in adapt
    extracted = result.result()
  File "/home/ubuntu/python/scrapy-playwright/scrapy_playwright/handler.py", line 138, in _download_request
    result = await self._download_request_with_page(request, page)
  File "/home/ubuntu/python/scrapy-playwright/scrapy_playwright/handler.py", line 165, in _download_request_with_page
    body = (await page.content()).encode("utf8")
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 5914, in content
    await self._async("page.content", self._impl_obj.content())
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_page.py", line 412, in content
    return await self._main_frame.content()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_frame.py", line 325, in content
    return await self._channel.send("content")
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 36, in send
    return await self.inner_send(method, params, False)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 54, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Execution context was destroyed, most likely because of a navigation.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
xanragcommented, Sep 14, 2021

@xanrag Hi, did you get “Aborted (core dumped)” error anymore?

I’m not sure. When I run scrapy in celery as a separate process it doesn’t log to the file when it crashes. There is something still going on though because ocassionally it stops and keeps putting out the same page/item count indefinitely without stopping and I have another issue where it doesn’t kill the chrome process correctly but I’ll investigate more and start another issue for that if I find anything. (A week of use spawned a quarter of a million zombie processes…)

1reaction
phongtnitcommented, Sep 8, 2021

@xanrag Hi, did you get “Aborted (core dumped)” error anymore?

I added export NODE_OPTIONS=--max-old-space-size=8192 in ~/.profile file and run Scrapy script. However, the error Aborted (core dumped) still occurs when Scrapy Playwright crawled more than 10k urls, sometime about 100k urls.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Fix JavaScript Heap Out of Memory Error - MakeUseOf
A common problem while working on a JavaScript Node.js project is the “JavaScript heap out of memory” error. This error usually occurs when ......
Read more >
JavaScript heap out of memory - Snyk Support
This generally occurs on larger projects where the default amount of memory allocated by Node (1.5gb) is insufficient to complete the command successfully....
Read more >
Node.js heap out of memory - javascript - Stack Overflow
I have a 64-bit CPU and I've installed x86 node version, which caused the CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory ......
Read more >
How to solve JavaScript heap out of memory error
To fix JavaScript heap out of memory error, you need to add the --max-old-space-size option when running your npm command. ... Alternatively, you ......
Read more >
JavaScript Heap Out Of Memory Error - OpenReplay Blog
A quick solution that you can use to fix "Heap Out Of Memory Error" in JavaScript. We lay out the causes and how...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found