question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

out of memory when using cheerio in crawler

See original GitHub issue

Hi, I’m using cheerio to parse html page in a simple crawler as below, the system quickly go out of memory when processing tens of pages, my computer has more than 4GB free memory, I notice that cheerio has a load operation, do I need to unload the page explicitly or some how to let cheerio release the memory after the processing finish?

var cheerio = require('cheerio');
var request = require('request');


function parseSpecificRoom(url)
{
    request({uri: url}, function(err, resp, body) {
        var $ = cheerio.load(body);
        var price = $('.house-price').text();
        var pay = $('.pay-method').text();
        var type = $('.house-type').text().replace(/\s/g, '');
        var location = $('.xiaoqu').text().replace(/\s/g, '');
        var phone = $('.tel-num').text().replace(/\s/g, '');
        console.log(price + ', ' + pay + ', ' + type + ', ' + location + ', ' + phone)
    });
}

function parsePage(index)
{
    request({uri:'http://sz.58.com/chuzu/pn' + index}, function(err, resp, body) {
        var $ = cheerio.load(body);
        var zufang = $('#infolist').children('table').eq(1).children('tr');
        zufang.each(function(i, elem) {
            var url = $(this).children().eq(1).children().eq(0).attr('href');
            parseSpecificRoom(url)
        });
    });
}

for(var i = 1; i < 100; i++)
    parsePage(i);

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:19 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
mike442144commented, Aug 31, 2016

Actually, I ran into the “out of memory” error recently and tried to replace cheerio with whacko, which solved my problem. thanks a lot, but will cheerio solve this problem in the future?

2reactions
luanmunizcommented, Aug 31, 2016

@mike442144 I’m working on this

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cheerio memory leak - Stack Overflow
I'm using cheerio to parse the results of otodom.pl advs. I'm experiencing problems regarding memory. Node 8.10.0 and cheerio@1.0.0-rc.3.
Read more >
CheerioCrawler guide - Crawlee
CheerioCrawler crawls by making plain HTTP requests to the provided URLs using the specialized got-scraping HTTP client. The URLs are fed to the...
Read more >
Cheerio crawler - Apify SDK
This example demonstrates how to use CheerioCrawler to crawl a list of URLs from an external file, load each URL using a plain...
Read more >
Intro To Web Scraping With Node.js & Cheerio - YouTube
In this video we will take a look at the Node.js library, Cheerio which is a jQuery like tool for the server used...
Read more >
cheerio
Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API. ϟ Blazingly fast: Cheerio works...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found