out of memory when using cheerio in crawler
See original GitHub issueHi, I’m using cheerio to parse html page in a simple crawler as below, the system quickly go out of memory when processing tens of pages, my computer has more than 4GB free memory, I notice that cheerio has a load operation, do I need to unload the page explicitly or some how to let cheerio release the memory after the processing finish?
var cheerio = require('cheerio');
var request = require('request');
function parseSpecificRoom(url)
{
request({uri: url}, function(err, resp, body) {
var $ = cheerio.load(body);
var price = $('.house-price').text();
var pay = $('.pay-method').text();
var type = $('.house-type').text().replace(/\s/g, '');
var location = $('.xiaoqu').text().replace(/\s/g, '');
var phone = $('.tel-num').text().replace(/\s/g, '');
console.log(price + ', ' + pay + ', ' + type + ', ' + location + ', ' + phone)
});
}
function parsePage(index)
{
request({uri:'http://sz.58.com/chuzu/pn' + index}, function(err, resp, body) {
var $ = cheerio.load(body);
var zufang = $('#infolist').children('table').eq(1).children('tr');
zufang.each(function(i, elem) {
var url = $(this).children().eq(1).children().eq(0).attr('href');
parseSpecificRoom(url)
});
});
}
for(var i = 1; i < 100; i++)
parsePage(i);
Issue Analytics
- State:
- Created 8 years ago
- Comments:19 (6 by maintainers)
Top Results From Across the Web
Cheerio memory leak - Stack Overflow
I'm using cheerio to parse the results of otodom.pl advs. I'm experiencing problems regarding memory. Node 8.10.0 and cheerio@1.0.0-rc.3.
Read more >CheerioCrawler guide - Crawlee
CheerioCrawler crawls by making plain HTTP requests to the provided URLs using the specialized got-scraping HTTP client. The URLs are fed to the...
Read more >Cheerio crawler - Apify SDK
This example demonstrates how to use CheerioCrawler to crawl a list of URLs from an external file, load each URL using a plain...
Read more >Intro To Web Scraping With Node.js & Cheerio - YouTube
In this video we will take a look at the Node.js library, Cheerio which is a jQuery like tool for the server used...
Read more >cheerio
Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API. ϟ Blazingly fast: Cheerio works...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Actually, I ran into the “out of memory” error recently and tried to replace cheerio with
whacko
, which solved my problem. thanks a lot, but will cheerio solve this problem in the future?@mike442144 I’m working on this