question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Uncaught exception with ImmoScout/ScrapingAnt

See original GitHub issue

Hi,

yesterday I tried the ImmoScout provider for the first time. At least once, scraping/retrieval worked fine and yielded results. After a few hours, though, Fredy crashed with the following error:

node:internal/process/promises:246                                                 
          triggerUncaughtException(err, true /* fromPromise */);                   
          ^                                                                        
                                                                                   
Error: Request failed with status code 404                                         
    at createError (/usr/home/.../fredy/node_modules/axios/lib/core/createError.js:16:15)                                                                    
    at settle (/usr/home/.../fredy/node_modules/axios/lib/core/settle.js:17:12)                                                                              
    at IncomingMessage.handleStreamEnd (/usr/home/.../fredy/node_modules/axios/lib/adapters/http.js:293:11)                                                  
    at IncomingMessage.emit (node:events:402:35)                                   
    at endReadableNT (node:internal/streams/readable:1340:12)                      
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {   
[...]

Does this need to be caught somewhere or am I doing something wrong?

Another issue I faced is:

TypeError: Cannot read properties of undefined (reading 'substring')                                                                                                  
    at normalize (/usr/home/.../fredy/lib/provider/immoscout.js:8:58)                                                                                        
    at Array.map (<anonymous>)  

but this one could be easily fixed by checking if o.link is defined and setting it to empty if not. Apparently some ImmoScout entries do not have a link or the parsing goes wrong.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
modkcommented, Nov 13, 2021

Has been running fine for almost 18 hours now so appears to be fixed. Thanks a lot.

0reactions
modkcommented, Nov 28, 2021

For the record, this is what works somewhat reliably now:

diff --git a/lib/provider/immoscout.js b/lib/provider/immoscout.js
index f7a52a4..1edea1f 100644
--- a/lib/provider/immoscout.js
+++ b/lib/provider/immoscout.js
@@ -9,7 +9,7 @@ function nullOrEmpty(val) {
 function normalize(o) {
   const title = nullOrEmpty(o.title) ? 'NO TITLE FOUND' : o.title.replace('NEU', '');
   const address = nullOrEmpty(o.address) ? 'NO ADDRESS FOUND' : (o.address || '').replace(/\(.*\),.*$/, '').trim();
-  const link = `https://www.immobilienscout24.de${o.link.substring(o.link.indexOf('/expose'))}`;
+  const link = nullOrEmpty(o.link) ? 'NO LINK FOUND' : `https://www.immobilienscout24.de${o.link.substring(o.link.indexOf('/expose'))}`;
   return Object.assign(o, { title, address, link });
 }
 
diff --git a/lib/services/requestDriver.js b/lib/services/requestDriver.js
index 89ccf44..2fb5491 100644
--- a/lib/services/requestDriver.js
+++ b/lib/services/requestDriver.js
@@ -1,7 +1,7 @@
 const axios = require('axios');
 const axiosRetry = require('axios-retry');
 
-axiosRetry(axios, { retryDelay: axiosRetry.exponentialDelay, retries: 3 });
+axiosRetry(axios, { retryDelay: axiosRetry.exponentialDelay, retries: 5 });
 
 function makeDriver(headers = {}) {
   let cookies = '';
@@ -22,14 +22,20 @@ function makeDriver(headers = {}) {
       callback(null, []);
     }
 
-    if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
-      //assume we have gotten a response from scrapingAnt
-      if (cookies.length === 0) {
-        cookies = result.data.cookies;
+    try {
+      if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
+        //assume we have gotten a response from scrapingAnt
+        if (cookies.length === 0) {
+          cookies = result.data.cookies;
+        }
+        callback(null, result.data.content);
+      } else {
+        callback(null, result.data);
       }
-      callback(null, result.data.content);
-    } else {
-      callback(null, result.data);
+
+    } catch (exception) {
+      console.error(`Error while trying to scrape data. Received error: ${exception.message}`);
+      callback(null, []);
     }
   };
 }
Read more comments on GitHub >

github_iconTop Results From Across the Web

web scraping - Python webscraping blocked - Stack Overflow
I want to webscrape german real estate website immobilienscout24.de. I would like to download the HTML of a given URL and then work...
Read more >
ScrapingAnt - Twitter
The easiest way to scrape websites via #API. ScrapingAnt uses the latest Chrome browser and rotates proxies to automate your data mining tasks....
Read more >
API Basics | ScrapingAnt
Documentation of ScrapingAnt web scraping REST API that enables to scrape websites with a headless Chrome browser.
Read more >
React 18 Support - Nolimits4web/Swiper - IssueHint
Uncaught exception with ImmoScout /ScrapingAnt, 10, 2021-11-08 ; "mute" or exclude forked co-roosting dataset from being indexed, 5, 2022-02-23 ; Unauthorized ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found