question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

crawlsite.js crashes on PDFs

See original GitHub issue

When the script reaches a PDF, it crashes.

Example:

(node:23872) UnhandledPromiseRejectionWarning: Error: net::ERR_ABORTED at https://code.design/files/code-design-magazine-001.pdf
    at navigate (/Users/martin/Sites/crawlsite/node_modules/puppeteer/lib/Page.js:539:37)
    at process._tickCallback (internal/process/next_tick.js:68:7)
(node:23872) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:23872) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:2
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
aamakerlsacommented, Oct 13, 2018

I found a way around the by making this modification

.filter(el => el.localName === 'a' && el.href && el.href.indexOf('.pdf') < 0) // element is an anchor with an href.

… basically it checks to make sure the href of the a tag does NOT contain .pdf

0reactions
ebidelcommented, Jan 29, 2019

Not sure if that would work but you could try. You’d have to read the response body of every request though 😦

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issue with puppeteer navigating to pdf document when ...
I m trying to scrape a pdf file using puppeteer. ... flag always_open_pdf_externally: true did not work as it crashes chromium/chrome
Read more >
Frequently Asked Questions for SocSciBot 4
SocSciBot keeps attempting to download huge non-text files (e.g., pdf, xml) and crashes. ... Click on the Crawl Site button and wait for...
Read more >
Browser crash/ Hang when closing large pdf files - Reddit
Browser crash/ Hang when closing large pdf files: Like PDFs with more than 600 or so pages. Opening and navigation is fine. Causes...
Read more >
Untitled
Thiruvonam bumper br 27 result, Cr11s kitkat, Stview js, 2008 olympics gymnastics bars ... Water management pdf free download, Naui instructor courses, ...
Read more >
Web Scraping with Python - Open Directory Data Archive
PDF. 115. Microsoft Word and .docx. 117. 8. Cleaning Your Dirty Data. ... The web, without a layer of HTML formatting, CSS styling,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found