question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

All Internet Archive links, used in reference testing, are broken

See original GitHub issue

Apparently the Internet Archive, which we depend on for a very large number of (linked) reference test-cases, has recently changed how they serve PDF files.

Previously, a URL such as http://web.archive.org/web/20160112115354/http://www.fao.org/fileadmin/user_upload/tci/docs/2_About Stacks.pdf would return a PDF file directly. However, now a HTML file is returned instead (which then points to the actual PDF file). For someone cloning the PDF.js repo, and attempting to set-up testing for the first time, this means that all linked test-cases will now fail. Furthermore, it also means that we cannot use the Internet Archive when adding new test-cases.

Since the HTML file returned does contain a direct link to the PDF file, embedded in an <iframe> tag, we could perhaps add special-casing for Internet Archive URLs in test/downloadutils.js, such that the HTML file is first downloaded and parsed to obtain a direct PDF link.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
timvandermeijcommented, Sep 17, 2017

I have one more idea for this. It’s a bit of a hybrid approach for the two solutions. How about in test/downloadutils.js we detect that we are dealing with an Internet Archive URL and perform the if_ transformation there? That way we don’t have to touch the link files (search/replace) and can easily adjust the code if the Internet Archive were to change its format again (or implement HTML parsing there later on if it happens often)? It will be quick and keep the option for HTML parsing open (while we avoid it for now).

0reactions
timvandermeijcommented, Sep 26, 2017

Yes, I’m hoping I can take a look at this before or during the weekend.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fixing Broken Links on the Internet - Internet Archive Blogs
Today the Internet Archive announces a new initiative to fix broken links across the Internet. We have 360 billion archived URLs, and now...
Read more >
Wikipedia fixes 9 million broken links thanks to the Internet ...
You can relax, it seems -- the Internet Archive has 'rescued' 9 million previously broken Wikipedia links by caching them in the Wayback ......
Read more >
Why I link to Wayback Machine instead of original web content
We are now working to have Wayback Machine URLs added IN ADDITION to Live Web links when any new outlinks are added... so...
Read more >
Recovering Broken Web Links and Resources with Archive.org
Once you have located the archived version of the resource you are trying to recover, it is easy to add the resource into...
Read more >
How to cite a work with a nonrecoverable source - APA Style
Test all DOIs in your reference list to ensure that they work and update them as necessary—that is, if the DOI does not...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found