A set of failing PDFs
See original GitHub issueI recently used ocrmypdf
to mass-OCR my PDFs and a bunch of DjVu files I converted to PDF (which strips the original Tesseract OCR so I needed some way to restore it). Worked very nicely, and I like the better compression over the default ddjvu
output.
Some files failed. I noticed the mention of a test corpus, so I thought you might like a list of failing files (these failed multiple times, so should be reliable test cases) and the errors.
The errors:
The files:
- https://www.gwern.net/docs/cs/1988-borenstein.pdf
- https://www.gwern.net/docs/eva/1999-japanedge-childrenmiyazakicocacola.pdf
- https://www.gwern.net/docs/statistics/decision/2006-drescher-goodandreal.pdf
- https://www.gwern.net/docs/statistics/decision/1962-blackett-studiesofwarnuclearandconventional.pdf
- https://www.gwern.net/docs/statistics/decision/1987-jonker.pdf
- https://www.gwern.net/docs/statistics/causality/2006-papanikolaou.pdf
- https://www.gwern.net/docs/nicotine/1996-foulds.pdf
- https://www.gwern.net/docs/history/1989-stern.pdf
- https://www.gwern.net/docs/sunkcosts/1990-wiklund-a.pdf
- https://www.gwern.net/docs/sunkcosts/1980-dawkins.pdf
- https://www.gwern.net/docs/genetics/correlation/1996-billig.pdf
- https://www.gwern.net/docs/genetics/correlation/1995-wadsworth.pdf
- https://www.gwern.net/docs/genetics/correlation/1989-tambs.pdf
- https://www.gwern.net/docs/genetics/correlation/1988-hewitt.pdf
- https://www.gwern.net/docs/genetics/correlation/1993-petrill.pdf
- https://www.gwern.net/docs/genetics/selection/2014-blasco.pdf
- https://www.gwern.net/docs/genetics/selection/2014-yao.pdf
- https://www.gwern.net/docs/genetics/selection/1980-yoo-3.pdf
- https://www.gwern.net/docs/genetics/selection/2014-montague.pdf
- https://www.gwern.net/docs/genetics/selection/2010-stearns.pdf
- https://www.gwern.net/docs/genetics/selection/1957-clayton.pdf
- https://www.gwern.net/docs/genetics/selection/2004-dekkers.pdf
- https://www.gwern.net/docs/genetics/heritable/1995-serpell-thedomesticdog.pdf
- https://www.gwern.net/docs/genetics/heritable/1989-coon.pdf
- https://www.gwern.net/docs/genetics/heritable/1979-jensen.pdf
- https://www.gwern.net/docs/genetics/heritable/1987-behavioralgeneticsabstracts.pdf
- https://www.gwern.net/docs/genetics/heritable/1989-reed.pdf
- https://www.gwern.net/docs/genetics/heritable/1986-davis-stormoverbiology.pdf
- https://www.gwern.net/docs/genetics/heritable/1995-willis.pdf
- https://www.gwern.net/docs/genetics/heritable/2014-cronqvist.pdf
- https://www.gwern.net/docs/genetics/heritable/1985-defries.pdf
- https://www.gwern.net/docs/music-distraction/2013-pacheco-unguetti.pdf
- https://www.gwern.net/docs/iodine/2005-caldwell.pdf
- https://www.gwern.net/docs/iq/smpy/1986-brody.pdf
- https://www.gwern.net/docs/iq/2014-beaver.pdf
- https://www.gwern.net/docs/iq/2015-daniele.pdf
- https://www.gwern.net/docs/iq/2013-barnes.pdf
- https://www.gwern.net/docs/iq/2010-lynn.pdf
- https://www.gwern.net/docs/iq/2004-van-hiel.pdf
- https://www.gwern.net/docs/rl/2002-schmidhuber.pdf
- https://www.gwern.net/docs/catnip/1976-preti.pdf
- https://www.gwern.net/docs/catnip/1987-tucker.pdf
- https://www.gwern.net/docs/catnip/1976-hill.pdf
- https://www.gwern.net/docs/catnip/1979-bland.pdf
- https://www.gwern.net/docs/catnip/1998-vandenbos.pdf
- https://www.gwern.net/docs/psychology/writing/1989-hartley.pdf
- https://www.gwern.net/docs/psychology/1957-clark.pdf
- https://www.gwern.net/docs/lithium/1990-schrauzer.pdf
- https://www.gwern.net/docs/vitamind/2013-li.pdf
- https://www.gwern.net/docs/japanese/1999-keene-seedsintheheart-teika.pdf
- https://www.gwern.net/docs/japanese/1999-keene-seedsintheheart-shotetsu.pdf
- https://www.gwern.net/docs/biology/1975-southern.pdf
- https://www.gwern.net/docs/culture/2007-cohen.pdf
- https://www.gwern.net/docs/nootropics/1992-heikinheimo.pdf
- https://www.gwern.net/docs/nootropics/2010-giesbrecht.pdf
- https://www.gwern.net/docs/nootropics/2004-juliano.pdf
- https://www.gwern.net/docs/nootropics/2012-pase.pdf
- https://www.gwern.net/docs/modafinil/2004-turner.pdf
Issue Analytics
- State:
- Created 5 years ago
- Comments:10
Top Results From Across the Web
Adobe Reader failing to open PDFs even after being set as default ...
Hi Tim,. As per the description given above, that Adobe Reader isn't setting up as default PDF viewer and its keep reverting to...
Read more >Failed PDF Conversion: 5 Common Errors and Fixes - Inkit
#1. Conversion to PDF failed because of mistakes in HTML code · #2. HTML document formatting issues · #3. Rendering software integration mistakes....
Read more >Checking & Fixing PDFs for Accessibility - unt clear
If the accessibility check results show a "Tagged PDF" status of "Failed," then the PDF lacks tags. To resolve this issue, (1) right-click...
Read more >Correcting PDF Validation Issues - HCAI
If only a few pages fail validation, you can print those specific pages to PDF and replace them in the original file. To...
Read more >Troubleshoot when you can't insert a PDF Printout in Class ...
Solution 1: Set Adobe Acrobat Reader as your default PDF viewer · Install Adobe Acrobat Reader. · Open your Start menu, then select...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The problem is quite definitely how these files are formatted. In any case, the next release should be more tolerant of PDFs with these types of errors - it will issue warnings instead.
I went by the logs and concluded the errors are for the same for the most part.
Probably fixed this, or at least suppressed the immediate cause of stack trace, in next release