Parser: book content's corrupted or not present: 9781098122836
See original GitHub issue#] Parser: book content's corrupted or not present: node13-ch5.html (Chapter 5: Top 5 Developer-friendly Node.js API Frameworks)
however i can browse the page in browser without problem
https://learning.oreilly.com/library/view/nodejs-tools/9781098122836/Text/node13-ch5.html
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (1 by maintainers)
Top Results From Across the Web
Not working on some books · Issue #29 - GitHub
(32 chapters) [#] Parser: book content's corrupted or not present: ch02.html (2. The Swift Programming Language) [+] Please delete all the ...
Read more >safaribooks - Bountysource
Download and generate EPUB of your favorite books from Safari Books Online ... Parser: book content's corrupted or not present: node13-ch5.html (Chapter 5: ......
Read more >Corrupted Chaos by Shain Rose - Goodreads
Corrupted Chaos book. Read 792 reviews from the world's largest community for readers. My enemy doesn't make the rules behind closed doors…Even if...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
A dead ugly workaround is to download the failing file again and then to use a slightly different way to parse.
Funnily enough the object returned by the parser has the wrong type
Element
and must be converted to aHtmlElement
to match the expectations of the code using it later on. For this I applyfromstring
andtostring
conversions, which is certainly not an efficient approach, but my lxml foo is simply too weak. In my case this code executes rarely enough and is fast enough so that I don’t care.Because the whole thing is so cheesy and I don’t even understand the root cause, I don’t plan to create an MR. So the next best thing is to provide the patch below. To apply the patch store the patch into a file and apply it with
git apply <patch file>
onto the safaribooks git repo. If the patch fails to apply consider to checkout version af22b43c1 or a sufficiently compatible version and try again.Limitation: Because I use path
/tmp
the hack will only work on *nix-based systems (incl. Macs), because I didn’t bother to use useStringIO
or at least the pythonic temporary file module.Please upgrade lxml to the latest version.
In my case, lxml<=4.4.2 can’t parse html content contains mathematical unicode characters(https://stackoverflow.com/questions/69334692/lxml-can-not-parse-html-fragment-contains-certain-unicode-character )