Text in SVG pages is not aligned to unicode codepoints
See original GitHub issueSee issue https://github.com/mozilla/pdf.js/issues/8546
When creating HTML pages, the text is perfektly fine.
When creating SVG pages, the looks fine, but copy/paste shows only mojibake.
SVG pages can be used in <img src="/path/page.svg">
tags. They are often more handy than <iframe src="/path/page.html">
Steps to reproduce: PDF: r2l.pdf
gulp dist-install
node examples/node/pdf2svg.js /tmp/r2l.pdf
firefox svgdump/*.svg
Copy/paste the text to an editor.
@brendandahl commented, that this function already exists, but it is only called when creating HTML pages: https://github.com/mozilla/pdf.js/blob/ad74f6e7410420dc6ae27edc863a2ef906d77b57/src/core/fonts.js#L731
I would donate $500 for a fix. https://www.bountysource.com/issues/55452997-text-in-svg-pages-is-not-aligned-to-unicode-codepoints
Issue Analytics
- State:
- Created 6 years ago
- Comments:27 (3 by maintainers)
Top Results From Across the Web
Text — SVG 2
SVG attributes such as 'dx', 'textLength', and 'spacing' (in 'textPath') that may reposition typographic characters do not break discretionary ligatures. If ...
Read more >Most robust method for showing Icon next to text [closed]
I know it's possible to do using at least these different methods: Unicode character from default fonts; Unicode character from CSS loaded fonts ......
Read more >Chapter 4. Multiline SVG Text - O'Reilly
This chapter discusses the basic attributes to position spans of text, showing how you can move the virtual typewriter to a new point...
Read more >text-anchor - SVG: Scalable Vector Graphics - MDN Web Docs
The text-anchor attribute is used to align (start-, middle- or end-alignment) a string of pre-formatted text or auto-wrapped text where the ...
Read more >SVG and Typography: Characters - XML.com
Numeric Character References · HTML character entity references (“ and ” for double curly quotes) are not defined in SVG. · Specifying ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey there. I’ve investigated the issue and I’ll most likely post a quick fix sometime this week. If I’m correct, for most documents the problem will disappear if I straighten out both the text and embedded fonts’ cmaps to use proper Unicode. Maybe even the html renderer will benefit from this if the selection layer will match the text shape more closely. I’ll then investigate possible edge cases, like having several fonts are baked into a single font file - this will likely require splitting the font file back into pieces. Suggestions are welcome, though.
For the above document, I believe you’ll need to support unicode ligatures (see https://github.com/mozilla/pdf.js/blob/c2cbeaa34d81bbb7cced856e2888867df587a1fa/src/core/fonts.js#L780) and we’ll have to create a GSUB table.