question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Text in SVG pages is not aligned to unicode codepoints

See original GitHub issue

See issue https://github.com/mozilla/pdf.js/issues/8546

When creating HTML pages, the text is perfektly fine. When creating SVG pages, the looks fine, but copy/paste shows only mojibake. SVG pages can be used in <img src="/path/page.svg"> tags. They are often more handy than <iframe src="/path/page.html">

Steps to reproduce: PDF: r2l.pdf

gulp dist-install
node examples/node/pdf2svg.js /tmp/r2l.pdf
firefox svgdump/*.svg

Copy/paste the text to an editor.

@brendandahl commented, that this function already exists, but it is only called when creating HTML pages: https://github.com/mozilla/pdf.js/blob/ad74f6e7410420dc6ae27edc863a2ef906d77b57/src/core/fonts.js#L731

I would donate $500 for a fix. https://www.bountysource.com/issues/55452997-text-in-svg-pages-is-not-aligned-to-unicode-codepoints

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:27 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
fngeorgycommented, May 28, 2018

Hey there. I’ve investigated the issue and I’ll most likely post a quick fix sometime this week. If I’m correct, for most documents the problem will disappear if I straighten out both the text and embedded fonts’ cmaps to use proper Unicode. Maybe even the html renderer will benefit from this if the selection layer will match the text shape more closely. I’ll then investigate possible edge cases, like having several fonts are baked into a single font file - this will likely require splitting the font file back into pieces. Suggestions are welcome, though.

1reaction
brendandahlcommented, May 30, 2018

For the above document, I believe you’ll need to support unicode ligatures (see https://github.com/mozilla/pdf.js/blob/c2cbeaa34d81bbb7cced856e2888867df587a1fa/src/core/fonts.js#L780) and we’ll have to create a GSUB table.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Text — SVG 2
SVG attributes such as 'dx', 'textLength', and 'spacing' (in 'textPath') that may reposition typographic characters do not break discretionary ligatures. If ...
Read more >
Most robust method for showing Icon next to text [closed]
I know it's possible to do using at least these different methods: Unicode character from default fonts; Unicode character from CSS loaded fonts ......
Read more >
Chapter 4. Multiline SVG Text - O'Reilly
This chapter discusses the basic attributes to position spans of text, showing how you can move the virtual typewriter to a new point...
Read more >
text-anchor - SVG: Scalable Vector Graphics - MDN Web Docs
The text-anchor attribute is used to align (start-, middle- or end-alignment) a string of pre-formatted text or auto-wrapped text where the ...
Read more >
SVG and Typography: Characters - XML.com
Numeric Character References · HTML character entity references (“ and ” for double curly quotes) are not defined in SVG. · Specifying ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found