I don't understand how to interpret the result from getTextContent
See original GitHub issueI am using pdf.js to do understand the text layout of pdf documents. However, I am having trouble understanding the information returned by getTextContent
. Sometime it appears that I have to scale the height for each item by the vertical scale in the transform, i.e., transform[3]
. Othertimes I don’t. I have no idea how to determine when I have to and when I don’t. Below are two examples from different pdf documents. In the first, I should not scale the height. In the second, I should. Does anyone know how I can figure this out?
-- Result from Document 1 (In this case the font is 14pt)
{str:"one two three four",
dir:"ltr",
width:151.67308125000002,
height:20.6625,
transform:[20.6625,0,0,20.6625,110.854,651.853],
fontName:"g_d0_f1"}
{fontFamily:"sans-serif",
ascent:0.694,
descent:-0.195}
-- Result from Document 2 (In this case the font is 8pt)
{str:"some text goes here to see if 101 and 2011xx: 1–8",
dir:"ltr",
width:171.72799999999987,
height:64,
transform:[8,0,0,8,30,684.5359],
fontName:"g_d0_f1"
}
{fontFamily:"serif",
ascent:0.883,
descent:-0.217}
(this is also posted on stackoverlow)
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
java - .getTextContent returns text from child elements too
When I finish working on my current project I'll try to look up and post links about iterating over the child nodes and...
Read more >org.w3c.dom.Node.getTextContent java code examples
public static String getJustText(Node text) { StringBuilder sb = new StringBuilder(); NodeList textElems = text.getChildNodes(); for(int i = 0; ...
Read more >Alternate to getTextContent and setTextContent
To simulate getTextContent: find all the child nodes that are Text nodes and append their values into a single string result.
Read more >Node.textContent - Web APIs | MDN
Don't get confused by the differences between Node. ... Moreover, since innerText takes CSS styles into account, reading the value of ...
Read more >Fix error "getTextContent is undefined for the type Node" for ...
For more details, and how you would know this, read on! Spotting this problem in Eclipse: Go back toe the Eclipse "Problems" tab...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Closing since the height and width calculation was wrong. This has been fixed in #10508.
See https://github.com/mozilla/pdf.js/blob/master/examples/text-only/pdf2svg.js example comments.