question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I don't understand how to interpret the result from getTextContent

See original GitHub issue

I am using pdf.js to do understand the text layout of pdf documents. However, I am having trouble understanding the information returned by getTextContent. Sometime it appears that I have to scale the height for each item by the vertical scale in the transform, i.e., transform[3]. Othertimes I don’t. I have no idea how to determine when I have to and when I don’t. Below are two examples from different pdf documents. In the first, I should not scale the height. In the second, I should. Does anyone know how I can figure this out?

-- Result from Document 1 (In this case the font is 14pt)

{str:"one two three four",
 dir:"ltr",
 width:151.67308125000002,
 height:20.6625,
 transform:[20.6625,0,0,20.6625,110.854,651.853],
 fontName:"g_d0_f1"}

{fontFamily:"sans-serif",
 ascent:0.694,
 descent:-0.195}

-- Result from Document 2 (In this case the font is 8pt)

{str:"some text goes here to see if 101 and 2011xx: 1–8",
 dir:"ltr",
 width:171.72799999999987,
 height:64,
 transform:[8,0,0,8,30,684.5359],
 fontName:"g_d0_f1"
}

{fontFamily:"serif",
 ascent:0.883,
 descent:-0.217}

(this is also posted on stackoverlow)

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
timvandermeijcommented, Jan 29, 2019

Closing since the height and width calculation was wrong. This has been fixed in #10508.

Read more comments on GitHub >

github_iconTop Results From Across the Web

java - .getTextContent returns text from child elements too
When I finish working on my current project I'll try to look up and post links about iterating over the child nodes and...
Read more >
org.w3c.dom.Node.getTextContent java code examples
public static String getJustText(Node text) { StringBuilder sb = new StringBuilder(); NodeList textElems = text.getChildNodes(); for(int i = 0; ...
Read more >
Alternate to getTextContent and setTextContent
To simulate getTextContent: find all the child nodes that are Text nodes and append their values into a single string result.
Read more >
Node.textContent - Web APIs | MDN
Don't get confused by the differences between Node. ... Moreover, since innerText takes CSS styles into account, reading the value of ...
Read more >
Fix error "getTextContent is undefined for the type Node" for ...
For more details, and how you would know this, read on! Spotting this problem in Eclipse: Go back toe the Eclipse "Problems" tab...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found