question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Extend --spreadsheet to handle invisible lines

See original GitHub issue

I would be nice to support this kind of tables:

$ wget https://static.healthcare.siemens.com/siemens_hwem-hwem_ssxa_websites-context-root/wcm/idc/groups/public/@global/@services/documents/download/mdaw/mtiz/~edisp/x500_1_0_conformance_statement-00074148.pdf
$ java -jar $TABULA --spreadsheet  -p 26 x500_1_0_conformance_statement-00074148.pdf

Since this is a multiline table, the option --spreadsheet is required here. Without --spreadsheet I can extract some stuff, but it is then difficult to join cell text by hand. It would be nice if --spreadsheet would not return an empty output.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jeremybmerrillcommented, Nov 27, 2016

Unfortunately, there are no invisible lines in this PDF, just white spaces. The “stream”/no-spreadsheet algorithm is built to use these white spaces; the spreadsheet/“lattice” algorithm needs actual line primitives in the PDF to work. There is, unfortunately, no way I know of to transform white spaces into lines in the general case into lines with sufficient confidence to use the lattice algorithm: even though our eyes can make the distinction easily, the computer can often see white lines when we know there aren’t any, e.g. “rivers”.

So, unfortunately, that PDF lacks lines, so the spreadsheet algorithm will not work on it. The “stream” mode output is the best you’re going to get. I know it’s a pain to try to combine cells, but we have not been able to come up with a general purpose method for combining them.

I’ve attached the table we’re talking about: ultrasound-26.pdf

0reactions
malaterrecommented, Nov 25, 2016

Technically I want Table 6 Ultrasound Image and Ultrasound Retired Image IOD Attributes (page 24-29), Table 7 Ultrasound MultiFrame and Ultrasound MultiFrame Retired Image IOD Attributes (page 30-36), Table 8 Secondary Capture Image IOD Attributes (page 37-41) and Table 9 Comprehensive SR IOD Attributes (page 41-43).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Show or hide gridlines on a worksheet - Microsoft Support
Show gridlines on a worksheet · Click the sheet. · To show gridlines: On the Layout tab, under View, select the Gridlines check...
Read more >
Quickly Hide Rows & Columns with Groups and ... - YouTube
In this video, you can learn how to group rows and columns in Excel so that you can quickly hide and unhide rows...
Read more >
How to fill down sequence numbers skip hidden rows in Excel?
This article introduces some good tricks for filling down sequence numbers only to visible cells and skip hidden rows in Excel.
Read more >
Solving the Mystery of the Hidden Rows in Excel - ThreeWill
I was able to Unhide Row 7 as shown below: I just need to figure out why my current spreadsheet will NOT allow...
Read more >
How to hide and unhide rows in Excel - Ablebits
Alternatively, you can click Home tab >Format > Row Height… and type 0 in the Row Height box. Either way, the selected rows...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found