Extend --spreadsheet to handle invisible lines
See original GitHub issueI would be nice to support this kind of tables:
$ wget https://static.healthcare.siemens.com/siemens_hwem-hwem_ssxa_websites-context-root/wcm/idc/groups/public/@global/@services/documents/download/mdaw/mtiz/~edisp/x500_1_0_conformance_statement-00074148.pdf
$ java -jar $TABULA --spreadsheet -p 26 x500_1_0_conformance_statement-00074148.pdf
Since this is a multiline table, the option --spreadsheet
is required here. Without --spreadsheet
I can extract some stuff, but it is then difficult to join cell text by hand. It would be nice if --spreadsheet
would not return an empty output.
Issue Analytics
- State:
- Created 7 years ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
Show or hide gridlines on a worksheet - Microsoft Support
Show gridlines on a worksheet · Click the sheet. · To show gridlines: On the Layout tab, under View, select the Gridlines check...
Read more >Quickly Hide Rows & Columns with Groups and ... - YouTube
In this video, you can learn how to group rows and columns in Excel so that you can quickly hide and unhide rows...
Read more >How to fill down sequence numbers skip hidden rows in Excel?
This article introduces some good tricks for filling down sequence numbers only to visible cells and skip hidden rows in Excel.
Read more >Solving the Mystery of the Hidden Rows in Excel - ThreeWill
I was able to Unhide Row 7 as shown below: I just need to figure out why my current spreadsheet will NOT allow...
Read more >How to hide and unhide rows in Excel - Ablebits
Alternatively, you can click Home tab >Format > Row Height… and type 0 in the Row Height box. Either way, the selected rows...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Unfortunately, there are no invisible lines in this PDF, just white spaces. The “stream”/no-spreadsheet algorithm is built to use these white spaces; the spreadsheet/“lattice” algorithm needs actual line primitives in the PDF to work. There is, unfortunately, no way I know of to transform white spaces into lines in the general case into lines with sufficient confidence to use the lattice algorithm: even though our eyes can make the distinction easily, the computer can often see white lines when we know there aren’t any, e.g. “rivers”.
So, unfortunately, that PDF lacks lines, so the spreadsheet algorithm will not work on it. The “stream” mode output is the best you’re going to get. I know it’s a pain to try to combine cells, but we have not been able to come up with a general purpose method for combining them.
I’ve attached the table we’re talking about: ultrasound-26.pdf
Technically I want
Table 6 Ultrasound Image and Ultrasound Retired Image IOD Attributes
(page 24-29),Table 7 Ultrasound MultiFrame and Ultrasound MultiFrame Retired Image IOD Attributes
(page 30-36),Table 8 Secondary Capture Image IOD Attributes
(page 37-41) andTable 9 Comprehensive SR IOD Attributes
(page 41-43).