Multilines cells not well detected with guess mode
See original GitHub issueHello,
I’m trying to use your fantastic piece of software to analyse the table provided in page 1 of this pdf : input.pdf
I’m using the last version of tabula-java. I compiled it thought git and mvn (as specified in readme.md).
I execute this command line :
java -jar target/tabula-1.0.2-SNAPSHOT-jar-with-dependencies.jar -g input.pdf > output.csv
I’ve got the following output : output.txt
But, I expect this : output_expected.txt
Do you think is that possible ?
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Line spacing and alignment issues in tables with multiline cells
Lines 5 and 6 (without any multiline cells) have the correct distance from each other. A tabular within the table (cells 221 and...
Read more >delimited files opened in excel with multi lines per cell display ...
my files are being generated in javascript, and i tried inserting different line breaks with no success. here is how the file opens...
Read more >How to put multiple lines of text in one cell in Excel?
1. Right click the cell you want to put multiple lines, then click Format Cells. See screenshot: 2. In the Format Cells dialog...
Read more >Multiline text in Excel cells - java - Stack Overflow
If I press "Format" in excel before double-clicking the cell, "word wrap" property is not checked. After double-click text is split in 2...
Read more >How to copy multi-line text from Excel without quotes?
Easiest way that I've found is to concatenate the cells that you want to be on multiple lines with something "special" between them,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Tabula itself! Get it here: http://tabula.technology — The script export format contains the area of interest:
Hello there, I’ve come across similar scenario to detect the tables followed by MultiLine cells. I’ve tried the above idea, though it works very well using the tool (on Mac OSX), I’m unable to reproduce the exact results using tabula.jar.
Tried the one freshly built from source as well as the one that came with Tabula.app. (Tabula.app/Contents/Java). In both these cases the observation is consistent, that it misses the last row.,while rest of the results do come in well.
Tried to achieve similar workflow by tweaking code to handle both -g -l option which first guesses the rect and then passes the rectangles with SpreadsheetDetection algorithm, but the observation is still the same…,last rows get skipped.
Please suggest how could I augment both guess option and lattice way of extraction.